SCALABLE DATA SERVICES

Similar documents

Tier Architectures. Kathleen Durant CS 3200

Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam

High Availability Solutions for the MariaDB and MySQL Database

MySQL és Hadoop mint Big Data platform (SQL + NoSQL = MySQL Cluster?!)

White Paper. Optimizing the Performance Of MySQL Cluster

MySQL Technical Overview

Tushar Joshi Turtle Networks Ltd

Availability Digest. MySQL Clusters Go Active/Active. December 2006

MySQL Reference Architectures for Massively Scalable Web Infrastructure

MySQL High-Availability and Scale-Out architectures

Designing and Implementing Scalable Applications with Memcached and MySQL

Database Administration with MySQL

MySQL Cluster Deployment Best Practices

DBA Tutorial Kai Voigt Senior MySQL Instructor Sun Microsystems Santa Clara, April 12, 2010

SQL Databases Course. by Applied Technology Research Center. This course provides training for MySQL, Oracle, SQL Server and PostgreSQL databases.

Open Source DBMS CUBRID 2008 & Community Activities. Byung Joo Chung bjchung@cubrid.com

Cloud Based Application Architectures using Smart Computing

Apache Hadoop FileSystem and its Usage in Facebook

MySQL Cluster Delivering Scalable & Highly Available Session Management

Take An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc

MySQL 5.0 vs. Microsoft SQL Server 2005

Welcome to Virtual Developer Day MySQL!

New Features in MySQL 5.0, 5.1, and Beyond

How To Build A Fault Tolerant Mythrodmausic Installation

MySQL Storage Engines

High Availability and Scalability for Online Applications with MySQL

D61830GC30. MySQL for Developers. Summary. Introduction. Prerequisites. At Course completion After completing this course, students will be able to:

MySQL Administration and Management Essentials

HDFS Users Guide. Table of contents

Database Scalability and Oracle 12c

An Introduction To Gluster. Ryan Matteson

MySQL and Hadoop Big Data Integration

CASE STUDY: Oracle TimesTen In-Memory Database and Shared Disk HA Implementation at Instance level. -ORACLE TIMESTEN 11gR1

ovirt and Gluster hyper-converged! HA solution for maximum resource utilization

Distributed File Systems

ovirt and Gluster hyper-converged! HA solution for maximum resource utilization

Configuring Apache Derby for Performance and Durability Olav Sandstå

SQL Server 2014 New Features/In- Memory Store. Juergen Thomas Microsoft Corporation

Real-time reporting at 10,000 inserts per second. Wesley Biggs CTO 25 October 2011 Percona Live

Architectures Haute-Dispo Joffrey MICHAÏE Consultant MySQL

Table of Contents. Overview... 1 Introduction... 2 Common Architectures Technical Challenges with Magento ChinaNetCloud's Experience...

Ultimate Guide to Oracle Storage

Couchbase Server Under the Hood

Diagram 1: Islands of storage across a digital broadcast workflow

A programming model in Cloud: MapReduce

MySQL CLUSTER. Scaling write operations, as well as reads, across commodity hardware. Low latency for a real-time user experience

High Availability Solutions with MySQL

Recovery Principles in MySQL Cluster 5.1

Oracle Architecture, Concepts & Facilities

Migrating SaaS Applications to Windows Azure

A very short Intro to Hadoop

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9a: Cloud Computing WHAT IS CLOUD COMPUTING? 2

Introduction to Red Hat Storage. January, 2012

Wikimedia architecture. Mark Bergsma Wikimedia Foundation Inc.

Hadoop & its Usage at Facebook

Technical Overview Simple, Scalable, Object Storage Software

High Availability & Disaster Recovery Development Project. Concepts, Design and Implementation

Social Networks and the Richness of Data

Guide to Scaling OpenLDAP

How To Monitor Mysql With Zabbix

In-Memory Databases MemSQL

Blackboard Learn TM, Release 9 Technology Architecture. John Fontaine

SWIFT. Page:1. Openstack Swift. Object Store Cloud built from the grounds up. David Hadas Swift ATC. HRL 2012 IBM Corporation

Sep 23, OSBCONF 2014 Cloud backup with Bareos

Microsoft SQL Database Administrator Certification

MySQL 6.0 Backup. Replication and Backup Team. Dr. Lars Thalmann Dr. Charles A. Bell Rafal Somla. Presented by, MySQL & O Reilly Media, Inc.

Architecture and Mode of Operation

Cloud Computing at Google. Architecture

MakeMyTrip CUSTOMER SUCCESS STORY

ADAM 5.5. System Requirements

MySQL High Availability Solutions. Lenz Grimmer OpenSQL Camp St. Augustin Germany

Preparing for the Big Oops! Disaster Recovery Sites for MySQL. Robert Hodges, CEO, Continuent MySQL Conference 2011

Top 10 Reasons to Choose MySQL for Web-based Applications

be architected pool of servers reliability and

Apache Hadoop FileSystem Internals

Journal of science STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS)

Near Real Time Indexing Kafka Message to Apache Blur using Spark Streaming. by Dibyendu Bhattacharya

Oracle9i Database and MySQL Database Server are

Technical Challenges for Big Health Care Data. Donald Kossmann Systems Group Department of Computer Science ETH Zurich

HDFS Under the Hood. Sanjay Radia. Grid Computing, Hadoop Yahoo Inc.

Tips and Tricks for Using Oracle TimesTen In-Memory Database in the Application Tier

Scalability and BMC Remedy Action Request System TECHNICAL WHITE PAPER

Designing, Scoping, and Configuring Scalable Drupal Infrastructure. Presented by David Strauss

Architecture and Mode of Operation

Building Scalable Web Sites: Tidbits from the sites that made it work. Gabe Rudy

SQL Server PDW. Artur Vieira Premier Field Engineer

Best Practices for Implementing High Availability for SAS 9.4

In Memory Accelerator for MongoDB

Removing Failure Points and Increasing Scalability for the Engine that Drives webmd.com

Scalable Architecture on Amazon AWS Cloud

FIFTH EDITION. Oracle Essentials. Rick Greenwald, Robert Stackowiak, and. Jonathan Stern O'REILLY" Tokyo. Koln Sebastopol. Cambridge Farnham.

Redundant Storage Cluster

Achieving Zero Downtime and Accelerating Performance for WordPress

Non-Stop for Apache HBase: Active-active region server clusters TECHNICAL BRIEF

Chapter 7. Using Hadoop Cluster and MapReduce

Large-Scale Web Applications

Practical Cassandra. Vitalii

Big data management with IBM General Parallel File System

Transcription:

1 SCALABLE DATA SERVICES 2110414 Large Scale Computing Systems Natawut Nupairoj, Ph.D.

Outline 2 Overview MySQL Database Clustering GlusterFS Memcached

3 Overview

Problems of Data Services 4 Data retrieval is usually the bottleneck Searching Transferring Basic performance improvement schemes Data partitioning Data replication need to maintain consistency Other techniques Database Clustering High-performance File Systems In-memory caching

5 MySQL Database Clustering Adapted from G. Vanderkelen, MySQL Cluster: An introduction, 2006

Quick intro to MySQL MySQL is a DBMS running on most OS Reputation for speed, quality, reliability and easy to use Storage Engines (MyISAM, InnoDB,..) Support standard SQL and other features Stored procedures Triggers, Updatable Views, Cursors Precision math Data dictionary (INFORMATION_SCHEMA database) and more.. Lots of Connectors and API available

MySQL Architecture API (PHP,Java,C/C++,.NET,..) Connection Pool (Authentication, limits, caches,..) SQL Interface DML/DDL Stored Procedures, Triggers, Views,.. Parser SQL Translation, Object Privileges Optimizer Execution Plan, Statistics Caches & Buffers Query Cache, global & per engine Storage Engines MyISAM InnoDB Memory Federated NDB CVS Merge Archive Blackhole (custom)

What is MySQL Cluster? In-memory storage data and indices in-memory check-pointed to disk Shared-Nothing architecture No single point of failure Synchronous replication between nodes Fail-over in case of node failure Row level locking Hot backups

Cluster Nodes 9 Participating processes are called nodes Nodes can be on same computers Two tiers in Cluster: SQL layer SQL nodes (also called API nodes) Storage layer Data nodes, Management nodes

Components of a Cluster PHP JBoss.Net Application Sql Layer SQL Nodes MySQL (mysqld) MySQL (mysqld) MySQL (mysqld) NDB NDB NDB Storage Layer Data Nodes Data Nodes ndbd ndbd ndbd ndbd Management Nodes Management (ndb_mgmd)

Data nodes Contain data and index Used for transaction coordination Each data node is connect to the others Shared-nothing architecture Up to 48 data nodes

SQL nodes Usually MySQL servers Also called API nodes Each SQL node is connected to all data nodes Applications access data using SQL Native NDB application (e.g. ndb_restore) Client application written using NDB API

Management nodes Controls setup and configuration Needed on startup for other nodes Cluster can run without Can act as arbitrator during network partitioning Cluster logs Accessible using ndb_mgm CLI

A Configuration Application Application Application MySQL + ndb_mgmd NDB MySQL NDB Data Nodes Node Group 1 ndbd ndbd

Failure: MySQL server Applications can use other Application Application Application mysqld reconnects MySQL + ndb_mgmd NDB MySQL NDB Data Nodes Node Group 1 ndbd ndbd

Failure: Data Node Other data nodes know Transaction aborted Min. 1 node per group needed 0 nodes in group = shutdown Application MySQL + ndb_mgmd NDB Application MySQL NDB Application Data Nodes Node Group 1 ndbd ndbd

Example: Web Sessions 17 Load balancer Apache/PHP Apache/PHP Apache/PHP Apache/PHP Without Cluster One MySQL server holding data Single point of failure MySQL (mysqld)

Web Sessions Load balancer Apache + PHP mysqld Apache + PHP mysqld Apache + PHP mysqld Apache + PHP mysqld With Cluster MySQL distributed No Single point of failure Shared storage, but redundant ndbd ndbd ndbd ndbd 18

19 GlusterFS

What is GlusterFS? 20 Open source, clustered file system Scale up to several petabytes for thousands of clients Aggregate disk and memory resources into a single global namespace over network Leverage commodity hardware Lead to storage virtualization Allow administrators to dynamically expand, shrink, rebalance, and migrate volumes Provide linear scalability, high performance, high availability, and ease of management

GlusterFS Overview 21

GlusterFS Architecture 22

GlusterFS Load Balancing Mode 23 Data are stored as files and folders Use tokens Extended attributes of a file Identify the location of a file Distributed across directories No need for dedicated metadata server Gluster translates the requested file name to a token and access the files directly

GlusterFS Replication Mode 24 Support auto replication across multiple storages Provide high availability (auto fail-over) and auto self-healing Uses load balancing to access replicated instances

25 Memcached

What is Memcached? 26 General-purpose high-performance open source distributed memory caching system giant hash table distributed across multiple machines Speed up dynamic database-driven websites by caching data and objects in RAM Being used by many popular web sites LiveJournal, Wikipedia, Facebook, Flickr, Twitter, Youtube API is available in many languages PHP, Java, Python, Perl, C, MySQL API

Basic Memcached Operations 27

Memcached with Java 28 MemcachedClient c=new MemcachedClient( new InetSocketAddress("127.0.0.1", 11211)); c.set("somekey", 3600, someobject); Object myobject=c.get("somekey"); c.delete("somekey")

Memcached and MySQL 29 Caching the results of database queries SELECT * FROM users WHERE userid =? with (userid:user)

30 Putting It All Together: Facebook Architecture

References 31 P. Strassmann, Introduction to Virtualization, http://www.strassmann.com/pubs/gmu/2008-10.pdf, George Mason University, 2008 Gluster Community, Gluster 3.1.x Documentation, http://www.gluster.com/community/documentation/index.php/main_page, 2010 Designing and Implementing Scalable Applications with Memcached and MySQL, MySQL White Paper, 2008 Wikipedia, Memcached, http://en.wikipedia.org/wiki/memcached, 2010