Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam sastry.vedantam@oracle.com



Similar documents
Welcome to Virtual Developer Day MySQL!

MySQL and Hadoop: Big Data Integration. Shubhangi Garg & Neha Kumari MySQL Engineering

MySQL and Hadoop Big Data Integration

MySQL Strategy. Morten Andersen, MySQL Enterprise Sales. Copyright 2014 Oracle and/or its affiliates. All rights reserved.

MySQL és Hadoop mint Big Data platform (SQL + NoSQL = MySQL Cluster?!)


<Insert Picture Here> MySQL in the Oracle Ecosystem

MySQL Administration and Management Essentials

MySQL Security: Best Practices

<Insert Picture Here> MySQL Update

SCALABLE DATA SERVICES

Constructing a Data Lake: Hadoop and Oracle Database United!

SQL Databases Course. by Applied Technology Research Center. This course provides training for MySQL, Oracle, SQL Server and PostgreSQL databases.

MySQL: Community and Enterprise soul. Marco Carlessi - (Oracle) MySQL Senior Sales Consultant

MySQL Enterprise Monitor

MySQL ENTEPRISE EDITION

OLTP Meets Bigdata, Challenges, Options, and Future Saibabu Devabhaktuni

MySQL Enterprise Backup

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

<Insert Picture Here> Playing in the Same Sandbox: MySQL and Oracle

DBA Tutorial Kai Voigt Senior MySQL Instructor Sun Microsystems Santa Clara, April 12, 2010

Oracle Database 12c Plug In. Switch On. Get SMART.

D61830GC30. MySQL for Developers. Summary. Introduction. Prerequisites. At Course completion After completing this course, students will be able to:

XpoLog Competitive Comparison Sheet

Hypertable Architecture Overview

MySQL Enterprise Edition

Best Practices for Using MySQL in the Cloud

Tushar Joshi Turtle Networks Ltd

the missing log collector Treasure Data, Inc. Muga Nishizawa

Big Data Course Highlights

Open Source DBMS CUBRID 2008 & Community Activities. Byung Joo Chung bjchung@cubrid.com

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview

From Dolphins to Elephants: Real-Time MySQL to Hadoop Replication with Tungsten

An Oracle White Paper June High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database

Database Scalability and Oracle 12c

MySQL Backups: From strategy to Implementation

Tips and Tricks for Using Oracle TimesTen In-Memory Database in the Application Tier

Database as a Service (DaaS) Version 1.02

High Availability Solutions for the MariaDB and MySQL Database

Integrating VoltDB with Hadoop

CHAPTER 1 - JAVA EE OVERVIEW FOR ADMINISTRATORS

WEBAPP PATTERN FOR APACHE TOMCAT - USER GUIDE

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION

Blackboard Learn TM, Release 9 Technology Architecture. John Fontaine

Oracle Database 11g Comparison Chart

Contents. Pentaho Corporation. Version 5.1. Copyright Page. New Features in Pentaho Data Integration 5.1. PDI Version 5.1 Minor Functionality Changes

Moving From Hadoop to Spark

MySQL Technical Overview

Implement Hadoop jobs to extract business value from large and varied data sets

TIBCO Spotfire Statistics Services Installation and Administration. Release 5.5 May 2013

Data Integration Checklist

Informatica Data Replication FAQs

Application-Tier In-Memory Analytics Best Practices and Use Cases

owncloud Architecture Overview

<Insert Picture Here> MySQL Security In A Cloudy World

Assignment # 1 (Cloud Computing Security)

Big Data Analytics Nokia

INTRODUCTION ADVANTAGES OF RUNNING ORACLE 11G ON WINDOWS. Edward Whalen, Performance Tuning Corporation

Hadoop and MySQL for Big Data

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

Oracle WebLogic Server 11g Administration

MySQL Cluster New Features. Johan Andersson MySQL Cluster Consulting johan.andersson@sun.com

MySQL CLUSTER. Scaling write operations, as well as reads, across commodity hardware. Low latency for a real-time user experience

Apache Sentry. Prasad Mujumdar

MySQL Reference Architectures for Massively Scalable Web Infrastructure

TIBCO Spotfire Statistics Services Installation and Administration Guide

Upcoming Announcements

F1: A Distributed SQL Database That Scales. Presentation by: Alex Degtiar (adegtiar@cmu.edu) /21/2013

Qlik Sense Enabling the New Enterprise

Microsoft SQL Database Administrator Certification

Data processing goes big

Portable Scale-Out Benchmarks for MySQL. MySQL User Conference 2008 Robert Hodges CTO Continuent, Inc.

Actian Vector in Hadoop

Deploying Hadoop with Manager

Alexander Rubin Principle Architect, Percona April 18, Using Hadoop Together with MySQL for Data Analysis

Oracle Database - Engineered for Innovation. Sedat Zencirci Teknoloji Satış Danışmanlığı Direktörü Türkiye ve Orta Asya

Who did what, when, where and how MySQL Audit Logging. Jeremy Glick & Andrew Moore 20/10/14

Safe Harbor Statement

6231A - Maintaining a Microsoft SQL Server 2008 Database

Social Networks and the Richness of Data

Using distributed technologies to analyze Big Data

owncloud Architecture Overview

Amazon Redshift & Amazon DynamoDB Michael Hanisch, Amazon Web Services Erez Hadas-Sonnenschein, clipkit GmbH Witali Stohler, clipkit GmbH

<Insert Picture Here> Oracle In-Memory Database Cache Overview

Tier Architectures. Kathleen Durant CS 3200

Big Data, SAP HANA. SUSE Linux Enterprise Server for SAP Applications. Kim Aaltonen

Performance and Scalability Overview

Hadoop Job Oriented Training Agenda

Lofan Abrams Data Services for Big Data Session # 2987

New Features in MySQL 5.0, 5.1, and Beyond

Data Governance in the Hadoop Data Lake. Michael Lang May 2015

IBM Campaign Version-independent Integration with IBM Engage Version 1 Release 3 April 8, Integration Guide IBM

<Insert Picture Here> Big Data

Firebird. A really free database used in free and commercial projects

SQL Server What s New? Christopher Speer. Technology Solution Specialist (SQL Server, BizTalk Server, Power BI, Azure) v-cspeer@microsoft.

TIBCO Spotfire Statistics Services Installation and Administration

Database Administration with MySQL

Performance and Scalability Overview

Transcription:

Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam sastry.vedantam@oracle.com

Agenda The rise of Big Data & Hadoop MySQL in the Big Data Lifecycle MySQL Solutions for Big Data Q&A

Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decision. The development, release, and timing of any features or functionality described for Oracle s products remains at the sole discretion of Oracle.

DRIVING MySQL INNOVATION MySQL Enterprise Monitor 2.2 MySQL Cluster 7.1 MySQL Cluster Manager 1.0 MySQL Workbench 5.2 MySQL Database 5.5 MySQL Enterprise Backup 3.5 MySQL Enterprise Monitor 2.3 MySQL Cluster Manager 1.1 All GA! MySQL Enterprise Backup 3.7 Oracle VM Template for MySQL Enterprise Edition MySQL Enterprise Oracle Certifications MySQL Windows Installer MySQL Enterprise Security MySQL Enterprise Scalability All GA! MySQL Database 5.6 DMR* MySQL Cluster 7.2 DMR MySQL Labs! MySQL Cluster 7.2 MySQL Cluster Manager 1.4 MySQL Utilities 1.0.6 MySQL Migration Wizard MySQL Enterprise Backup 3.9 MySQL Enterprise Audit MySQL Database 5.6 MySQL Cluster 7.3 All GA! MySQL Database 5.7.2 DMR A BETTER MySQL ( early and often ) 2010 McKinsey Global Institute 2011 2012-13 *Development Milestone Release

Pluggable Storage Engines Architecture MySQL Server Clients and Apps Connectors Native C API, JDBC, ODBC,.Net, PHP, Ruby, Python, VB, Perl Enterprise Management Services and Utilities Backup & Recovery Security Replication Cluster Partitioning Instance Manager Information_Schema MySQL Workbench SQL Interface DDL, DML, Stored Procedures, Views, Triggers, Etc.. Connection Pool Authentication Thread Reuse Connection Limits Check Memory Caches Parser Query Translation, Object Privileges Optimizer Access Paths, Statistics Caches Global and Engine Specific Caches and Buffers Pluggable Storage Engines Memory, Index and Storage Management InnoDB MyISAM Cluster Etc Partners Community More.. Filesystems, Files and Logs Redo, Undo, Data, Index, Binary, Error, Query and Slow 2010 Oracle Corporation Proprietary and Confidential

Industry Leaders Rely on MySQL Web & Enterprise OEM & ISVs Cloud

MySQL 5.6: In Summary IMPROVED PERFORMANCE AND SCALABILITY Scales to 48 CPU Threads Up to 230% performance gain over MySQL 5.5 IMPROVED INNODB Better transactional throughput and availability IMPROVED OPTIMIZER Better query exec times and diagnostics for query tuning and debugging IMPROVED REPLICATION Higher performance, availability and data integrity IMPROVED PERFORMANCE SCHEMA Better Instrumentation, User/Application level statistics and monitoring New! NoSQL ACCESS TO INNODB McKinsey Global Institute Fast, Key Value access with full ACID compliance, better developer agility

MySQL 5.6: Best Replication Features Ever PERFORMANCE Multi-Threaded Slaves Binary Log Group Commit Optimized Row-Based Replication FAILOVER & RECOVERY Global Transaction Identifiers Replication Failover & Admin Utilities Crash Safe Slaves DATA INTEGRITY Replication Event Checksums DEV/OPS AGILITY Time Delayed Replication McKinsey Global Institute Remote Binlog Backup Informational Log Events

Leading Use-Case, On-Line Retail Users Browsing Recommendations Profile, Purchase History Social media updates Preferences Brands Liked Web Logs: Pages Viewed Comments Posted Recommendations Telephony Stream

MySQL in the Big Data Lifecycle BI Solutions DECIDE ACQUIRE Applier ANALYZE ORGANIZE

Download the MySQL Guide to Big Data: http://www.mysql.com/why-mysql/white-papers/mysql-and-hadoop-guide-to-big-data-integration/ *Leading Hadoop Vendor

MySQL in the Big Data Lifecycle DECIDE ACQUIRE NoSQL Interfaces for MySQL Database MySQL Cluster ANALYZE ORGANIZE

MySQL NoSQL Interface Design Goals: Fast, Flexible and Safe Blazing Fast Key / Value Queries Fully transactional / ACID NoSQL + SQL across same Data Set Combined with Schema Flexibility: Online DDL

How Memcached is used with MySQL separately Memcached is in-memory key-value store for small data It is one of the most widely used In- Memory cache implementations for social network websites Memcached has a simple and open protocol as opposed to a rich client bound to a specific language, and implementation makes it portable across a wide variety of languages and environments client SQL query Save/retrieve copy Simple protocol Set/get/add/delete MySQL Memcached Memcached

InnoDB as a Key Value store Combine the best of the NoSQL world and SQL world Memcached listens on specific ports as the front end, directs requests directly to InnoDB Simple commands, much smaller network transmit packages Persistent storage from InnoDB Index on the key column Full ACID compliance Bypass Optimizer and QP layer of MySQL and directly access the storage engine Dual access of data (SQL and Memcached)

MySQL 5.6: NoSQL Interface to InnoDB Memcached API Key-value access to InnoDB Bypasses SQL parsing Implemented via: - Memcached plug-in to mysqld - Memcached mapped to native InnoDB API - Use existing Memcached clients - Shared process for ultra-low latency

TPS Performance 80000 MySQL 5.6: NoSQL Benchmarking 70000 60000 50000 40000 30000 Memcached API SQL 20000 10000 0 8 32 128 512 Client Connections Up to 9x Higher SET / INSERT Throughput

MySQL Cluster: Multiple NoSQL Interfaces Mix & Match

Millions of UPDATEs per Second 25 1.2 Billion UPDATEs per Minute 20 15 10 5 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 MySQL Cluster Data Nodes NoSQL C++ API, flexasynch benchmark 30 x Intel E5-2600 Intel Servers, 2 socket, 64GB ACID Transactions, with Synchronous Replication

MySQL in the Big Data Lifecycle DECIDE ACQUIRE Import Data Apache Sqoop MySQL Hadoop Applier ANALYZE ORGANIZE

Apache Sqoop Apache TLP, part of Hadoop project Developed by Cloudera Bulk data import and export Between Hadoop (HDFS) and external data stores JDBC Connector architecture Supports plug-ins for specific functionality Fast Path Connector developed for MySQL

Importing Data Submit Map Only Job Sqoop Import Gather Metadata Sqoop Job Map HDFS Storage Map Transactional Data Map Map Hadoop Cluster

Hadoop Applier: Design Uses MySQL replication techniques for real time integration Binlog API uses Binary Log to rapidly fetch new data from a running server via the replication protocol MySQL Binlog comprised of events, each event represents a database change Hadoop Applier receives the events using the Binlog API, and writes the changes into a file in Hadoop Distributed File System Other tools in Hadoop Ecosystem,such as Apache Hive, can then consume this data

Hadoop Applier: Implementation Replicates rows inserted into a table in MySQL to Hadoop Distributed File System Uses an API provided by libhdfs, a C library to manipulate files in HDFS The library comes pre-compiled with Hadoop Distributions Connects to the MySQL master (or reads the binary log generated by MySQL) to: Fetch the row insert events occurring on the master Decode these events, extracting data inserted into each field of the row Separate the data by the desired field delimiters and row delimiters Use content handlers to get it in the format required Append it to a text file in HDFS

Integration with HIVE Hive runs on top of Hadoop. Install HIVE on the hadoop master node Set the default datawarehouse directory same as the base directory into which Hadoop Applier writes Create similar schema's on both MySQL & Hive Timestamps are inserted as first field in HDFS files Data is stored in 'datafile1.txt' by default The working directory is base_dir/db_name.db/tb_name SQL Query CREATE TABLE t (i INT); Hive QL CREATE TABLE t ( time_stamp INT, i INT) [ROW FORMAT DELIMITED] STORED AS TEXTFILE;

Mapping Between MySQL and HDFS Schema

MySQL in the Big Data Lifecycle DECIDE Analyze Export Data Decide ANALYZE

Analyze Big Data

Export Data Submit Map Only Job Sqoop Export Gather Metadata Sqoop Job Map HDFS Storage Map Transactional & Analytics Data Map Map Hadoop Cluster

MySQL Reporting Database for BI

MySQL Operational Database for Web

Data Analysis: MySQL Enterprise Edition Highest Levels of Security, Performance and Availability MySQL Enterprise Security MySQL Enterprise Audit Oracle Premier Lifetime Support Oracle Product Certifications/Integrations MySQL Enterprise Monitor/Query Analyzer MySQL Enterprise Scalability MySQL Enterprise High Availability MySQL Enterprise Backup MySQL Workbench

MySQL Enterprise Monitor with Query Analyzer Tune Analytical Queries Enhance DevOps Agility

Scaling, Security and Data Protection MySQL Enterprise Scalability MySQL Enterprise Security MySQL Enterprise Audit MySQL Enterprise Backup

MySQL Enterprise Backup Online Backup for InnoDB Full, Incremental, Partial Backups (scriptable interface) Compression Point in Time, Full, Partial Recovery options Metadata on status, progress, history Unlimited Database Size Cross-Platform Windows, Linux, Unix MEB Backup Files Certified with Oracle Secure Backup mysqlbackup MySQL Database Files Ensures quick, online backup and recovery of your MySQL apps.

MySQL Enterprise Security MySQL External Authentication PAM (Pluggable Authentication Modules) Access external authentication methods Standard interface (Unix, LDAP, others) proxied and non-proxied users Windows Access native Windows services Authenticate users already logged into Windows (Windows Active Directory) Pluggable Authentication API Integrates MySQL with existing security infrastructures and SOPs.

5.5 MySQL Enterprise Scalability MySQL Thread Pool MySQL default thread-handling excellent performance, can limit scalability as connections grow MySQL Thread Pool improves sustained performance/scale as user connections grow

Thread Pool

MySQL Enterprise Audit Policy-based Auditing for MySQL Applications Out-of-the-box logging of connections, logins, query activity across all or specific MySQL servers User defined policies, filtering and log rotation Dynamically enabled, disabled: no server restart XML-based audit stream per Oracle audit specification Easily implemented via MySQL 5.5 Audit API MySQL 5.5.28 and higher Get it here: support.oracle.com and edelivery.oracle.com Adds regulatory compliance to MySQL applications

Oracle Premier Support for MySQL Rely on The Experts - Get Unique Benefits Straight from the Source Largest Team of MySQL Experts Backed by MySQL Developers Forward Compatible Hot Fixes MySQL Maintenance Releases MySQL Support in 29 Languages 24/7/365 Unlimited Incidents Knowledge Base MySQL Consultative Support Only From Oracle "The MySQL support service has been essential in helping us with troubleshooting and providing recommendations for the production cluster, Thanks." -- Carlos Morales Playfulplay.com

Summary MySQL + Hadoop: widely deployed solution Best of both worlds SQL + NoSQL Access Tools and expertise to support you End to end Oracle solutions for Big Data Integrate for Insight

Next Steps Download the Guide http://www.mysql.com/why-mysql/whitepapers/mysql_wp_hadoop.php Try Out MySQL 5.6 http://www.mysql.com/downloads/mysql/ Engage MySQL Consulting http://www.mysql.com/consulting/

Thank you!