Database Scalability and Oracle 12c



Similar documents

Mark Bennett. Search and the Virtual Machine

<Insert Picture Here> Oracle Database Directions Fred Louis Principal Sales Consultant Ohio Valley Region

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013

Scalable Internet Services and Load Balancing

Oracle Big Data SQL Technical Update

Scalable Architecture on Amazon AWS Cloud

OLTP Meets Bigdata, Challenges, Options, and Future Saibabu Devabhaktuni

Assignment # 1 (Cloud Computing Security)

Scalable Internet Services and Load Balancing

Comparison of the Frontier Distributed Database Caching System with NoSQL Databases

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

Cloud Based Application Architectures using Smart Computing

MySQL és Hadoop mint Big Data platform (SQL + NoSQL = MySQL Cluster?!)

Scalability of web applications. CSCI 470: Web Science Keith Vertanen

Oracle Database 11g Comparison Chart

Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam

FIFTH EDITION. Oracle Essentials. Rick Greenwald, Robert Stackowiak, and. Jonathan Stern O'REILLY" Tokyo. Koln Sebastopol. Cambridge Farnham.

NoSQL Data Base Basics

Welcome to Virtual Developer Day MySQL!

Hadoop IST 734 SS CHUNG

Real-time Data Replication

How To Scale Out Of A Nosql Database

Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB

XTM Web 2.0 Enterprise Architecture Hardware Implementation Guidelines. A.Zydroń 18 April Page 1 of 12

Tushar Joshi Turtle Networks Ltd

Can the Elephants Handle the NoSQL Onslaught?

Cloud Computing Trends

Data Warehousing and Analytics Infrastructure at Facebook. Ashish Thusoo & Dhruba Borthakur athusoo,dhruba@facebook.com

Big Data Analytics - Accelerated. stream-horizon.com

Oracle Database In-Memory The Next Big Thing

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9a: Cloud Computing WHAT IS CLOUD COMPUTING? 2

Sentimental Analysis using Hadoop Phase 2: Week 2

Oracle Database 12c Plug In. Switch On. Get SMART.

NoSQL and Hadoop Technologies On Oracle Cloud

SQL Server 2014 New Features/In- Memory Store. Juergen Thomas Microsoft Corporation

Big Data with Component Based Software

Evaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing

WINDOWS AZURE DATA MANAGEMENT AND BUSINESS ANALYTICS

#9011 GeoMedia WebMap Performance Analysis and Tuning (a quick guide to improving system performance)

Big Data Analytics - Accelerated. stream-horizon.com

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES

Where We Are. References. Cloud Computing. Levels of Service. Cloud Computing History. Introduction to Data Management CSE 344

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

How Comcast Built An Open Source Content Delivery Network National Engineering & Technical Operations

An Oracle White Paper July Oracle Primavera Contract Management, Business Intelligence Publisher Edition-Sizing Guide

Introduction to Big Data Training

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, XLDB Conference at Stanford University, Sept 2012

Hadoop and Map-Reduce. Swati Gore

A Performance Analysis of Distributed Indexing using Terrier

Tier Architectures. Kathleen Durant CS 3200

Tips and Tricks for Using Oracle TimesTen In-Memory Database in the Application Tier

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

D61830GC30. MySQL for Developers. Summary. Introduction. Prerequisites. At Course completion After completing this course, students will be able to:

Managing your Red Hat Enterprise Linux guests with RHN Satellite

Apache Hadoop. Alexandru Costan

SCALABLE DATA SERVICES

Moving From Hadoop to Spark

Cloud Computing Is In Your Future

Top 10 Performance Tips for OBI-EE

Cloud Computing: Meet the Players. Performance Analysis of Cloud Providers

Big Data Use Case. How Rackspace is using Private Cloud for Big Data. Bryan Thompson. May 8th, 2013

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Open Source Technologies on Microsoft Azure

Emerging Technologies Shaping the Future of Data Warehouses & Business Intelligence

Benchmarking Couchbase Server for Interactive Applications. By Alexey Diomin and Kirill Grigorchuk

CitusDB Architecture for Real-Time Big Data

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, UC Berkeley, Nov 2012

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Module 14: Scalability and High Availability

hmetrix Revolutionizing Healthcare Analytics with Vertica & Tableau

SWISSBOX REVISITING THE DATA PROCESSING SOFTWARE STACK

TNT SOFTWARE White Paper Series

America s Most Wanted a metric to detect persistently faulty machines in Hadoop

MEASURING WORKLOAD PERFORMANCE IS THE INFRASTRUCTURE A PROBLEM?

How good can databases deal with Netflow data

<Insert Picture Here> Best Practices for Extreme Performance with Data Warehousing on Oracle Database

Bringing Big Data into the Enterprise

How To Use Big Data For Telco (For A Telco)

SharePoint 2010 Performance and Capacity Planning Best Practices

Harnessing the Power of the Microsoft Cloud for Deep Data Analytics

Dimension Data Enabling the Journey to the Cloud

Enterprise Architectures for Large Tiled Basemap Projects. Tommy Fauvell

Technology Insight Series

Oracle Architecture, Concepts & Facilities

SAP HANA SAP s In-Memory Database. Dr. Martin Kittel, SAP HANA Development January 16, 2013

Preview of Oracle Database 12c In-Memory Option. Copyright 2013, Oracle and/or its affiliates. All rights reserved.

FROM RELATIONAL TO OBJECT DATABASE MANAGEMENT SYSTEMS

X4-2 Exadata announced (well actually around Jan 1) OEM/Grid control 12c R4 just released

Upgrading to Microsoft SQL Server 2008 R2 from Microsoft SQL Server 2008, SQL Server 2005, and SQL Server 2000

Parallel & Distributed Data Management

Large-Scale Web Applications

Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu

[Hadoop, Storm and Couchbase: Faster Big Data]

Postgres Plus Advanced Server

ORACLE OLAP. Oracle OLAP is embedded in the Oracle Database kernel and runs in the same database process

Transcription:

Database Scalability and Oracle 12c Marcelle Kratochvil CTO Piction ACE Director All Data/Any Data marcelle@piction.com

Warning I will be covering topics and saying things that will cause a rethink in how you view the Database world... No regrets

Do these scale? Java MySQL PL/SQL Securefiles Clustered databases Object Oriented 3-Tier File Systems

What about these metrics? TPC Benchmarks Hardware Disk Speed Parallelism Windows Unix

Focus is traditional view How much data can be stored How many queries can be run How many concurrent users How BIG???

Focus is on big...

Scalability is more Introducing new concepts Transparent Scalability Bi-Directional New Dimensions All scalability comes at a price (sacrifice)

Scalability comes at a Price Common method to achieve Delayed Consistency Data Warehouse Analytics (summary) Snapshot replication Log Mining Reverse data from REDO logs Course Grain Search engines Google (trawled pages are not indexed in real time)

Scalability comes at a Price Common method to achieve Batch and Queue Advanced Queuing Ensure consistent throughput

Theory vs Practicality New technology doesn't always naturally scale XML Object Oriented (clients vs server) Relational

<SOAP-ENV:Envelope xmlns:soap- ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/1999/xmlschema-instance" xmlns:xsd="http://www.w3.org/1999/xmlschema"> <SOAP-ENV:Body> <WEBSERVICE xmlns="http://host.piction.com/" SOAP- ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">webs ervice <myxml_values_are_easy_to_read:this_is_thename_of_person xmlns:d='http://www.develop.com/student' xmlns:i='urn:schemas-develop-com:identifiers' xmlns:p='urn:schemas-develop-com:programming-languages'> <unique_id:identifier_pk>3235329</unique_id:identifier_pk> <full_name_of_person>jane Doe</full_name_of_person> <another_namespace:language>c#</another_namespace:language> <myxml_values_are_easy_to_read:rating>9.5</myxml_values_are_easy_t o_read:rating> </myxml_values_are_easy_to_read:this_is_thename_of_person> </WEBSERVICE> </SOAP-ENV:Body> </SOAP-ENV:Envelope> 888 Characters Hard to read Time to generate Time to transmit Time to parse Time to process

{ } "pk": "3235329", "nm": "Jane Doe", "l": "C#", "rt": "9.5" 68 Characters Full use XML does not encourage scalability JSON does - sacrifice: meaning in attribute names (maintenance)

On a serious point which understands network scalability? NOT: Any of the big vendors Clue: Who came up with bit-torrent? Finally somone was getting it

MOST VENDORS The heart, the core of their architecture is client/server or 3-Tier The network has infinite capacity in their eyes Doomed to fail

Key Lesson The language, the tool, the environment used can encourage scalable practices How well does PL/SQL rate vs Java or Python?

Transparent Scalability Conceptual to Physical without change Can I take my conceptual design (tables) and install it without change? No change to # columns No change to data types No change to column lengths No changes for performance 1. Can this be done? (max columns) 2. How many users, how much storage, before it breaks?

Flexibility in design, maintenance, adhoc queries, upgrades comes at a price Scalability Conceptual Logical Scalability High Break Point Redesign Different databases have different break points Physical Entry Point Low All databases can be made to scale, but not all transparently scale well

Scalability High Break Point Conceptual Rigid Fixed Entry Point Physical Conceptual Break Point Rigid comes at the price if flexibility Logical Physical Entry Point Low

Hosted Small Medium Large Remote access Customer only 100Gb+ Outsourced No DBAs No Sys admin About 500Gb Data Black box soln No DBAs Some Sys admin About 1-5Tb Data Remote admin DBA Team Skilled Sys admin 1-50Tb data Secure Replicate Database Size vs Support Size - Does this scale also? DMZ

What separates database vendors is Transparent scalability not scalability based on impractical metrics

Database Terroir set of special characteristics Scalability You keep using that word. I do not think it means what you think it means. - Inigo Montoya Memory Concurrency CPU Storage Network The forgotten metric

MPP Cluster Process Memory Multi-CPU Caching Locking 1 10 50 100 200 500 1000 2000 5000 10000 #users This graph shows common breakpoint regions databases experience as the number of concurrent users grows in size.

Hardware Scalability? Memory Memory CPU CPU CPU CPU CPU CPU CPU CPU CPU Memory Memory CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU Memory CPU CPU CPU CPU NUMA

NUMA CPU CPU CPU CPU Memory Memory CPU CPU CPU CPU Non Uniform Memory Architecture

Cluster Bottleneck Memory Memory CPU CPU CPU CPU CPU CPU CPU CPU

MPP Application Memory Memory CPU CPU CPU CPU CPU CPU CPU CPU Replicate and/or Distribute (sharding)

Physical Layer Same Network Layers but in reverse order Does TCP/IP scale? UI Yes No Application Tier The internet It's open Client / Server Connection Oracle Application Net Foundation Layer Routers / Hubs / Switches / Firewalls / VPN Transport Layer Internet Layer Internet Layer Link Layer IP addresses v4 The stack overhead Video Complex Database Tier

Scalability Metric Storage + Cost Cost Oracle Hadoop MySQL Data Volume

So again, do these transparently scale? Java Yes and No MySQL Yes and No PL/SQL - Yes Securefiles Singular Yes Clustered databases - No Object Oriented (client is different to the server) 3-Tier - No File Systems - No

Scalability Issues with Relational ACID transactions More locked rows, less updates Breaks data down, then assembles it SQL Queries CPU and Memory

Sacrifices made to Scale ACID transactions Delayed consistency More locked rows, less updates Dirty reads, read only Breaks data down, then assembles it Object Oriented, no adhoc querying SQL Queries CPU and Memory Hash indexing (NoSQL), no joins, replicated data, links, single focus access

Bi-Directional Scalability Scaling Small is as important as Scaling Large Small Large Ease of use Use with minimal skills CPU/Memory Device

The sacrifice MPP complexity, consistency, longer dev time Data Load concurrency More transactions consistency Data access - complexity To get more x, sacrifice y Faster queries indexes 8x more storage Faster DML local managed tablespaces waste storage Faster backups bit map block tracker sacrifice CPU + storage

Additional Dimensions Licence Data (storage) Development Users (memory) Backup Queries (CPU) Multimedia Large Query (Parallel) Loading Managing Delivery

So which scalability dimension? Licence or Size? Business dependent TCP Benchmarks

Challenges Social networks more users Search engines very fast queries Do vendors understand scalability?

Yes Small scale No Network (RDC) Microsoft Novice administrators File system (NTFS) But. GUI and Tools (Office)

Facebook Single business focus Social Network Scalability sacrificed Hundreds of millions of users data consistency for large # users user interface for large # users Horizontal Scalabilty (MPP) Spread load arbitrarily across many machines Cheap commodity hardware Mixed solutions

How Facebook Scales Different databases for different needs Increases storage Storage is cheap Sacrifice storage for scalability Efficiency is a separate effort from scaling

Facebook solutions MySQL Sharding, automation, monitoring heavy investment in operations and performance engineering. 50,000+ servers Cassandra (Distributes storage) Reliability Inbox search (NoSQL) BigTable (Distributed Hash)

Hive Data Warehouse on top of Hadoop Facebook solutions Data summarisation (query analysis) HipHop PHP to optimized c++ Scribe log data streamed in real time

Google Cache keys in global memory, not memcache Compression Sacrifice CPU to improve I/O One HTTP request retrieves all data MPP, distributed but hardware in close proximity

Google Interface cheats Will return great results, but not accurate Might say 2 million answers, but doesn't cache Sharding For MPP Darwinian infrastructure Try multiple scalability solutions Let the best win and propogate

Sharding Small part of the whole Partitioning of data Easily manage partitions Faster access to partitions Enables Parallel Access Useful for video streaming Used for MPP scaling

Memcache General-purpose distributed memory caching system Speed up dynamic database-driven websites Caching data and objects in RAM Reduces the number of times an external data source (such as a database or API) must be read.

So what is the track to upward scaling? Objects (co-locate data) Has index access (minimal i/o) No ACID MPP (low cost, commodity hardware, no backups, distributed)

The lesson to learn There is no one solution to scalability One needs multiple tools, products, solutions The further upwards (or downwards) the goal of scaling, the more sacrifices that are needed The transparent scaling benchmarks change as hardware changes

Yes Proven No Architecture discourages efficiency Does 3 Tier Scale? Separating Application for Data scales Scalability comes when the application and data is kept together 3 Tier scales when the network bandwidth is not an issue

Achilles Heel is the network User Interface Tier Application Middle Tier Natural Network Bottleneck Database Tier

3-Tier not practical for Multimedia Apps Multimedia stored in the database MPP apps Distributed architecture not suited Separating developers from the database Discourages efficient queries Scales well with legacy systems

Future Challenges Scaling naturally to MPP SQL very complex to distribute to MPP Shared data update complex to move to MPP Social Networks port well Mostly one user update own data, rest RO Delayed consistency

Future Challenges Scaling with multimedia Images in or out of database Filesystem Sharding, caching, performance MM good candidate Insert mostly, rare to update Multi user read Challenge is delivery over networks

Future Challenges Scaling with Virtualizations Shared Memory Shared CPU Shared Disk Shared Resources Great for Management Weakness is if all Vms have heavy usage

Does Oracle understand Scalability? Relational scaling upwards yes Scaling downwards mostly no, but starting Binary data yes but mostly no Super scalability using MPP no Hardware yes Transparent scalability better than the others Licence no, maybe with MySQL Network scalability no (but no vendor really does)

For questions Marcelle Kratochvil marcelle@piction.com