Database Scalability and Oracle 12c Marcelle Kratochvil CTO Piction ACE Director All Data/Any Data marcelle@piction.com
Warning I will be covering topics and saying things that will cause a rethink in how you view the Database world... No regrets
Do these scale? Java MySQL PL/SQL Securefiles Clustered databases Object Oriented 3-Tier File Systems
What about these metrics? TPC Benchmarks Hardware Disk Speed Parallelism Windows Unix
Focus is traditional view How much data can be stored How many queries can be run How many concurrent users How BIG???
Focus is on big...
Scalability is more Introducing new concepts Transparent Scalability Bi-Directional New Dimensions All scalability comes at a price (sacrifice)
Scalability comes at a Price Common method to achieve Delayed Consistency Data Warehouse Analytics (summary) Snapshot replication Log Mining Reverse data from REDO logs Course Grain Search engines Google (trawled pages are not indexed in real time)
Scalability comes at a Price Common method to achieve Batch and Queue Advanced Queuing Ensure consistent throughput
Theory vs Practicality New technology doesn't always naturally scale XML Object Oriented (clients vs server) Relational
<SOAP-ENV:Envelope xmlns:soap- ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/1999/xmlschema-instance" xmlns:xsd="http://www.w3.org/1999/xmlschema"> <SOAP-ENV:Body> <WEBSERVICE xmlns="http://host.piction.com/" SOAP- ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">webs ervice <myxml_values_are_easy_to_read:this_is_thename_of_person xmlns:d='http://www.develop.com/student' xmlns:i='urn:schemas-develop-com:identifiers' xmlns:p='urn:schemas-develop-com:programming-languages'> <unique_id:identifier_pk>3235329</unique_id:identifier_pk> <full_name_of_person>jane Doe</full_name_of_person> <another_namespace:language>c#</another_namespace:language> <myxml_values_are_easy_to_read:rating>9.5</myxml_values_are_easy_t o_read:rating> </myxml_values_are_easy_to_read:this_is_thename_of_person> </WEBSERVICE> </SOAP-ENV:Body> </SOAP-ENV:Envelope> 888 Characters Hard to read Time to generate Time to transmit Time to parse Time to process
{ } "pk": "3235329", "nm": "Jane Doe", "l": "C#", "rt": "9.5" 68 Characters Full use XML does not encourage scalability JSON does - sacrifice: meaning in attribute names (maintenance)
On a serious point which understands network scalability? NOT: Any of the big vendors Clue: Who came up with bit-torrent? Finally somone was getting it
MOST VENDORS The heart, the core of their architecture is client/server or 3-Tier The network has infinite capacity in their eyes Doomed to fail
Key Lesson The language, the tool, the environment used can encourage scalable practices How well does PL/SQL rate vs Java or Python?
Transparent Scalability Conceptual to Physical without change Can I take my conceptual design (tables) and install it without change? No change to # columns No change to data types No change to column lengths No changes for performance 1. Can this be done? (max columns) 2. How many users, how much storage, before it breaks?
Flexibility in design, maintenance, adhoc queries, upgrades comes at a price Scalability Conceptual Logical Scalability High Break Point Redesign Different databases have different break points Physical Entry Point Low All databases can be made to scale, but not all transparently scale well
Scalability High Break Point Conceptual Rigid Fixed Entry Point Physical Conceptual Break Point Rigid comes at the price if flexibility Logical Physical Entry Point Low
Hosted Small Medium Large Remote access Customer only 100Gb+ Outsourced No DBAs No Sys admin About 500Gb Data Black box soln No DBAs Some Sys admin About 1-5Tb Data Remote admin DBA Team Skilled Sys admin 1-50Tb data Secure Replicate Database Size vs Support Size - Does this scale also? DMZ
What separates database vendors is Transparent scalability not scalability based on impractical metrics
Database Terroir set of special characteristics Scalability You keep using that word. I do not think it means what you think it means. - Inigo Montoya Memory Concurrency CPU Storage Network The forgotten metric
MPP Cluster Process Memory Multi-CPU Caching Locking 1 10 50 100 200 500 1000 2000 5000 10000 #users This graph shows common breakpoint regions databases experience as the number of concurrent users grows in size.
Hardware Scalability? Memory Memory CPU CPU CPU CPU CPU CPU CPU CPU CPU Memory Memory CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU Memory CPU CPU CPU CPU NUMA
NUMA CPU CPU CPU CPU Memory Memory CPU CPU CPU CPU Non Uniform Memory Architecture
Cluster Bottleneck Memory Memory CPU CPU CPU CPU CPU CPU CPU CPU
MPP Application Memory Memory CPU CPU CPU CPU CPU CPU CPU CPU Replicate and/or Distribute (sharding)
Physical Layer Same Network Layers but in reverse order Does TCP/IP scale? UI Yes No Application Tier The internet It's open Client / Server Connection Oracle Application Net Foundation Layer Routers / Hubs / Switches / Firewalls / VPN Transport Layer Internet Layer Internet Layer Link Layer IP addresses v4 The stack overhead Video Complex Database Tier
Scalability Metric Storage + Cost Cost Oracle Hadoop MySQL Data Volume
So again, do these transparently scale? Java Yes and No MySQL Yes and No PL/SQL - Yes Securefiles Singular Yes Clustered databases - No Object Oriented (client is different to the server) 3-Tier - No File Systems - No
Scalability Issues with Relational ACID transactions More locked rows, less updates Breaks data down, then assembles it SQL Queries CPU and Memory
Sacrifices made to Scale ACID transactions Delayed consistency More locked rows, less updates Dirty reads, read only Breaks data down, then assembles it Object Oriented, no adhoc querying SQL Queries CPU and Memory Hash indexing (NoSQL), no joins, replicated data, links, single focus access
Bi-Directional Scalability Scaling Small is as important as Scaling Large Small Large Ease of use Use with minimal skills CPU/Memory Device
The sacrifice MPP complexity, consistency, longer dev time Data Load concurrency More transactions consistency Data access - complexity To get more x, sacrifice y Faster queries indexes 8x more storage Faster DML local managed tablespaces waste storage Faster backups bit map block tracker sacrifice CPU + storage
Additional Dimensions Licence Data (storage) Development Users (memory) Backup Queries (CPU) Multimedia Large Query (Parallel) Loading Managing Delivery
So which scalability dimension? Licence or Size? Business dependent TCP Benchmarks
Challenges Social networks more users Search engines very fast queries Do vendors understand scalability?
Yes Small scale No Network (RDC) Microsoft Novice administrators File system (NTFS) But. GUI and Tools (Office)
Facebook Single business focus Social Network Scalability sacrificed Hundreds of millions of users data consistency for large # users user interface for large # users Horizontal Scalabilty (MPP) Spread load arbitrarily across many machines Cheap commodity hardware Mixed solutions
How Facebook Scales Different databases for different needs Increases storage Storage is cheap Sacrifice storage for scalability Efficiency is a separate effort from scaling
Facebook solutions MySQL Sharding, automation, monitoring heavy investment in operations and performance engineering. 50,000+ servers Cassandra (Distributes storage) Reliability Inbox search (NoSQL) BigTable (Distributed Hash)
Hive Data Warehouse on top of Hadoop Facebook solutions Data summarisation (query analysis) HipHop PHP to optimized c++ Scribe log data streamed in real time
Google Cache keys in global memory, not memcache Compression Sacrifice CPU to improve I/O One HTTP request retrieves all data MPP, distributed but hardware in close proximity
Google Interface cheats Will return great results, but not accurate Might say 2 million answers, but doesn't cache Sharding For MPP Darwinian infrastructure Try multiple scalability solutions Let the best win and propogate
Sharding Small part of the whole Partitioning of data Easily manage partitions Faster access to partitions Enables Parallel Access Useful for video streaming Used for MPP scaling
Memcache General-purpose distributed memory caching system Speed up dynamic database-driven websites Caching data and objects in RAM Reduces the number of times an external data source (such as a database or API) must be read.
So what is the track to upward scaling? Objects (co-locate data) Has index access (minimal i/o) No ACID MPP (low cost, commodity hardware, no backups, distributed)
The lesson to learn There is no one solution to scalability One needs multiple tools, products, solutions The further upwards (or downwards) the goal of scaling, the more sacrifices that are needed The transparent scaling benchmarks change as hardware changes
Yes Proven No Architecture discourages efficiency Does 3 Tier Scale? Separating Application for Data scales Scalability comes when the application and data is kept together 3 Tier scales when the network bandwidth is not an issue
Achilles Heel is the network User Interface Tier Application Middle Tier Natural Network Bottleneck Database Tier
3-Tier not practical for Multimedia Apps Multimedia stored in the database MPP apps Distributed architecture not suited Separating developers from the database Discourages efficient queries Scales well with legacy systems
Future Challenges Scaling naturally to MPP SQL very complex to distribute to MPP Shared data update complex to move to MPP Social Networks port well Mostly one user update own data, rest RO Delayed consistency
Future Challenges Scaling with multimedia Images in or out of database Filesystem Sharding, caching, performance MM good candidate Insert mostly, rare to update Multi user read Challenge is delivery over networks
Future Challenges Scaling with Virtualizations Shared Memory Shared CPU Shared Disk Shared Resources Great for Management Weakness is if all Vms have heavy usage
Does Oracle understand Scalability? Relational scaling upwards yes Scaling downwards mostly no, but starting Binary data yes but mostly no Super scalability using MPP no Hardware yes Transparent scalability better than the others Licence no, maybe with MySQL Network scalability no (but no vendor really does)
For questions Marcelle Kratochvil marcelle@piction.com