Linux on Power Open Source Databases Kevin Lawrence IBM - NA Power Systems - Server Solutions Ecosystem Open Source Databases 2016 IBM Corporation
Linux on Power - Open Source Databases By 2018, more than 70% of new in-house applications will be developed on an OSDBMS, and 50% of existing commercial RDBMS instances will have been converted or will be in process * *Gartner - The State of Open Source RDBMSs, 2015, by Donald Feinberg and Merv Adrian, published April 21, 2015. 2016 IBM Corporation 2
Database Ecosystem Many Database choices spanning commercial to open source products, Relational and non-relational models no single winner takes all, Relational DBs strengths transactional integrity and large ecosystem around SQL NoSQL DBs are much lower cost and provide clients a simple data model with dynamic control over store and retrieve of primarily unstructured data types. The primary 4 flavors of NoSQL DBs are all available on Power 8 : Key/Value Store (example is Redis) Document Store (example is MongoDB) Columnar Store (example is Cassandra Graph Stores (example is Neo4J) 2016 IBM Corporation 3
Types of Databases Relational database management systems (RDBMS) support the relational (table-oriented) data model. The schema of a table (relation schema) is defined by the table name and a fixed number of attributes with fixed data types. A record (entity) corresponds to a row in the table and consists of the values of each attribute. (Open Source example would be Postgres/EnterpriseDB) Document Databases (eg MongoDB) store data in Documents, Documents contain one or more Fields. Data can be queried based on any combination of fields in a document. The appeal of these systems is that that are very general purpose, have large application ecosystems and map very nicely to support and enable many of today s object oriented programing styles. Key Value Store Databases (eg Redis) are the most basic type of nonrelational DBs. They store a Key and associated Values. Wide Column Stores (example Cassandra) vary in the number of Columns that are stored. The appeal of these systems is around their very high performance and scalability. Graph Databases (eg Neo4j) focus on storing simple and complex relationships and can be queried to discover simple and more complex relationships between the data. 2016 IBM Corporation 4
Types of Databases with Open Source Examples - Example: MongoDB - Example: Redis Relational - Example: EnterpriseDB Wide column store - Example: Cassandra Graphical - Example: Neo4J 2016 IBM Corporation 5
Common Linux on Power OSDBs Name Classification Optimized for Common Use Cases MongoDB NoSQL - Document Store Document Model, Document stores, semistructured or unstructured data. Redis NoSQL - in memory Key Value Store Data queues, Strings, Lists, Counts, caching, Statistics, Text, session IDs, pictures, videos Cassandra NoSQL - Wide Column Store NoSQL environments that need Very High Performance and Scalability, Very High data volumes Neo4J NoSQL - Graph Store Data stored as edges, nodes, or attributes (Graphs). Single view of Customer records, Enterprise content management, catalogs, personalization Live in memory cache, data queues, User session data, shopping cart data, Messaging, Fraud detection, Internet of Things data sensor data, log data, telco call detail records Fraud detection, Social Network Analysis, Location aware apps, Master data mgmt., Machine Learning PostGres (Enterprise DB) Open source Object Relational database Wide variety of transactional work at lower TCO relational/structured queries to object store and retrieval Oracle RDBMs migrations and takeouts MariaDB Open source Relational database Lower cost transactional SQL based queries and updates Migrations from Oracle MySQL, Turbo LAMP stack 2016 IBM Corporation 6
Redis Main points: Simple values or data structures by keys. Blazing fast Exploits Power 8: Redis Labs on Power utilizes IBM POWER8 servers, the IBM Flash System, the IBM CAPI-Flash card and the Redis Labs Enterprise Cluster (RLEC) for Flash software. Other features : Master-slave replication, automatic failover Best used: For rapidly changing data with a foreseeable database size (should fit mostly in memory). For example: To store real-time stock prices. Real-time analytics. Leaderboards. Real-time communication. And wherever you used memcached before. 2016 IBM Corporation 7
MongoDB Main point: Retains some friendly properties of SQL. (Query, index) Exploits Power 8 features: Performance, MongoDB with CAPI Flash on P8 testing just starting Other features : Master/slave replication (auto failover with replica sets), Sharding, Text search integrated, Has geospatial indexing Data center aware Best used: If you need dynamic queries. If you prefer to define indexes, not map/reduce functions. If you need good performance on a big DB. If you wanted CouchDB, but your data changes too much, filling up disks. For example: Most popular NoSQL Document DB. 2016 IBM Corporation 8
Cassandra Main point: Store huge datasets, retrieves in "almost" SQL (CQL3) Exploits Power 8 features : Apache Other features: CQL3 is the official interface and very similar SQL, but with some limitations that come from the scalability (most notably: no JOINs, no aggregate functions.) Querying by key, or key range (secondary indices are also available). Highly scalable and highly available with no single point of failure NoSQL column family implementation Very high write throughput and good read throughput. Writes can be much faster than reads (when reads are disk-bound) SQL-like query language (since 0.8) and support search through secondary indexes Tunable consistency and support for replication Flexible schema Map/reduce possible with Apache Hadoop Very good and reliable cross-datacenter replication Best used: When you need to store data so huge that it doesn't fit on server, but still want a friendly familiar interface to it. For example: Web analytics, to count hits by hour, by browser, by IP, etc. Transaction logging. Data collection from huge sensor arrays. 2016 IBM Corporation 9
Neo4j Main point: NoSQL Graph database optimized for connected data Exploit Power 8 features: Neo4j on POWER8 offers 56 TB of extended memory, drastically increasing the size at which realtime graph queries are possible. Real-time graph processing with Neo4j on POWER8 supports both standard operational requirements and analytic insights that normally require offline processing. IBM POWER8 hardware allows Neo4j to scale both up and out for graphs of greater size than ever before. Other features: HTTP/REST (or embedding in Java) Full ACID (Atomicity, Consistency, Isolation, Durability) conformity (including durable data) Integrated pattern-matching-based query language ("Cypher") Indexing of keys, nodes and relationships Advanced path-finding with multiple algorithms Optimized for reads Has transactions (in the Java API) Clustering, replication, caching, online backup, advanced monitoring and High Availability are commercially licensed Best used: For graph-style, rich or complex, interconnected data. For example: For searching routes in social relations, public transport links, road maps, or network topologies. 2016 IBM Corporation 10
EnterpriseDB (Postgres) Main Point: Enterprise class, Open Source, Relational Database Easily integrates/supplants OracleDB - This means that many applications written for Oracle run on Postgres Advanced Server without modification and Oracle-skilled developers can use it with minimal re-training. Performance EDB running on Power8 brings a cost-effective, enterprise-class solution to CIOs and IT managers running Red Hat Enteprise Linux 7.x and Power8 based on little endian. EDB Postgres Advanced Server on Power8 offers 2x higher performance over Intel-based systems for OLTP applications, high performance multi-threading, more cache and greater data bandwidth Scalability Reliably handles multi-terabyte data sets supporting millions of users with guaranteed transactional integrity and continuous availability TCO Reduces operating costs by requiring less systems at a lower acquisition cost DBMS Convergence Support traditional structured, semi-structured, and unstructured data types to reduce the need to deploy costly, one-off NoSQL data silos, adoption of Postgres and migration of workloads from proprietary databases. Services Brings together two industry leaders committed to Open Source offerings. EDB Postgres Management, Integration, and Migration Suites supports replication, HA, database monitoring/management and data integration for mission-critical enterprise applications. 2016 IBM Corporation 11
Modernize your Database with POWER8 and EnterpriseDB 30% Less servers 84% reduction in SW licensing cost with fewer cores and EnterpriseDB 29% reduction in HW costs and maintenance 68% reduction in core count 6000000 Solution TCO for 3 years 5000000 4000000 3000000 2000000 1000000 79% 3-year TCO Reduction 0 S822LC/20c/2.926 with EnterpriseDB HP DL380p/Brwell (2s) with OracleEE Environmentals HW SW Assumptions: 7 Power S922LC servers (65% utilization) have equivalent performance as 10 x86 servers (40% utilization)
Modernize your Database with POWER8/PowerKVM and MongoDB vs x86/vmware and Oracle EE 90% reduction in SW licensing cost with fewer cores and MongoDB 23% reduction in HW costs and maintenance 45% reduction in core count 6000000 Solution TCO for 3 years 5000000 4000000 3000000 2000000 1000000 0 S822LC/20c/2.926 with MongoDB Environmentals HW SW HP DL380/BWL/44c/2.2 with OracleEE 85% 3-year TCO Reduction Assumptions: 7xPower S822LC/20c servers with PowerKVM (40% utilization) have equivalent performance as 10xHPDL380/E5-2699 v4/44c servers with VMWare (40% utilization) Performance is based on SPECint_rate
Hortonworks Announcement Announced at IBM Edge: Hortonworks HDP is coming to Power! What is Hortonworks HDP? It is an Enterprise-ready open source Apache Hadoop distribution based on a centralized architecture (YARN). HDP addresses the complete needs of data-at-rest, powers real-time customer applications and delivers robust analytics that accelerate decision making and innovation 2016 IBM Corporation 14
By 2018, more than 70% of new inhouse applications will be developed on an OSDBMS, and 50% of existing commercial RDBMS instances will have been converted or will be in process * *Gartner - The State of Open Source RDBMSs, 2015, by Donald Feinberg and Merv Adrian, published April 21, 2015. 2016 IBM Corporation 15
Trademarks and notes IBM Corporation 2016 IBM, the IBM logo and ibm.com are registered trademarks, and other company, product, or service names may be trademarks or service marks of International Business Machines Corporation in the United States, other countries, or both. A current list of IBM trademarks is available on the web at Copyright and trademark information at www.ibm.com/legal/copytrade.shtml. Other company, product, and service names may be trademarks or service marks of others. References in this publication to IBM products or services do not imply that IBM intends to make them available in all countries in which IBM operates. IBM and IBM Credit LLC do not, nor intend to, offer or provide accounting, tax or legal advice to clients. Clients should consult with their own financial, tax and legal advisors. Any tax or accounting treatment decisions made by or on behalf of the client are the sole responsibility of the customer. IBM Global Financing offerings are provided through IBM Credit LLC in the United States, IBM Canada Ltd. in Canada, and other IBM subsidiaries and divisions worldwide to qualified commercial and government clients. Rates and availability are based on a client s credit rating, financing terms, offering type, equipment type and options, and may vary by country. Some offerings are not available in certain countries. Other restrictions may apply. Rates and offerings are subject to change, extension or withdrawal without notice. 2016 IBM Corporation 16
Welcome to the Waitless World. 2016 IBM Corporation 17