The First Hybrid, In-Memory RDBMS Powered by Hadoop and Spark
|
|
|
- George Shields
- 9 years ago
- Views:
Transcription
1 WHITE PAPER The First Hybrid, In-Memory RDBMS Powered by Hadoop and Spark
2 Copyright 2015 Splice Machine. All rights reserved. The content of this white paper, including the ideas and concepts contained within, are the property of Splice Machine, Inc. This document is considered proprietary and confidential, and should not be reproduced or reused in any way without the permission of Splice Machine. Published 11/15
3 Table of Contents Executive Summary... 2 Decisions in the Moment... 3 Inadequate Technology Options... 4 The Advantages of SQL, the Scale Out of NoSQL, and the Performance of In-Memory... 5 Proven Technologies... 6 Comparison to Other Hadoop Technologies... 8 Cross-Industry Use Cases... 9 Applications Key Features and Benefits of Splice Machine Conclusion Splice Machine: The First Hybrid, In-Memory RDBMS Powered by Hadoop and Spark 1
4 Executive Summary Splice Machine: The First Hybrid, In-Memory RDBMS Powered by Hadoop and Spark Business is moving faster than ever before, driven by the proliferation of social media, cloud computing and mobile devices. To remain competitive, companies need to make decisions in the moment. Real-time decision-making demands technology that can manage the massive increases in both data volume and velocity that Big Data brings to the table. Splice Machine builds upon proven open-source technologies like Hadoop, HBase TM, Apache Spark TM and Apache Derby TM to create an innovative, generalpurpose database that provides the following benefits when compared to traditional RDBMSs such as Oracle, IBM DB2, or MySQL: 10-20x Faster leverages HBase, the distributed NoSQL DB, as well as inmemory cluster computing from Spark ¼ the Cost scales out on commodity hardware using Hadoop ANSI SQL - leverages existing SQL-based analysts, reports, and applications without rewrites Distributed Transactions ensures reliable updates across multiple rows and tables, based on advanced research by Google Flexible provides excellent performance for simultaneous OLTP and OLAP workloads Elastic increases or decreases scale in just a few minutes Use Cases The Splice Machine RDBMS is ideal for a broad range of use cases, including digital marketing, ETL acceleration, operational data lakes, data warehouse offloads, IoT applications, web, mobile, and social applications, and operational applications. It also enables a new class of data-driven applications that only Splice Machine can power in real-time. Splice Machine: The First Hybrid, In-Memory RDBMS Powered by Hadoop and Spark 2
5 Decisions in the Moment Being data-driven is almost cliché in business today, but the little known fact is that most business decisions are still made using stale data. Today s systems can only catch up to yesterday s data. What businesses need is to be able to make decisions in the moment. However, this is extraordinarily challenging because increasing data volume and velocity is overwhelming traditional databases. Data volume is growing 30-40% per year, 1 approximately an order of magnitude faster than IT budgets, which are growing at 3-4% per year. 2 Reaching this scale with traditional RDBMSs the backbone of IT, requires you to scale up with increasingly larger, specialized servers from companies such as Oracle and IBM, but this is cost prohibitive. fig. 1 By shortening the decision lifecycle, data-driven companies will create competitive advantage. Data velocity has accelerated greatly, driven by real-time data streams from social media, programmatic marketing, mobile devices, and the Internet of Things (IoT). For instance, in digital marketing, companies want to deliver realtime, personalized experiences across channels based on the last 5 clicks to drive greater engagement and conversion. This requires making decisions in the moment with real-time data, not after the fact with day-old data ETL d (Extracted, Transformed, and Loaded) from another system. This also demands databases that can simultaneously handle OLTP and OLAP workloads to eliminate the ETL delay for analysis. REAL-TIME, DATA-DRIVEN BUSINESS Real-Time Actions Based on Real-Time Data COLLECT ANALYZE ACT SECONDS for Real-Time Updates SECONDS for Apps to Analyze SECONDS for Apps to Act CURRENT BUSINESS Delayed Actions Based on Out-of-Date Data COLLECT 1 DAY for ETL ANALYZE DAYS for a Human Analyst ACT DAYS TO WEEKS by Humans 1 Source: 2013 IBM Briefing Book 2 Source: Gartner, Worldwide IT, Spending forecast, 3Q13 Update Splice Machine: The First Hybrid, In-Memory RDBMS Powered by Hadoop and Spark 3
6 Compared to SQL-on-Hadoop technologies, the Splice Machine RDBMS is unique as the only transactional ANSI SQL database leveraging both Hadoop and Spark. Inadequate Technology Options Current technology solutions cannot support real-time, data-driven businesses that may need to process terabytes or even petabytes of data to make decisions in the moment. Traditional Commercial RDBMSs Traditional RDBMSs (Relational Database Management Systems) such as Oracle or IBM DB2 can support real-time updates, but they require expensive specialized hardware to scale up. Costing up to millions of dollars per system, these specialized systems become cost prohibitive quickly. Traditional Open-Source RDBMSs Traditional open source databases such as MySQL TM and PostgreSQL are unable to scale beyond a few terabytes without manual sharding. Manual sharding requires a partial rewrite of every application and becomes a maintenance nightmare to periodically rebalance shards. For most companies, manual sharding is not a viable option. In-Memory-Only Databases New in-memory-only databases provide very fast performance, but all of the data must be stored in-memory, which can become very expensive beyond a few terabytes. To solve these cost issues while still maintaining their speed advantages, in-memory databases need to have spill-to-disk and handle node failures without dropping queries. Hadoop New Big Data technologies such as Hadoop and HBase provide cost-effective and proven scalability from terabytes to petabytes with inexpensive commodity servers. They also provide very limited SQL support, the de facto data language in most organizations. The lack of full ANSI SQL support is a major barrier to Hadoop adoption, as companies cannot leverage existing investments in SQLtrained resources as well as SQL-based applications and Business Intelligence (BI) tools. These factors have combined to prevent companies from getting value out of their current Hadoop deployments beyond being a massive data store. Splice Machine: The First Hybrid, In-Memory RDBMS Powered by Hadoop and Spark 4
7 Splice Machine: The Advantages of SQL, the Scale Out of NoSQL, and the Performance of In-Memory Splice Machine delivers a unique hybrid of state-of-the-art technologies: Distributed, Transactional SQL Support existing applications without rewrites or retraining of existing SQL-based analysts Scale-Out Cost-effectively scale out on commodity hardware with proven technology from Hadoop and Spark In-Memory Experience outstanding performance for OLAP queries with in-memory technology from Spark Cost-Effective, Hybrid Architecture The Splice Machine RDBMS is an innovative hybrid of in-memory computing from Spark and disk-based storage from Hadoop. Unlike in-memory-only databases, Splice Machine does not force companies to put all of their data in-memory, which can become prohibitively expensive as data volumes grow. Splice Machine uses Spark in-memory computation to materialize the intermediate results of long-running queries but leverages the power of HBase to durably store and access data at scale. State-of-the-Art Optimizer Splice Machine started with Apache Derby, an ANSI SQL Java database, and replaced its storage layer with HBase/Hadoop and Spark. The Splice Machine optimizer automatically evaluates each query and sends it to the right computation engine: OLTP queries (i.e., small read/writes, range queries) go to HBase/Hadoop, OLAP queries (i.e., large joins or aggregations) go to Spark. Mixed OLTP/OLAP Workloads The Splice Machine RDBMS is a high-performance database designed to simultaneously handle OLTP and OLAP queries. With separate processes and resource management for Hadoop and Spark, Splice Machine can ensure that OLAP queries do not interfere with OLTP queries. Questions to consider when selecting your next database: Can it support simultaneous OLTP and OLAP queries? Does it scale-out on commodity hardware? Does it have in-memory technology? Can it support ACID transactions? Splice Machine: The First Hybrid, In-Memory RDBMS Powered by Hadoop and Spark 5
8 Proven Technologies Splice Machine marries three proven technology stacks: Apache Derby, HBase/Hadoop and Spark. Apache Derby With over 15 years of development, Apache Derby is a Java-based SQL database. Splice Machine chose Derby because it is a full-featured ANSI SQL database, lightweight (<3 MB), and easy to embed into the HBase/Hadoop stack. fig. 2 The Splice Machine Solution Stack OPERATIONAL APPS SQL & BI TOOLS SPLICE MACHINE SPLICE BULK IMPORT / EXPORT SPLICE PARSER SPLICE ODBC / JDBC SPLICE PLANNERS SPLICE OPTIMIZER SPLICE EXECUTOR DERBY TM OLTP OLAP HADOOP HBASE TM HDFS TM SPARK TM Splice Machine: The First Hybrid, In-Memory RDBMS Powered by Hadoop and Spark 6
9 HBase and Hadoop Splice Machine is unique in its ability to enable a new breed of realtime, analytical applications that can collect, analyze, and react to data in realtime to transform industries. HBase and Hadoop have become the leading platforms for distributed computing. HBase is based on Google s Big Table technology and uses the Hadoop Distributed File System (HDFS) for reliable and available persistent storage. HBase provides auto-sharding and failover technology for scaling database tables across multiple servers. It also enables real-time, incremental writes on top of the immutable Hadoop file system. HBase and Hadoop are the only technologies proven to scale to dozens of petabytes on commodity servers and are used by companies such as Facebook, Twitter, Adobe and Salesforce.com. Splice Machine chose HBase and Hadoop because of their proven auto-sharding, replication, and failover technology. Apache Spark Apache Spark has emerged as the leading in-memory cluster computing engine. It has very efficient in-memory processing, with spill-to-disk so that queries that do not fit in memory, do not fail. Unlike many other in-memory technologies, Spark is resilient to node failures. In other words, it can retry work on other active nodes in light of a failure so that a computation does not fail. Splice Machine integrates these technology stacks by replacing the storage engine in Apache Derby with HBase and its query executor with HBase and Spark. Splice Machine retains the Apache Derby parser, but it redesigned the planner, optimizer, and executor to leverage the distributed HBase computation engine. Each physical HBase node has two separate processes and memory spaces: the HBase Region Server with the Splice Machine parser, planner, and optimizer embedded; and the Spark computation engine. The Splice Machine executor compiles its planned computation in bytecode that is serialized to each distributed HBase region (i.e., data shard) via HBase co-processors or to each Spark worker, depending on which computation engine is selected for the statement. This enables massive parallelization by pushing the computation down to each distributed data shard. Compatible With Any Hadoop Distribution Splice Machine can be used with any standard Hadoop distribution that has both HBase and Spark. Supported Hadoop distributions include Cloudera, MapR and Hortonworks. Splice Machine: The First Hybrid, In-Memory RDBMS Powered by Hadoop and Spark 7
10 Comparison to Other Hadoop Technologies The SQL-on-Hadoop space has become crowded with many technologies. Unlike Splice Machine, these technologies are designed for analytics only and cannot support high volumes of real-time updates. This means that updates require extensive and time-consuming ETL (Extract, Transformation, and Load) processes that delay reports and often run only on a daily basis. Most of the technologies also provide limited and non-standard dialects of SQL. Compared to SQL-on-Hadoop technologies, the Splice Machine RDBMS is unique as the only transactional ANSI SQL database leveraging both Hadoop and Spark. Transactions ensure that real-time updates can be reliably made without data loss or corruption. It is also the only general-purpose SQL database on Hadoop that can simultaneously support both operational and analytical workloads. Splice Machine: The First Hybrid, In-Memory RDBMS Powered by Hadoop and Spark 8
11 Cost-effective scaling with commodity software Cross-Industry Use Cases General-purpose databases such as MySQL and Oracle are used across a range of operational (OLTP) and analytical (OLAP) applications. As a general-purpose database that can scale-out with commodity servers, Splice Machine can be used anywhere that MySQL or Oracle can be used, but it really excels whenever the database size exceeds a few terabytes. Unlock the Value in Hadoop Hadoop has become a standard when it comes to storage of massive amounts of data, but it is a file system, not a database. With Splice Machine, companies can use Hadoop to power operational applications and analytics on real-time data. NoSQL Falls Short Companies requiring cost-effective scaling on commodity hardware have historically turned to NoSQL solutions such as MongoDB, Cassandra TM, or HBase. However, the scalability comes at a great cost little or no SQL language support, as well as lack of transactions, joins, aggregations, and secondary indices. Application developers can quickly discover these omissions and are then forced to recreate that functionality for each application a costly, timeconsuming, and error-prone effort. Splice Machine enables these developers to enjoy the best of both worlds standard SQL AND scale-out on commodity hardware. Big Iron Too Expensive Traditional RDBMSs such as Oracle and IBM DB2 typically require specialized server hardware to scale beyond a few terabytes. Each of these servers can cost millions of dollars. Even small spikes in volume can require massive spending to scale to requirements. For most companies, the cost becomes prohibitive to handle terabytes, let alone petabytes, of data. Splice Machine enables highly costeffective scaling out on commodity servers for companies currently trapped with spiraling costs from Big Iron. MySQL Can t Scale Open source databases such as MySQL and PostgreSQL can no longer scale once the data volume exceeds a few terabytes. Manual sharding is the only option to handle increased data volumes. It requires a partial rewrite of every application as well as the rebalancing of shards. Splice Machine can help those companies scale beyond a few terabytes with the proven auto-sharding capability of HBase. Splice Machine: The First Hybrid, In-Memory RDBMS Powered by Hadoop and Spark 9
12 Data Warehouse Offload Teradata or Netezza data warehouses employ very expensive, specialized hardware. For many companies, they recognize that up to 50% of the workloads (e.g., operational reports) on those systems can be offloaded to less expensive, scale-out systems. With Spark in-memory technology, the Splice Machine RDBMS can deliver similar performance at a dramatically lower cost. Future Proof New Apps It can often be difficult to anticipate future data requirements for new applications. However, as businesses become more data intensive, it is likely that many applications will exceed a few terabytes of data in their lifetime. No company wants to build an application and then endure a costly database migration in a few years because data growth was beyond expectations. Splice Machine provides a future-proof database platform that can scale costeffectively from gigabytes to petabytes for new applications. Splice Machine: The First Hybrid, In-Memory RDBMS Powered by Hadoop and Spark 10
13 Applications General-purpose databases like Splice Machine can be used in almost every industry across many different real-time applications. However, Splice Machine is unique in its ability to enable a new breed of real-time, analytical applications that can collect, analyze, AND react to data in real-time to transform industries. The rest of this section highlights a few examples of real-time applications that Splice Machine can power. Digital Marketing fig. 3 Splice Machine s ability to process real-time updates enables retail campaigns that provide true real-time personalization. Consumer marketing optimizes interactions with millions of consumers across websites, mobile devices, s and ads. The holy grail is real-time personalization, which shows the right message, to the right person, at the right time. Traditional solutions employ analytical models based only on yesterday s data. Splice Machine enables companies to make decisions in the moment with real-time data, not after the fact with day-old data ETL d (Extracted, Transformed, and Loaded) from another system. Cross-Channel Campaigns Real-Time Personalization RETAIL BRANDS UNICA CONSUMERS Real-Time Actions SPLICE MACHINE Splice Machine: The First Hybrid, In-Memory RDBMS Powered by Hadoop and Spark 11
14 Tuning Ongoing database tuning to address performance issues ODS Systems of Record Data Too Old Data is hours or days old, when business needs it near real-time ERP ETL Data Warehouse Delayed Reports Errors or performance issues cause miss of ETL window and delay reports CRM Unable to Meet Business Needs Takes weeks or months to change or create new reports Expensive Scale-up hardware and proprietary software Script Maintenance Constant updating of ETL scripts to handle changing sources and reports Too Slow Can take hours or even days to finish ETL pipeline fig. 4 ETL issues create inertia and hidden costs for businesses ETL Acceleration The ETL process is a hidden burden within most IT departments, raising costs and slowing down decision-making. With Big Data, ETL pipelines have become an even bigger bottleneck for applications and analysts who rely on the data flowing through them. By replacing overwhelmed operational data stores (ODS) using RDBMSs like Oracle with the Splice Machine RDBMS, companies can drive ETL lag down from days and hours to minutes and seconds. The Splice Machine RDBMS combines the best of all worlds the transactional integrity of an RDBMS, the proven scale out of Hadoop and the in-memory performance of Spark to provide an easier way to scale out the capacity of the ETL pipeline. Splice Machine: The First Hybrid, In-Memory RDBMS Powered by Hadoop and Spark 12
15 Operational Data Lake An operational data lake (ODL) is a modern replacement for older Operational Data Stores (ODSs). An ODL enables offloading real-time reporting and analytics from more expensive OLTP and data warehouse systems, as well as performing aggregation and transformation for ETL processes. fig. 5 Splice Machine powers system telemetry applications that transform the way devices are serviced. Compared to traditional ODSs using scale-out technologies, an ODL provides a 20x price/performance advantage with scale-out technology. It can also handle semi-structured and unstructured data as part of a larger Hadoop-based data lake. Many companies find that replacing an ODS with an operational data lake represents an excellent first project in Big Data. INTERNET-ENABLED DEVICES System Telemetry Remote Resets CUSTOM MONITORING APP SPLICE MACHINE Internet Of Things Splice Machine has the scalability to ingest, analyze, and react to massive volumes of telemetry data. Telemetry is a stream of data from devices used to detect conditions that require action in real-time. These conditions can be system failures, resource optimizations, fraudulent actions, marketing or service opportunities, or intrusions. System-intensive industries such as telecoms, traditional utilities, internet service providers (ISPs), and television cable/satellite companies use real-time telemetry to proactively detect failures, isolate them, and then attempt remote resets to prevent service calls and dispatch of onsite technicians. Cross-industry examples include server monitoring to detect any degradation of website performance followed up with offers to the affected users to increase customer retention. Cyberthreat security applications monitor firewalls to correlate real-time activity with historic firewall logs to determine if there is a real threat. Traditional systems are unable to ingest and analyze the sheer volume of system telemetry data. Splice Machine has the scalability to ingest, analyze, and react to massive volumes of telemetry data. Splice Machine: The First Hybrid, In-Memory RDBMS Powered by Hadoop and Spark 13
16 Precision Medicine Splice Machine provides real-time analytics on a Big Data scale. As healthcare costs continue to rise faster than inflation, countries such as the United States have started to shift away from activity-based payments to outcome-based payments. Under the Affordable Care Act, hospitals in the United States will not be paid if a patient is readmitted in less than 30 days. This provides a powerful incentive to improve the long-term effectiveness of treatments. The medical industry has begun to focus on precision medicine, which uses genomic, clinical, and diagnostic data to tailor a treatment for a particular individual. Genomic data, which can be massive, can help predict outcomes and possible complications, as well as tailor medications for complex diseases such as cancer. Clinical data includes Electronic Health Records (EHRs) to ensure that care is coordinated across multiple treatments and medical providers. Diagnostic data, especially from real-time device data both at the hospital and at home, can trigger intervention before complications reach a tipping point and require hospital readmission. Traditional systems are unable to handle the massive volume of data required for precision medicine. Splice Machine can support precision medicine by handling the huge stores of genomic data, providing the structure for EHRs, and ingesting large quantities of real-time device data. Operational Applications Traditional RDBMSs typically power real-time operational systems (OLTP), including enterprise software as well as web, mobile, and social applications. These systems require a high volume of real-time reads and writes with transactional integrity. With ACID (Atomicity, Consistency, Isolation, Durability) transactions, Splice Machine can power these applications, while simultaneously providing cost-effective scaling with commodity hardware. Operational Analytics Operational analytical systems (OLAP) on traditional RDBMSs have traditionally supported real-time, ad-hoc queries on transactional and aggregated data. However as that transactional data gets augmented with an explosion of real-time web, social, mobile and machine-generated data, traditional systems cannot scale. Specialized appliances such as Teradata and Netezza can scale to handle massive data sets, but they are expensive and require extensive ETL. Splice Machine provides the best of both worlds: real-time operational analytics on a Big Data scale. Splice Machine: The First Hybrid, In-Memory RDBMS Powered by Hadoop and Spark 14
17 Key Features and Benefits of Splice Machine Standard ANSI SQL to Leverage Existing SQL Tools & Skill Sets Leveraging the proven SQL processing of Apache Derby, Splice Machine provides a true RDBMS on Hadoop. Splice Machine is SQL-99 compliant and includes functionality such as: Transactions Joins Secondary indexes Aggregations Sub-queries Triggers Constraints User-defined functions (UDFs) Column-level security Stored procedures Views Virtual tables Window functions Standard SQL enables companies to leverage existing SQL-trained resources to develop and maintain applications on Splice Machine. Existing applications can migrate to Splice Machine with minimal effort, thanks to ODBC and JDBC drivers that provide seamless connectivity to BI tools such as IBM Cognos, SAP Business Objects, Tableau and MicroStrategy, ETL tools like Informatica and Ab Initio, statistical tools such as SAS and R, and SQL tools such as Toad and DbVisualizer TM. Query Acceleration with Advanced In-Memory Technology Splice Machine embeds Apache Spark a fast, open source engine for largescale data processing to accelerate OLAP queries. Spark has very efficient in-memory processing, but it can spill to disk (instead of dropping the query) if the query processing exceeds available memory. More importantly, Spark is unique in its resilience to node failures, which can happen in a commodity cluster. Other in-memory technologies will drop all queries associated with a failed node, while Spark uses ancestry to regenerate its in-memory Resilient Distributed Datasets (RDDs) on another node and therefore does not require redundant storage. Splice Machine: The First Hybrid, In-Memory RDBMS Powered by Hadoop and Spark 15
18 Cost-Effective Scaling with Commodity Hardware Splice Machine leverages the proven auto-sharding of HBase to scale with commodity servers. At companies such as Facebook, HBase has scaled to dozens of petabytes using inexpensive commodity hardware. HBase horizontally partitions or splits each table into smaller chunks or shards that are distributed across multiple servers. For performance reasons, it automatically splits shards that become too large. This auto-sharding occurs automatically without impacting applications built on Splice Machine. SPLICE MACHINE Client Zookeeper HMaster HRegionServer HRegionServer HBase HRegion HLog Store StoreFile HFile MemStore StoreFile HFile Store MemStore StoreFile HFile HRegion HLog Store StoreFile HFile MemStore StoreFile HFile Store MemStore StoreFile HFile DFS Client DFS Client Hadoop DataNode DataNode DataNode DataNode fig. 6 HBase architecture Splice Machine: The First Hybrid, In-Memory RDBMS Powered by Hadoop and Spark 16
19 Unprecedented Support for Simultaneous OLTP & OLAP Workloads The hybrid in-memory architecture of the Splice Machine RDBMS is designed to provide high performance for simultaneous OLTP and OLAP queries. Spark provides the performance of in-memory technology for OLAP queries, while HBase can scale out to petabytes for OLTP queries. Advanced Optimizer. The Splice Machine RDBMS delivers a seamless integration of both Spark and HBase. The Splice Machine optimizer automatically evaluates each query and sends it to the right computation engine: OLTP queries (i.e., small read/writes, range queries) go to HBase/ Hadoop, OLAP queries (i.e., large joins or aggregations) go to Spark. Isolated Resource Management. Splice Machine uses advanced resource management to ensure high-performance for simultaneous OLTP and OLAP queries. With separate processes and resource management for Hadoop and Spark, the Splice Machine RDBMS can ensure that large, complex OLAP queries do not interfere with time-sensitive OLTP queries. Users can set custom priority levels for OLAP queries to ensure that important reports are not blocked behind massive batch processes that consume all cluster resources. Splice Machine: The First Hybrid, In-Memory RDBMS Powered by Hadoop and Spark 17
20 Real-Time Updates with Transactional Integrity Database transactions ensure that real-time updates can be reliably executed without data loss or corruption. Even in analytical applications, transactions are important. For instance, transactions guarantee that the data and secondary indexes are updated atomically. Transactions also enable zero-downtime updates or ETL to data warehouses, as data can be updated while reports simultaneously see a consistent view of the data. Splice Machine provides full ACID (Atomicity, Consistency, Isolation, Durability) transactions across rows and tables. It uses a form of Multiple Version Concurrency Control (MVCC) called snapshot isolation that does not change records instead it creates a new version with a unique timestamp. Each transaction can use different versions of each record to create its own snapshot of the database. With each transaction having its own snapshot, transactions can execute concurrently without locking reads. This leads to very high throughput and avoids troublesome deadlock conditions. Splice Machine developed a state-of-the-art snapshot design based on the latest research on distributed transactions. Splice Machine extended the research of Google s Percolator system, Yahoo Labs, and the University of Waterloo on HBase transactions with support for very large transactions and increased throughput of bulk loads. Share Quantity Share Price TIMESTAMP VALUE TIMESTAMP VALUE T12 4,000 Virtual Snapshot T7 $15.11 T7 2,000 T5 $15.65 T3 5,000 T2 $15.74 T1 3,000 T0 $15.27 value held = share quality * share value held = value held = 5,000*$15.64 fig. 8 High concurrency, ACID transactions Splice Machine: The First Hybrid, In-Memory RDBMS Powered by Hadoop and Spark 18
21 High-Performance, Distributed Computing Architecture Splice Machine is a high-performance, distributed computing architecture that delivers massive parallelization for predicates, joins, aggregations, and functions by pushing computation down to each distributed data shard. On each HBase physical node, the Splice Machine RDBMS has a separate process and memory space for HBase and Spark. It places its parser, planner, optimizer, and executor in the HBase Region Server process and uses HBase coprocessors to distribute OLTP computation across regions (i.e, shards). fig. 8 The Distributed Computing Architecture of HBase with Splice Machine Built on top of HBase and Hadoop, Splice Machine also provides proven autosharding, failover and replication. Splice Machine supports any standard Hadoop distribution (i.e., Cloudera, MapR, Hortonworks). Zookeeper HMaster Spark Master HBase Region Server HBase Region Server SPLICE PARSER SPLICE PARSER SPLICE PLANNER SPLICE PLANNER SPLICE OPTIMIZER SPLICE OPTIMIZER L EGEND Region Region Region Region HBase Co-Processor SPLICE EXECUTOR 1 Snapshot Isolation Indexes SPLICE EXECUTOR N Snapshot Isolation Indexes SPLICE EXECUTOR 1 Snapshot Isolation Indexes SPLICE EXECUTOR N Snapshot Isolation Indexes Spark Worker RDD RDD Spark Worker RDD RDD Executor Cache SPLICE EXECUTOR 1 Executor Cache Executor Cache SPLICE EXECUTOR 1 Executor Cache Task Task Task Task Task Task Task Task HDFS HDFS Splice Machine: The First Hybrid, In-Memory RDBMS Powered by Hadoop and Spark 19
22 Flexible, General-Purpose Database Platform Splice Machine is a flexible database platform that can power both OLAP and OLTP workloads and efficiently handle sparse data. The storage layer for Splice Machine is HBase, which is a schema-less, key-value store. This provides Splice Machine with greater flexibility in storing data and updating schemas. Sparse Data Support. Splice Machine can efficiently support super-schemas with sparse data. For instance, if a table has 1000 columns and a row only used two columns, that row would only store the data for those two columns. No extra storage would be used to store nulls for the other 998 columns. Splice Machine also enables fast, flexible schema updates. Columns can be added to tables without full table scans. MapReduce Input/Output Formats. Parallelized Splice Machine queries utilize HBase co-processors and not MapReduce for distributed computation on data stored in the Hadoop Distributed File System (HDFS). However, Splice Machine does provide an API to run MapReduce, Hive, and Spark jobs against data stored in Splice Machine. This enables users to perform custom, batchoriented analyses across the entire data set in Splice Machine using other technologies such as Storm, Kafka, Pig, MLLib, or Mahout. ODBC/JDBC Support. Splice Machine also provides many options to access its data and to deploy across servers. Applications and analysts can use SQL through a command-line interface (CLI) as well as JDBC/ODBC connections. Developers can therefore embed SQL in their code in a variety of programming languages including Java, Scala, Python, C++, and Javascript. Access External Data and Libraries. The Splice Machine RDBMS includes the ability to easily access external data and libraries. It can execute federated queries on data in external databases and files using Virtual Table Interfaces (VTIs). It can also execute all pre-built Spark libraries (over 130 and growing) for machine learning, stream analysis, data integration and graph modeling. Compatible With Any Hadoop Distribution. Operations teams can continue to use their existing Hadoop distributions (e.g., Cloudera, MapR, Hortonworks), as Splice Machine installs quickly on each HBase Region Server. Through HBase s integration with YARN, Splice Machine can share a Hadoop cluster with other workloads. Splice Machine: The First Hybrid, In-Memory RDBMS Powered by Hadoop and Spark 20
23 Conclusion As the only hybrid, in-memory RDBMS powered by Hadoop and Spark, Splice Machine presents unlimited possibilities to application developers and database architects. Most of all, it eliminates the compromises that have been part of any database platform selection to date. You can have proven scale out with commodity hardware, the performance of in-memory technology AND standard SQL to maximize your ability to leverage current staff, operations and applications. No longer will you be locked into specialized hardware costing millions per server. You will have a massively scalable general-purpose database that can handle operational AND analytical workloads. Splice Machine can also help you unlock the value of the data trapped in your current Hadoop deployment. Compared to traditional RDBMSs such as Oracle, IBM DB2, or MySQL, the Splice Machine RDBMS provide the following benefits: 10-20x Faster leverages HBase, the distributed NoSQL DB, as well as inmemory cluster computing from Spark ¼ the Cost scales out on commodity hardware using Hadoop and HBase ANSI SQL - leverages existing SQL-based analysts, reports, and applications without rewrites Distributed Transactions ensures reliable updates across multiple rows and tables, based on advanced research by Google Flexible provides excellent performance for simultaneous OLTP and OLAP workloads Elastic increases or decreases scale in just a few minutes With an innovative, hybrid architecture, Splice Machine is uniquely qualified to power data-driven businesses that can harness real-time insights to take better actions. Companies can use Splice Machine to make decisions in the moment to leapfrog their competition. Contact Us Please contact us at [email protected] to learn more or see a demo. Splice Machine: The First Hybrid, In-Memory RDBMS Powered by Hadoop and Spark 21
Splice Machine: SQL-on-Hadoop Evaluation Guide www.splicemachine.com
REPORT Splice Machine: SQL-on-Hadoop Evaluation Guide www.splicemachine.com The content of this evaluation guide, including the ideas and concepts contained within, are the property of Splice Machine,
Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook
Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future
Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview
Programming Hadoop 5-day, instructor-led BD-106 MapReduce Overview The Client Server Processing Pattern Distributed Computing Challenges MapReduce Defined Google's MapReduce The Map Phase of MapReduce
Trafodion Operational SQL-on-Hadoop
Trafodion Operational SQL-on-Hadoop SophiaConf 2015 Pierre Baudelle, HP EMEA TSC July 6 th, 2015 Hadoop workload profiles Operational Interactive Non-interactive Batch Real-time analytics Operational SQL
Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database
Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica
How Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns
How Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns Table of Contents Abstract... 3 Introduction... 3 Definition... 3 The Expanding Digitization
The Internet of Things and Big Data: Intro
The Internet of Things and Big Data: Intro John Berns, Solutions Architect, APAC - MapR Technologies April 22 nd, 2014 1 What This Is; What This Is Not It s not specific to IoT It s not about any specific
How To Scale Out Of A Nosql Database
Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 [email protected] www.scch.at Michael Zwick DI
Big Data Landscape for Databases
Big Data Landscape for Databases Bob Baran Senior Sales Enginee [email protected]! May 12, 2015 Typical Database Workloads OLTP Applications Real-Time Web, Mobile, and IoT Applications Real-Time,
A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani
A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani Technical Architect - Big Data Syntel Agenda Welcome to the Zoo! Evolution Timeline Traditional BI/DW Architecture Where Hadoop Fits In 2 Welcome to
The Future of Data Management
The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class
Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum
Big Data Analytics with EMC Greenplum and Hadoop Big Data Analytics with EMC Greenplum and Hadoop Ofir Manor Pre Sales Technical Architect EMC Greenplum 1 Big Data and the Data Warehouse Potential All
Actian SQL in Hadoop Buyer s Guide
Actian SQL in Hadoop Buyer s Guide Contents Introduction: Big Data and Hadoop... 3 SQL on Hadoop Benefits... 4 Approaches to SQL on Hadoop... 4 The Top 10 SQL in Hadoop Capabilities... 5 SQL in Hadoop
THE DEVELOPER GUIDE TO BUILDING STREAMING DATA APPLICATIONS
THE DEVELOPER GUIDE TO BUILDING STREAMING DATA APPLICATIONS WHITE PAPER Successfully writing Fast Data applications to manage data generated from mobile, smart devices and social interactions, and the
Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase
Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform
EMC Federation Big Data Solutions. Copyright 2015 EMC Corporation. All rights reserved.
EMC Federation Big Data Solutions 1 Introduction to data analytics Federation offering 2 Traditional Analytics! Traditional type of data analysis, sometimes called Business Intelligence! Type of analytics
Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time?
Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time? Kai Wähner [email protected] @KaiWaehner www.kai-waehner.de Disclaimer! These opinions are my own and do not necessarily
HDP Hadoop From concept to deployment.
HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some
BIG DATA TRENDS AND TECHNOLOGIES
BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.
Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect
Big Data & QlikView Democratizing Big Data Analytics David Freriks Principal Solution Architect TDWI Vancouver Agenda What really is Big Data? How do we separate hype from reality? How does that relate
Big Data Analytics - Accelerated. stream-horizon.com
Big Data Analytics - Accelerated stream-horizon.com Legacy ETL platforms & conventional Data Integration approach Unable to meet latency & data throughput demands of Big Data integration challenges Based
Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale
WHITE PAPER Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale Sponsored by: IBM Carl W. Olofson December 2014 IN THIS WHITE PAPER This white paper discusses the concept
Hadoop & Spark Using Amazon EMR
Hadoop & Spark Using Amazon EMR Michael Hanisch, AWS Solutions Architecture 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda Why did we build Amazon EMR? What is Amazon EMR?
How To Handle Big Data With A Data Scientist
III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution
HDP Enabling the Modern Data Architecture
HDP Enabling the Modern Data Architecture Herb Cunitz President, Hortonworks Page 1 Hortonworks enables adoption of Apache Hadoop through HDP (Hortonworks Data Platform) Founded in 2011 Original 24 architects,
MySQL and Hadoop. Percona Live 2014 Chris Schneider
MySQL and Hadoop Percona Live 2014 Chris Schneider About Me Chris Schneider, Database Architect @ Groupon Spent the last 10 years building MySQL architecture for multiple companies Worked with Hadoop for
Tap into Hadoop and Other No SQL Sources
Tap into Hadoop and Other No SQL Sources Presented by: Trishla Maru What is Big Data really? The Three Vs of Big Data According to Gartner Volume Volume Orders of magnitude bigger than conventional data
TE's Analytics on Hadoop and SAP HANA Using SAP Vora
TE's Analytics on Hadoop and SAP HANA Using SAP Vora Naveen Narra Senior Manager TE Connectivity Santha Kumar Rajendran Enterprise Data Architect TE Balaji Krishna - Director, SAP HANA Product Mgmt. -
Datenverwaltung im Wandel - Building an Enterprise Data Hub with
Datenverwaltung im Wandel - Building an Enterprise Data Hub with Cloudera Bernard Doering Regional Director, Central EMEA, Cloudera Cloudera Your Hadoop Experts Founded 2008, by former employees of Employees
Big Data With Hadoop
With Saurabh Singh [email protected] The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials
Native Connectivity to Big Data Sources in MicroStrategy 10. Presented by: Raja Ganapathy
Native Connectivity to Big Data Sources in MicroStrategy 10 Presented by: Raja Ganapathy Agenda MicroStrategy supports several data sources, including Hadoop Why Hadoop? How does MicroStrategy Analytics
CitusDB Architecture for Real-Time Big Data
CitusDB Architecture for Real-Time Big Data CitusDB Highlights Empowers real-time Big Data using PostgreSQL Scales out PostgreSQL to support up to hundreds of terabytes of data Fast parallel processing
BIG DATA TECHNOLOGY. Hadoop Ecosystem
BIG DATA TECHNOLOGY Hadoop Ecosystem Agenda Background What is Big Data Solution Objective Introduction to Hadoop Hadoop Ecosystem Hybrid EDW Model Predictive Analysis using Hadoop Conclusion What is Big
Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.
Collaborative Big Data Analytics 1 Big Data Is Less About Size, And More About Freedom TechCrunch!!!!!!!!! Total data: bigger than big data 451 Group Findings: Big Data Is More Extreme Than Volume Gartner!!!!!!!!!!!!!!!
Moving From Hadoop to Spark
+ Moving From Hadoop to Spark Sujee Maniyam Founder / Principal @ www.elephantscale.com [email protected] Bay Area ACM meetup (2015-02-23) + HI, Featured in Hadoop Weekly #109 + About Me : Sujee
Architectures for Big Data Analytics A database perspective
Architectures for Big Data Analytics A database perspective Fernando Velez Director of Product Management Enterprise Information Management, SAP June 2013 Outline Big Data Analytics Requirements Spectrum
Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: [email protected] Website: www.qburst.com
Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, [email protected] Assistant Professor, Information
Apache HBase. Crazy dances on the elephant back
Apache HBase Crazy dances on the elephant back Roman Nikitchenko, 16.10.2014 YARN 2 FIRST EVER DATA OS 10.000 nodes computer Recent technology changes are focused on higher scale. Better resource usage
Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84
Index A Amazon Web Services (AWS), 50, 58 Analytics engine, 21 22 Apache Kafka, 38, 131 Apache S4, 38, 131 Apache Sqoop, 37, 131 Appliance pattern, 104 105 Application architecture, big data analytics
Dell In-Memory Appliance for Cloudera Enterprise
Dell In-Memory Appliance for Cloudera Enterprise Hadoop Overview, Customer Evolution and Dell In-Memory Product Details Author: Armando Acosta Hadoop Product Manager/Subject Matter Expert [email protected]/
Big Data Approaches. Making Sense of Big Data. Ian Crosland. Jan 2016
Big Data Approaches Making Sense of Big Data Ian Crosland Jan 2016 Accelerate Big Data ROI Even firms that are investing in Big Data are still struggling to get the most from it. Make Big Data Accessible
Einsatzfelder von IBM PureData Systems und Ihre Vorteile.
Einsatzfelder von IBM PureData Systems und Ihre Vorteile [email protected] Agenda Information technology challenges PureSystems and PureData introduction PureData for Transactions PureData for Analytics
Microsoft Big Data. Solution Brief
Microsoft Big Data Solution Brief Contents Introduction... 2 The Microsoft Big Data Solution... 3 Key Benefits... 3 Immersive Insight, Wherever You Are... 3 Connecting with the World s Data... 3 Any Data,
Self-service BI for big data applications using Apache Drill
Self-service BI for big data applications using Apache Drill 2015 MapR Technologies 2015 MapR Technologies 1 Data Is Doubling Every Two Years Unstructured data will account for more than 80% of the data
Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing
Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Wayne W. Eckerson Director of Research, TechTarget Founder, BI Leadership Forum Business Analytics
Big Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth
MAKING BIG DATA COME ALIVE Big Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth Steve Gonzales, Principal Manager [email protected]
#mstrworld. Tapping into Hadoop and NoSQL Data Sources in MicroStrategy. Presented by: Trishla Maru. #mstrworld
Tapping into Hadoop and NoSQL Data Sources in MicroStrategy Presented by: Trishla Maru Agenda Big Data Overview All About Hadoop What is Hadoop? How does MicroStrategy connects to Hadoop? Customer Case
Hadoop Ecosystem B Y R A H I M A.
Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open
Introduction to Hadoop. New York Oracle User Group Vikas Sawhney
Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop
Big Data Technologies Compared June 2014
Big Data Technologies Compared June 2014 Agenda What is Big Data Big Data Technology Comparison Summary Other Big Data Technologies Questions 2 What is Big Data by Example The SKA Telescope is a new development
Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>
s Big Data solutions Roger Wullschleger DBTA Workshop on Big Data, Cloud Data Management and NoSQL 10. October 2012, Stade de Suisse, Berne 1 The following is intended to outline
Next-Generation Cloud Analytics with Amazon Redshift
Next-Generation Cloud Analytics with Amazon Redshift What s inside Introduction Why Amazon Redshift is Great for Analytics Cloud Data Warehousing Strategies for Relational Databases Analyzing Fast, Transactional
Native Connectivity to Big Data Sources in MSTR 10
Native Connectivity to Big Data Sources in MSTR 10 Bring All Relevant Data to Decision Makers Support for More Big Data Sources Optimized Access to Your Entire Big Data Ecosystem as If It Were a Single
Can the Elephants Handle the NoSQL Onslaught?
Can the Elephants Handle the NoSQL Onslaught? Avrilia Floratou, Nikhil Teletia David J. DeWitt, Jignesh M. Patel, Donghui Zhang University of Wisconsin-Madison Microsoft Jim Gray Systems Lab Presented
Navigating the Big Data infrastructure layer Helena Schwenk
mwd a d v i s o r s Navigating the Big Data infrastructure layer Helena Schwenk A special report prepared for Actuate May 2013 This report is the second in a series of four and focuses principally on explaining
InfiniteGraph: The Distributed Graph Database
A Performance and Distributed Performance Benchmark of InfiniteGraph and a Leading Open Source Graph Database Using Synthetic Data Objectivity, Inc. 640 West California Ave. Suite 240 Sunnyvale, CA 94086
CIO Guide How to Use Hadoop with Your SAP Software Landscape
SAP Solutions CIO Guide How to Use with Your SAP Software Landscape February 2013 Table of Contents 3 Executive Summary 4 Introduction and Scope 6 Big Data: A Definition A Conventional Disk-Based RDBMs
GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION
GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION Syed Rasheed Solution Manager Red Hat Corp. Kenny Peeples Technical Manager Red Hat Corp. Kimberly Palko Product Manager Red Hat Corp.
Why Big Data in the Cloud?
Have 40 Why Big Data in the Cloud? Colin White, BI Research January 2014 Sponsored by Treasure Data TABLE OF CONTENTS Introduction The Importance of Big Data The Role of Cloud Computing Using Big Data
Protecting Big Data Data Protection Solutions for the Business Data Lake
White Paper Protecting Big Data Data Protection Solutions for the Business Data Lake Abstract Big Data use cases are maturing and customers are using Big Data to improve top and bottom line revenues. With
How to Enhance Traditional BI Architecture to Leverage Big Data
B I G D ATA How to Enhance Traditional BI Architecture to Leverage Big Data Contents Executive Summary... 1 Traditional BI - DataStack 2.0 Architecture... 2 Benefits of Traditional BI - DataStack 2.0...
Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics
In Organizations Mark Vervuurt Cluster Data Science & Analytics AGENDA 1. Yellow Elephant 2. Data Ingestion & Complex Event Processing 3. SQL on Hadoop 4. NoSQL 5. InMemory 6. Data Science & Machine Learning
Ganzheitliches Datenmanagement
Ganzheitliches Datenmanagement für Hadoop Michael Kohs, Senior Sales Consultant @mikchaos The Problem with Big Data Projects in 2016 Relational, Mainframe Documents and Emails Data Modeler Data Scientist
Please give me your feedback
Please give me your feedback Session BB4089 Speaker Claude Lorenson, Ph. D and Wendy Harms Use the mobile app to complete a session survey 1. Access My schedule 2. Click on this session 3. Go to Rate &
INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE
INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE AGENDA Introduction to Big Data Introduction to Hadoop HDFS file system Map/Reduce framework Hadoop utilities Summary BIG DATA FACTS In what timeframe
Big Data Buzzwords From A to Z. By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012
Big Data Buzzwords From A to Z By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012 Big Data Buzzwords Big data is one of the, well, biggest trends in IT today, and it has spawned a whole new generation
Offload Enterprise Data Warehouse (EDW) to Big Data Lake. Ample White Paper
Offload Enterprise Data Warehouse (EDW) to Big Data Lake Oracle Exadata, Teradata, Netezza and SQL Server Ample White Paper EDW (Enterprise Data Warehouse) Offloads The EDW (Enterprise Data Warehouse)
How To Create A Data Visualization With Apache Spark And Zeppelin 2.5.3.5
Big Data Visualization using Apache Spark and Zeppelin Prajod Vettiyattil, Software Architect, Wipro Agenda Big Data and Ecosystem tools Apache Spark Apache Zeppelin Data Visualization Combining Spark
End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ
End to End Solution to Accelerate Data Warehouse Optimization Franco Flore Alliance Sales Director - APJ Big Data Is Driving Key Business Initiatives Increase profitability, innovation, customer satisfaction,
Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru
Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy Presented by: Jeffrey Zhang and Trishla Maru Agenda Big Data Overview All About Hadoop What is Hadoop? How does MicroStrategy connects to Hadoop?
Self-service BI for big data applications using Apache Drill
Self-service BI for big data applications using Apache Drill 2015 MapR Technologies 2015 MapR Technologies 1 Management - MCS MapR Data Platform for Hadoop and NoSQL APACHE HADOOP AND OSS ECOSYSTEM Batch
Realtime Apache Hadoop at Facebook. Jonathan Gray & Dhruba Borthakur June 14, 2011 at SIGMOD, Athens
Realtime Apache Hadoop at Facebook Jonathan Gray & Dhruba Borthakur June 14, 2011 at SIGMOD, Athens Agenda 1 Why Apache Hadoop and HBase? 2 Quick Introduction to Apache HBase 3 Applications of HBase at
The Future of Data Management with Hadoop and the Enterprise Data Hub
The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah Cofounder & CTO, Cloudera, Inc. Twitter: @awadallah 1 2 Cloudera Snapshot Founded 2008, by former employees of Employees
From Spark to Ignition:
From Spark to Ignition: Fueling Your Business on Real-Time Analytics Eric Frenkiel, MemSQL CEO June 29, 2015 San Francisco, CA What s in Store For This Presentation? 1. MemSQL: A real-time database for
Upcoming Announcements
Enterprise Hadoop Enterprise Hadoop Jeff Markham Technical Director, APAC [email protected] Page 1 Upcoming Announcements April 2 Hortonworks Platform 2.1 A continued focus on innovation within
The Principles of the Business Data Lake
The Principles of the Business Data Lake The Business Data Lake Culture eats Strategy for Breakfast, so said Peter Drucker, elegantly making the point that the hardest thing to change in any organization
SAS BIG DATA SOLUTIONS ON AWS SAS FORUM ESPAÑA, OCTOBER 16 TH, 2014 IAN MEYERS SOLUTIONS ARCHITECT / AMAZON WEB SERVICES
SAS BIG DATA SOLUTIONS ON AWS SAS FORUM ESPAÑA, OCTOBER 16 TH, 2014 IAN MEYERS SOLUTIONS ARCHITECT / AMAZON WEB SERVICES AWS GLOBAL INFRASTRUCTURE 10 Regions 25 Availability Zones 51 Edge locations WHAT
Executive Summary... 2 Introduction... 3. Defining Big Data... 3. The Importance of Big Data... 4 Building a Big Data Platform...
Executive Summary... 2 Introduction... 3 Defining Big Data... 3 The Importance of Big Data... 4 Building a Big Data Platform... 5 Infrastructure Requirements... 5 Solution Spectrum... 6 Oracle s Big Data
Bringing the Power of SAS to Hadoop. White Paper
White Paper Bringing the Power of SAS to Hadoop Combine SAS World-Class Analytic Strength with Hadoop s Low-Cost, Distributed Data Storage to Uncover Hidden Opportunities Contents Introduction... 1 What
Real-Time Big Data Analytics SAP HANA with the Intel Distribution for Apache Hadoop software
Real-Time Big Data Analytics with the Intel Distribution for Apache Hadoop software Executive Summary is already helping businesses extract value out of Big Data by enabling real-time analysis of diverse
Accelerating Hadoop MapReduce Using an In-Memory Data Grid
Accelerating Hadoop MapReduce Using an In-Memory Data Grid By David L. Brinker and William L. Bain, ScaleOut Software, Inc. 2013 ScaleOut Software, Inc. 12/27/2012 H adoop has been widely embraced for
Interactive data analytics drive insights
Big data Interactive data analytics drive insights Daniel Davis/Invodo/S&P. Screen images courtesy of Landmark Software and Services By Armando Acosta and Joey Jablonski The Apache Hadoop Big data has
Data-Intensive Programming. Timo Aaltonen Department of Pervasive Computing
Data-Intensive Programming Timo Aaltonen Department of Pervasive Computing Data-Intensive Programming Lecturer: Timo Aaltonen University Lecturer [email protected] Assistants: Henri Terho and Antti
WA2192 Introduction to Big Data and NoSQL EVALUATION ONLY
WA2192 Introduction to Big Data and NoSQL Web Age Solutions Inc. USA: 1-877-517-6540 Canada: 1-866-206-4644 Web: http://www.webagesolutions.com The following terms are trademarks of other companies: Java
HOW TO LIVE WITH THE ELEPHANT IN THE SERVER ROOM APACHE HADOOP WORKSHOP
HOW TO LIVE WITH THE ELEPHANT IN THE SERVER ROOM APACHE HADOOP WORKSHOP AGENDA Introduction What is Hadoop and the rationale behind it Hadoop Distributed File System (HDFS) and MapReduce Common Hadoop
Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control
Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control EP/K006487/1 UK PI: Prof Gareth Taylor (BU) China PI: Prof Yong-Hua Song (THU) Consortium UK Members: Brunel University
ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat
ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web
Deploy. Friction-free self-service BI solutions for everyone Scalable analytics on a modern architecture
Friction-free self-service BI solutions for everyone Scalable analytics on a modern architecture Apps and data source extensions with APIs Future white label, embed or integrate Power BI Deploy Intelligent
Enterprise Operational SQL on Hadoop Trafodion Overview
Enterprise Operational SQL on Hadoop Trafodion Overview Rohit Jain Distinguished & Chief Technologist Strategic & Emerging Technologies Enterprise Database Solutions Copyright 2012 Hewlett-Packard Development
Introducing Oracle Exalytics In-Memory Machine
Introducing Oracle Exalytics In-Memory Machine Jon Ainsworth Director of Business Development Oracle EMEA Business Analytics 1 Copyright 2011, Oracle and/or its affiliates. All rights Agenda Topics Oracle
Introduction to Big Data! with Apache Spark" UC#BERKELEY#
Introduction to Big Data! with Apache Spark" UC#BERKELEY# So What is Data Science?" Doing Data Science" Data Preparation" Roles" This Lecture" What is Data Science?" Data Science aims to derive knowledge!
Comprehensive Analytics on the Hortonworks Data Platform
Comprehensive Analytics on the Hortonworks Data Platform We do Hadoop. Page 1 Page 2 Back to 2005 Page 3 Vertical Scaling Page 4 Vertical Scaling Page 5 Vertical Scaling Page 6 Horizontal Scaling Page
Internals of Hadoop Application Framework and Distributed File System
International Journal of Scientific and Research Publications, Volume 5, Issue 7, July 2015 1 Internals of Hadoop Application Framework and Distributed File System Saminath.V, Sangeetha.M.S Abstract- Hadoop
In-memory computing with SAP HANA
In-memory computing with SAP HANA June 2015 Amit Satoor, SAP @asatoor 2015 SAP SE or an SAP affiliate company. All rights reserved. 1 Hyperconnectivity across people, business, and devices give rise to
SAP and Hortonworks Reference Architecture
SAP and Hortonworks Reference Architecture Hortonworks. We Do Hadoop. June Page 1 2014 Hortonworks Inc. 2011 2014. All Rights Reserved A Modern Data Architecture With SAP DATA SYSTEMS APPLICATIO NS Statistical
Saving Millions through Data Warehouse Offloading to Hadoop. Jack Norris, CMO MapR Technologies. MapR Technologies. All rights reserved.
Saving Millions through Data Warehouse Offloading to Hadoop Jack Norris, CMO MapR Technologies MapR Technologies. All rights reserved. MapR Technologies Overview Open, enterprise-grade distribution for
