Big Data Landscape for Databases
|
|
|
- Gerard Stephens
- 10 years ago
- Views:
Transcription
1 Big Data Landscape for Databases Bob Baran Senior Sales Enginee May 12, 2015
2 Typical Database Workloads OLTP Applications Real-Time Web, Mobile, and IoT Applications Real-Time, Operational Reporting Ad-Hoc Analytics Enterprise Data Warehouses Typical Databases MySQL Oracle MongoDB Cassandra MySQL Oracle MySQL Oracle Greenplum Paraccel Netezza Teradata Oracle Sybase IQ Use Cases ERP, CRM, Supply Chain Web, mobile, social IoT Operational Datastores Crystal Reports Exploratory Analytics Data Mining Enterprise Reporting Workload Strengths Real-time updates ACID transactions High concurrency of small reads/ writes Range queries Real-time updates High ingest rates High concurrency of small reads/ writes Range queries Real-time updates Canned, parameterized reports Range queries Complex queries requiring full table scans Append only Parameterized reports against historical data Operational Analytical 2
3 Recent History of RDBMSs RDBMS Definition Relational with joins ACID transactions Secondary indexes Typically row-oriented Operational and/or analytical workloads By early 2000s Limited innovation Looked like Oracle and Teradata won 3
4 Hadoop Shakes Up Batch Analytics Data processing framework Cheap distributed file system Brute force, batch processing through MapReduce Great for batch analytics Great place to dump data to look at later 4
5 NoSQL Shakes Ups Operational DBs NoSQL wave Companies like Google, Amazon and LinkedIn needed greater scalability & schema flexibility New databases developed by developers, not database people Provided scale-out, but lost SQL Worked well at web startups because: In some cases, use cases did not need ACID Willing to handle exceptions at app level 5
6 Convoluted Evolution of Databases Hadoop 2005 NoSQL Databases 2010 Scale-out SQL Databases 2013 Scale Out Sc ala bili ty Traditional RDBMSs 1980s-2000s Scale Up Hierarchical/ Network Databases 1970s Indexed Files (ISAM) 1960s Functionality 6
7 Mainstream user changes Driven by web, social, mobile, and Internet of Things Major increases in scale 30% annual data growth Significant requirements for semi-structured data Though relatively little unstructured Technology adoption continuum What is it? Scale-out SQL DBs for operational apps Should I use it? NoSQL for web apps Hadoop technologies for analytics Why wouldn t I use it? Cloud 7
8 Schema on Ingest vs. Schema on Read Data Stream Schema on Ingest Schema on Read Application Structured data should always remain structured Schema on Read if you only use data a few times a year Add schema if data used regularly Even schemaless MongoDB requires schema - 10 Things You Should Know About Running MongoDB At Scale By Asya Kamsky, Principal Solutions Architect at MongoDB Item #1 have a good schema and indexing strategy #
9 Scale-out is the future of databases How do I scale? Scale Up Scale Out NoSQL NewSQL SQL-on- Hadoop MPP Hadoop RDBMS Analytic Engines 9
10 NoSQL Pros Cons Easy scale-out Flexible schema Easier web development with hierarchical data structures (MongoDB) Cross-data center replication (Cassandra) No SQL requires retraining and app rewrites No joins i.e., no cross row/ document dependencies No reliable updates through transactions across rows/tables Eventual consistency (Cassandra) Not designed to do aggregations required for analytics 10
11 NewSQL Pros Cons Easy scale-out ANSI SQL eliminates retraining and app rewrites Reliable updates through ACID transactions RDBMS functionality Strong cross-data center replication (NuoDB) Proprietary scale-out, unproven into petabytes Must manage another distributed infrastructure beyond Hadoop Can not leverage Hadoop ecosystem of tools 11
12 NewSQL In-Memory Pros Cons Easy scale-out High performance because everything in memory ACID transactions within nodes Memory 10-20x more expensive Limited SQL Limited cross-node transactions Proprietary scale-out, unproven into petabytes Must manage another distributed infrastructure beyond Hadoop Can not leverage Hadoop ecosystem 12
13 Operational RDBMS on Hadoop Pros Cons Easy scale-out Scale-out infrastructure proven into petabytes ANSI SQL eliminates retraining and app rewrites Reliable updates through ACID transactions Leverages Hadoop distributed infrastructure and tool ecosystem Full table scans slower than MPP DBs, but faster than traditional RDBMSs Existing HDFS data must be reloaded through SQL interface 13
14 MPP Analytical Databases Pros Cons Easy scale-out Very fast performance for full table scans Highly parallelized, shared nothing architectures May have columnar storage (Vertica) No maintenance of indexes (Netezza) Poor concurrency models prevent support of real-time apps Poor performance for range queries Need to redistribute all data to add nodes (hash partitioning) May require specialized hardware (Netezza) Proprietary scale out - can not leverage Hadoop ecosystem of tools 14
15 SQL-on-Hadoop Analytical Engines Pros Cons Easy scale-out Scale-out proven into petabytes Leverages Hadoop distributed infrastructure Can leverage Hadoop ecosystem of tools Relatively immature, especially compared to MPP DBs Limited SQL Poor concurrency models prevent support of real-time apps No reliable updates through transactions Intermediate results must fit in memory (Presto) 15
16 Future: Hybrid In-Memory Architectures Memory Cache with Disk - Unsophisticated memory management Pure In-Memory - Very expensive Hybrid In-Memory - Flexible, cost-effective - Controlled by optimizer - In-memory materialized views? 16
17 Summary Future of Databases Predicted Trends Scale-out dominates databases Developers stop worrying about data size and develop new data-driven apps Hybrid in-memory architecture becomes mainstream Predicted Winners Hadoop becomes de facto distributed file system NoSQL used for simple web apps Scale-out SQL RDBMSs replace traditional RDBMSs 17
18 Questions? Bob Baran Senior Sales Engineer May 12, 2015
19 Powering Real-Time Apps on Hadoop Bob Baran Senior Sales Engineer May 12, 2015
20 Who Are We? THE ONLY HADOOP RDBMS Power operational applications on Hadoop Affordable, Scale-Out Commodity hardware Elastic Easy to expand or scale back 10x Better Price/Perf Transactional Real-time updates & ACID Transactions ANSI SQL Leverage existing SQL code, tools, & skills Flexible Support operational and analytical workloads #
21 What People are Saying Recognized as a key innovator in databases Quotes Scaling out on Splice Machine presented some major benefits over Oracle...automatic balancing between clusters...avoiding the costly licensing issues. An alternative to today s RDBMSes, Splice Machine effectively combines traditional relational database technology with the scale-out capabilities of Hadoop. The unique claim of Splice Machine is that it can run transactional applications as well as support analytics on top of Hadoop. Awards 21
22 Advisory Board Advisory Board includes luminaries in databases and technology Mike Franklin Computer Science Chair, UC Berkeley Director, UC Berkeley AmpLab Founder of Apache Spark Roger Bamford Former Principal Architect at Oracle Father of Oracle RAC Marie-Anne Neimat Co-Founder, Times-Ten Database Former VP, Database Eng. at Oracle Ken Rudin Head of Analytics at Facebook Former GM of Oracle Data Warehousing 22
23 Combines the Best of Both Worlds Hadoop Scale-out on commodity servers Proven to 100s of petabytes Efficiently handle sparse data Extensive ecosystem RDBMS ANSI SQL Real-time, concurrent updates ACID transactions ODBC/JDBC support #
24 Focused on OLTP and Real-Time Workloads OLTP Applications Real-Time Web, Mobile, and IoT Applications Real-Time, Operational Reporting Ad-Hoc Analytics Enterprise Data Warehouses Typical Databases MySQL Oracle MySQL Oracle MongoDB Cassandra MySQL Oracle Greenplum Paraccel Netezza Teradata Oracle Sybase IQ Use Cases ERP, CRM, Supply Chain Web, mobile, social IoT Operational Datastores Crystal Reports Exploratory Analytics Data Mining Enterprise Reporting Workload Strengths Real-time updates ACID transactions High concurrency of small reads/ writes Range queries Real-time updates High ingest rates High concurrency of small reads/ writes Range queries Real-time updates Canned, parameterized reports Range queries Complex queries requiring full table scans Append only Parameterized reports against historical data 24
25 OLTP Campaign Management: Harte-Hanks Overview Digital marketing services provider Unified Customer Profile Real-time campaign management OLTP environment with BI reports Challenges Oracle RAC too expensive to scale Queries too slow even up to ½ hour Getting worse expect 30-50% data growth Looked for 9 months for a cost-effective solution Solution Diagram Cross-Channel Campaigns Real-Time Personalization Real-Time Actions Initial Results 10-20x price/perf with no application, BI or ETL rewrites ¼ cost with commodity scale out 3-7x faster through parallelized queries 25
26 Reference Architecture: Operational Data Lake Offload real-time reporting and analytics from expensive OLTP and DW systems ERP CRM OLTP Systems Stream or Batch Updates Operational Data Lake ETL Data Warehouse Executive Business Reports Supply Chain Ad Hoc Analytics HR Datamart Operational Reports & Analytics Real-Time, Event-Driven Apps #
27 Streamlining the Structured Data Pipeline in Hadoop ERP CRM Source Systems Sqoop Traditional Hadoop Pipeline Apply Inferred Schema SQL Query Engines BI Tools Stored as flat files vs. ERP CRM Source Systems Streamlined Hadoop Pipeline Exisiting ETL Tool Stored in same schema BI Tools Advantages Reduced operational costs with less complexity Reduced processing time and errors with fewer translations Real-time updates for data cleansing Better SQL support 27
28 Complementing Existing Hadoop-Based Data Lakes Optimizing storage and querying of structured data as part of ELT or Hadoop query engines Pig ERP CRM OLTP Systems 1 HCATALOG 3 SCHEMA ON READ: Ad-hoc Hadoop queries across structured and unstructured data Supply Chain HR SCHEMA ON INGEST: Streamlined, structured-tostructured integration Structured Data Unstructured Data 2 SCHEMA BEFORE READ: Repository for structured data or metadata from ELT process on unstructured data #
29 Proven Building Blocks: Hadoop and Derby APACHE DERBY ANSI SQL-99 RDBMS Java-based ODBC/JDBC Compliant! APACHE HBASE/HDFS Auto-sharding Real-time updates Fault-tolerance Scalability to 100s of PBs Data replication #
30 HBase: Proven Scale-Out Auto-sharding Scales with commodity hardware Cost-effective from GBs to PBs High availability thru failover and replication LSM-trees #
31 Splice Optimizations to HBase Splice Storage is optimized over raw HBase We use Bitmap Indexes to store data in packed byte arrays This approach allows us to store data in a much smaller footprint than traditional HBase With a TPCH schema, we found a 10X reduction in data size reduction Requires far less hardware and resources to perform the same workload Asynchronous Write Pipeline HBase writes (puts) are not pipelined and block while the call is being made Splice s write pipeline allows us to reach speeds of over 100K writes / second per HBase node This allows extremely high ingest speeds without requiring more hardware and custom code Transactions As scalability increases, the likelihood of failures increases We utilize Snapshot Isolation to make sure if there is a failure, it does not corrupt existing data RDBMS Capabilities The use of SQL vs. custom scans and the ability for an optimizer to choose the best access path to the data Core Data Management functions (Indexes, Constraints, typed columns, etc.) 31
32 Distributed, Parallelized Query Execution Parallelized computation across cluster Moves computation to the data Utilizes HBase co-processors No MapReduce HBase Co-Processor!HBase Server Memory Space L G N E E D #
33 ANSI SQL-99 Coverage Data types e.g., INTEGER, REAL, CHARACTER, DATE, BOOLEAN, BIGINT Conditional functions e.g., CASE, searched CASE DDL e.g., CREATE TABLE, CREATE SCHEMA, ALTER TABLE, DELETE, UPDATE Privileges e.g., privileges for SELECT, DELETE, INSERT, EXECUTE Predicates e.g., IN, BETWEEN, LIKE, EXISTS Cursors e.g., updatable, read-only, positioned DELETE/UPDATE DML e.g., INSERT, DELETE, UPDATE, SELECT Joins e.g., INNER JOIN, LEFT OUTER JOIN Query specification e.g., SELECT DISTINCT, GROUP BY, HAVING SET functions e.g., UNION, ABS, MOD, ALL, CHECK Transactions e.g., COMMIT, ROLLBACK, READ COMMITTED, REPEATABLE READ, READ UNCOMMITTED, Snapshot Isolation Sub-queries Aggregation functions e.g., AVG, MAX, COUNT String functions e.g., SUBSTRING, concatenation, UPPER, LOWER, POSITION, TRIM, LENGTH Triggers User-defined functions (UDFs) Views including grouped views Window Functions (rank, rownumber, ) 33
34 Window Functions (Advanced Analytics Functions) Analytics such as Running total, Moving averages, Top-N Queries Performs calculations across a set of table rows related to the current row in the window Similar to aggregate functions with two significant differences: Outputs one row for each input value it operates upon. Groups rows with window partitioning and frame clauses vs. Group BY SPLICE MACHINE Currently Supports RANK DENSE_RANK ROW NUMBER AVG SUM COUNT MAX MIN 34
35 Lockless, ACID transactions Adds multi-row, multi-table transactions to HBase with rollback Fast, lockless, high concurrency Extends research from Google Percolator, Yahoo Labs, U of Waterloo Patent pending technology #
36 Customer Performance Benchmarks Typically 10x price/performance improvement SPEED 3-7x 20x 7x FASTER VS. PRICE/ PERFORMANCE 10-20x 10x 30x LOWER #
37 Applications, BI / SQL tool support via ODBC/JDBC #
38 Splice Machine Safe Journey Process Initial Overview Rapid Assessment Proof of Concept Pilot Project Enterprise Implementation Splice Machine overview Set the stage for Rapid Assessment Half day workshop Assess Splice Machine fit Identity target use cases Risk assessment of use cases Agree upon success criteria Prove client use case on Splice Machine hosted environment Benchmark using customer queries and schema On Customer data or generated data that resembles customer data Identify paid pilot use case with limited change management impact Install Splice Machine on client environment Deploy use case/ application on client data Prove Splice Machine against key requirements Kickstart Requirements Design/Dev QA Test Cutover Hypercare 1 day 5 days (including prep) 2 weeks 3-6 weeks 3-10 months 38
39 Safe Journey Enterprise Implementation Stages Kickstart Requirements Design/Dev QA Test Parallel Ops Cutover Hypercare Packaged 2 week program to get new client off to strong start on solid foundation! Incorporates: Splice Architecture & Development courses Risk Assessment Workshop Implementation Blueprint Establish clear functional and performance requirements document! Can be a refresh only if project is a port of an existing app to Splice Based on Agile method. Phase is divided into 2 week sprints! Stories covering a set of required capabilities are assigned to each developer! A design doc is created, code is written, unit tests are written and executed until they pass The QA test period includes: Performance Test End-to-End System Integration Test User Acceptance Test! Depending on scale of project there may be multiple iterations of each test with break/ fix cycles in between Used when an existing system is being ported to Splice Machine from another database! The new Splice Machine-based system runs side by side with the old system for a period of time Optional Formal period in which Splicebased solution goes-live and pre-existing system is deprecated Period of onsite support during cutover and for a period immediately following golive Optional #
40 Common Risks and Mitigation Strategies Data migration Risk: Clients are typically migrating very large data sets to Splice Machine. Issues with migration of certain data types such as dates can waste a lot of time reloading large amounts of data Solution: First migrate a small subset of tables that contain all required Changes data types. to source Ensure schema these during migrate implementation successfully before migrating the entire database Risk: Changes to the schema of the source database to be migrated during the course of the implementation will lead to a significant amount of rework and reloading of data, adding unplanned time to the project Solution: All stakeholders agree up front to freeze the schema as of an Stored agreed procedure upon date conversion prior to the Design/Development stage. Risk: Stored procedures need to be converted from the original language (e.g., PL/SQL) to Java. Complex stored procedures make include significant amounts or procedural code as well as multiple SQL statements Solution: Carefully review the function and design of SPs to be 40
41 Common Risks and Mitigation Strategies SQL compatibility Risk: Even though Splice Machine conforms to the ANSI 99+ SQL standard, virtually every database has unique syntax and some queries may need to be modified. Additionally, SQL generated by packaged applications may not be modifiable. Solution: Formal review of SQL syntax during the requirements phase. Modify relevant queries during the Design/Dev phase. If not modifiable an enhancement request for Splice Machine to support the required syntax out of the box may needed. Indexing Risk: Proper indexing is usually important to maximize the performance of Splice Machine. Splice Machine indexes are likely to differ from the indexes required for a traditional RDBMS Solution: Ensure that query performance SLAs are clearly defined in the Requirements phase. Incorporate proper index design early in the Design/Dev phase. Assume some iteration will be required to achieve the optimal indexes Hadoop knowledge Risk: Project stakeholders often have limited knowledge of Hadoop and the distributed computing paradigm. This can lead to confusion about the Splice Machine value proposition and the and the advantages of moving to a scale-out architecture Solution: Include the Splice Machine Kickoff Program at the beginning of the implementation project. This includes essential training on Hadoop and related fundamentals concepts critical to realizing value from a Splice Machine deployment 41
42 Summary THE ONLY HADOOP RDBMS Power operational applications on Hadoop Affordable, Scale-Out Commodity hardware Elastic Easy to expand or scale back 10x Better Price/Perf Transactional Real-time updates & ACID Transactions ANSI SQL Leverage existing SQL code, tools, & skills Flexible Support operational and analytical workloads #
43 Questions? Bob Baran Senior Sales Engineer May 12, 2015
Splice Machine: SQL-on-Hadoop Evaluation Guide www.splicemachine.com
REPORT Splice Machine: SQL-on-Hadoop Evaluation Guide www.splicemachine.com The content of this evaluation guide, including the ideas and concepts contained within, are the property of Splice Machine,
HDP Hadoop From concept to deployment.
HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some
Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing
Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Wayne W. Eckerson Director of Research, TechTarget Founder, BI Leadership Forum Business Analytics
Hadoop and Relational Database The Best of Both Worlds for Analytics Greg Battas Hewlett Packard
Hadoop and Relational base The Best of Both Worlds for Analytics Greg Battas Hewlett Packard The Evolution of Analytics Mainframe EDW Proprietary MPP Unix SMP MPP Appliance Hadoop? Questions Is Hadoop
Oracle Database 12c Plug In. Switch On. Get SMART.
Oracle Database 12c Plug In. Switch On. Get SMART. Duncan Harvey Head of Core Technology, Oracle EMEA March 2015 Safe Harbor Statement The following is intended to outline our general product direction.
The First Hybrid, In-Memory RDBMS Powered by Hadoop and Spark
WHITE PAPER The First Hybrid, In-Memory RDBMS Powered by Hadoop and Spark www.splicemachine.com Copyright 2015 Splice Machine. All rights reserved. The content of this white paper, including the ideas
Trafodion Operational SQL-on-Hadoop
Trafodion Operational SQL-on-Hadoop SophiaConf 2015 Pierre Baudelle, HP EMEA TSC July 6 th, 2015 Hadoop workload profiles Operational Interactive Non-interactive Batch Real-time analytics Operational SQL
A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani
A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani Technical Architect - Big Data Syntel Agenda Welcome to the Zoo! Evolution Timeline Traditional BI/DW Architecture Where Hadoop Fits In 2 Welcome to
Can the Elephants Handle the NoSQL Onslaught?
Can the Elephants Handle the NoSQL Onslaught? Avrilia Floratou, Nikhil Teletia David J. DeWitt, Jignesh M. Patel, Donghui Zhang University of Wisconsin-Madison Microsoft Jim Gray Systems Lab Presented
How To Handle Big Data With A Data Scientist
III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution
Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.
Collaborative Big Data Analytics 1 Big Data Is Less About Size, And More About Freedom TechCrunch!!!!!!!!! Total data: bigger than big data 451 Group Findings: Big Data Is More Extreme Than Volume Gartner!!!!!!!!!!!!!!!
EMC Federation Big Data Solutions. Copyright 2015 EMC Corporation. All rights reserved.
EMC Federation Big Data Solutions 1 Introduction to data analytics Federation offering 2 Traditional Analytics! Traditional type of data analysis, sometimes called Business Intelligence! Type of analytics
Enterprise Operational SQL on Hadoop Trafodion Overview
Enterprise Operational SQL on Hadoop Trafodion Overview Rohit Jain Distinguished & Chief Technologist Strategic & Emerging Technologies Enterprise Database Solutions Copyright 2012 Hewlett-Packard Development
INTRODUCTION TO CASSANDRA
INTRODUCTION TO CASSANDRA This ebook provides a high level overview of Cassandra and describes some of its key strengths and applications. WHAT IS CASSANDRA? Apache Cassandra is a high performance, open
End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ
End to End Solution to Accelerate Data Warehouse Optimization Franco Flore Alliance Sales Director - APJ Big Data Is Driving Key Business Initiatives Increase profitability, innovation, customer satisfaction,
How To Scale Out Of A Nosql Database
Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 [email protected] www.scch.at Michael Zwick DI
Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database
Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica
A Scalable Data Transformation Framework using the Hadoop Ecosystem
A Scalable Data Transformation Framework using the Hadoop Ecosystem Raj Nair Director Data Platform Kiru Pakkirisamy CTO AGENDA About Penton and Serendio Inc Data Processing at Penton PoC Use Case Functional
SAP Real-time Data Platform. April 2013
SAP Real-time Data Platform April 2013 Agenda Introduction SAP Real Time Data Platform Overview SAP Sybase ASE SAP Sybase IQ SAP EIM Questions and Answers 2012 SAP AG. All rights reserved. 2 Introduction
Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect
Big Data & QlikView Democratizing Big Data Analytics David Freriks Principal Solution Architect TDWI Vancouver Agenda What really is Big Data? How do we separate hype from reality? How does that relate
Moving From Hadoop to Spark
+ Moving From Hadoop to Spark Sujee Maniyam Founder / Principal @ www.elephantscale.com [email protected] Bay Area ACM meetup (2015-02-23) + HI, Featured in Hadoop Weekly #109 + About Me : Sujee
Datenverwaltung im Wandel - Building an Enterprise Data Hub with
Datenverwaltung im Wandel - Building an Enterprise Data Hub with Cloudera Bernard Doering Regional Director, Central EMEA, Cloudera Cloudera Your Hadoop Experts Founded 2008, by former employees of Employees
Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>
s Big Data solutions Roger Wullschleger DBTA Workshop on Big Data, Cloud Data Management and NoSQL 10. October 2012, Stade de Suisse, Berne 1 The following is intended to outline
Tap into Hadoop and Other No SQL Sources
Tap into Hadoop and Other No SQL Sources Presented by: Trishla Maru What is Big Data really? The Three Vs of Big Data According to Gartner Volume Volume Orders of magnitude bigger than conventional data
Using RDBMS, NoSQL or Hadoop?
Using RDBMS, NoSQL or Hadoop? DOAG Conference 2015 Jean- Pierre Dijcks Big Data Product Management Server Technologies Copyright 2014 Oracle and/or its affiliates. All rights reserved. Data Ingest 2 Ingest
Native Connectivity to Big Data Sources in MSTR 10
Native Connectivity to Big Data Sources in MSTR 10 Bring All Relevant Data to Decision Makers Support for More Big Data Sources Optimized Access to Your Entire Big Data Ecosystem as If It Were a Single
Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum
Big Data Analytics with EMC Greenplum and Hadoop Big Data Analytics with EMC Greenplum and Hadoop Ofir Manor Pre Sales Technical Architect EMC Greenplum 1 Big Data and the Data Warehouse Potential All
HDP Enabling the Modern Data Architecture
HDP Enabling the Modern Data Architecture Herb Cunitz President, Hortonworks Page 1 Hortonworks enables adoption of Apache Hadoop through HDP (Hortonworks Data Platform) Founded in 2011 Original 24 architects,
The Future of Data Management
The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class
THE DEVELOPER GUIDE TO BUILDING STREAMING DATA APPLICATIONS
THE DEVELOPER GUIDE TO BUILDING STREAMING DATA APPLICATIONS WHITE PAPER Successfully writing Fast Data applications to manage data generated from mobile, smart devices and social interactions, and the
CIO Guide How to Use Hadoop with Your SAP Software Landscape
SAP Solutions CIO Guide How to Use with Your SAP Software Landscape February 2013 Table of Contents 3 Executive Summary 4 Introduction and Scope 6 Big Data: A Definition A Conventional Disk-Based RDBMs
Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook
Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future
Big Data Technology ดร.ช ชาต หฤไชยะศ กด. Choochart Haruechaiyasak, Ph.D.
Big Data Technology ดร.ช ชาต หฤไชยะศ กด Choochart Haruechaiyasak, Ph.D. Speech and Audio Technology Laboratory (SPT) National Electronics and Computer Technology Center (NECTEC) National Science and Technology
ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat
ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web
Facebook s Petabyte Scale Data Warehouse using Hive and Hadoop
Facebook s Petabyte Scale Data Warehouse using Hive and Hadoop Why Another Data Warehousing System? Data, data and more data 200GB per day in March 2008 12+TB(compressed) raw data per day today Trends
Preview of Oracle Database 12c In-Memory Option. Copyright 2013, Oracle and/or its affiliates. All rights reserved.
Preview of Oracle Database 12c In-Memory Option 1 The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any
Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru
Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy Presented by: Jeffrey Zhang and Trishla Maru Agenda Big Data Overview All About Hadoop What is Hadoop? How does MicroStrategy connects to Hadoop?
How Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns
How Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns Table of Contents Abstract... 3 Introduction... 3 Definition... 3 The Expanding Digitization
Navigating the Big Data infrastructure layer Helena Schwenk
mwd a d v i s o r s Navigating the Big Data infrastructure layer Helena Schwenk A special report prepared for Actuate May 2013 This report is the second in a series of four and focuses principally on explaining
INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE
INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE AGENDA Introduction to Big Data Introduction to Hadoop HDFS file system Map/Reduce framework Hadoop utilities Summary BIG DATA FACTS In what timeframe
Luncheon Webinar Series May 13, 2013
Luncheon Webinar Series May 13, 2013 InfoSphere DataStage is Big Data Integration Sponsored By: Presented by : Tony Curcio, InfoSphere Product Management 0 InfoSphere DataStage is Big Data Integration
HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics
HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics ESSENTIALS EMC ISILON Use the industry's first and only scale-out NAS solution with native Hadoop
Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances
INSIGHT Oracle's All- Out Assault on the Big Data Market: Offering Hadoop, R, Cubes, and Scalable IMDB in Familiar Packages Carl W. Olofson IDC OPINION Global Headquarters: 5 Speen Street Framingham, MA
Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84
Index A Amazon Web Services (AWS), 50, 58 Analytics engine, 21 22 Apache Kafka, 38, 131 Apache S4, 38, 131 Apache Sqoop, 37, 131 Appliance pattern, 104 105 Application architecture, big data analytics
An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics
An Oracle White Paper November 2010 Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics 1 Introduction New applications such as web searches, recommendation engines,
W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract
W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the
BIG DATA TECHNOLOGY. Hadoop Ecosystem
BIG DATA TECHNOLOGY Hadoop Ecosystem Agenda Background What is Big Data Solution Objective Introduction to Hadoop Hadoop Ecosystem Hybrid EDW Model Predictive Analysis using Hadoop Conclusion What is Big
Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster. Nov 7, 2012
Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster Nov 7, 2012 Who I Am Robert Lancaster Solutions Architect, Hotel Supply Team [email protected] @rob1lancaster Organizer of Chicago
Structured Data Storage
Structured Data Storage Xgen Congress Short Course 2010 Adam Kraut BioTeam Inc. Independent Consulting Shop: Vendor/technology agnostic Staffed by: Scientists forced to learn High Performance IT to conduct
Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview
Programming Hadoop 5-day, instructor-led BD-106 MapReduce Overview The Client Server Processing Pattern Distributed Computing Challenges MapReduce Defined Google's MapReduce The Map Phase of MapReduce
Actian SQL in Hadoop Buyer s Guide
Actian SQL in Hadoop Buyer s Guide Contents Introduction: Big Data and Hadoop... 3 SQL on Hadoop Benefits... 4 Approaches to SQL on Hadoop... 4 The Top 10 SQL in Hadoop Capabilities... 5 SQL in Hadoop
Evaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing
Evaluating NoSQL for Enterprise Applications Dirk Bartels VP Strategy & Marketing Agenda The Real Time Enterprise The Data Gold Rush Managing The Data Tsunami Analytics and Data Case Studies Where to go
Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase
Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform
Introduction to Apache Cassandra
Introduction to Apache Cassandra White Paper BY DATASTAX CORPORATION JULY 2013 1 Table of Contents Abstract 3 Introduction 3 Built by Necessity 3 The Architecture of Cassandra 4 Distributing and Replicating
Big Data Success Step 1: Get the Technology Right
Big Data Success Step 1: Get the Technology Right TOM MATIJEVIC Director, Business Development ANDY MCNALIS Director, Data Management & Integration MetaScale is a subsidiary of Sears Holdings Corporation
Hadoop Ecosystem B Y R A H I M A.
Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open
The Internet of Things and Big Data: Intro
The Internet of Things and Big Data: Intro John Berns, Solutions Architect, APAC - MapR Technologies April 22 nd, 2014 1 What This Is; What This Is Not It s not specific to IoT It s not about any specific
Offload Enterprise Data Warehouse (EDW) to Big Data Lake. Ample White Paper
Offload Enterprise Data Warehouse (EDW) to Big Data Lake Oracle Exadata, Teradata, Netezza and SQL Server Ample White Paper EDW (Enterprise Data Warehouse) Offloads The EDW (Enterprise Data Warehouse)
#mstrworld. Tapping into Hadoop and NoSQL Data Sources in MicroStrategy. Presented by: Trishla Maru. #mstrworld
Tapping into Hadoop and NoSQL Data Sources in MicroStrategy Presented by: Trishla Maru Agenda Big Data Overview All About Hadoop What is Hadoop? How does MicroStrategy connects to Hadoop? Customer Case
Lecture Data Warehouse Systems
Lecture Data Warehouse Systems Eva Zangerle SS 2013 PART C: Novel Approaches in DW NoSQL and MapReduce Stonebraker on Data Warehouses Star and snowflake schemas are a good idea in the DW world C-Stores
Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap
Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap 3 key strategic advantages, and a realistic roadmap for what you really need, and when 2012, Cognizant Topics to be discussed
The 3 questions to ask yourself about BIG DATA
The 3 questions to ask yourself about BIG DATA Do you have a big data problem? Companies looking to tackle big data problems are embarking on a journey that is full of hype, buzz, confusion, and misinformation.
Reference Architecture, Requirements, Gaps, Roles
Reference Architecture, Requirements, Gaps, Roles The contents of this document are an excerpt from the brainstorming document M0014. The purpose is to show how a detailed Big Data Reference Architecture
Building Your Big Data Team
Building Your Big Data Team With all the buzz around Big Data, many companies have decided they need some sort of Big Data initiative in place to stay current with modern data management requirements.
An Integrated Analytics & Big Data Infrastructure September 21, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle Enterprise
An Integrated Analytics & Big Data Infrastructure September 21, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle Enterprise Solutions Group The following is intended to outline our
Presenters: Luke Dougherty & Steve Crabb
Presenters: Luke Dougherty & Steve Crabb About Keylink Keylink Technology is Syncsort s partner for Australia & New Zealand. Our Customers: www.keylink.net.au 2 ETL is THE best use case for Hadoop. ShanH
Using distributed technologies to analyze Big Data
Using distributed technologies to analyze Big Data Abhijit Sharma Innovation Lab BMC Software 1 Data Explosion in Data Center Performance / Time Series Data Incoming data rates ~Millions of data points/
Cisco IT Hadoop Journey
Cisco IT Hadoop Journey Srini Desikan, Program Manager IT 2015 MapR Technologies 1 Agenda Hadoop Platform Timeline Key Decisions / Lessons Learnt Data Lake Hadoop s place in IT Data Platforms Use Cases
Apache Hadoop in the Enterprise. Dr. Amr Awadallah, CTO/Founder @awadallah, [email protected]
Apache Hadoop in the Enterprise Dr. Amr Awadallah, CTO/Founder @awadallah, [email protected] Cloudera The Leader in Big Data Management Powered by Apache Hadoop The Leading Open Source Distribution of Apache
Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time?
Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time? Kai Wähner [email protected] @KaiWaehner www.kai-waehner.de Disclaimer! These opinions are my own and do not necessarily
Big Data Analytics - Accelerated. stream-horizon.com
Big Data Analytics - Accelerated stream-horizon.com Legacy ETL platforms & conventional Data Integration approach Unable to meet latency & data throughput demands of Big Data integration challenges Based
Architectures for Big Data Analytics A database perspective
Architectures for Big Data Analytics A database perspective Fernando Velez Director of Product Management Enterprise Information Management, SAP June 2013 Outline Big Data Analytics Requirements Spectrum
BIG DATA APPLIANCES. July 23, TDWI. R Sathyanarayana. Enterprise Information Management & Analytics Practice EMC Consulting
BIG DATA APPLIANCES July 23, TDWI R Sathyanarayana Enterprise Information Management & Analytics Practice EMC Consulting 1 Big data are datasets that grow so large that they become awkward to work with
Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data
INFO 1500 Introduction to IT Fundamentals 5. Database Systems and Managing Data Resources Learning Objectives 1. Describe how the problems of managing data resources in a traditional file environment are
IBM Data Retrieval Technologies: RDBMS, BLU, IBM Netezza, and Hadoop
IBM Data Retrieval Technologies: RDBMS, BLU, IBM Netezza, and Hadoop Frank C. Fillmore, Jr. The Fillmore Group, Inc. Session Code: E13 Wed, May 06, 2015 (02:15 PM - 03:15 PM) Platform: Cross-platform Objectives
Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014
Forecast of Big Data Trends Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014 Big Data transforms Business 2 Data created every minute Source http://mashable.com/2012/06/22/data-created-every-minute/
PostgreSQL Business Intelligence & Performance Simon Riggs CTO, 2ndQuadrant PostgreSQL Major Contributor
PostgreSQL Business Intelligence & Performance Simon Riggs CTO, 2ndQuadrant PostgreSQL Major Contributor The research leading to these results has received funding from the European Union's Seventh Framework
BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014
BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014 Ralph Kimball Associates 2014 The Data Warehouse Mission Identify all possible enterprise data assets Select those assets
The Principles of the Business Data Lake
The Principles of the Business Data Lake The Business Data Lake Culture eats Strategy for Breakfast, so said Peter Drucker, elegantly making the point that the hardest thing to change in any organization
Dell In-Memory Appliance for Cloudera Enterprise
Dell In-Memory Appliance for Cloudera Enterprise Hadoop Overview, Customer Evolution and Dell In-Memory Product Details Author: Armando Acosta Hadoop Product Manager/Subject Matter Expert [email protected]/
Oracle Database: SQL and PL/SQL Fundamentals NEW
Oracle University Contact Us: + 38516306373 Oracle Database: SQL and PL/SQL Fundamentals NEW Duration: 5 Days What you will learn This Oracle Database: SQL and PL/SQL Fundamentals training delivers the
NoSQL for SQL Professionals William McKnight
NoSQL for SQL Professionals William McKnight Session Code BD03 About your Speaker, William McKnight President, McKnight Consulting Group Frequent keynote speaker and trainer internationally Consulted to
Oracle Database In-Memory The Next Big Thing
Oracle Database In-Memory The Next Big Thing Maria Colgan Master Product Manager #DBIM12c Why is Oracle do this Oracle Database In-Memory Goals Real Time Analytics Accelerate Mixed Workload OLTP No Changes
Big Data Technologies Compared June 2014
Big Data Technologies Compared June 2014 Agenda What is Big Data Big Data Technology Comparison Summary Other Big Data Technologies Questions 2 What is Big Data by Example The SKA Telescope is a new development
EMC/Greenplum Driving the Future of Data Warehousing and Analytics
EMC/Greenplum Driving the Future of Data Warehousing and Analytics EMC 2010 Forum Series 1 Greenplum Becomes the Foundation of EMC s Data Computing Division E M C A CQ U I R E S G R E E N P L U M Greenplum,
Lecture 10: HBase! Claudia Hauff (Web Information Systems)! [email protected]
Big Data Processing, 2014/15 Lecture 10: HBase!! Claudia Hauff (Web Information Systems)! [email protected] 1 Course content Introduction Data streams 1 & 2 The MapReduce paradigm Looking behind the
CitusDB Architecture for Real-Time Big Data
CitusDB Architecture for Real-Time Big Data CitusDB Highlights Empowers real-time Big Data using PostgreSQL Scales out PostgreSQL to support up to hundreds of terabytes of data Fast parallel processing
Business Intelligence for Big Data
Business Intelligence for Big Data Will Gorman, Vice President, Engineering May, 2011 2010, Pentaho. All Rights Reserved. www.pentaho.com. What is BI? Business Intelligence = reports, dashboards, analysis,
Roadmap Talend : découvrez les futures fonctionnalités de Talend
Roadmap Talend : découvrez les futures fonctionnalités de Talend Cédric Carbone Talend Connect 9 octobre 2014 Talend 2014 1 Connecting the Data-Driven Enterprise Talend 2014 2 Agenda Agenda Why a Unified
Performance and Scalability Overview
Performance and Scalability Overview This guide provides an overview of some of the performance and scalability capabilities of the Pentaho Business Analytics platform. PENTAHO PERFORMANCE ENGINEERING
Accelerating and Simplifying Apache
Accelerating and Simplifying Apache Hadoop with Panasas ActiveStor White paper NOvember 2012 1.888.PANASAS www.panasas.com Executive Overview The technology requirements for big data vary significantly
More Data in Less Time
More Data in Less Time Leveraging Cloudera CDH as an Operational Data Store Daniel Tydecks, Systems Engineering DACH & CE Goals of an Operational Data Store Load Data Sources Traditional Architecture Operational
Big Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth
MAKING BIG DATA COME ALIVE Big Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth Steve Gonzales, Principal Manager [email protected]
Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics
In Organizations Mark Vervuurt Cluster Data Science & Analytics AGENDA 1. Yellow Elephant 2. Data Ingestion & Complex Event Processing 3. SQL on Hadoop 4. NoSQL 5. InMemory 6. Data Science & Machine Learning
How to Choose Between Hadoop, NoSQL and RDBMS
How to Choose Between Hadoop, NoSQL and RDBMS Keywords: Jean-Pierre Dijcks Oracle Redwood City, CA, USA Big Data, Hadoop, NoSQL Database, Relational Database, SQL, Security, Performance Introduction A
