Big Data Landscape for Databases

Size: px
Start display at page:

Download "Big Data Landscape for Databases"

Transcription

1 Big Data Landscape for Databases Bob Baran Senior Sales Enginee May 12, 2015

2 Typical Database Workloads OLTP Applications Real-Time Web, Mobile, and IoT Applications Real-Time, Operational Reporting Ad-Hoc Analytics Enterprise Data Warehouses Typical Databases MySQL Oracle MongoDB Cassandra MySQL Oracle MySQL Oracle Greenplum Paraccel Netezza Teradata Oracle Sybase IQ Use Cases ERP, CRM, Supply Chain Web, mobile, social IoT Operational Datastores Crystal Reports Exploratory Analytics Data Mining Enterprise Reporting Workload Strengths Real-time updates ACID transactions High concurrency of small reads/ writes Range queries Real-time updates High ingest rates High concurrency of small reads/ writes Range queries Real-time updates Canned, parameterized reports Range queries Complex queries requiring full table scans Append only Parameterized reports against historical data Operational Analytical 2

3 Recent History of RDBMSs RDBMS Definition Relational with joins ACID transactions Secondary indexes Typically row-oriented Operational and/or analytical workloads By early 2000s Limited innovation Looked like Oracle and Teradata won 3

4 Hadoop Shakes Up Batch Analytics Data processing framework Cheap distributed file system Brute force, batch processing through MapReduce Great for batch analytics Great place to dump data to look at later 4

5 NoSQL Shakes Ups Operational DBs NoSQL wave Companies like Google, Amazon and LinkedIn needed greater scalability & schema flexibility New databases developed by developers, not database people Provided scale-out, but lost SQL Worked well at web startups because: In some cases, use cases did not need ACID Willing to handle exceptions at app level 5

6 Convoluted Evolution of Databases Hadoop 2005 NoSQL Databases 2010 Scale-out SQL Databases 2013 Scale Out Sc ala bili ty Traditional RDBMSs 1980s-2000s Scale Up Hierarchical/ Network Databases 1970s Indexed Files (ISAM) 1960s Functionality 6

7 Mainstream user changes Driven by web, social, mobile, and Internet of Things Major increases in scale 30% annual data growth Significant requirements for semi-structured data Though relatively little unstructured Technology adoption continuum What is it? Scale-out SQL DBs for operational apps Should I use it? NoSQL for web apps Hadoop technologies for analytics Why wouldn t I use it? Cloud 7

8 Schema on Ingest vs. Schema on Read Data Stream Schema on Ingest Schema on Read Application Structured data should always remain structured Schema on Read if you only use data a few times a year Add schema if data used regularly Even schemaless MongoDB requires schema - 10 Things You Should Know About Running MongoDB At Scale By Asya Kamsky, Principal Solutions Architect at MongoDB Item #1 have a good schema and indexing strategy #

9 Scale-out is the future of databases How do I scale? Scale Up Scale Out NoSQL NewSQL SQL-on- Hadoop MPP Hadoop RDBMS Analytic Engines 9

10 NoSQL Pros Cons Easy scale-out Flexible schema Easier web development with hierarchical data structures (MongoDB) Cross-data center replication (Cassandra) No SQL requires retraining and app rewrites No joins i.e., no cross row/ document dependencies No reliable updates through transactions across rows/tables Eventual consistency (Cassandra) Not designed to do aggregations required for analytics 10

11 NewSQL Pros Cons Easy scale-out ANSI SQL eliminates retraining and app rewrites Reliable updates through ACID transactions RDBMS functionality Strong cross-data center replication (NuoDB) Proprietary scale-out, unproven into petabytes Must manage another distributed infrastructure beyond Hadoop Can not leverage Hadoop ecosystem of tools 11

12 NewSQL In-Memory Pros Cons Easy scale-out High performance because everything in memory ACID transactions within nodes Memory 10-20x more expensive Limited SQL Limited cross-node transactions Proprietary scale-out, unproven into petabytes Must manage another distributed infrastructure beyond Hadoop Can not leverage Hadoop ecosystem 12

13 Operational RDBMS on Hadoop Pros Cons Easy scale-out Scale-out infrastructure proven into petabytes ANSI SQL eliminates retraining and app rewrites Reliable updates through ACID transactions Leverages Hadoop distributed infrastructure and tool ecosystem Full table scans slower than MPP DBs, but faster than traditional RDBMSs Existing HDFS data must be reloaded through SQL interface 13

14 MPP Analytical Databases Pros Cons Easy scale-out Very fast performance for full table scans Highly parallelized, shared nothing architectures May have columnar storage (Vertica) No maintenance of indexes (Netezza) Poor concurrency models prevent support of real-time apps Poor performance for range queries Need to redistribute all data to add nodes (hash partitioning) May require specialized hardware (Netezza) Proprietary scale out - can not leverage Hadoop ecosystem of tools 14

15 SQL-on-Hadoop Analytical Engines Pros Cons Easy scale-out Scale-out proven into petabytes Leverages Hadoop distributed infrastructure Can leverage Hadoop ecosystem of tools Relatively immature, especially compared to MPP DBs Limited SQL Poor concurrency models prevent support of real-time apps No reliable updates through transactions Intermediate results must fit in memory (Presto) 15

16 Future: Hybrid In-Memory Architectures Memory Cache with Disk - Unsophisticated memory management Pure In-Memory - Very expensive Hybrid In-Memory - Flexible, cost-effective - Controlled by optimizer - In-memory materialized views? 16

17 Summary Future of Databases Predicted Trends Scale-out dominates databases Developers stop worrying about data size and develop new data-driven apps Hybrid in-memory architecture becomes mainstream Predicted Winners Hadoop becomes de facto distributed file system NoSQL used for simple web apps Scale-out SQL RDBMSs replace traditional RDBMSs 17

18 Questions? Bob Baran Senior Sales Engineer May 12, 2015

19 Powering Real-Time Apps on Hadoop Bob Baran Senior Sales Engineer May 12, 2015

20 Who Are We? THE ONLY HADOOP RDBMS Power operational applications on Hadoop Affordable, Scale-Out Commodity hardware Elastic Easy to expand or scale back 10x Better Price/Perf Transactional Real-time updates & ACID Transactions ANSI SQL Leverage existing SQL code, tools, & skills Flexible Support operational and analytical workloads #

21 What People are Saying Recognized as a key innovator in databases Quotes Scaling out on Splice Machine presented some major benefits over Oracle...automatic balancing between clusters...avoiding the costly licensing issues. An alternative to today s RDBMSes, Splice Machine effectively combines traditional relational database technology with the scale-out capabilities of Hadoop. The unique claim of Splice Machine is that it can run transactional applications as well as support analytics on top of Hadoop. Awards 21

22 Advisory Board Advisory Board includes luminaries in databases and technology Mike Franklin Computer Science Chair, UC Berkeley Director, UC Berkeley AmpLab Founder of Apache Spark Roger Bamford Former Principal Architect at Oracle Father of Oracle RAC Marie-Anne Neimat Co-Founder, Times-Ten Database Former VP, Database Eng. at Oracle Ken Rudin Head of Analytics at Facebook Former GM of Oracle Data Warehousing 22

23 Combines the Best of Both Worlds Hadoop Scale-out on commodity servers Proven to 100s of petabytes Efficiently handle sparse data Extensive ecosystem RDBMS ANSI SQL Real-time, concurrent updates ACID transactions ODBC/JDBC support #

24 Focused on OLTP and Real-Time Workloads OLTP Applications Real-Time Web, Mobile, and IoT Applications Real-Time, Operational Reporting Ad-Hoc Analytics Enterprise Data Warehouses Typical Databases MySQL Oracle MySQL Oracle MongoDB Cassandra MySQL Oracle Greenplum Paraccel Netezza Teradata Oracle Sybase IQ Use Cases ERP, CRM, Supply Chain Web, mobile, social IoT Operational Datastores Crystal Reports Exploratory Analytics Data Mining Enterprise Reporting Workload Strengths Real-time updates ACID transactions High concurrency of small reads/ writes Range queries Real-time updates High ingest rates High concurrency of small reads/ writes Range queries Real-time updates Canned, parameterized reports Range queries Complex queries requiring full table scans Append only Parameterized reports against historical data 24

25 OLTP Campaign Management: Harte-Hanks Overview Digital marketing services provider Unified Customer Profile Real-time campaign management OLTP environment with BI reports Challenges Oracle RAC too expensive to scale Queries too slow even up to ½ hour Getting worse expect 30-50% data growth Looked for 9 months for a cost-effective solution Solution Diagram Cross-Channel Campaigns Real-Time Personalization Real-Time Actions Initial Results 10-20x price/perf with no application, BI or ETL rewrites ¼ cost with commodity scale out 3-7x faster through parallelized queries 25

26 Reference Architecture: Operational Data Lake Offload real-time reporting and analytics from expensive OLTP and DW systems ERP CRM OLTP Systems Stream or Batch Updates Operational Data Lake ETL Data Warehouse Executive Business Reports Supply Chain Ad Hoc Analytics HR Datamart Operational Reports & Analytics Real-Time, Event-Driven Apps #

27 Streamlining the Structured Data Pipeline in Hadoop ERP CRM Source Systems Sqoop Traditional Hadoop Pipeline Apply Inferred Schema SQL Query Engines BI Tools Stored as flat files vs. ERP CRM Source Systems Streamlined Hadoop Pipeline Exisiting ETL Tool Stored in same schema BI Tools Advantages Reduced operational costs with less complexity Reduced processing time and errors with fewer translations Real-time updates for data cleansing Better SQL support 27

28 Complementing Existing Hadoop-Based Data Lakes Optimizing storage and querying of structured data as part of ELT or Hadoop query engines Pig ERP CRM OLTP Systems 1 HCATALOG 3 SCHEMA ON READ: Ad-hoc Hadoop queries across structured and unstructured data Supply Chain HR SCHEMA ON INGEST: Streamlined, structured-tostructured integration Structured Data Unstructured Data 2 SCHEMA BEFORE READ: Repository for structured data or metadata from ELT process on unstructured data #

29 Proven Building Blocks: Hadoop and Derby APACHE DERBY ANSI SQL-99 RDBMS Java-based ODBC/JDBC Compliant! APACHE HBASE/HDFS Auto-sharding Real-time updates Fault-tolerance Scalability to 100s of PBs Data replication #

30 HBase: Proven Scale-Out Auto-sharding Scales with commodity hardware Cost-effective from GBs to PBs High availability thru failover and replication LSM-trees #

31 Splice Optimizations to HBase Splice Storage is optimized over raw HBase We use Bitmap Indexes to store data in packed byte arrays This approach allows us to store data in a much smaller footprint than traditional HBase With a TPCH schema, we found a 10X reduction in data size reduction Requires far less hardware and resources to perform the same workload Asynchronous Write Pipeline HBase writes (puts) are not pipelined and block while the call is being made Splice s write pipeline allows us to reach speeds of over 100K writes / second per HBase node This allows extremely high ingest speeds without requiring more hardware and custom code Transactions As scalability increases, the likelihood of failures increases We utilize Snapshot Isolation to make sure if there is a failure, it does not corrupt existing data RDBMS Capabilities The use of SQL vs. custom scans and the ability for an optimizer to choose the best access path to the data Core Data Management functions (Indexes, Constraints, typed columns, etc.) 31

32 Distributed, Parallelized Query Execution Parallelized computation across cluster Moves computation to the data Utilizes HBase co-processors No MapReduce HBase Co-Processor!HBase Server Memory Space L G N E E D #

33 ANSI SQL-99 Coverage Data types e.g., INTEGER, REAL, CHARACTER, DATE, BOOLEAN, BIGINT Conditional functions e.g., CASE, searched CASE DDL e.g., CREATE TABLE, CREATE SCHEMA, ALTER TABLE, DELETE, UPDATE Privileges e.g., privileges for SELECT, DELETE, INSERT, EXECUTE Predicates e.g., IN, BETWEEN, LIKE, EXISTS Cursors e.g., updatable, read-only, positioned DELETE/UPDATE DML e.g., INSERT, DELETE, UPDATE, SELECT Joins e.g., INNER JOIN, LEFT OUTER JOIN Query specification e.g., SELECT DISTINCT, GROUP BY, HAVING SET functions e.g., UNION, ABS, MOD, ALL, CHECK Transactions e.g., COMMIT, ROLLBACK, READ COMMITTED, REPEATABLE READ, READ UNCOMMITTED, Snapshot Isolation Sub-queries Aggregation functions e.g., AVG, MAX, COUNT String functions e.g., SUBSTRING, concatenation, UPPER, LOWER, POSITION, TRIM, LENGTH Triggers User-defined functions (UDFs) Views including grouped views Window Functions (rank, rownumber, ) 33

34 Window Functions (Advanced Analytics Functions) Analytics such as Running total, Moving averages, Top-N Queries Performs calculations across a set of table rows related to the current row in the window Similar to aggregate functions with two significant differences: Outputs one row for each input value it operates upon. Groups rows with window partitioning and frame clauses vs. Group BY SPLICE MACHINE Currently Supports RANK DENSE_RANK ROW NUMBER AVG SUM COUNT MAX MIN 34

35 Lockless, ACID transactions Adds multi-row, multi-table transactions to HBase with rollback Fast, lockless, high concurrency Extends research from Google Percolator, Yahoo Labs, U of Waterloo Patent pending technology #

36 Customer Performance Benchmarks Typically 10x price/performance improvement SPEED 3-7x 20x 7x FASTER VS. PRICE/ PERFORMANCE 10-20x 10x 30x LOWER #

37 Applications, BI / SQL tool support via ODBC/JDBC #

38 Splice Machine Safe Journey Process Initial Overview Rapid Assessment Proof of Concept Pilot Project Enterprise Implementation Splice Machine overview Set the stage for Rapid Assessment Half day workshop Assess Splice Machine fit Identity target use cases Risk assessment of use cases Agree upon success criteria Prove client use case on Splice Machine hosted environment Benchmark using customer queries and schema On Customer data or generated data that resembles customer data Identify paid pilot use case with limited change management impact Install Splice Machine on client environment Deploy use case/ application on client data Prove Splice Machine against key requirements Kickstart Requirements Design/Dev QA Test Cutover Hypercare 1 day 5 days (including prep) 2 weeks 3-6 weeks 3-10 months 38

39 Safe Journey Enterprise Implementation Stages Kickstart Requirements Design/Dev QA Test Parallel Ops Cutover Hypercare Packaged 2 week program to get new client off to strong start on solid foundation! Incorporates: Splice Architecture & Development courses Risk Assessment Workshop Implementation Blueprint Establish clear functional and performance requirements document! Can be a refresh only if project is a port of an existing app to Splice Based on Agile method. Phase is divided into 2 week sprints! Stories covering a set of required capabilities are assigned to each developer! A design doc is created, code is written, unit tests are written and executed until they pass The QA test period includes: Performance Test End-to-End System Integration Test User Acceptance Test! Depending on scale of project there may be multiple iterations of each test with break/ fix cycles in between Used when an existing system is being ported to Splice Machine from another database! The new Splice Machine-based system runs side by side with the old system for a period of time Optional Formal period in which Splicebased solution goes-live and pre-existing system is deprecated Period of onsite support during cutover and for a period immediately following golive Optional #

40 Common Risks and Mitigation Strategies Data migration Risk: Clients are typically migrating very large data sets to Splice Machine. Issues with migration of certain data types such as dates can waste a lot of time reloading large amounts of data Solution: First migrate a small subset of tables that contain all required Changes data types. to source Ensure schema these during migrate implementation successfully before migrating the entire database Risk: Changes to the schema of the source database to be migrated during the course of the implementation will lead to a significant amount of rework and reloading of data, adding unplanned time to the project Solution: All stakeholders agree up front to freeze the schema as of an Stored agreed procedure upon date conversion prior to the Design/Development stage. Risk: Stored procedures need to be converted from the original language (e.g., PL/SQL) to Java. Complex stored procedures make include significant amounts or procedural code as well as multiple SQL statements Solution: Carefully review the function and design of SPs to be 40

41 Common Risks and Mitigation Strategies SQL compatibility Risk: Even though Splice Machine conforms to the ANSI 99+ SQL standard, virtually every database has unique syntax and some queries may need to be modified. Additionally, SQL generated by packaged applications may not be modifiable. Solution: Formal review of SQL syntax during the requirements phase. Modify relevant queries during the Design/Dev phase. If not modifiable an enhancement request for Splice Machine to support the required syntax out of the box may needed. Indexing Risk: Proper indexing is usually important to maximize the performance of Splice Machine. Splice Machine indexes are likely to differ from the indexes required for a traditional RDBMS Solution: Ensure that query performance SLAs are clearly defined in the Requirements phase. Incorporate proper index design early in the Design/Dev phase. Assume some iteration will be required to achieve the optimal indexes Hadoop knowledge Risk: Project stakeholders often have limited knowledge of Hadoop and the distributed computing paradigm. This can lead to confusion about the Splice Machine value proposition and the and the advantages of moving to a scale-out architecture Solution: Include the Splice Machine Kickoff Program at the beginning of the implementation project. This includes essential training on Hadoop and related fundamentals concepts critical to realizing value from a Splice Machine deployment 41

42 Summary THE ONLY HADOOP RDBMS Power operational applications on Hadoop Affordable, Scale-Out Commodity hardware Elastic Easy to expand or scale back 10x Better Price/Perf Transactional Real-time updates & ACID Transactions ANSI SQL Leverage existing SQL code, tools, & skills Flexible Support operational and analytical workloads #

43 Questions? Bob Baran Senior Sales Engineer May 12, 2015

Splice Machine: SQL-on-Hadoop Evaluation Guide www.splicemachine.com

Splice Machine: SQL-on-Hadoop Evaluation Guide www.splicemachine.com REPORT Splice Machine: SQL-on-Hadoop Evaluation Guide www.splicemachine.com The content of this evaluation guide, including the ideas and concepts contained within, are the property of Splice Machine,

More information

HDP Hadoop From concept to deployment.

HDP Hadoop From concept to deployment. HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some

More information

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Wayne W. Eckerson Director of Research, TechTarget Founder, BI Leadership Forum Business Analytics

More information

Hadoop and Relational Database The Best of Both Worlds for Analytics Greg Battas Hewlett Packard

Hadoop and Relational Database The Best of Both Worlds for Analytics Greg Battas Hewlett Packard Hadoop and Relational base The Best of Both Worlds for Analytics Greg Battas Hewlett Packard The Evolution of Analytics Mainframe EDW Proprietary MPP Unix SMP MPP Appliance Hadoop? Questions Is Hadoop

More information

Oracle Database 12c Plug In. Switch On. Get SMART.

Oracle Database 12c Plug In. Switch On. Get SMART. Oracle Database 12c Plug In. Switch On. Get SMART. Duncan Harvey Head of Core Technology, Oracle EMEA March 2015 Safe Harbor Statement The following is intended to outline our general product direction.

More information

The First Hybrid, In-Memory RDBMS Powered by Hadoop and Spark

The First Hybrid, In-Memory RDBMS Powered by Hadoop and Spark WHITE PAPER The First Hybrid, In-Memory RDBMS Powered by Hadoop and Spark www.splicemachine.com Copyright 2015 Splice Machine. All rights reserved. The content of this white paper, including the ideas

More information

Trafodion Operational SQL-on-Hadoop

Trafodion Operational SQL-on-Hadoop Trafodion Operational SQL-on-Hadoop SophiaConf 2015 Pierre Baudelle, HP EMEA TSC July 6 th, 2015 Hadoop workload profiles Operational Interactive Non-interactive Batch Real-time analytics Operational SQL

More information

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani Technical Architect - Big Data Syntel Agenda Welcome to the Zoo! Evolution Timeline Traditional BI/DW Architecture Where Hadoop Fits In 2 Welcome to

More information

Can the Elephants Handle the NoSQL Onslaught?

Can the Elephants Handle the NoSQL Onslaught? Can the Elephants Handle the NoSQL Onslaught? Avrilia Floratou, Nikhil Teletia David J. DeWitt, Jignesh M. Patel, Donghui Zhang University of Wisconsin-Madison Microsoft Jim Gray Systems Lab Presented

More information

How To Handle Big Data With A Data Scientist

How To Handle Big Data With A Data Scientist III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution

More information

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved. Collaborative Big Data Analytics 1 Big Data Is Less About Size, And More About Freedom TechCrunch!!!!!!!!! Total data: bigger than big data 451 Group Findings: Big Data Is More Extreme Than Volume Gartner!!!!!!!!!!!!!!!

More information

EMC Federation Big Data Solutions. Copyright 2015 EMC Corporation. All rights reserved.

EMC Federation Big Data Solutions. Copyright 2015 EMC Corporation. All rights reserved. EMC Federation Big Data Solutions 1 Introduction to data analytics Federation offering 2 Traditional Analytics! Traditional type of data analysis, sometimes called Business Intelligence! Type of analytics

More information

Enterprise Operational SQL on Hadoop Trafodion Overview

Enterprise Operational SQL on Hadoop Trafodion Overview Enterprise Operational SQL on Hadoop Trafodion Overview Rohit Jain Distinguished & Chief Technologist Strategic & Emerging Technologies Enterprise Database Solutions Copyright 2012 Hewlett-Packard Development

More information

INTRODUCTION TO CASSANDRA

INTRODUCTION TO CASSANDRA INTRODUCTION TO CASSANDRA This ebook provides a high level overview of Cassandra and describes some of its key strengths and applications. WHAT IS CASSANDRA? Apache Cassandra is a high performance, open

More information

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ End to End Solution to Accelerate Data Warehouse Optimization Franco Flore Alliance Sales Director - APJ Big Data Is Driving Key Business Initiatives Increase profitability, innovation, customer satisfaction,

More information

How To Scale Out Of A Nosql Database

How To Scale Out Of A Nosql Database Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 [email protected] www.scch.at Michael Zwick DI

More information

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica

More information

A Scalable Data Transformation Framework using the Hadoop Ecosystem

A Scalable Data Transformation Framework using the Hadoop Ecosystem A Scalable Data Transformation Framework using the Hadoop Ecosystem Raj Nair Director Data Platform Kiru Pakkirisamy CTO AGENDA About Penton and Serendio Inc Data Processing at Penton PoC Use Case Functional

More information

SAP Real-time Data Platform. April 2013

SAP Real-time Data Platform. April 2013 SAP Real-time Data Platform April 2013 Agenda Introduction SAP Real Time Data Platform Overview SAP Sybase ASE SAP Sybase IQ SAP EIM Questions and Answers 2012 SAP AG. All rights reserved. 2 Introduction

More information

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect Big Data & QlikView Democratizing Big Data Analytics David Freriks Principal Solution Architect TDWI Vancouver Agenda What really is Big Data? How do we separate hype from reality? How does that relate

More information

Moving From Hadoop to Spark

Moving From Hadoop to Spark + Moving From Hadoop to Spark Sujee Maniyam Founder / Principal @ www.elephantscale.com [email protected] Bay Area ACM meetup (2015-02-23) + HI, Featured in Hadoop Weekly #109 + About Me : Sujee

More information

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Datenverwaltung im Wandel - Building an Enterprise Data Hub with Datenverwaltung im Wandel - Building an Enterprise Data Hub with Cloudera Bernard Doering Regional Director, Central EMEA, Cloudera Cloudera Your Hadoop Experts Founded 2008, by former employees of Employees

More information

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here> s Big Data solutions Roger Wullschleger DBTA Workshop on Big Data, Cloud Data Management and NoSQL 10. October 2012, Stade de Suisse, Berne 1 The following is intended to outline

More information

Tap into Hadoop and Other No SQL Sources

Tap into Hadoop and Other No SQL Sources Tap into Hadoop and Other No SQL Sources Presented by: Trishla Maru What is Big Data really? The Three Vs of Big Data According to Gartner Volume Volume Orders of magnitude bigger than conventional data

More information

Using RDBMS, NoSQL or Hadoop?

Using RDBMS, NoSQL or Hadoop? Using RDBMS, NoSQL or Hadoop? DOAG Conference 2015 Jean- Pierre Dijcks Big Data Product Management Server Technologies Copyright 2014 Oracle and/or its affiliates. All rights reserved. Data Ingest 2 Ingest

More information

Native Connectivity to Big Data Sources in MSTR 10

Native Connectivity to Big Data Sources in MSTR 10 Native Connectivity to Big Data Sources in MSTR 10 Bring All Relevant Data to Decision Makers Support for More Big Data Sources Optimized Access to Your Entire Big Data Ecosystem as If It Were a Single

More information

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum Big Data Analytics with EMC Greenplum and Hadoop Big Data Analytics with EMC Greenplum and Hadoop Ofir Manor Pre Sales Technical Architect EMC Greenplum 1 Big Data and the Data Warehouse Potential All

More information

HDP Enabling the Modern Data Architecture

HDP Enabling the Modern Data Architecture HDP Enabling the Modern Data Architecture Herb Cunitz President, Hortonworks Page 1 Hortonworks enables adoption of Apache Hadoop through HDP (Hortonworks Data Platform) Founded in 2011 Original 24 architects,

More information

The Future of Data Management

The Future of Data Management The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class

More information

THE DEVELOPER GUIDE TO BUILDING STREAMING DATA APPLICATIONS

THE DEVELOPER GUIDE TO BUILDING STREAMING DATA APPLICATIONS THE DEVELOPER GUIDE TO BUILDING STREAMING DATA APPLICATIONS WHITE PAPER Successfully writing Fast Data applications to manage data generated from mobile, smart devices and social interactions, and the

More information

CIO Guide How to Use Hadoop with Your SAP Software Landscape

CIO Guide How to Use Hadoop with Your SAP Software Landscape SAP Solutions CIO Guide How to Use with Your SAP Software Landscape February 2013 Table of Contents 3 Executive Summary 4 Introduction and Scope 6 Big Data: A Definition A Conventional Disk-Based RDBMs

More information

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future

More information

Big Data Technology ดร.ช ชาต หฤไชยะศ กด. Choochart Haruechaiyasak, Ph.D.

Big Data Technology ดร.ช ชาต หฤไชยะศ กด. Choochart Haruechaiyasak, Ph.D. Big Data Technology ดร.ช ชาต หฤไชยะศ กด Choochart Haruechaiyasak, Ph.D. Speech and Audio Technology Laboratory (SPT) National Electronics and Computer Technology Center (NECTEC) National Science and Technology

More information

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web

More information

Facebook s Petabyte Scale Data Warehouse using Hive and Hadoop

Facebook s Petabyte Scale Data Warehouse using Hive and Hadoop Facebook s Petabyte Scale Data Warehouse using Hive and Hadoop Why Another Data Warehousing System? Data, data and more data 200GB per day in March 2008 12+TB(compressed) raw data per day today Trends

More information

Preview of Oracle Database 12c In-Memory Option. Copyright 2013, Oracle and/or its affiliates. All rights reserved.

Preview of Oracle Database 12c In-Memory Option. Copyright 2013, Oracle and/or its affiliates. All rights reserved. Preview of Oracle Database 12c In-Memory Option 1 The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any

More information

Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru

Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy Presented by: Jeffrey Zhang and Trishla Maru Agenda Big Data Overview All About Hadoop What is Hadoop? How does MicroStrategy connects to Hadoop?

More information

How Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns

How Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns How Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns Table of Contents Abstract... 3 Introduction... 3 Definition... 3 The Expanding Digitization

More information

Navigating the Big Data infrastructure layer Helena Schwenk

Navigating the Big Data infrastructure layer Helena Schwenk mwd a d v i s o r s Navigating the Big Data infrastructure layer Helena Schwenk A special report prepared for Actuate May 2013 This report is the second in a series of four and focuses principally on explaining

More information

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE AGENDA Introduction to Big Data Introduction to Hadoop HDFS file system Map/Reduce framework Hadoop utilities Summary BIG DATA FACTS In what timeframe

More information

Luncheon Webinar Series May 13, 2013

Luncheon Webinar Series May 13, 2013 Luncheon Webinar Series May 13, 2013 InfoSphere DataStage is Big Data Integration Sponsored By: Presented by : Tony Curcio, InfoSphere Product Management 0 InfoSphere DataStage is Big Data Integration

More information

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics ESSENTIALS EMC ISILON Use the industry's first and only scale-out NAS solution with native Hadoop

More information

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances INSIGHT Oracle's All- Out Assault on the Big Data Market: Offering Hadoop, R, Cubes, and Scalable IMDB in Familiar Packages Carl W. Olofson IDC OPINION Global Headquarters: 5 Speen Street Framingham, MA

More information

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84 Index A Amazon Web Services (AWS), 50, 58 Analytics engine, 21 22 Apache Kafka, 38, 131 Apache S4, 38, 131 Apache Sqoop, 37, 131 Appliance pattern, 104 105 Application architecture, big data analytics

More information

An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics An Oracle White Paper November 2010 Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics 1 Introduction New applications such as web searches, recommendation engines,

More information

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the

More information

BIG DATA TECHNOLOGY. Hadoop Ecosystem

BIG DATA TECHNOLOGY. Hadoop Ecosystem BIG DATA TECHNOLOGY Hadoop Ecosystem Agenda Background What is Big Data Solution Objective Introduction to Hadoop Hadoop Ecosystem Hybrid EDW Model Predictive Analysis using Hadoop Conclusion What is Big

More information

Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster. Nov 7, 2012

Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster. Nov 7, 2012 Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster Nov 7, 2012 Who I Am Robert Lancaster Solutions Architect, Hotel Supply Team [email protected] @rob1lancaster Organizer of Chicago

More information

Structured Data Storage

Structured Data Storage Structured Data Storage Xgen Congress Short Course 2010 Adam Kraut BioTeam Inc. Independent Consulting Shop: Vendor/technology agnostic Staffed by: Scientists forced to learn High Performance IT to conduct

More information

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview Programming Hadoop 5-day, instructor-led BD-106 MapReduce Overview The Client Server Processing Pattern Distributed Computing Challenges MapReduce Defined Google's MapReduce The Map Phase of MapReduce

More information

Actian SQL in Hadoop Buyer s Guide

Actian SQL in Hadoop Buyer s Guide Actian SQL in Hadoop Buyer s Guide Contents Introduction: Big Data and Hadoop... 3 SQL on Hadoop Benefits... 4 Approaches to SQL on Hadoop... 4 The Top 10 SQL in Hadoop Capabilities... 5 SQL in Hadoop

More information

Evaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing

Evaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing Evaluating NoSQL for Enterprise Applications Dirk Bartels VP Strategy & Marketing Agenda The Real Time Enterprise The Data Gold Rush Managing The Data Tsunami Analytics and Data Case Studies Where to go

More information

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform

More information

Introduction to Apache Cassandra

Introduction to Apache Cassandra Introduction to Apache Cassandra White Paper BY DATASTAX CORPORATION JULY 2013 1 Table of Contents Abstract 3 Introduction 3 Built by Necessity 3 The Architecture of Cassandra 4 Distributing and Replicating

More information

Big Data Success Step 1: Get the Technology Right

Big Data Success Step 1: Get the Technology Right Big Data Success Step 1: Get the Technology Right TOM MATIJEVIC Director, Business Development ANDY MCNALIS Director, Data Management & Integration MetaScale is a subsidiary of Sears Holdings Corporation

More information

Hadoop Ecosystem B Y R A H I M A.

Hadoop Ecosystem B Y R A H I M A. Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open

More information

The Internet of Things and Big Data: Intro

The Internet of Things and Big Data: Intro The Internet of Things and Big Data: Intro John Berns, Solutions Architect, APAC - MapR Technologies April 22 nd, 2014 1 What This Is; What This Is Not It s not specific to IoT It s not about any specific

More information

Offload Enterprise Data Warehouse (EDW) to Big Data Lake. Ample White Paper

Offload Enterprise Data Warehouse (EDW) to Big Data Lake. Ample White Paper Offload Enterprise Data Warehouse (EDW) to Big Data Lake Oracle Exadata, Teradata, Netezza and SQL Server Ample White Paper EDW (Enterprise Data Warehouse) Offloads The EDW (Enterprise Data Warehouse)

More information

#mstrworld. Tapping into Hadoop and NoSQL Data Sources in MicroStrategy. Presented by: Trishla Maru. #mstrworld

#mstrworld. Tapping into Hadoop and NoSQL Data Sources in MicroStrategy. Presented by: Trishla Maru. #mstrworld Tapping into Hadoop and NoSQL Data Sources in MicroStrategy Presented by: Trishla Maru Agenda Big Data Overview All About Hadoop What is Hadoop? How does MicroStrategy connects to Hadoop? Customer Case

More information

Lecture Data Warehouse Systems

Lecture Data Warehouse Systems Lecture Data Warehouse Systems Eva Zangerle SS 2013 PART C: Novel Approaches in DW NoSQL and MapReduce Stonebraker on Data Warehouses Star and snowflake schemas are a good idea in the DW world C-Stores

More information

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap 3 key strategic advantages, and a realistic roadmap for what you really need, and when 2012, Cognizant Topics to be discussed

More information

The 3 questions to ask yourself about BIG DATA

The 3 questions to ask yourself about BIG DATA The 3 questions to ask yourself about BIG DATA Do you have a big data problem? Companies looking to tackle big data problems are embarking on a journey that is full of hype, buzz, confusion, and misinformation.

More information

Reference Architecture, Requirements, Gaps, Roles

Reference Architecture, Requirements, Gaps, Roles Reference Architecture, Requirements, Gaps, Roles The contents of this document are an excerpt from the brainstorming document M0014. The purpose is to show how a detailed Big Data Reference Architecture

More information

Building Your Big Data Team

Building Your Big Data Team Building Your Big Data Team With all the buzz around Big Data, many companies have decided they need some sort of Big Data initiative in place to stay current with modern data management requirements.

More information

An Integrated Analytics & Big Data Infrastructure September 21, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle Enterprise

An Integrated Analytics & Big Data Infrastructure September 21, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle Enterprise An Integrated Analytics & Big Data Infrastructure September 21, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle Enterprise Solutions Group The following is intended to outline our

More information

Presenters: Luke Dougherty & Steve Crabb

Presenters: Luke Dougherty & Steve Crabb Presenters: Luke Dougherty & Steve Crabb About Keylink Keylink Technology is Syncsort s partner for Australia & New Zealand. Our Customers: www.keylink.net.au 2 ETL is THE best use case for Hadoop. ShanH

More information

Using distributed technologies to analyze Big Data

Using distributed technologies to analyze Big Data Using distributed technologies to analyze Big Data Abhijit Sharma Innovation Lab BMC Software 1 Data Explosion in Data Center Performance / Time Series Data Incoming data rates ~Millions of data points/

More information

Cisco IT Hadoop Journey

Cisco IT Hadoop Journey Cisco IT Hadoop Journey Srini Desikan, Program Manager IT 2015 MapR Technologies 1 Agenda Hadoop Platform Timeline Key Decisions / Lessons Learnt Data Lake Hadoop s place in IT Data Platforms Use Cases

More information

Apache Hadoop in the Enterprise. Dr. Amr Awadallah, CTO/Founder @awadallah, [email protected]

Apache Hadoop in the Enterprise. Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera.com Apache Hadoop in the Enterprise Dr. Amr Awadallah, CTO/Founder @awadallah, [email protected] Cloudera The Leader in Big Data Management Powered by Apache Hadoop The Leading Open Source Distribution of Apache

More information

Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time?

Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time? Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time? Kai Wähner [email protected] @KaiWaehner www.kai-waehner.de Disclaimer! These opinions are my own and do not necessarily

More information

Big Data Analytics - Accelerated. stream-horizon.com

Big Data Analytics - Accelerated. stream-horizon.com Big Data Analytics - Accelerated stream-horizon.com Legacy ETL platforms & conventional Data Integration approach Unable to meet latency & data throughput demands of Big Data integration challenges Based

More information

Architectures for Big Data Analytics A database perspective

Architectures for Big Data Analytics A database perspective Architectures for Big Data Analytics A database perspective Fernando Velez Director of Product Management Enterprise Information Management, SAP June 2013 Outline Big Data Analytics Requirements Spectrum

More information

BIG DATA APPLIANCES. July 23, TDWI. R Sathyanarayana. Enterprise Information Management & Analytics Practice EMC Consulting

BIG DATA APPLIANCES. July 23, TDWI. R Sathyanarayana. Enterprise Information Management & Analytics Practice EMC Consulting BIG DATA APPLIANCES July 23, TDWI R Sathyanarayana Enterprise Information Management & Analytics Practice EMC Consulting 1 Big data are datasets that grow so large that they become awkward to work with

More information

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data INFO 1500 Introduction to IT Fundamentals 5. Database Systems and Managing Data Resources Learning Objectives 1. Describe how the problems of managing data resources in a traditional file environment are

More information

IBM Data Retrieval Technologies: RDBMS, BLU, IBM Netezza, and Hadoop

IBM Data Retrieval Technologies: RDBMS, BLU, IBM Netezza, and Hadoop IBM Data Retrieval Technologies: RDBMS, BLU, IBM Netezza, and Hadoop Frank C. Fillmore, Jr. The Fillmore Group, Inc. Session Code: E13 Wed, May 06, 2015 (02:15 PM - 03:15 PM) Platform: Cross-platform Objectives

More information

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014 Forecast of Big Data Trends Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014 Big Data transforms Business 2 Data created every minute Source http://mashable.com/2012/06/22/data-created-every-minute/

More information

PostgreSQL Business Intelligence & Performance Simon Riggs CTO, 2ndQuadrant PostgreSQL Major Contributor

PostgreSQL Business Intelligence & Performance Simon Riggs CTO, 2ndQuadrant PostgreSQL Major Contributor PostgreSQL Business Intelligence & Performance Simon Riggs CTO, 2ndQuadrant PostgreSQL Major Contributor The research leading to these results has received funding from the European Union's Seventh Framework

More information

BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014

BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014 BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014 Ralph Kimball Associates 2014 The Data Warehouse Mission Identify all possible enterprise data assets Select those assets

More information

The Principles of the Business Data Lake

The Principles of the Business Data Lake The Principles of the Business Data Lake The Business Data Lake Culture eats Strategy for Breakfast, so said Peter Drucker, elegantly making the point that the hardest thing to change in any organization

More information

Dell In-Memory Appliance for Cloudera Enterprise

Dell In-Memory Appliance for Cloudera Enterprise Dell In-Memory Appliance for Cloudera Enterprise Hadoop Overview, Customer Evolution and Dell In-Memory Product Details Author: Armando Acosta Hadoop Product Manager/Subject Matter Expert [email protected]/

More information

Oracle Database: SQL and PL/SQL Fundamentals NEW

Oracle Database: SQL and PL/SQL Fundamentals NEW Oracle University Contact Us: + 38516306373 Oracle Database: SQL and PL/SQL Fundamentals NEW Duration: 5 Days What you will learn This Oracle Database: SQL and PL/SQL Fundamentals training delivers the

More information

NoSQL for SQL Professionals William McKnight

NoSQL for SQL Professionals William McKnight NoSQL for SQL Professionals William McKnight Session Code BD03 About your Speaker, William McKnight President, McKnight Consulting Group Frequent keynote speaker and trainer internationally Consulted to

More information

Oracle Database In-Memory The Next Big Thing

Oracle Database In-Memory The Next Big Thing Oracle Database In-Memory The Next Big Thing Maria Colgan Master Product Manager #DBIM12c Why is Oracle do this Oracle Database In-Memory Goals Real Time Analytics Accelerate Mixed Workload OLTP No Changes

More information

Big Data Technologies Compared June 2014

Big Data Technologies Compared June 2014 Big Data Technologies Compared June 2014 Agenda What is Big Data Big Data Technology Comparison Summary Other Big Data Technologies Questions 2 What is Big Data by Example The SKA Telescope is a new development

More information

EMC/Greenplum Driving the Future of Data Warehousing and Analytics

EMC/Greenplum Driving the Future of Data Warehousing and Analytics EMC/Greenplum Driving the Future of Data Warehousing and Analytics EMC 2010 Forum Series 1 Greenplum Becomes the Foundation of EMC s Data Computing Division E M C A CQ U I R E S G R E E N P L U M Greenplum,

More information

Lecture 10: HBase! Claudia Hauff (Web Information Systems)! [email protected]

Lecture 10: HBase! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl Big Data Processing, 2014/15 Lecture 10: HBase!! Claudia Hauff (Web Information Systems)! [email protected] 1 Course content Introduction Data streams 1 & 2 The MapReduce paradigm Looking behind the

More information

CitusDB Architecture for Real-Time Big Data

CitusDB Architecture for Real-Time Big Data CitusDB Architecture for Real-Time Big Data CitusDB Highlights Empowers real-time Big Data using PostgreSQL Scales out PostgreSQL to support up to hundreds of terabytes of data Fast parallel processing

More information

Business Intelligence for Big Data

Business Intelligence for Big Data Business Intelligence for Big Data Will Gorman, Vice President, Engineering May, 2011 2010, Pentaho. All Rights Reserved. www.pentaho.com. What is BI? Business Intelligence = reports, dashboards, analysis,

More information

Roadmap Talend : découvrez les futures fonctionnalités de Talend

Roadmap Talend : découvrez les futures fonctionnalités de Talend Roadmap Talend : découvrez les futures fonctionnalités de Talend Cédric Carbone Talend Connect 9 octobre 2014 Talend 2014 1 Connecting the Data-Driven Enterprise Talend 2014 2 Agenda Agenda Why a Unified

More information

Performance and Scalability Overview

Performance and Scalability Overview Performance and Scalability Overview This guide provides an overview of some of the performance and scalability capabilities of the Pentaho Business Analytics platform. PENTAHO PERFORMANCE ENGINEERING

More information

Accelerating and Simplifying Apache

Accelerating and Simplifying Apache Accelerating and Simplifying Apache Hadoop with Panasas ActiveStor White paper NOvember 2012 1.888.PANASAS www.panasas.com Executive Overview The technology requirements for big data vary significantly

More information

More Data in Less Time

More Data in Less Time More Data in Less Time Leveraging Cloudera CDH as an Operational Data Store Daniel Tydecks, Systems Engineering DACH & CE Goals of an Operational Data Store Load Data Sources Traditional Architecture Operational

More information

Big Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth

Big Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth MAKING BIG DATA COME ALIVE Big Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth Steve Gonzales, Principal Manager [email protected]

More information

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics In Organizations Mark Vervuurt Cluster Data Science & Analytics AGENDA 1. Yellow Elephant 2. Data Ingestion & Complex Event Processing 3. SQL on Hadoop 4. NoSQL 5. InMemory 6. Data Science & Machine Learning

More information

How to Choose Between Hadoop, NoSQL and RDBMS

How to Choose Between Hadoop, NoSQL and RDBMS How to Choose Between Hadoop, NoSQL and RDBMS Keywords: Jean-Pierre Dijcks Oracle Redwood City, CA, USA Big Data, Hadoop, NoSQL Database, Relational Database, SQL, Security, Performance Introduction A

More information