Deep Quick-Dive into Big Data ETL with ODI12c and Oracle Big Data Connectors Mark Rittman, CTO, Rittman Mead Oracle Openworld 2014, San Francisco

Size: px
Start display at page:

Download "Deep Quick-Dive into Big Data ETL with ODI12c and Oracle Big Data Connectors Mark Rittman, CTO, Rittman Mead Oracle Openworld 2014, San Francisco"

Transcription

1 Deep Quick-Dive into Big Data ETL with ODI12c and Oracle Big Data Connectors Mark Rittman, CTO, Rittman Mead Oracle Openworld 2014, San Francisco

2 About the Speaker Mark Rittman, Co-Founder of Rittman Mead Oracle ACE Director, specialising in Oracle BI&DW 14 Years Experience with Oracle Technology Regular columnist for Oracle Magazine Author of two Oracle Press Oracle BI books Oracle Business Intelligence Developers Guide Oracle Exalytics Revealed Writer for Rittman Mead Blog : mark.rittman@rittmanmead.com Twitter

3 Traditional Data Warehouse / BI Architectures Three-layer architecture - staging, foundation and access/performance All three layers stored in a relational database (Oracle) ETL used to move data from layer-to-layer Traditional Relational Data Warehouse Data Load Staging Foundation / ODS Performance / Dimensional Direct Read BI Tool (OBIEE) with metadata layer Traditional structured data sources Data Load Data Load ETL ETL Data Load OLAP / In-Memory Tool with data load into own database

4 Introducing Hadoop A new approach to data processing and data storage Rather than a small number of large, powerful servers, it spreads processing over large numbers of small, cheap, redundant servers Spreads the data you re processing over lots of distributed nodes Has scheduling/workload process that sends Job Tracker parts of a job to each of the nodes - a bit like Oracle Parallel Execution And does the processing where the data sits - a bit like Exadata storage servers Shared-nothing architecture Low-cost and highly horizontal scalable Task Tracker Task Tracker Task Tracker Task Tracker Data Node Data Node Task Tracker Task Tracker

5 Hadoop Tenets : Simplified Distributed Processing Hadoop, through MapReduce, breaks processing down into simple stages Map : select the columns and values you re interested in, pass through as key/value pairs Reduce : aggregate the results Most ETL jobs can be broken down into filtering, projecting and aggregating Hadoop then automatically runs job on cluster Share-nothing small chunks of work Run the job on the node where the data is Handle faults etc Gather the results back in Mapper Filter, Project Reducer Aggregate Mapper Filter, Project Reducer Aggregate Mapper Filter, Project Output One HDFS file per reducer, in a directory

6 Moving Data In, Around and Out of Hadoop Three stages to Hadoop ETL work, with dedicated Apache / other tools Load : receive files in batch, or in real-time (logs, events) Transform : process & transform data to answer questions Store / Export : store in structured form, or export to RDBMS using Sqoop RDBMS Imports Real-Time Logs / Events Loading Stage! Processing Stage! Store / Export Stage! File Exports RDBMS Exports File / Unstructured Imports

7 ETL Offloading Special use-case : offloading low-value, simple ETL work to a Hadoop cluster Receiving, aggregating, filtering and pre-processing data for an RDBMS data warehouse Potentially free-up high-value Exadata / RBDMS servers for analytic work

8 Core Apache Hadoop Tools Apache Hadoop, including MapReduce and HDFS Scaleable, fault-tolerant file storage for Hadoop Parallel programming framework for Hadoop Apache Hive SQL abstraction layer over HDFS Perform set-based ETL within Hadoop Apache Pig, Spark Dataflow-type languages over HDFS, Hive etc Extensible through UDFs, streaming etc Apache Flume, Apache Sqoop, Apache Kafka Real-time and batch loading into HDFS Modular, fault-tolerant, wide source/target coverage

9 Hive as the Hadoop SQL Access Layer MapReduce jobs are typically written in Java, but Hive can make this simpler Hive is a query environment over Hadoop/MapReduce to support SQL-like queries Hive server accepts HiveQL queries via HiveODBC or HiveJDBC, automatically creates MapReduce jobs against data previously loaded into the Hive HDFS tables Approach used by ODI and OBIEE to gain access to Hadoop data Allows Hadoop data to be accessed just like any other data source (sort of...)

10 Oracle s Big Data Products Oracle Big Data Appliance - Engineered System for Big Data Acquisition and Processing Cloudera Distribution of Hadoop Cloudera Manager Open-source R Oracle NoSQL Database Oracle Enterprise Linux + Oracle JVM New - Oracle Big Data SQL Oracle Big Data Connectors Oracle Loader for Hadoop (Hadoop > Oracle RBDMS) Oracle Direct Connector for HDFS (HFDS > Oracle RBDMS) Oracle R Advanced Analytics for Hadoop Oracle Data Integrator 12c

11 Oracle Loader for Hadoop Oracle technology for accessing Hadoop data, and loading it into an Oracle database Pushes data transformation, heavy lifting to the Hadoop cluster, using MapReduce Direct-path loads into Oracle Database, partitioned and non-partitioned Online and offline loads Key technology for fast load of Hadoop results into Oracle DB

12 Oracle Direct Connector for HDFS Enables HDFS as a data-source for Oracle Database external tables Effectively provides Oracle SQL access over HDFS Supports data query, or import into Oracle DB Treat HDFS-stored files in the same way as regular files But with HDFS s low-cost and fault-tolerance

13 Oracle R Advanced Analytics for Hadoop Add-in to R that extends capability to Hadoop Gives R the ability to create Map and Reduce functions Extends R data frames to include Hive tables Automatically run R functions on Hadoop by using Hive tables as source

14 Just Released - Oracle Big Data SQL Part of Oracle Big Data 4.0 (BDA-only) Also requires Oracle Database 12c, Oracle Exadata Database Machine Extends Oracle Data Dictionary to cover Hive Extends Oracle SQL and SmartScan to Hadoop Extends Oracle Security Model over Hadoop Fine-grained access control Data redaction, data masking SQL Queries Exadata Database Server SmartScan SmartScan Exadata Storage Servers Hadoop Cluster Oracle Big Data SQL

15 Bringing it All Together : Oracle Data Integrator 12c ODI provides an excellent framework for running Hadoop ETL jobs ELT approach pushes transformations down to Hadoop - leveraging power of cluster Hive, HBase, Sqoop and OLH/ODCH KMs provide native Hadoop loading / transformation Whilst still preserving RDBMS push-down Extensible to cover Pig, Spark etc Process orchestration Data quality / error handling Metadata and model-driven

16 ODI and Big Data Integration Example In this example, we ll show an end-to-end ETL process on Hadoop using ODI12c & BDA Scenario: load webserver log data into Hadoop, process enhance and aggregate, then load final summary table into Oracle Database 12c Process using Hadoop framework Leverage Big Data Connectors Metadata-based ETL development using ODI12c Real-world example

17 ETL & Data Flow through BDA System Five-step process to load, transform, aggregate and filter incoming log data Leverage ODI s capabilities where possible Make use of Hadoop power + scalability! Flume Agent Apache HTTP Server Flume Messaging TCP Port 4545 (example) 1 Flume Agent Log Files (HDFS) IKM File to Hive using RegEx SerDe hive_raw_apache_ access_log (Hive Table) posts (Hive Table) IKM Hive Control Append (Hive table join & load into target hive table) 2 3 log_entries_ and post_detail (Hive Table)! Sqoop extract categories_sql_ extract (Hive Table) IKM Hive Control Append (Hive table join & load into target hive table) hive_raw_apache_ access_log (Hive Table) 4 Geocoding IP>Country list (Hive Table) IKM Hive Transform (Hive streaming through Python script) hive_raw_apache_ access_log (Hive Table) 5 IKM File / Hive to Oracle (bulk unload to Oracle DB)

18 ETL Considerations : Using Hive vs. Regular Oracle SQL Not all join types are available in Hive - joins must be equality joins No sequences, no primary keys on tables Generally need to stage Oracle or other external data into Hive before joining to it Hive latency - not good for small microbatch-type work But other alternatives exist - Spark, Impala etc Hive is INSERT / APPEND only - no updates, deletes etc But HBase may be suitable for CRUD-type loading Don t assume that HiveQL == Oracle SQL Test assumptions before committing to platform vs.

19 Five-Step ETL Process 1. Take the incoming log files (via Flume) and load into a structured Hive table 2. Enhance data from that table to include details on authors, posts from other Hive tables 3. Join to some additional ref. data held in an Oracle database, to add author details 4. Geocode the log data, so that we have the country for each calling IP address 5. Output the data in summary form to an Oracle database

20 Using Flume to Transport Log Files to BDA Apache Flume is the standard way to transport log files from source through to target Initial use-case was webserver log files, but can transport any file from A>B Does not do data transformation, but can send to multiple targets / target types Mechanisms and checks to ensure successful transport of entries Has a concept of agents, sinks and channels Agents collect and forward log data Sinks store it in final destination Channels store log data en-route Simple configuration through INI files Handled outside of ODI12c

21 GoldenGate for Continuous Streaming to Hadoop Oracle GoldenGate is also an option, for streaming RDBMS transactions to Hadoop Leverages GoldenGate & HDFS / Hive Java APIs Sample Implementations on MOS Doc.ID (HDFS) and (Hive) Likely to be formal part of GoldenGate in future release - but usable now Can also integrate with Flume for delivery to HDFS - see MOS Doc.ID

22 1 Load Incoming Log Files into Hive Table First step in process is to load the incoming log files into a Hive table Also need to parse the log entries to extract request, date, IP address etc columns Hive table can then easily be used in downstream transformations Use IKM File to Hive (LOAD DATA) KM Source can be local files or HDFS Either load file into Hive HDFS area, or leave as external Hive table Ability to use SerDe to parse file data

23 Using IKM File to Hive to Load Web Log File Data into Hive Create mapping to load file source (single column for weblog entries) into Hive table Target Hive table should have column for incoming log row, and parsed columns

24 Specifying a SerDe to Parse Incoming Hive Data SerDe (Serializer-Deserializer) interfaces give Hive the ability to process new file formats Distributed as JAR file, gives Hive ability to parse semi-structured formats We can use the RegEx SerDe to parse the Apache CombinedLogFormat file into columns Enabled through OVERRIDE_ROW_FORMAT IKM File to Hive (LOAD DATA) KM option

25 2 Join to Additional Hive Tables, Transform using HiveQL IKM Hive to Hive Control Append can be used to perform Hive table joins, filtering, agg. etc. INSERT only, no DELETE, UPDATE etc Not all ODI12c mapping operators supported, but basic functionality works OK Use this KM to join to other Hive tables, adding more details on post, title etc Perform DISTINCT on join output, load into summary Hive table

26 Joining Hive Tables Only equi-joins supported Must use ANSI syntax More complex joins may not produce valid HiveQL (subqueries etc)

27 Filtering, Aggregating and Transforming Within Hive Aggregate (GROUP BY), DISTINCT, FILTER, EXPRESSION, JOIN, SORT etc mapping operators can be added to mapping to manipulate data Generates HiveQL functions, clauses etc

28 3 Bring in Reference Data from Oracle Database In this third step, additional reference data from Oracle Database needs to be added In theory, should be able to add Oracle-sourced datastores to mapping and join as usual But Oracle / JDBC-generic LKMs don t get work with Hive

29 Options for Importing Oracle / RDBMS Data into Hadoop Could export RBDMS data to file, and load using IKM File to Hive Oracle Big Data Connectors only export to Oracle, not import to Hadoop Best option is to use Apache Sqoop, and new IKM SQL to Hive-HBase-File knowledge module Hadoop-native, automatically runs in parallel Uses native JDBC drivers, or OraOop (for example) Bi-directional in-and-out of Hadoop to RDBMS Run from OS command-line

30 Loading RDBMS Data into Hive using Sqoop First step is to stage Oracle data into equivalent Hive table Use special LKM SQL Multi-Connect Global load knowledge module for Oracle source Passes responsibility for load (extract) to following IKM Then use IKM SQL to Hive-HBase-File (Sqoop) to load the Hive table

31 Join Oracle-Sourced Hive Table to Existing Hive Table Oracle-sourced reference data in Hive can then be joined to existing Hive table as normal Filters, aggregation operators etc can be added to mapping if required Use IKM Hive Control Append as integration KM

32 New Option - Using Oracle Big Data SQL Oracle Big Data SQL provides ability for Exadata to reference Hive tables Use feature to create join in Oracle, bringing across Hive data through ORACLE_HIVE table

33 4 Using Hive Streaming and Python for Geocoding Data Another requirement we have is to geocode the webserver log entries Allows us to aggregate page views by country Based on the fact that IP ranges can usually be attributed to specific countries Not functionality normally found in Hive etc, but can be done with add-on APIs

34 How GeoIP Geocoding Works Uses free Geocoding API and database from Maxmind Convert IP address to an integer Find which integer range our IP address sits within But Hive can t use BETWEEN in a join

35 Solution : IKM Hive Transform IKM Hive Transform can pass the output of a Hive SELECT statement through a perl, python, shell etc script to transform content Uses Hive TRANSFORM USING AS functionality hive> add file file:///tmp/add_countries.py; Added resource: file:///tmp/add_countries.py hive> select transform (hostname,request_date,post_id,title,author,category) > using 'add_countries.py' > as (hostname,request_date,post_id,title,author,category,country) > from access_per_post_categories;

36 Creating the Python Script for Hive Streaming Solution requires a Python API to be installed on all Hadoop nodes, along with geocode DB! wget python! get-pip.py pip install pygeoip! Python script then parses incoming stdin lines using tab-separation of fields, outputs same (but with extra field for the country) #!/usr/bin/python import sys sys.path.append('/usr/lib/python2.6/site-packages/') import pygeoip gi = pygeoip.geoip('/tmp/geoip.dat') for line in sys.stdin: line = line.rstrip() hostname,request_date,post_id,title,author,category = line.split('\t') country = gi.country_name_by_addr(hostname) print hostname+'\t'+request_date+'\t'+post_id+'\t'+title+'\t'+author +'\t'+country+'\t'+category

37 Setting up the Mapping Map source Hive table to target, which includes column for extra country column!!!!!!! Copy script + GeoIP.dat file to every node s /tmp directory Ensure all Python APIs and libraries are installed on each Hadoop node

38 Configuring IKM Hive Transform TRANSFORM_SCRIPT_NAME specifies name of script, and path to script TRANSFORM_SCRIPT has issues with parsing; do not use, leave blank and KM will use existing one Optional ability to specify sort and distribution columns (can be compound) Leave other options at default

39 Executing the Mapping KM automatically registers the script with Hive (which caches it on all nodes) HiveQL output then runs the contents of the first Hive table through the script, outputting results to target table

40 5 Bulk Unload Summary Data to Oracle Database Final requirement is to unload final Hive table contents to Oracle Database Several use-cases for this: Use Hadoop / BDA for ETL offloading Use analysis capabilities of BDA, but then output results to RDBMS data mart or DW Permit use of more advanced SQL query tools Share results with other applications Can use Sqoop for this, or use Oracle Big Data Connectors Fast bulk unload, or transparent Oracle access to Hive

41 IKM File/Hive to Oracle (OLH/ODCH) KM for accessing HDFS/Hive data from Oracle Either sets up ODCH connectivity, or bulk-unloads via OLH Map from HDFS or Hive source to Oracle tables (via Oracle technology in Topology)

42 Configuring the KM Physical Settings For the access table in Physical view, change LKM to LKM SQL Multi-Connect Delegates the multi-connect capabilities to the downstream node, so you can use a multiconnect IKM such as IKM File/Hive to Oracle

43 Create Package to Sequence ETL Steps Define package (or load plan) within ODI12c to orchestrate the process Call package / load plan execution from command-line, web service call, or schedule

44 Execute Overall Package Each step executed in sequence End-to-end ETL process, using ODI12c s metadata-driven development process, data quality handing, heterogenous connectivity, but Hadoop-native processing

45 Conclusions Hadoop, and the Oracle Big Data Appliance, is an excellent platform for data capture, analysis and processing Hadoop tools such as Hive, Sqoop, MapReduce and Pig provide means to process and analyse data in parallel, using languages + approach familiar to Oracle developers ODI12c provides several benefits when working with ETL and data loading on Hadoop Metadata-driven design; data quality handling; KMs to handle technical complexity Oracle Data Integrator Adapter for Hadoop provides several KMs for Hadoop sources In this presentation, we ve seen an end-to-end example of big data ETL using ODI The power of Hadoop and BDA, with the ETL orchestration of ODI12c

46 Thank You for Attending! Thank you for attending this presentation, and more information can be found at Contact us at or Look out for our book, Oracle Business Intelligence Developers Guide out now! Follow-us on Twitter or Facebook (facebook.com/rittmanmead)

47 Deep-Dive into Big Data ETL with ODI12c and Oracle Big Data Connectors Mark Rittman, CTO, Rittman Mead Oracle Openworld 2014, San Francisco

Lesson 2 : Hadoop & NoSQL Data Loading using Hadoop Tools and ODI12c Mark Rittman, CTO, Rittman Mead Oracle Openworld 2014, San Francisco

Lesson 2 : Hadoop & NoSQL Data Loading using Hadoop Tools and ODI12c Mark Rittman, CTO, Rittman Mead Oracle Openworld 2014, San Francisco Lesson 2 : Hadoop & NoSQL Data Loading using Hadoop Tools and ODI12c Mark Rittman, CTO, Rittman Mead Oracle Openworld 2014, San Francisco Moving Data In, Around and Out of Hadoop Three stages to Hadoop

More information

Oracle Big Data Spatial & Graph Social Network Analysis - Case Study

Oracle Big Data Spatial & Graph Social Network Analysis - Case Study Oracle Big Data Spatial & Graph Social Network Analysis - Case Study Mark Rittman, CTO, Rittman Mead OTN EMEA Tour, May 2016 info@rittmanmead.com www.rittmanmead.com @rittmanmead About the Speaker Mark

More information

Lesson 3 : Hadoop Data Processing using ODI12c and CDH5 Mark Rittman, CTO, Rittman Mead Oracle Openworld 2014, San Francisco

Lesson 3 : Hadoop Data Processing using ODI12c and CDH5 Mark Rittman, CTO, Rittman Mead Oracle Openworld 2014, San Francisco Lesson 3 : Hadoop Data Processing using ODI12c and CDH5 Mark Rittman, CTO, Rittman Mead Oracle Openworld 2014, San Francisco Moving Data In, Around and Out of Hadoop Three stages to Hadoop data movement,

More information

Oracle Big Data Essentials

Oracle Big Data Essentials Oracle University Contact Us: Local: 1800 103 4775 Intl: +91 80 40291196 Oracle Big Data Essentials Duration: 3 Days What you will learn This Oracle Big Data Essentials training deep dives into using the

More information

Constructing a Data Lake: Hadoop and Oracle Database United!

Constructing a Data Lake: Hadoop and Oracle Database United! Constructing a Data Lake: Hadoop and Oracle Database United! Sharon Sophia Stephen Big Data PreSales Consultant February 21, 2015 Safe Harbor The following is intended to outline our general product direction.

More information

Oracle Data Integrator for Big Data. Alex Kotopoulis Senior Principal Product Manager

Oracle Data Integrator for Big Data. Alex Kotopoulis Senior Principal Product Manager Oracle Data Integrator for Big Data Alex Kotopoulis Senior Principal Product Manager Hands on Lab - Oracle Data Integrator for Big Data Abstract: This lab will highlight to Developers, DBAs and Architects

More information

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future

More information

Cloudera Certified Developer for Apache Hadoop

Cloudera Certified Developer for Apache Hadoop Cloudera CCD-333 Cloudera Certified Developer for Apache Hadoop Version: 5.6 QUESTION NO: 1 Cloudera CCD-333 Exam What is a SequenceFile? A. A SequenceFile contains a binary encoding of an arbitrary number

More information

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here> s Big Data solutions Roger Wullschleger DBTA Workshop on Big Data, Cloud Data Management and NoSQL 10. October 2012, Stade de Suisse, Berne 1 The following is intended to outline

More information

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview Programming Hadoop 5-day, instructor-led BD-106 MapReduce Overview The Client Server Processing Pattern Distributed Computing Challenges MapReduce Defined Google's MapReduce The Map Phase of MapReduce

More information

Hadoop Ecosystem B Y R A H I M A.

Hadoop Ecosystem B Y R A H I M A. Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open

More information

How To Use A Data Center With A Data Farm On A Microsoft Server On A Linux Server On An Ipad Or Ipad (Ortero) On A Cheap Computer (Orropera) On An Uniden (Orran)

How To Use A Data Center With A Data Farm On A Microsoft Server On A Linux Server On An Ipad Or Ipad (Ortero) On A Cheap Computer (Orropera) On An Uniden (Orran) Day with Development Master Class Big Data Management System DW & Big Data Global Leaders Program Jean-Pierre Dijcks Big Data Product Management Server Technologies Part 1 Part 2 Foundation and Architecture

More information

GoldenGate and ODI - A Perfect Match for Real-Time Data Warehousing

GoldenGate and ODI - A Perfect Match for Real-Time Data Warehousing GoldenGate and ODI - A Perfect Match for Real-Time Data Warehousing Michael Rainey, Principal Consultant, Rittman Mead RMOUG Training Days, February 2013 About me... Michael Rainey, Principal Consultant,

More information

Hadoop (BDA) and Oracle Technologies on BI Projects Mark Rittman, CTO, Rittman Mead Dutch Oracle Users Group, Jan 14th 2015

Hadoop (BDA) and Oracle Technologies on BI Projects Mark Rittman, CTO, Rittman Mead Dutch Oracle Users Group, Jan 14th 2015 Hadoop (BDA) and Oracle Technologies on BI Projects Mark Rittman, CTO, Rittman Mead Dutch Oracle Users Group, Jan 14th 2015 About the Speaker Mark Rittman, Co-Founder of Rittman Mead Oracle ACE Director,

More information

Hadoop and Map-Reduce. Swati Gore

Hadoop and Map-Reduce. Swati Gore Hadoop and Map-Reduce Swati Gore Contents Why Hadoop? Hadoop Overview Hadoop Architecture Working Description Fault Tolerance Limitations Why Map-Reduce not MPI Distributed sort Why Hadoop? Existing Data

More information

Big Data in a Relational World Presented by: Kerry Osborne JPMorgan Chase December, 2012

Big Data in a Relational World Presented by: Kerry Osborne JPMorgan Chase December, 2012 Big Data in a Relational World Presented by: Kerry Osborne JPMorgan Chase December, 2012 whoami Never Worked for Oracle Worked with Oracle DB Since 1982 (V2) Working with Exadata since early 2010 Work

More information

ITG Software Engineering

ITG Software Engineering Introduction to Apache Hadoop Course ID: Page 1 Last Updated 12/15/2014 Introduction to Apache Hadoop Course Overview: This 5 day course introduces the student to the Hadoop architecture, file system,

More information

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture. Big Data Hadoop Administration and Developer Course This course is designed to understand and implement the concepts of Big data and Hadoop. This will cover right from setting up Hadoop environment in

More information

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani Technical Architect - Big Data Syntel Agenda Welcome to the Zoo! Evolution Timeline Traditional BI/DW Architecture Where Hadoop Fits In 2 Welcome to

More information

Connecting Hadoop with Oracle Database

Connecting Hadoop with Oracle Database Connecting Hadoop with Oracle Database Sharon Stephen Senior Curriculum Developer Server Technologies Curriculum The following is intended to outline our general product direction.

More information

Internals of Hadoop Application Framework and Distributed File System

Internals of Hadoop Application Framework and Distributed File System International Journal of Scientific and Research Publications, Volume 5, Issue 7, July 2015 1 Internals of Hadoop Application Framework and Distributed File System Saminath.V, Sangeetha.M.S Abstract- Hadoop

More information

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give

More information

Big Data and Advanced Analytics Applications and Capabilities Steven Hagan, Vice President, Server Technologies

Big Data and Advanced Analytics Applications and Capabilities Steven Hagan, Vice President, Server Technologies Big Data and Advanced Analytics Applications and Capabilities Steven Hagan, Vice President, Server Technologies 1 Copyright 2011, Oracle and/or its affiliates. All rights Big Data, Advanced Analytics:

More information

Implement Hadoop jobs to extract business value from large and varied data sets

Implement Hadoop jobs to extract business value from large and varied data sets Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to

More information

Hadoop Meets Exadata. Presented by: Kerry Osborne. DW Global Leaders Program Decemeber, 2012

Hadoop Meets Exadata. Presented by: Kerry Osborne. DW Global Leaders Program Decemeber, 2012 Hi Hadoop Meets Exadata Presented by: Kerry Osborne DW Global Leaders Program Decemeber, 2012 whoami Never Worked for Oracle Worked with Oracle DB Since 1982 (V2) Working with Exadata since early 2010

More information

MySQL and Hadoop Big Data Integration

MySQL and Hadoop Big Data Integration MySQL and Hadoop Big Data Integration Unlocking New Insight A MySQL White Paper December 2012 Table of Contents Introduction... 3 The Lifecycle of Big Data... 4 MySQL in the Big Data Lifecycle... 4 Acquire:

More information

Big Data Course Highlights

Big Data Course Highlights Big Data Course Highlights The Big Data course will start with the basics of Linux which are required to get started with Big Data and then slowly progress from some of the basics of Hadoop/Big Data (like

More information

Hadoop Job Oriented Training Agenda

Hadoop Job Oriented Training Agenda 1 Hadoop Job Oriented Training Agenda Kapil CK hdpguru@gmail.com Module 1 M o d u l e 1 Understanding Hadoop This module covers an overview of big data, Hadoop, and the Hortonworks Data Platform. 1.1 Module

More information

Executive Summary... 2 Introduction... 3. Defining Big Data... 3. The Importance of Big Data... 4 Building a Big Data Platform...

Executive Summary... 2 Introduction... 3. Defining Big Data... 3. The Importance of Big Data... 4 Building a Big Data Platform... Executive Summary... 2 Introduction... 3 Defining Big Data... 3 The Importance of Big Data... 4 Building a Big Data Platform... 5 Infrastructure Requirements... 5 Solution Spectrum... 6 Oracle s Big Data

More information

Oracle Big Data SQL Technical Update

Oracle Big Data SQL Technical Update Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical

More information

The Hadoop Eco System Shanghai Data Science Meetup

The Hadoop Eco System Shanghai Data Science Meetup The Hadoop Eco System Shanghai Data Science Meetup Karthik Rajasethupathy, Christian Kuka 03.11.2015 @Agora Space Overview What is this talk about? Giving an overview of the Hadoop Ecosystem and related

More information

Safe Harbor Statement

Safe Harbor Statement Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment

More information

OWB Users, Enter The New ODI World

OWB Users, Enter The New ODI World OWB Users, Enter The New ODI World Kulvinder Hari Oracle Introduction Oracle Data Integrator (ODI) is a best-of-breed data integration platform focused on fast bulk data movement and handling complex data

More information

Qsoft Inc www.qsoft-inc.com

Qsoft Inc www.qsoft-inc.com Big Data & Hadoop Qsoft Inc www.qsoft-inc.com Course Topics 1 2 3 4 5 6 Week 1: Introduction to Big Data, Hadoop Architecture and HDFS Week 2: Setting up Hadoop Cluster Week 3: MapReduce Part 1 Week 4:

More information

Big Data Open Source Stack vs. Traditional Stack for BI and Analytics

Big Data Open Source Stack vs. Traditional Stack for BI and Analytics Big Data Open Source Stack vs. Traditional Stack for BI and Analytics Part I By Sam Poozhikala, Vice President Customer Solutions at StratApps Inc. 4/4/2014 You may contact Sam Poozhikala at spoozhikala@stratapps.com.

More information

Workshop on Hadoop with Big Data

Workshop on Hadoop with Big Data Workshop on Hadoop with Big Data Hadoop? Apache Hadoop is an open source framework for distributed storage and processing of large sets of data on commodity hardware. Hadoop enables businesses to quickly

More information

Integrate Master Data with Big Data using Oracle Table Access for Hadoop

Integrate Master Data with Big Data using Oracle Table Access for Hadoop Integrate Master Data with Big Data using Oracle Table Access for Hadoop Kuassi Mensah Oracle Corporation Redwood Shores, CA, USA Keywords: Hadoop, BigData, Hive SQL, Spark SQL, HCatalog, StorageHandler

More information

Oracle Big Data Fundamentals Ed 1 NEW

Oracle Big Data Fundamentals Ed 1 NEW Oracle University Contact Us: +90 212 329 6779 Oracle Big Data Fundamentals Ed 1 NEW Duration: 5 Days What you will learn In the Oracle Big Data Fundamentals course, learn to use Oracle's Integrated Big

More information

Big Data Introduction

Big Data Introduction Big Data Introduction Ralf Lange Global ISV & OEM Sales 1 Copyright 2012, Oracle and/or its affiliates. All rights Conventional infrastructure 2 Copyright 2012, Oracle and/or its affiliates. All rights

More information

Hadoop IST 734 SS CHUNG

Hadoop IST 734 SS CHUNG Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to

More information

Dominik Wagenknecht Accenture

Dominik Wagenknecht Accenture Dominik Wagenknecht Accenture Improving Mainframe Performance with Hadoop October 17, 2014 Organizers General Partner Top Media Partner Media Partner Supporters About me Dominik Wagenknecht Accenture Vienna

More information

An Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database

An Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database An Oracle White Paper June 2012 High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database Executive Overview... 1 Introduction... 1 Oracle Loader for Hadoop... 2 Oracle Direct

More information

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics In Organizations Mark Vervuurt Cluster Data Science & Analytics AGENDA 1. Yellow Elephant 2. Data Ingestion & Complex Event Processing 3. SQL on Hadoop 4. NoSQL 5. InMemory 6. Data Science & Machine Learning

More information

HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM

HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM 1. Introduction 1.1 Big Data Introduction What is Big Data Data Analytics Bigdata Challenges Technologies supported by big data 1.2 Hadoop Introduction

More information

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web

More information

ORACLE DATA INTEGRATOR ENTERPRISE EDITION

ORACLE DATA INTEGRATOR ENTERPRISE EDITION ORACLE DATA INTEGRATOR ENTERPRISE EDITION Oracle Data Integrator Enterprise Edition 12c delivers high-performance data movement and transformation among enterprise platforms with its open and integrated

More information

International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February-2014 10 ISSN 2278-7763

International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February-2014 10 ISSN 2278-7763 International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February-2014 10 A Discussion on Testing Hadoop Applications Sevuga Perumal Chidambaram ABSTRACT The purpose of analysing

More information

OLH: Oracle Loader for Hadoop OSCH: Oracle SQL Connector for Hadoop Distributed File System (HDFS)

OLH: Oracle Loader for Hadoop OSCH: Oracle SQL Connector for Hadoop Distributed File System (HDFS) Use Data from a Hadoop Cluster with Oracle Database Hands-On Lab Lab Structure Acronyms: OLH: Oracle Loader for Hadoop OSCH: Oracle SQL Connector for Hadoop Distributed File System (HDFS) All files are

More information

HADOOP. Revised 10/19/2015

HADOOP. Revised 10/19/2015 HADOOP Revised 10/19/2015 This Page Intentionally Left Blank Table of Contents Hortonworks HDP Developer: Java... 1 Hortonworks HDP Developer: Apache Pig and Hive... 2 Hortonworks HDP Developer: Windows...

More information

Sentimental Analysis using Hadoop Phase 2: Week 2

Sentimental Analysis using Hadoop Phase 2: Week 2 Sentimental Analysis using Hadoop Phase 2: Week 2 MARKET / INDUSTRY, FUTURE SCOPE BY ANKUR UPRIT The key value type basically, uses a hash table in which there exists a unique key and a pointer to a particular

More information

Native Connectivity to Big Data Sources in MSTR 10

Native Connectivity to Big Data Sources in MSTR 10 Native Connectivity to Big Data Sources in MSTR 10 Bring All Relevant Data to Decision Makers Support for More Big Data Sources Optimized Access to Your Entire Big Data Ecosystem as If It Were a Single

More information

Data processing goes big

Data processing goes big Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,

More information

An Oracle White Paper February 2014. Oracle Data Integrator Performance Guide

An Oracle White Paper February 2014. Oracle Data Integrator Performance Guide An Oracle White Paper February 2014 Oracle Data Integrator Performance Guide Executive Overview... 2 INTRODUCTION... 3 UNDERSTANDING E-LT... 3 ORACLE DATA INTEGRATOR ARCHITECTURE AT RUN-TIME... 4 Sources,

More information

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE AGENDA Introduction to Big Data Introduction to Hadoop HDFS file system Map/Reduce framework Hadoop utilities Summary BIG DATA FACTS In what timeframe

More information

A Scalable Data Transformation Framework using the Hadoop Ecosystem

A Scalable Data Transformation Framework using the Hadoop Ecosystem A Scalable Data Transformation Framework using the Hadoop Ecosystem Raj Nair Director Data Platform Kiru Pakkirisamy CTO AGENDA About Penton and Serendio Inc Data Processing at Penton PoC Use Case Functional

More information

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84 Index A Amazon Web Services (AWS), 50, 58 Analytics engine, 21 22 Apache Kafka, 38, 131 Apache S4, 38, 131 Apache Sqoop, 37, 131 Appliance pattern, 104 105 Application architecture, big data analytics

More information

OBIEE 11g Data Modeling Best Practices

OBIEE 11g Data Modeling Best Practices OBIEE 11g Data Modeling Best Practices Mark Rittman, Director, Rittman Mead Oracle Open World 2010, San Francisco, September 2010 Introductions Mark Rittman, Co-Founder of Rittman Mead Oracle ACE Director,

More information

Big Data: Tools and Technologies in Big Data

Big Data: Tools and Technologies in Big Data Big Data: Tools and Technologies in Big Data Jaskaran Singh Student Lovely Professional University, Punjab Varun Singla Assistant Professor Lovely Professional University, Punjab ABSTRACT Big data can

More information

Hadoop Introduction. Olivier Renault Solution Engineer - Hortonworks

Hadoop Introduction. Olivier Renault Solution Engineer - Hortonworks Hadoop Introduction Olivier Renault Solution Engineer - Hortonworks Hortonworks A Brief History of Apache Hadoop Apache Project Established Yahoo! begins to Operate at scale Hortonworks Data Platform 2013

More information

Using RDBMS, NoSQL or Hadoop?

Using RDBMS, NoSQL or Hadoop? Using RDBMS, NoSQL or Hadoop? DOAG Conference 2015 Jean- Pierre Dijcks Big Data Product Management Server Technologies Copyright 2014 Oracle and/or its affiliates. All rights reserved. Data Ingest 2 Ingest

More information

Spring,2015. Apache Hive BY NATIA MAMAIASHVILI, LASHA AMASHUKELI & ALEKO CHAKHVASHVILI SUPERVAIZOR: PROF. NODAR MOMTSELIDZE

Spring,2015. Apache Hive BY NATIA MAMAIASHVILI, LASHA AMASHUKELI & ALEKO CHAKHVASHVILI SUPERVAIZOR: PROF. NODAR MOMTSELIDZE Spring,2015 Apache Hive BY NATIA MAMAIASHVILI, LASHA AMASHUKELI & ALEKO CHAKHVASHVILI SUPERVAIZOR: PROF. NODAR MOMTSELIDZE Contents: Briefly About Big Data Management What is hive? Hive Architecture Working

More information

What s New with Oracle BI, Analytics and DW

What s New with Oracle BI, Analytics and DW What s New with Oracle BI, Analytics and DW Mark Rittman, CTO, Rittman Mead India Masterclass Tour 2013 About the Speaker Mark Rittman, Co-Founder of Rittman Mead Oracle ACE Director, specialising in Oracle

More information

MySQL and Hadoop: Big Data Integration. Shubhangi Garg & Neha Kumari MySQL Engineering

MySQL and Hadoop: Big Data Integration. Shubhangi Garg & Neha Kumari MySQL Engineering MySQL and Hadoop: Big Data Integration Shubhangi Garg & Neha Kumari MySQL Engineering 1Copyright 2013, Oracle and/or its affiliates. All rights reserved. Agenda Design rationale Implementation Installation

More information

CitusDB Architecture for Real-Time Big Data

CitusDB Architecture for Real-Time Big Data CitusDB Architecture for Real-Time Big Data CitusDB Highlights Empowers real-time Big Data using PostgreSQL Scales out PostgreSQL to support up to hundreds of terabytes of data Fast parallel processing

More information

Architecting for the Internet of Things & Big Data

Architecting for the Internet of Things & Big Data Architecting for the Internet of Things & Big Data Robert Stackowiak, Oracle North America, VP Information Architecture & Big Data September 29, 2014 Safe Harbor Statement The following is intended to

More information

The Future of Data Management

The Future of Data Management The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class

More information

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances INSIGHT Oracle's All- Out Assault on the Big Data Market: Offering Hadoop, R, Cubes, and Scalable IMDB in Familiar Packages Carl W. Olofson IDC OPINION Global Headquarters: 5 Speen Street Framingham, MA

More information

Hadoop Submitted in partial fulfillment of the requirement for the award of degree of Bachelor of Technology in Computer Science

Hadoop Submitted in partial fulfillment of the requirement for the award of degree of Bachelor of Technology in Computer Science A Seminar report On Hadoop Submitted in partial fulfillment of the requirement for the award of degree of Bachelor of Technology in Computer Science SUBMITTED TO: www.studymafia.org SUBMITTED BY: www.studymafia.org

More information

Business Intelligence for Big Data

Business Intelligence for Big Data Business Intelligence for Big Data Will Gorman, Vice President, Engineering May, 2011 2010, Pentaho. All Rights Reserved. www.pentaho.com. What is BI? Business Intelligence = reports, dashboards, analysis,

More information

Lofan Abrams Data Services for Big Data Session # 2987

Lofan Abrams Data Services for Big Data Session # 2987 Lofan Abrams Data Services for Big Data Session # 2987 Big Data Are you ready for blast-off? Big Data, for better or worse: 90% of world s data generated over last two years. ScienceDaily, ScienceDaily

More information

<Insert Picture Here> Big Data

<Insert Picture Here> Big Data Big Data Kevin Kalmbach Principal Sales Consultant, Public Sector Engineered Systems Program Agenda What is Big Data and why it is important? What is your Big

More information

Building Scalable Big Data Infrastructure Using Open Source Software. Sam William sampd@stumbleupon.

Building Scalable Big Data Infrastructure Using Open Source Software. Sam William sampd@stumbleupon. Building Scalable Big Data Infrastructure Using Open Source Software Sam William sampd@stumbleupon. What is StumbleUpon? Help users find content they did not expect to find The best way to discover new

More information

EMC Federation Big Data Solutions. Copyright 2015 EMC Corporation. All rights reserved.

EMC Federation Big Data Solutions. Copyright 2015 EMC Corporation. All rights reserved. EMC Federation Big Data Solutions 1 Introduction to data analytics Federation offering 2 Traditional Analytics! Traditional type of data analysis, sometimes called Business Intelligence! Type of analytics

More information

Integrating VoltDB with Hadoop

Integrating VoltDB with Hadoop The NewSQL database you ll never outgrow Integrating with Hadoop Hadoop is an open source framework for managing and manipulating massive volumes of data. is an database for handling high velocity data.

More information

Big Data Analytics - Accelerated. stream-horizon.com

Big Data Analytics - Accelerated. stream-horizon.com Big Data Analytics - Accelerated stream-horizon.com Legacy ETL platforms & conventional Data Integration approach Unable to meet latency & data throughput demands of Big Data integration challenges Based

More information

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Summary Xiangzhe Li Nowadays, there are more and more data everyday about everything. For instance, here are some of the astonishing

More information

BIG DATA HANDS-ON WORKSHOP Data Manipulation with Hive and Pig

BIG DATA HANDS-ON WORKSHOP Data Manipulation with Hive and Pig BIG DATA HANDS-ON WORKSHOP Data Manipulation with Hive and Pig Contents Acknowledgements... 1 Introduction to Hive and Pig... 2 Setup... 2 Exercise 1 Load Avro data into HDFS... 2 Exercise 2 Define an

More information

Big Data on Microsoft Platform

Big Data on Microsoft Platform Big Data on Microsoft Platform Prepared by GJ Srinivas Corporate TEG - Microsoft Page 1 Contents 1. What is Big Data?...3 2. Characteristics of Big Data...3 3. Enter Hadoop...3 4. Microsoft Big Data Solutions...4

More information

COURSE CONTENT Big Data and Hadoop Training

COURSE CONTENT Big Data and Hadoop Training COURSE CONTENT Big Data and Hadoop Training 1. Meet Hadoop Data! Data Storage and Analysis Comparison with Other Systems RDBMS Grid Computing Volunteer Computing A Brief History of Hadoop Apache Hadoop

More information

TRAINING PROGRAM ON BIGDATA/HADOOP

TRAINING PROGRAM ON BIGDATA/HADOOP Course: Training on Bigdata/Hadoop with Hands-on Course Duration / Dates / Time: 4 Days / 24th - 27th June 2015 / 9:30-17:30 Hrs Venue: Eagle Photonics Pvt Ltd First Floor, Plot No 31, Sector 19C, Vashi,

More information

Oracle Data Integrator 12c New Features Overview Advancing Big Data Integration O R A C L E W H I T E P A P E R M A R C H 2 0 1 5

Oracle Data Integrator 12c New Features Overview Advancing Big Data Integration O R A C L E W H I T E P A P E R M A R C H 2 0 1 5 Oracle Data Integrator 12c New Features Overview Advancing Big Data Integration O R A C L E W H I T E P A P E R M A R C H 2 0 1 5 Table of Contents Executive Overview 1 Oracle Data Integrator 12.1.3.0.1

More information

Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam sastry.vedantam@oracle.com

Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam sastry.vedantam@oracle.com Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam sastry.vedantam@oracle.com Agenda The rise of Big Data & Hadoop MySQL in the Big Data Lifecycle MySQL Solutions for Big Data Q&A

More information

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica

More information

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop Lecture 32 Big Data 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop 1 2 Big Data Problems Data explosion Data from users on social

More information

Practical Hadoop by Example

Practical Hadoop by Example Practical Hadoop by Example for relational database professioanals Alex Gorbachev 12-Mar-2013 New York, NY Alex Gorbachev Chief Technology Officer at Pythian Blogger OakTable Network member Oracle ACE

More information

An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics An Oracle White Paper November 2010 Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics 1 Introduction New applications such as web searches, recommendation engines,

More information

#mstrworld. Tapping into Hadoop and NoSQL Data Sources in MicroStrategy. Presented by: Trishla Maru. #mstrworld

#mstrworld. Tapping into Hadoop and NoSQL Data Sources in MicroStrategy. Presented by: Trishla Maru. #mstrworld Tapping into Hadoop and NoSQL Data Sources in MicroStrategy Presented by: Trishla Maru Agenda Big Data Overview All About Hadoop What is Hadoop? How does MicroStrategy connects to Hadoop? Customer Case

More information

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION Syed Rasheed Solution Manager Red Hat Corp. Kenny Peeples Technical Manager Red Hat Corp. Kimberly Palko Product Manager Red Hat Corp.

More information

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP Big-Data and Hadoop Developer Training with Oracle WDP What is this course about? Big Data is a collection of large and complex data sets that cannot be processed using regular database management tools

More information

Data Discovery and Systems Diagnostics with the ELK stack. Rittman Mead - BI Forum 2015, Brighton. Robin Moffatt, Principal Consultant Rittman Mead

Data Discovery and Systems Diagnostics with the ELK stack. Rittman Mead - BI Forum 2015, Brighton. Robin Moffatt, Principal Consultant Rittman Mead Data Discovery and Systems Diagnostics with the ELK stack Rittman Mead - BI Forum 2015, Brighton Robin Moffatt, Principal Consultant Rittman Mead T : +44 (0) 1273 911 268 (UK) About Me Principal Consultant

More information

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015 Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL May 2015 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document

More information

Oracle Database 12c Plug In. Switch On. Get SMART.

Oracle Database 12c Plug In. Switch On. Get SMART. Oracle Database 12c Plug In. Switch On. Get SMART. Duncan Harvey Head of Core Technology, Oracle EMEA March 2015 Safe Harbor Statement The following is intended to outline our general product direction.

More information

Real Time Big Data Processing

Real Time Big Data Processing Real Time Big Data Processing Cloud Expo 2014 Ian Meyers Amazon Web Services Global Infrastructure Deployment & Administration App Services Analytics Compute Storage Database Networking AWS Global Infrastructure

More information

Hadoop for MySQL DBAs. Copyright 2011 Cloudera. All rights reserved. Not to be reproduced without prior written consent.

Hadoop for MySQL DBAs. Copyright 2011 Cloudera. All rights reserved. Not to be reproduced without prior written consent. Hadoop for MySQL DBAs + 1 About me Sarah Sproehnle, Director of Educational Services @ Cloudera Spent 5 years at MySQL At Cloudera for the past 2 years sarah@cloudera.com 2 What is Hadoop? An open-source

More information

Big Data With Hadoop

Big Data With Hadoop With Saurabh Singh singh.903@osu.edu The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials

More information

Putting Apache Kafka to Use!

Putting Apache Kafka to Use! Putting Apache Kafka to Use! Building a Real-time Data Platform for Event Streams! JAY KREPS, CONFLUENT! A Couple of Themes! Theme 1: Rise of Events! Theme 2: Immutability Everywhere! Level! Example! Immutable

More information

An Oracle White Paper June 2013. Oracle: Big Data for the Enterprise

An Oracle White Paper June 2013. Oracle: Big Data for the Enterprise An Oracle White Paper June 2013 Oracle: Big Data for the Enterprise Executive Summary... 2 Introduction... 3 Defining Big Data... 3 The Importance of Big Data... 4 Building a Big Data Platform... 5 Infrastructure

More information

IBM BigInsights Has Potential If It Lives Up To Its Promise. InfoSphere BigInsights A Closer Look

IBM BigInsights Has Potential If It Lives Up To Its Promise. InfoSphere BigInsights A Closer Look IBM BigInsights Has Potential If It Lives Up To Its Promise By Prakash Sukumar, Principal Consultant at iolap, Inc. IBM released Hadoop-based InfoSphere BigInsights in May 2013. There are already Hadoop-based

More information

Hadoop 101. Lars George. NoSQL- Ma4ers, Cologne April 26, 2013

Hadoop 101. Lars George. NoSQL- Ma4ers, Cologne April 26, 2013 Hadoop 101 Lars George NoSQL- Ma4ers, Cologne April 26, 2013 1 What s Ahead? Overview of Apache Hadoop (and related tools) What it is Why it s relevant How it works No prior experience needed Feel free

More information

Integrating Big Data into the Computing Curricula

Integrating Big Data into the Computing Curricula Integrating Big Data into the Computing Curricula Yasin Silva, Suzanne Dietrich, Jason Reed, Lisa Tsosie Arizona State University http://www.public.asu.edu/~ynsilva/ibigdata/ 1 Overview Motivation Big

More information