Hadoop (BDA) and Oracle Technologies on BI Projects Mark Rittman, CTO, Rittman Mead Dutch Oracle Users Group, Jan 14th 2015

Size: px
Start display at page:

Download "Hadoop (BDA) and Oracle Technologies on BI Projects Mark Rittman, CTO, Rittman Mead Dutch Oracle Users Group, Jan 14th 2015"

Transcription

1 Hadoop (BDA) and Oracle Technologies on BI Projects Mark Rittman, CTO, Rittman Mead Dutch Oracle Users Group, Jan 14th 2015

2 About the Speaker Mark Rittman, Co-Founder of Rittman Mead Oracle ACE Director, specialising in Oracle BI&DW 14 Years Experience with Oracle Technology Regular columnist for Oracle Magazine Author of two Oracle Press Oracle BI books Oracle Business Intelligence Developers Guide Oracle Exalytics Revealed Writer for Rittman Mead Blog : mark.rittman@rittmanmead.com Twitter

3 About Rittman Mead Oracle BI and DW Gold partner Winner of five UKOUG Partner of the Year awards in including BI World leading specialist partner for technical excellence, solutions delivery and innovation in Oracle BI Approximately 80 consultants worldwide All expert in Oracle BI and DW Offices in US (Atlanta), Europe, Australia and India Skills in broad range of supporting Oracle tools: OBIEE, OBIA ODIEE Essbase, Oracle OLAP GoldenGate Endeca

4 Agenda Part 1 : The Hadoop (BDA) technical stack for Oracle BI/DW projects Why are Oracle BI/DW customers adopting Hadoop (BDA) technologies? What are the Oracle and Cloudera products being used? New Oracle products on the roadmap - Big Data Discovery, Big Data SQL futures Where does OBIEE, ODI etc fit in with these new products Rittman Mead s development platform Part 2 : Rittman Mead Hadoop (BDA) + Oracle BI Project Experiences What is Cloudera CDH, and the BDA, like to work with? How do we approach projects and PoCs? What architecture and approach do we actually take, now? How well do OBIEE and ODI work with Hadoop and BDA? What are the emerging techs, products and architectures we see for 2015+?

5 Part 1 : The Hadoop (BDA) technical stack for Oracle BI/DW projects or How did we get here?

6 15+ Years in Oracle BI and Data Warehousing Started back in 1997 on a bank Oracle DW project Our tools were Oracle 7.3.4, SQL*Plus, PL/SQL and shell scripts Went on to use Oracle Developer/2000 and Designer/2000 Our initial users queried the DW using SQL*Plus And later on, we rolled-out Discoverer/2000 to everyone else And life was fun

7 The Oracle-Centric DW Architecture Over time, this data warehouse architecture developed Added Oracle Warehouse Builder to automate and model the DW build Oracle 9i Application Server (yay!) to deliver reports and web portals Data Mining and OLAP in the database Oracle 9i for in-database ETL (and RAC) Data was typically loaded from Oracle RBDMS and EBS It was turtles Oracle all the way down

8 The State of the Art for BI & DW Was This.. Oracle Discoverer Drake - Combining Relational and OLAP Analysis for Oracle RDBMS Oracle Portal, part of Oracle 9iAS Oracle Warehouse Builder 9iAS / Paris

9 Then Came Siebel Analytics and OBIEE

10 The Oracle BI & DW World Changed Siebel Analytics replaced Oracle DIscoverer Oracle Data Integrator replaced Oracle Warehouse Builder Hyperion Essbase Replaced Oracle OLAP You were as likely to be loading from SQL Server as from Oracle They made us do things we didn t like to do Add a mid-tier virtual DW engine on top of the database Export data out of Oracle into an OLAP server Improve query performance using tools outside of the Oracle data warehouse It was all a bit scary Not to mention that WebLogic stuff

11 Introducing - The Oracle Reference DW Architecture Recognizing the difference between long-term storage of DW data (the foundation layer) And organizing the data for queries and easy navigation (the access + performance layer ) Also recognising where OBIEE had been game-changing - federated queries Things are good again

12 and now this happened

13

14 Today s Oracle Information Management Architecture Actionable Events Actionable Insights Actionable Information Structured Enterprise Data Input Events Event Engine Data Reservoir Data Factory Enterprise Information Store Reporting Other Data Execution Innovation Events & Data Discovery Lab Discovery Output

15 Virtualization & Query Federation Today s Layered Data Warehouse Architecture Data Sources Enterprise Performance Management Data Ingestion Access & Performance Layer Data Engines & Poly-structured sources Structured Data Sources Operational Data COTS Data Master & Ref. Data Streaming & BAM Foundation Data Layer Past, current and future interpretation of enterprise data. Structured to support agile access & navigation Immutable modelled data. Business Process Neutral form. Abstracted from business process changes Pre-built & Ad-hoc BI Assets Content Docs SMS Web & Social Media Raw Data Reservoir Immutable raw data reservoir Raw data at rest is not interpreted Information Interpretation Information Services Discovery Lab Sandboxes Project based data stores to support specific discovery objectives Rapid Development Sandboxes Project based data stored to facilitate rapid content / presentation delivery Data Science

16 The Oracle Data Warehousing Platform

17 Introducing The Data Reservoir? A reservoir is a lake than also can process and refine (your data) Wide-ranging source of low-density, lower-value data to complement the DW

18 Oracle s Big Data Products Oracle Big Data Appliance Optimized hardware for Hadoop processing Cloudera Distribution incl. Hadoop Oracle Big Data Connectors, ODI etc Oracle Big Data Connectors Oracle Big Data SQL Oracle NoSQL Database Oracle Data Integrator Oracle R Distribution OBIEE, BI Publisher and Endeca Info Discovery

19 Just Released - Oracle Big Data SQL Part of Oracle Big Data 4.0 (BDA-only) Also requires Oracle Database 12c, Oracle Exadata Database Machine More on this later SQL Queries Exadata Database Server SmartScan SmartScan Exadata Storage Servers Hadoop Cluster Oracle Big Data SQL

20 Coming Soon : Oracle Big Data Discovery Combining of Endeca Server search, analysis and visualisation capabilities with Apache Spark data munging and transformation Analyse, parse, explore and wrangle data using graphical tools and a Spark-based transformation engine Create a catalog of the data on your Hadoop cluster, then search that catalog using Endeca Server Create recommendations of other datasets, based on what you re looking at now Visualize your datasets, discover new insights

21 Coming Soon : Oracle Data Enrichment Cloud Service Cloud-based service for loading, enriching, cleansing and supplementing Hadoop data Part of the Oracle Data Integration product family Used up-stream from Big Data Discovery Aims to solve the data quality problem for Hadoop

22 Combining Oracle RDBMS with Hadoop + NoSQL High-value, high-density data goes into Oracle RDBMS Better support for fast queries, summaries, referential integrity etc Lower-value, lower-density data goes into Hadoop + NoSQL Also provides flexible schema, more agile development Successful next-generation BI+DW projects combine both - neither on their own is sufficient

23 Productising the Next-Generation IM Architecture

24 Still a Key Role for Data Integration, and BI Tools Fast, scaleable low-cost / flexible-schema data capture using Hadoop + NoSQL (BDA) Long-term storage of the most important downstream data - Oracle RBDMS (Exadata) Fast analysis + business-friendly interface : OBIEE, Endeca (Exalytics), RTD etc

25 OBIEE for Enterprise Analysis Across all Data Sources Dashboards, analyses, OLAP analytics, scorecards, published reporting, mobile Presented as an integrated business semantic model Optional mid-tier query acceleration using Oracle Exalytics In-Memory Machine Access data from RBDMS, applications, Hadoop, OLAP, ADF BCs etc Business Presentation Layer (Reports, Dashboards) Enterprise Semantic Business Model In-Memory Caching Layer Application Sources Hadoop / NoSQL Sources DW / OLAP Sources

26 Bringing it All Together : Oracle Data Integrator 12c ODI provides an excellent framework for running Hadoop ETL jobs ELT approach pushes transformations down to Hadoop - leveraging power of cluster Hive, HBase, Sqoop and OLH/ODCH KMs provide native Hadoop loading / transformation Whilst still preserving RDBMS push-down Extensible to cover Pig, Spark etc Process orchestration Data quality / error handling Metadata and model-driven

27 Oracle s Product Strategy

28 Rittman Mead Hadoop (BDA) + Oracle BI Project Experiences Working with (Cloudera) Hadoop, + Hive, NoSQL, etc Working with the Oracle Big Data Appliance Typical Hadoop + BI Use-Cases How Rittman Mead approaches Hadoop + Oracle BI projects Hadoop things that keep the CIO awake at night ODI and Hadoop OBIEE and Hadoop Oracle Big Data SQL Futures - Apache Spark, Next-Generation Hive, Big Data Discovery

29 Why is Hadoop of Interest to Us? Gives us an ability to store more data, at more detail, for longer Provides a cost-effective way to analyse vast amounts of data Hadoop & NoSQL technologies can give us schema-on-read capabilities There s vast amounts of innovation in this area we can harness And it s very complementary to Oracle BI & DW

30 Oracle & Hadoop Use-Cases Use Hadoop as a low-cost, horizontally-scalable DW archive Use Hadoop, Hive and MapReduce for low-cost ETL staging Support standalone-hadoop analysis with Oracle reference data Extend the DW with new data sources, datatypes, detail-level data

31 The Killer, Tech-Focused Use Case : Data Reservoir A reservoir is a lake than also can process and refine (your data) Wide-ranging source of low-density, lower-value data to complement the DW

32 Typical Business Use Case : 360 Degree View of Cust / Process OLTP transactional tells us what happened (in the past), but not why Common customer requirement now is to get a 360 degree view of their activity Understand what s being said about them External drivers for interest, activity Understand more about customer intent, opinions One example is to add details of social media mentions, likes, tweets and retweets etc to the transactional dataset Correlate twitter activity with sales increases, drops Measure impact of social media strategy Gather and include textual, sentiment, contextual data from surveys, media etc

33 Initial PoC over 4-6 Weeks Focus on high-productivity data analyst tools to identify key data, insights Typically performed using R, CDH on VMs, lots of scripting, lots of client interaction Focus on the discovery phase Governance, dashboards, productionizing can come later

34 Discovery vs. Exploitation Project Phases Discovery and monetising steps in Big Data projects have different requirements Discovery phase Unbounded discovery Self-Service sandbox Wide toolset Promotion to Exploitation Commercial exploitation Narrower toolset Integration to operations Non-functional requirements Code standardisation & governance

35 Rittman Mead Development Lab 4 x 64GB VM Servers 256GB RAM across cluster 36TB Storage VMWare ESXi VCenter Additional iscsi 6TB storage Synology DS414 NAS Demo / Free Software Installs Cloudera CDH5 Express - and BDP2.2 for Tez Oracle RBDMS, OBIEE, ODI etc Oracle Big Data Connectors Oracle EM 12cR4 vmhost2 vmhost3 vmhost4 vmhost5 BDP 2.2 Cluster 5 x nodes Oracle RDBMS LDAP OBIEE 11g VCenter 64GB RAM, 6TB Disk Core i7 4x3.6GHz VMWare ESXi 5.5 CDH 5.3 Cluster 5 x nodes Kerberos-Secured KDC ODI12c BI Apps 11g 64GB RAM, 6TB Disk Core i7 4x3.6GHz VMWare ESXi 5.5 Vigor 2830n Router VPN, DHCP etc CDH 5.2 Cluster 6 x nodes (16-32GB RAM / node) 64GB RAM, 6TB Disk Core i7 4x3.6GHz VMWare ESXi 5.5 iscsi LUN shared VMFS cluster filesystem Synology DS414 NAS, 6TB (For testing VMWare VMotion failover, large HDFS datasets etc) 64GB RAM, 6TB Disk Core i7 4x3.6GHz VMWare ESXi 5.5 Mac Mini Server OS X Server DNS etc EM 12c R4 16GB RAM, 1TB Disk Core i7 2x3.6GHz

36 Cluster Management VMWare VSphere 5 + VCenter Server Oracle Enterprise Manager 12cR4 Cloud Control OSX Server Yosemite

37 So how well does it work?

38 Part 1 : Rittman Mead Hadoop (BDA) + Oracle BI Project Experiences

39 Typical RM Project BDA Topology Starter BDA rack, or full rack Kerberos-secured using included KDC server Integration with corporate LDAP for Cloudera Manager, Hue etc Developer access through Hue, Beeline, R Studio End-user access through OBIEE, Endeca and other tools With final datasets usually exported to Exadata or Exalytics

40 Oracle Big Data Appliance Engineered system for big data processing and analysis Optimized for enterprise Hadoop workloads 288 Intel Xeon E5 Processors 1152 GB total memory 648TB total raw storage capacity Cloudera Distribution of Hadoop Cloudera Manager Open-source R Oracle NoSQL Database Community Edition Oracle Enterprise Linux + Oracle JVM New - Oracle Big Data SQL

41 Working with Oracle Big Data Appliance Don t underestimate the value of pre-integrated - massive time-saver for client No need to integrate Big Data Connectors, ODI Agent etc with HDFS, Hive etc etc Single support route - raise SR with Oracle, they will route to Cloudera if needed Single patch process for whole cluster - OS, CDH etc etc Full access to Cloudera Enterprise features Otherwise just another CDH cluster in terms of SSH access etc We like it ;-)

42 Cloudera Distribution including Hadoop (CDH) Like Linux, you can set up your Hadoop system manually, or use a distribution Key Hadoop distributions include Cloudera CDH, Hortonworks HDP, MapR etc Cloudera CDH is the distribution Oracle use on Big Data Appliance Provides HDFS and Hadoop framework for BDA Includes Pig, Hive, Sqoop, Oozie, HBase Cloudera Impala for real-time SQL access Cloudera Manager & Hue

43 Cloudera Manager and Hue Web-based tools provided with Cloudera CDH Cloudera Manager used for cluster admin, maintenance (like Enterprise Manager Commercial tool developed by Cloudera Not enabled by default in BigDataLite VM Hue is a developer / analyst tool for working with Pig, Hive, Sqoop, HDFS etc Open source project included in CDH

44 Working with Cloudera Hadoop (CDH) - Observations Very good product stack, enterprise-friendly, big community, can do lots with free edition Cloudera have their favoured Hadoop technologies - Spark, Kafka Also makes use of Cloudera-specific tools - Impala, Cloudera Manager etc But ignores some tools that have value - Apache Tez for example Easy for an Oracle developer to get productive with the CDH stack But beware of some immature technologies / products Hive!= Oracle SQL Spark is very much an alpha product Limitations in things like LDAP integration, end-to-end security Lots of products in stack = lots of places to go to diagnose issues

45 CDH : Things That Work Well HDFS as a low-cost, flexible data store / reservoir; Hive for SQL access to structured + semi-structured HDFS data Pig, Spark, Python, R for data analysis and munging Cloudera Manager and Hue for web-based admin + dev access Hive Metastore / HCatalog HDFS Cluster Filesystem RDBMS Imports Real-Time Logs / Events File / Unstructured Imports

46 Oracle Big Data Connectors Oracle-licensed utilities to connect Hadoop to Oracle RBDMS Bulk-extract data from Hadoop to Oracle, or expose HDFS / Hive data as external tables Run R analysis and processing on Hadoop Leverage Hadoop compute resources to offload ETL and other work from Oracle RBDMS Enable Oracle SQL to access and load Hadoop data

47 Working with the Oracle Big Data Connectors Oracle Loader for Hadoop, Oracle SQL Connector for HDFS - rarely used Sqoop works both way (Oracle>Hadoop, Hadoop>Oracle) and is good enough OSCH replaced by Oracle Big Data SQL for direct Oracle>Hive access Oracle Advanced Analytics for Hadoop has been very useful though Run MapReduce jobs from R Run R functions across Hive tables

48 Oracle R Advanced Analytics for Hadoop Key Features Run R functions on Hive Dataframes Write MapReduce functions in R

49 Initial Data Scoping & Discovery using R R is typically used at start of a big data project to get a high-level understanding of the data Can be run as R standalone, or using Oracle R Advanced Analytics for Hadoop Do basic scan of incoming dataset, get counts, determine delimiters etc Distribution of values for columns Basic graphs and data discovery Use findings to drive design of parsing logic, Hive data structures, need for data scrubbing / correcting etc

50 Design Pattern : Discovery Lab Actionable Events Actionable Insights Actionable Information Structured Enterprise Data Input Events Event Engine Data Reservoir Data Factory Enterprise Information Store Reporting Other Data Execution Innovation Events & Data Discovery Lab Discovery Output

51 Design Pattern : Discovery Lab Specific focus on identifying commercial value for exploitation Small group of highly skilled individuals (aka Data Scientists) Iterative development approach data oriented NOT development oriented Wide range of tools and techniques applied Searching and discovering unstructured data Finding correlations and clusters Filtering, aggregating, deriving and enhancing data Data provisioned through Data Factory or own ETL Typically separate infrastructure but could also be unified Reservoir if resource managed effectively

52 For the Future - Oracle Big Data Discovery

53 Interactive Analysis & Exploration of Hadoop Data

54 Share and Collaborate on Big Data Discovery Projects

55 Typical RM Big Data Project Tools Used Data Loading Real-time via Flume Conf scripts Batch via Sqoop cmd-line exec Discovery phase Exploitation phase Data prep via R scripts, Python scripts etc Data analysis via R scripts, Python scripts, Pig, Spark etc Sharing output via Hive tables, Impala tables, HDFS files etc Data Export Batch via Sqoop cmd-line exec a.k.a. data munging a.k.a. the magic

56 Data Loading into Hadoop Default load type is real-time, streaming loads Batch / bulk loads only typically used to seed system Variety of sources including web log activity, event streams Target is typically HDFS (Hive) or HBase Data typically lands in raw state Lots of files and events, need to be filtered/aggregated Typically semi-structured (JSON, logs etc) High volume, high velocity - Which is why we use Hadoop rather than RBDMS (speed vs. ACID trade-off) Economics of Hadoop means its often possible to archive all incoming data at detail level Real-Time Logs / Events File / Unstructured Imports Loading Stage

57 Apache Flume : Distributed Transport for Log Activity Apache Flume is the standard way to transport log files from source through to target Initial use-case was webserver log files, but can transport any file from A>B Does not do data transformation, but can send to multiple targets / target types Mechanisms and checks to ensure successful transport of entries Has a concept of agents, sinks and channels Agents collect and forward log data Sinks store it in final destination Channels store log data en-route Simple configuration through INI files Handled outside of ODI12c

58 Apache Kafka : Reliable, Message-Based Developed by LinkedIn, designed to address Flume issues around reliability, throughput (though many of those issues have been addressed since) Designed for persistent messages as the common use case Website messages, events etc vs. log file entries Consumer (pull) rather than Producer (push) model Supports multiple consumers per message queue More complex to set up than Flume, and can use Flume as a consumer of messages But gaining popularity, especially alongside Spark Streaming

59 GoldenGate for Continuous Streaming to Hadoop Oracle GoldenGate is also an option, for streaming RDBMS transactions to Hadoop Leverages GoldenGate & HDFS / Hive Java APIs Sample Implementations on MOS Doc.ID (HDFS) and (Hive) Likely to be formal part of GoldenGate in future release - but usable now Can also integrate with Flume for delivery to HDFS - see MOS Doc.ID

60 NoSQL Databases Family of database types that reject tabular storage, SQL access and ACID compliance Useful as a way of landing data quickly + supporting random cell-level access by ETL process Focus is on scalability, speed and schema-on-read Oracle NoSQL Database - speed and scalability Apache HBase - speed, scalability and Hadoop MongoDB - native storage of JSON documents May or may not run on Hadoop, but associated with it Great choice for high-velocity data capture CRUD approach vs write-once/read many in HDFS

61 ODI on Hadoop - Big Data Projects Discover ETL Tools ODI provides an excellent framework for running Hadoop ETL jobs ELT approach pushes transformations down to Hadoop - leveraging power of cluster Hive, HBase, Sqoop and OLH/ODCH KMs provide native Hadoop loading / transformation Whilst still preserving RDBMS push-down Extensible to cover Pig, Spark etc Process orchestration Data quality / error handling Metadata and model-driven

62 ODI on Hadoop - How Well Does It Work? Very good for set-based processing of Hadoop data (HiveQL) Can run python, R etc scripts as procedures Brings metadata and team-based ETL development to Hadoop Process orchestration, error-handling etc Rapid innovation from the ODI Product Dev team - Spark KMs etc coming soon But requires Hadoop devs to learn ODI, or add ODI developer to the project

63 Options for Sharing Hadoop Output with Wider Audience During the discovery phase of a Hadoop project, audience are likely technical Most comfortable with data analyst tools, command-line, low-level access to the data During the exploitation phase, audience will be less technical Emphasis on graphical tools, and integration with wider reporting toolset + metadata Three main options for visualising and sharing Hadoop data 1.Coming Soon - Oracle Big Data Discovery (Endeca on Hadoop) 2.OBIEE reporting against Hadoop direct using Hive/Impala, or Oracle Big Data SQL 3.OBIEE reporting against an export of the Hadoop data, on Exalytics / RDBMS

64 Oracle Business Analytics and Big Data Sources OBIEE 11g can also make use of big data sources OBIEE supports Hive/Hadoop as a data source Oracle R Enterprise can expose R models through DB functions, columns Oracle Exalytics has InfiniBand connectivity to Oracle BDA Endeca Information Discovery can analyze unstructured and semi-structured sources Increasingly tighter-integration between OBIEE and Endeca

65 New in OBIEE : Hadoop Connectivity through Hive MapReduce jobs are typically written in Java, but Hive can make this simpler Hive is a query environment over Hadoop/MapReduce to support SQL-like queries Hive server accepts HiveQL queries via HiveODBC or HiveJDBC, automatically creates MapReduce jobs against data previously loaded into the Hive HDFS tables Approach used by ODI and OBIEE to gain access to Hadoop data Allows Hadoop data to be accessed just like any other data source

66 Importing Hadoop/Hive Metadata into RPD HiveODBC driver has to be installed into Windows environment, so that BI Administration tool can connect to Hive and return table metadata Import as ODBC datasource, change physical DB type to Apache Hadoop afterwards Note that OBIEE queries cannot span >1 Hive schema (no table prefixes) 2 1 3

67 OBIEE / HiveServer2 ODBC Driver Issue Most customers using BDAs are using CDH4 or CDH5 - which uses HiveServer2 OBIEE only ships/supports HiveServer1 ODBC drivers But OBIEE on Windows can use the Cloudera HiveServer2 ODBC drivers which isn t supported by Oracle but works!

68 Dealing with Hadoop / Hive Latency Option 1 : Impala Hadoop access through Hive can be slow - due to inherent latency in Hive Hive queries use MapReduce in the background to query Hadoop Spins-up Java VM on each query Generates MapReduce job Runs and collates the answer Great for large, distributed queries but not so good for speed-of-thought dashboards

69 Dealing with Hadoop / Hive Latency Option 1 : Use Impala Hive is slow - because it s meant to be used for batch-mode queries Many companies / projects are trying to improve Hive - one of which is Cloudera Cloudera Impala is an open-source but commercially-sponsored in-memory MPP platform Replaces Hive and MapReduce in the Hadoop stack Can we use this, instead of Hive, to access Hadoop? It will need to work with OBIEE Warning - it won t be a supported data source (yet )

70 How Impala Works A replacement for Hive, but uses Hive concepts and data dictionary (metastore) MPP (Massively Parallel Processing) query engine that runs within Hadoop Uses same file formats, security, resource management as Hadoop Impala Processes queries in-memory Hadoop Accesses standard HDFS file data HDFS etc Option to use Apache AVRO, RCFile, LZO or Parquet (column-store) Designed for interactive, real-time SQL-like access to Hadoop Impala Hadoop HDFS etc BI Server Presentation Svr Impala Hadoop HDFS etc Cloudera Impala ODBC Driver Impala Hadoop HDFS etc Impala Hadoop HDFS etc Multi-Node Hadoop Cluster

71 Connecting OBIEE to Cloudera Impala Warning - unsupported source - limited testing and no support from MOS Requires Cloudera Impala ODBC drivers - Windows or Linux (RHEL etc/sles) - 32/64 bit ODBC Driver / DSN connection steps similar to Hive

72 So Does Impala Work, as a Hive Substitute? With ORDER BY disabled in DB features, it appears to But not extensively tested by me, or Oracle But it s certainly interesting Reduces 30s, 180s queries down to 1s, 10s etc Impala, or one of the competitor projects (Drill, Dremel etc) assumed to be the real-time query replacement for Hive, in time Oracle announced planned support for Impala at OOW watch this space

73 Dealing with Hadoop / Hive Latency Option 2 : Export to Data Mart In most cases, for general reporting access, exporting into RDBMS makes sense Export Hive data from Hadoop into Oracle Data Mart or Data Warehouse Use Oracle RDBMS for high-value data analysis, full access to RBDMS optimisations Potentially use Exalytics for in-memory RBDMS access RDBMS Imports Real-Time Logs / Events Loading Stage Processing Stage Store / Export Stage File Exports RDBMS Exports File / Unstructured Imports

74 Dealing with Hadoop / Hive Latency Option 3 : Big Data SQL Preferred solution for customers with Oracle Big Data Appliance is Big Data SQL Oracle SQL Access to both relational, and Hive/NoSQL data sources Exadata-type SmartScan against Hadoop datasets Response-time equivalent to Impala or Hive on Tez No issues around HiveQL limitations Insulates end-users around differences between Oracle and Hive datasets

75 Oracle Big Data SQL Part of Oracle Big Data 4.0 (BDA-only) Also requires Oracle Database 12c, Oracle Exadata Database Machine Extends Oracle Data Dictionary to cover Hive Extends Oracle SQL and SmartScan to Hadoop Extends Oracle Security Model over Hadoop Fine-grained access control Data redaction, data masking Uses fast c-based readers where possible (vs. Hive MapReduce generation) Map Hadoop parallelism to Oracle PQ Big Data SQL engine works on top of YARN Like Spark, Tez, MR2 SmartScan Exadata Storage Servers SQL Queries Exadata Database Server Hadoop Cluster SmartScan Oracle Big Data SQL

76 View Hive Table Metadata in the Oracle Data Dictionary Oracle Database 12c with Big Data SQL option can view Hive table metadata Linked by Exadata configuration steps to one or more BDA clusters DBA_HIVE_TABLES and USER_HIVE_TABLES exposes Hive metadata Oracle SQL*Developer 4.0.3, with Cloudera Hive drivers, can connect to Hive metastore SQL> col database_name for a30 SQL> col table_name for a30 SQL> select database_name, table_name 2 from dba_hive_tables; DATABASE_NAME TABLE_NAME default access_per_post default access_per_post_categories default access_per_post_full default apachelog default categories default countries default cust default hive_raw_apache_access_log

77 Big Data SQL Server Dataflow Read data from HDFS Data Node Direct-path reads C-based readers when possible Use native Hadoop classes otherwise Translate bytes to Oracle Apply SmartScan to Oracle bytes Apply filters Project columns Parse JSON/XML Score models

78 Hive Access through Oracle External Tables + Hive Driver Big Data SQL accesses Hive tables through external table mechanism ORACLE_HIVE external table type imports Hive metastore metadata ORACLE_HDFS requires metadata to be specified Access parameters cluster and tablename specify Hive table source and BDA cluster CREATE TABLE access_per_post_categories( hostname varchar2(100), request_date varchar2(100), post_id varchar2(10), title varchar2(200), author varchar2(100), category varchar2(100), ip_integer number) organization external (type oracle_hive default directory default_dir access parameters(com.oracle.bigdata.tablename=default.access_per_post_categories));

79 Use Rich Oracle SQL Dialect over Hadoop (Hive) Data Ranking Functions rank, dense_rank, cume_dist, percent_rank, ntile Window Aggregate Functions Avg, sum, min, max, count, variance, first_value, last_value LAG/LEAD Functions Reporting Aggregate Functions Sum, Avg, ratio_to_report Statistical Aggregates Correlation, linear regression family, covariance Linear Regression Fitting of ordinary-least-squares regression line to set of number pairs Descriptive Statistics Correlations Pearson s correlation coefficients Crosstabs Chi squared, phi coefficinet Hypothesis Testing Student t-test, Bionomal test Distribution Anderson-Darling test - etc.

80 Leverages Hive Metastore for Hadoop Java Access Classes As with other next-gen SQL access layers, uses common Hive metastore table metadata Provides route to underlying Hadoop data for Oracle Big Data SQL c-based SmartScan

81 Extending SmartScan, and Oracle SQL, Across All Data Brings query-offloading features of Exadata to Oracle Big Data Appliance Query across both Oracle and Hadoop sources Intelligent query optimisation applies SmartScan close to ALL data Use same SQL dialect across both sources Apply same security rules, policies, user access rights across both sources

82 Example : Using Big Data SQL to Add Dimensions to Hive Data We want to add country and post details to a Hive table containing page accesses Post and Country details are stored in Oracle RBDMS reference tables Hive Weblog Activity table Oracle Dimension lookup tables Combined output in report form

83 Create ORACLE_HIVE External Table over Hive Table Use the ORACLE_HIVE access driver type to create Oracle external table over Hive table ACCESS_PER_POST_EXTTAB now appears in Oracle data dictionary

84 Import Oracle Tables, Create RPD joining Tables Together No need to use Hive ODBC drivers - Oracle OCI connection instead No issue around HiveServer1 vs HiveServer2; also Big Data SQL handles authentication with Hadoop cluster in background, Kerberos etc Transparent to OBIEE - all appear as Oracle tables Join across schemas if required

85 Create Physical Data Model from Imported Table Metadata Join ORACLE_HIVE external table containing log data, to reference tables from Oracle DB

86 Create Business Model and Presentation Layers Map incoming physical tables into a star schema Add aggregation method for fact measures Add logical keys for logical dimension tables Remove columns from fact table that aren t measures

87 Create Initial Analyses Against Combined Dataset Create analyses using full SQL features Access to Oracle RDBMS Advanced Analytics functions through EVALUATE, EVALUATE_AGGR etc Big Data SQL SmartScan feature provides fast, ad-hoc access to Hive data, avoiding MapReduce

88 Oracle / Hive Query Federation at the RDBMS Level Oracle Big Data SQL feature (not BI Server) takes care of query federation SQL required for fact table (web log activity) access sent to Big Data SQL agent on BDA Only columns (projection) and rows (filtering) required to answer query sent back to Exadata Storage Indexes used on both Exadata Storage Servers and BDA nodes to skip block reads for irrelevant data HDFS caching used to speed-up access to commonly-used HDFS data

89 Access to Full Set of Oracle Join Types No longer restricted to HiveQL equi-joins - Big Data SQL supports all Oracle join operators Use to join Hive data (using View over external table) to a IP range country lookup table using BETWEEN join operator

90 Add In Time Dimension Table Enables time-series reporting; pre-req for forecasting (linear regression-type queries) Map to Date field in view over ORACLE_HIVE table Convert incoming Hive STRING field to Oracle DATE for better time-series manipulation

91 Now Enables Time-Series Reporting Incl. Country Lookups

92 What About Oracle Big Data SQL and ODI12c? Hive, and MapReduce, are well suited to batch-type ETL jobs, but Not all join types are available in Hive - joins must be equality joins Any data from external Oracle RDBMS sources has to be staged in Hadoop before joining Limited set of HiveQL functions vs. Oracle SQL Oracle-based mappings have to import Hive data into DB before accessing it

93 Combining Oracle and Hadoop (Hive) Data in Mappings Example scenario : log data in Hadoop needs to be enriched with customer data in Oracle Hadoop (Hive) contains log activity and customer etc IDs Reference / customer data held in Oracle RBDMS How do we create a mapping that joins both datasets? movieapp_log_odistage.custid = CUSTOMER.CUSTID

94 Options for Importing Oracle / RDBMS Data into Hadoop Could export RBDMS data to file, and load using IKM File to Hive Oracle Big Data Connectors only export to Oracle, not import to Hadoop One option is to use Apache Sqoop, and new IKM SQL to Hive-HBase-File knowledge module Hadoop-native, automatically runs in parallel Uses native JDBC drivers, or OraOop (for example) Bi-directional in-and-out of Hadoop to RDBMS Join performed in Hive, using HiveQL With HiveQL limitations (only equi-joins) movieapp_log_odistage.custid = customer.custid Sqoop extract

95 New Option - Using Oracle Big Data SQL Oracle Big Data SQL provides ability for Exadata to reference Hive tables Use feature to create join in Oracle, bringing across Hive data through ORACLE_HIVE table

96 Oracle Big Data SQL and Data Integration Gives us the ability to easily bring in Hadoop (Hive) data into Oracle-based mappings Allows us to create Hive-based mappings that use Oracle SQL for transforms, joins Faster access to Hive data for real-time ETL scenarios Through Hive, bring NoSQL and semi-structured data access to Oracle ETL projects For our scenario - join weblog + customer data in Oracle RDBMS, no need to stage in Hive

97 Using Big Data SQL in an ODI12c Mapping By default, Hive table has to be exposed as an ORACLE_HIVE external table in Oracle first Then register that Oracle external table in ODI repository + model 3 1 External table creation in Oracle 2 Register in ODI Model Logical Mapping using just Oracle tables

98 Custom KM : LKM Hive to Oracle (Big Data SQL) ODI12c Big Data SQL example on BigDataLite VM uses a custom KM for Big Data SQL LKM Hive to Oracle (Big Data SQL) - KM code downloadable from java.net Allows Hive+Oracle joins by auto-creating ORACLE_HIVE extttab definition to enable Big Data SQL Hive table access

99 ODI12c Mapping Creates Temp Exttab, Joins to Oracle Big Data SQL Hive External Table created as temp object 1 Register in ODI Model 3 Main integration SQL routines uses regular Oracle SQL join 4 Hive table AP uses LKM Hive to Oracle (Big Data SQL) 2 IKM Oracle Insert

100 Finally What Keeps the CIO Awake at Night Security and Privacy Regulations Are we analysing and sharing data in compliance with privacy regulations? - And if we are - would customers think our use of it is ethical? Do I know if the data in my Hadoop cluster is *really* secure?

101 Hadoop Security By Default Connections between Hadoop services, and by users to services, aren t authenticated Security is fragmented : HDFS, Hive, OS user accounts, Hue, CM all separate models No single place to define security policies, groups, access rights No single tool to audit access and permissions By default, everything is open and trusted - reflects roots in academia, R&D, marketing depts

102 Secured Hadoop : Kerberos, Sentry, Data Encryption etc Available for most Hadoop distributions, part of core Hadoop Kerberos Authentication - enables service-to-service, and client-to-service authentication using MIT Kerberos or MS AD Kerberos Apache Sentry - Role-based Access Control for Hive, Impala and HDFS (CDH5.3+) Transparent at-rest HDFS encryption (CDH5.3+) Closes security loopholes, goes some way to Oracle-type data security

103 Oracle Big Data SQL : Single RBDMS/Hadoop Security Model Potential to extend Oracle security model over Hadoop (Hive) data Masking / Redaction VPD FGAC

104 Summary Hadoop and Oracle Big Data Appliance are increasingly appearing in BI+DW Projects Gives DW projects the ability to store more data, cheaper and more flexibly than before Enables non-relational (SQL) query tools and analysis techniques (R, Spark etc) Extends BI s capability to report and analyze across wider data sources Maturity varies widely in terms of tool maturity, and Oracle integration with Hadoop Trend is for Oracle to productize big data, creating tools + products around Oracle BDA We are probably at early stages - but very interesting times to be an Oracle BI+DW dev!

105 Thank You for Attending! Thank you for attending this presentation, and more information can be found at Contact us at or Look out for our book, Oracle Business Intelligence Developers Guide out now! Follow-us on Twitter or Facebook (facebook.com/rittmanmead)

Deep Quick-Dive into Big Data ETL with ODI12c and Oracle Big Data Connectors Mark Rittman, CTO, Rittman Mead Oracle Openworld 2014, San Francisco

Deep Quick-Dive into Big Data ETL with ODI12c and Oracle Big Data Connectors Mark Rittman, CTO, Rittman Mead Oracle Openworld 2014, San Francisco Deep Quick-Dive into Big Data ETL with ODI12c and Oracle Big Data Connectors Mark Rittman, CTO, Rittman Mead Oracle Openworld 2014, San Francisco About the Speaker Mark Rittman, Co-Founder of Rittman Mead

More information

Oracle Big Data Spatial & Graph Social Network Analysis - Case Study

Oracle Big Data Spatial & Graph Social Network Analysis - Case Study Oracle Big Data Spatial & Graph Social Network Analysis - Case Study Mark Rittman, CTO, Rittman Mead OTN EMEA Tour, May 2016 info@rittmanmead.com www.rittmanmead.com @rittmanmead About the Speaker Mark

More information

Safe Harbor Statement

Safe Harbor Statement Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment

More information

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION Syed Rasheed Solution Manager Red Hat Corp. Kenny Peeples Technical Manager Red Hat Corp. Kimberly Palko Product Manager Red Hat Corp.

More information

Constructing a Data Lake: Hadoop and Oracle Database United!

Constructing a Data Lake: Hadoop and Oracle Database United! Constructing a Data Lake: Hadoop and Oracle Database United! Sharon Sophia Stephen Big Data PreSales Consultant February 21, 2015 Safe Harbor The following is intended to outline our general product direction.

More information

HDP Hadoop From concept to deployment.

HDP Hadoop From concept to deployment. HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some

More information

Seamless Access from Oracle Database to Your Big Data

Seamless Access from Oracle Database to Your Big Data Seamless Access from Oracle Database to Your Big Data Brian Macdonald Big Data and Analytics Specialist Oracle Enterprise Architect September 24, 2015 Agenda Hadoop and SQL access methods What is Oracle

More information

Oracle Big Data SQL Technical Update

Oracle Big Data SQL Technical Update Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical

More information

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here> s Big Data solutions Roger Wullschleger DBTA Workshop on Big Data, Cloud Data Management and NoSQL 10. October 2012, Stade de Suisse, Berne 1 The following is intended to outline

More information

Executive Summary... 2 Introduction... 3. Defining Big Data... 3. The Importance of Big Data... 4 Building a Big Data Platform...

Executive Summary... 2 Introduction... 3. Defining Big Data... 3. The Importance of Big Data... 4 Building a Big Data Platform... Executive Summary... 2 Introduction... 3 Defining Big Data... 3 The Importance of Big Data... 4 Building a Big Data Platform... 5 Infrastructure Requirements... 5 Solution Spectrum... 6 Oracle s Big Data

More information

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Copyright 2012, Oracle and/or its affiliates. All rights reserved. 1 Oracle Big Data Appliance Releases 2.5 and 3.0 Ralf Lange Global ISV & OEM Sales Agenda Quick Overview on BDA and its Positioning Product Details and Updates Security and Encryption New Hadoop Versions

More information

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Highly competitive enterprises are increasingly finding ways to maximize and accelerate

More information

An Integrated Analytics & Big Data Infrastructure September 21, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle Enterprise

An Integrated Analytics & Big Data Infrastructure September 21, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle Enterprise An Integrated Analytics & Big Data Infrastructure September 21, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle Enterprise Solutions Group The following is intended to outline our

More information

Introducing Oracle Exalytics In-Memory Machine

Introducing Oracle Exalytics In-Memory Machine Introducing Oracle Exalytics In-Memory Machine Jon Ainsworth Director of Business Development Oracle EMEA Business Analytics 1 Copyright 2011, Oracle and/or its affiliates. All rights Agenda Topics Oracle

More information

Hadoop Ecosystem B Y R A H I M A.

Hadoop Ecosystem B Y R A H I M A. Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open

More information

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India) Deploying OBIEE in the Cloud: Getting Started, Deployment Scenarios and Best Practices Mark Rittman, CTO, Rittman Mead Oracle Openworld 2014, San Francisco About the Speaker Mark Rittman, Co-Founder of

More information

Evolution of Information Management Architecture and Development

Evolution of Information Management Architecture and Development Evolution of Information Management Architecture and Development Stewart Bryson Chief Innovation Officer, Rittman Mead! Andrew Bond Head of Enterprise Architecture, Oracle EMEA Oracle Information Management

More information

Ganzheitliches Datenmanagement

Ganzheitliches Datenmanagement Ganzheitliches Datenmanagement für Hadoop Michael Kohs, Senior Sales Consultant @mikchaos The Problem with Big Data Projects in 2016 Relational, Mainframe Documents and Emails Data Modeler Data Scientist

More information

The Future of Data Management

The Future of Data Management The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class

More information

Lesson 2 : Hadoop & NoSQL Data Loading using Hadoop Tools and ODI12c Mark Rittman, CTO, Rittman Mead Oracle Openworld 2014, San Francisco

Lesson 2 : Hadoop & NoSQL Data Loading using Hadoop Tools and ODI12c Mark Rittman, CTO, Rittman Mead Oracle Openworld 2014, San Francisco Lesson 2 : Hadoop & NoSQL Data Loading using Hadoop Tools and ODI12c Mark Rittman, CTO, Rittman Mead Oracle Openworld 2014, San Francisco Moving Data In, Around and Out of Hadoop Three stages to Hadoop

More information

Luncheon Webinar Series May 13, 2013

Luncheon Webinar Series May 13, 2013 Luncheon Webinar Series May 13, 2013 InfoSphere DataStage is Big Data Integration Sponsored By: Presented by : Tony Curcio, InfoSphere Product Management 0 InfoSphere DataStage is Big Data Integration

More information

Oracle Big Data Essentials

Oracle Big Data Essentials Oracle University Contact Us: Local: 1800 103 4775 Intl: +91 80 40291196 Oracle Big Data Essentials Duration: 3 Days What you will learn This Oracle Big Data Essentials training deep dives into using the

More information

Reference Architecture, Requirements, Gaps, Roles

Reference Architecture, Requirements, Gaps, Roles Reference Architecture, Requirements, Gaps, Roles The contents of this document are an excerpt from the brainstorming document M0014. The purpose is to show how a detailed Big Data Reference Architecture

More information

<Insert Picture Here> Big Data

<Insert Picture Here> Big Data Big Data Kevin Kalmbach Principal Sales Consultant, Public Sector Engineered Systems Program Agenda What is Big Data and why it is important? What is your Big

More information

What s New with Oracle BI, Analytics and DW

What s New with Oracle BI, Analytics and DW What s New with Oracle BI, Analytics and DW Mark Rittman, CTO, Rittman Mead India Masterclass Tour 2013 About the Speaker Mark Rittman, Co-Founder of Rittman Mead Oracle ACE Director, specialising in Oracle

More information

Oracle Big Data Fundamentals Ed 1 NEW

Oracle Big Data Fundamentals Ed 1 NEW Oracle University Contact Us: +90 212 329 6779 Oracle Big Data Fundamentals Ed 1 NEW Duration: 5 Days What you will learn In the Oracle Big Data Fundamentals course, learn to use Oracle's Integrated Big

More information

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future

More information

An Oracle White Paper June 2013. Oracle: Big Data for the Enterprise

An Oracle White Paper June 2013. Oracle: Big Data for the Enterprise An Oracle White Paper June 2013 Oracle: Big Data for the Enterprise Executive Summary... 2 Introduction... 3 Defining Big Data... 3 The Importance of Big Data... 4 Building a Big Data Platform... 5 Infrastructure

More information

Big Data and Advanced Analytics Applications and Capabilities Steven Hagan, Vice President, Server Technologies

Big Data and Advanced Analytics Applications and Capabilities Steven Hagan, Vice President, Server Technologies Big Data and Advanced Analytics Applications and Capabilities Steven Hagan, Vice President, Server Technologies 1 Copyright 2011, Oracle and/or its affiliates. All rights Big Data, Advanced Analytics:

More information

Modernizing Your Data Warehouse for Hadoop

Modernizing Your Data Warehouse for Hadoop Modernizing Your Data Warehouse for Hadoop Big data. Small data. All data. Audie Wright, DW & Big Data Specialist Audie.Wright@Microsoft.com O 425-538-0044, C 303-324-2860 Unlock Insights on Any Data Taking

More information

High Performance Data Management Use of Standards in Commercial Product Development

High Performance Data Management Use of Standards in Commercial Product Development v2 High Performance Data Management Use of Standards in Commercial Product Development Jay Hollingsworth: Director Oil & Gas Business Unit Standards Leadership Council Forum 28 June 2012 1 The following

More information

Data Governance in the Hadoop Data Lake. Michael Lang May 2015

Data Governance in the Hadoop Data Lake. Michael Lang May 2015 Data Governance in the Hadoop Data Lake Michael Lang May 2015 Introduction Product Manager for Teradata Loom Joined Teradata as part of acquisition of Revelytix, original developer of Loom VP of Sales

More information

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview Programming Hadoop 5-day, instructor-led BD-106 MapReduce Overview The Client Server Processing Pattern Distributed Computing Challenges MapReduce Defined Google's MapReduce The Map Phase of MapReduce

More information

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren News and trends in Data Warehouse Automation, Big Data and BI Johan Hendrickx & Dirk Vermeiren Extreme Agility from Source to Analysis DWH Appliances & DWH Automation Typical Architecture 3 What Business

More information

Integrating Hadoop. Into Business Intelligence & Data Warehousing. Philip Russom TDWI Research Director for Data Management, April 9 2013

Integrating Hadoop. Into Business Intelligence & Data Warehousing. Philip Russom TDWI Research Director for Data Management, April 9 2013 Integrating Hadoop Into Business Intelligence & Data Warehousing Philip Russom TDWI Research Director for Data Management, April 9 2013 TDWI would like to thank the following companies for sponsoring the

More information

Hadoop & SAS Data Loader for Hadoop

Hadoop & SAS Data Loader for Hadoop Turning Data into Value Hadoop & SAS Data Loader for Hadoop Sebastiaan Schaap Frederik Vandenberghe Agenda What s Hadoop SAS Data management: Traditional In-Database In-Memory The Hadoop analytics lifecycle

More information

Are You Big Data Ready?

Are You Big Data Ready? ACS 2015 Annual Canberra Conference Are You Big Data Ready? Vladimir Videnovic Business Solutions Director Oracle Big Data and Analytics Introduction Introduction What is Big Data? If you can't explain

More information

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances INSIGHT Oracle's All- Out Assault on the Big Data Market: Offering Hadoop, R, Cubes, and Scalable IMDB in Familiar Packages Carl W. Olofson IDC OPINION Global Headquarters: 5 Speen Street Framingham, MA

More information

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies Big Data: Global Digital Data Growth Growing leaps and bounds by 40+% Year over Year! 2009 =.8 Zetabytes =.08

More information

Big Data Technologies Compared June 2014

Big Data Technologies Compared June 2014 Big Data Technologies Compared June 2014 Agenda What is Big Data Big Data Technology Comparison Summary Other Big Data Technologies Questions 2 What is Big Data by Example The SKA Telescope is a new development

More information

Apache Sentry. Prasad Mujumdar prasadm@apache.org prasadm@cloudera.com

Apache Sentry. Prasad Mujumdar prasadm@apache.org prasadm@cloudera.com Apache Sentry Prasad Mujumdar prasadm@apache.org prasadm@cloudera.com Agenda Various aspects of data security Apache Sentry for authorization Key concepts of Apache Sentry Sentry features Sentry architecture

More information

#TalendSandbox for Big Data

#TalendSandbox for Big Data Evalua&on von Apache Hadoop mit der #TalendSandbox for Big Data Julien Clarysse @whatdoesdatado @talend 2015 Talend Inc. 1 Connecting the Data-Driven Enterprise 2 Talend Overview Founded in 2006 BRAND

More information

Big Data Are You Ready? Thomas Kyte http://asktom.oracle.com

Big Data Are You Ready? Thomas Kyte http://asktom.oracle.com Big Data Are You Ready? Thomas Kyte http://asktom.oracle.com The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated

More information

Oracle Big Data SQL. Architectural Deep Dive. Dan McClary, Ph.D. Big Data Product Management Oracle

Oracle Big Data SQL. Architectural Deep Dive. Dan McClary, Ph.D. Big Data Product Management Oracle Oracle Big Data SQL Architectural Deep Dive Dan McClary, Ph.D. Big Data Product Management Oracle Copyright 2014, Oracle and/or its affiliates. All rights reserved. Safe Harbor Statement The following is

More information

Big Data Course Highlights

Big Data Course Highlights Big Data Course Highlights The Big Data course will start with the basics of Linux which are required to get started with Big Data and then slowly progress from some of the basics of Hadoop/Big Data (like

More information

ORACLE DATA INTEGRATOR ENTERPRISE EDITION

ORACLE DATA INTEGRATOR ENTERPRISE EDITION ORACLE DATA INTEGRATOR ENTERPRISE EDITION Oracle Data Integrator Enterprise Edition 12c delivers high-performance data movement and transformation among enterprise platforms with its open and integrated

More information

Apache Hadoop: The Pla/orm for Big Data. Amr Awadallah CTO, Founder, Cloudera, Inc. aaa@cloudera.com, twicer: @awadallah

Apache Hadoop: The Pla/orm for Big Data. Amr Awadallah CTO, Founder, Cloudera, Inc. aaa@cloudera.com, twicer: @awadallah Apache Hadoop: The Pla/orm for Big Data Amr Awadallah CTO, Founder, Cloudera, Inc. aaa@cloudera.com, twicer: @awadallah 1 The Problems with Current Data Systems BI Reports + Interac7ve Apps RDBMS (aggregated

More information

Cyber Security With Big Data

Cyber Security With Big Data Cyber Security With Big Data Fast. Complete. Cost-Effec1ve. Harry J Foxwell, PhD Principal Consultant Oracle Public Sector Oct 2015 Safe Harbor Statement The following is intended to outline our general

More information

HDP Enabling the Modern Data Architecture

HDP Enabling the Modern Data Architecture HDP Enabling the Modern Data Architecture Herb Cunitz President, Hortonworks Page 1 Hortonworks enables adoption of Apache Hadoop through HDP (Hortonworks Data Platform) Founded in 2011 Original 24 architects,

More information

Big Data SQL and Query Franchising

Big Data SQL and Query Franchising Big Data SQL and Query Franchising An Architecture for Query Beyond Hadoop Dan McClary, Ph.D. Big Data Product Management Oracle Copyright 2014, Oracle and/or its affiliates. All rights reserved. Safe Harbor

More information

Moving From Hadoop to Spark

Moving From Hadoop to Spark + Moving From Hadoop to Spark Sujee Maniyam Founder / Principal @ www.elephantscale.com sujee@elephantscale.com Bay Area ACM meetup (2015-02-23) + HI, Featured in Hadoop Weekly #109 + About Me : Sujee

More information

Architecting for the Internet of Things & Big Data

Architecting for the Internet of Things & Big Data Architecting for the Internet of Things & Big Data Robert Stackowiak, Oracle North America, VP Information Architecture & Big Data September 29, 2014 Safe Harbor Statement The following is intended to

More information

The Future of Data Management with Hadoop and the Enterprise Data Hub

The Future of Data Management with Hadoop and the Enterprise Data Hub The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah Cofounder & CTO, Cloudera, Inc. Twitter: @awadallah 1 2 Cloudera Snapshot Founded 2008, by former employees of Employees

More information

An Oracle White Paper October 2011. Oracle: Big Data for the Enterprise

An Oracle White Paper October 2011. Oracle: Big Data for the Enterprise An Oracle White Paper October 2011 Oracle: Big Data for the Enterprise Executive Summary... 2 Introduction... 3 Defining Big Data... 3 The Importance of Big Data... 4 Building a Big Data Platform... 5

More information

How To Create A Business Intelligence (Bi)

How To Create A Business Intelligence (Bi) Oracle Business Analytics Overview Markus Päivinen Business Analytics Country Leader, Finland May 2014 1 Presentation content What are the requirements for modern BI Trend in Business Analytics Big Data

More information

Oracle Big Data Building A Big Data Management System

Oracle Big Data Building A Big Data Management System Oracle Big Building A Big Management System Copyright 2015, Oracle and/or its affiliates. All rights reserved. Effi Psychogiou ECEMEA Big Product Director May, 2015 Safe Harbor Statement The following

More information

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Datenverwaltung im Wandel - Building an Enterprise Data Hub with Datenverwaltung im Wandel - Building an Enterprise Data Hub with Cloudera Bernard Doering Regional Director, Central EMEA, Cloudera Cloudera Your Hadoop Experts Founded 2008, by former employees of Employees

More information

Migrating Discoverer to OBIEE Lessons Learned. Presented By Presented By Naren Thota Infosemantics, Inc.

Migrating Discoverer to OBIEE Lessons Learned. Presented By Presented By Naren Thota Infosemantics, Inc. Migrating Discoverer to OBIEE Lessons Learned Presented By Presented By Naren Thota Infosemantics, Inc. Professional Background Partner/OBIEE Architect at Infosemantics, Inc. Experience with BI solutions

More information

Please give me your feedback

Please give me your feedback Please give me your feedback Session BB4089 Speaker Claude Lorenson, Ph. D and Wendy Harms Use the mobile app to complete a session survey 1. Access My schedule 2. Click on this session 3. Go to Rate &

More information

TUT NoSQL Seminar (Oracle) Big Data

TUT NoSQL Seminar (Oracle) Big Data Timo Raitalaakso +358 40 848 0148 rafu@solita.fi TUT NoSQL Seminar (Oracle) Big Data 11.12.2012 Timo Raitalaakso MSc 2000 Work: Solita since 2001 Senior Database Specialist Oracle ACE 2012 Blog: http://rafudb.blogspot.com

More information

TE's Analytics on Hadoop and SAP HANA Using SAP Vora

TE's Analytics on Hadoop and SAP HANA Using SAP Vora TE's Analytics on Hadoop and SAP HANA Using SAP Vora Naveen Narra Senior Manager TE Connectivity Santha Kumar Rajendran Enterprise Data Architect TE Balaji Krishna - Director, SAP HANA Product Mgmt. -

More information

Oracle Big Data Handbook

Oracle Big Data Handbook ORACLG Oracle Press Oracle Big Data Handbook Tom Plunkett Brian Macdonald Bruce Nelson Helen Sun Khader Mohiuddin Debra L. Harding David Segleau Gokula Mishra Mark F. Hornick Robert Stackowiak Keith Laker

More information

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84 Index A Amazon Web Services (AWS), 50, 58 Analytics engine, 21 22 Apache Kafka, 38, 131 Apache S4, 38, 131 Apache Sqoop, 37, 131 Appliance pattern, 104 105 Application architecture, big data analytics

More information

Building Your Big Data Team

Building Your Big Data Team Building Your Big Data Team With all the buzz around Big Data, many companies have decided they need some sort of Big Data initiative in place to stay current with modern data management requirements.

More information

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ End to End Solution to Accelerate Data Warehouse Optimization Franco Flore Alliance Sales Director - APJ Big Data Is Driving Key Business Initiatives Increase profitability, innovation, customer satisfaction,

More information

Big Data Analytics Platform @ Nokia

Big Data Analytics Platform @ Nokia Big Data Analytics Platform @ Nokia 1 Selecting the Right Tool for the Right Workload Yekesa Kosuru Nokia Location & Commerce Strata + Hadoop World NY - Oct 25, 2012 Agenda Big Data Analytics Platform

More information

Oracle Database - Engineered for Innovation. Sedat Zencirci Teknoloji Satış Danışmanlığı Direktörü Türkiye ve Orta Asya

Oracle Database - Engineered for Innovation. Sedat Zencirci Teknoloji Satış Danışmanlığı Direktörü Türkiye ve Orta Asya Oracle Database - Engineered for Innovation Sedat Zencirci Teknoloji Satış Danışmanlığı Direktörü Türkiye ve Orta Asya Oracle Database 11g Release 2 Shipping since September 2009 11.2.0.3 Patch Set now

More information

An Integrated Big Data & Analytics Infrastructure June 14, 2012 Robert Stackowiak, VP Oracle ESG Data Systems Architecture

An Integrated Big Data & Analytics Infrastructure June 14, 2012 Robert Stackowiak, VP Oracle ESG Data Systems Architecture An Integrated Big Data & Analytics Infrastructure June 14, 2012 Robert Stackowiak, VP ESG Data Systems Architecture Big Data & Analytics as a Service Components Unstructured Data / Sparse Data of Value

More information

An Oracle BI and EPM Development Roadmap

An Oracle BI and EPM Development Roadmap An Oracle BI and EPM Development Roadmap Mark Rittman, Director, Rittman Mead UKOUG Financials SIG, September 2009 1 Who Am I? Oracle BI&W Architecture and Development Specialist Co-Founder of Rittman

More information

Birds of a Feather Session: Best Practices for TimesTen on Exalytics

Birds of a Feather Session: Best Practices for TimesTen on Exalytics Birds of a Feather Session: Best Practices for TimesTen on Exalytics Chris Jenkins Senior Director, In-Memory Technology, Oracle Antony Heljula Technical Director, Peak Indicators Ltd. Mark Rittman CTO,

More information

Regression & Load Testing BI EE 11g

Regression & Load Testing BI EE 11g Regression & Load Testing BI EE 11g Venkatakrishnan J Who Am I? Venkatakrishnan Janakiraman Over 8+ Years of Oracle BI & EPM experience Managing Director (India), Rittman Mead India Blog at http://www.rittmanmead.com/blog

More information

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give

More information

Native Connectivity to Big Data Sources in MSTR 10

Native Connectivity to Big Data Sources in MSTR 10 Native Connectivity to Big Data Sources in MSTR 10 Bring All Relevant Data to Decision Makers Support for More Big Data Sources Optimized Access to Your Entire Big Data Ecosystem as If It Were a Single

More information

OBIEE 11g Data Modeling Best Practices

OBIEE 11g Data Modeling Best Practices OBIEE 11g Data Modeling Best Practices Mark Rittman, Director, Rittman Mead Oracle Open World 2010, San Francisco, September 2010 Introductions Mark Rittman, Co-Founder of Rittman Mead Oracle ACE Director,

More information

Building Scalable Big Data Infrastructure Using Open Source Software. Sam William sampd@stumbleupon.

Building Scalable Big Data Infrastructure Using Open Source Software. Sam William sampd@stumbleupon. Building Scalable Big Data Infrastructure Using Open Source Software Sam William sampd@stumbleupon. What is StumbleUpon? Help users find content they did not expect to find The best way to discover new

More information

Welkom! Copyright 2014 Oracle and/or its affiliates. All rights reserved.

Welkom! Copyright 2014 Oracle and/or its affiliates. All rights reserved. Welkom! WIE? Bestuurslid OGh met BI / WA ervaring Bepalen activiteiten van de vereniging Deelname in organisatie commite van 1 of meerdere events Faciliteren van de SIG s Redactie van OGh-Visie Onderhouden

More information

Tap into Hadoop and Other No SQL Sources

Tap into Hadoop and Other No SQL Sources Tap into Hadoop and Other No SQL Sources Presented by: Trishla Maru What is Big Data really? The Three Vs of Big Data According to Gartner Volume Volume Orders of magnitude bigger than conventional data

More information

Big Data Management and Security

Big Data Management and Security Big Data Management and Security Audit Concerns and Business Risks Tami Frankenfield Sr. Director, Analytics and Enterprise Data Mercury Insurance What is Big Data? Velocity + Volume + Variety = Value

More information

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES Relational vs. Non-Relational Architecture Relational Non-Relational Rational Predictable Traditional Agile Flexible Modern 2 Agenda Big Data

More information

Business Intelligence for Big Data

Business Intelligence for Big Data Business Intelligence for Big Data Will Gorman, Vice President, Engineering May, 2011 2010, Pentaho. All Rights Reserved. www.pentaho.com. What is BI? Business Intelligence = reports, dashboards, analysis,

More information

An Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database

An Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database An Oracle White Paper June 2012 High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database Executive Overview... 1 Introduction... 1 Oracle Loader for Hadoop... 2 Oracle Direct

More information

Getting Started Practical Input For Your Roadmap

Getting Started Practical Input For Your Roadmap Getting Started Practical Input For Your Roadmap Mike Ferguson Managing Director, Intelligent Business Strategies BA4ALL Big Data & Analytics Insight Conference Stockholm, May 2015 About Mike Ferguson

More information

IBM Big Data Platform

IBM Big Data Platform IBM Big Data Platform Turning big data into smarter decisions Stefan Söderlund. IBM kundarkitekt, Försvarsmakten Sesam vår-seminarie Big Data, Bigga byte kräver Pigga Hertz! May 16, 2013 By 2015, 80% of

More information

Big Data Are You Ready? Jorge Plascencia Solution Architect Manager

Big Data Are You Ready? Jorge Plascencia Solution Architect Manager Big Data Are You Ready? Jorge Plascencia Solution Architect Manager Big Data: The Datafication Of Everything Thoughts Devices Processes Thoughts Things Processes Run the Business Organize data to do something

More information

Data processing goes big

Data processing goes big Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,

More information

Oracle BI Cloud Service : What is it and Where Will it be Useful? Francesco Tisiot, Principal Consultant, Rittman Mead OUG Ireland 2015, Dublin

Oracle BI Cloud Service : What is it and Where Will it be Useful? Francesco Tisiot, Principal Consultant, Rittman Mead OUG Ireland 2015, Dublin Oracle BI Cloud Service : What is it and Where Will it be Useful? Francesco Tisiot, Principal Consultant, Rittman Mead OUG Ireland 2015, Dublin About the Speaker Francesco Tisiot Principal Consultant at

More information

Data Integration Checklist

Data Integration Checklist The need for data integration tools exists in every company, small to large. Whether it is extracting data that exists in spreadsheets, packaged applications, databases, sensor networks or social media

More information

Data Warehouse Optimization

Data Warehouse Optimization Data Warehouse Optimization Embedding Hadoop in Data Warehouse Environments A Whitepaper Rick F. van der Lans Independent Business Intelligence Analyst R20/Consultancy September 2013 Sponsored by Copyright

More information

Big Data: Are You Ready? Kevin Lancaster

Big Data: Are You Ready? Kevin Lancaster Big Data: Are You Ready? Kevin Lancaster Director, Engineered Systems Oracle Europe, Middle East & Africa 1 A Data Explosion... Traditional Data Sources Billing engines Custom developed New, Non-Traditional

More information

Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru

Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy Presented by: Jeffrey Zhang and Trishla Maru Agenda Big Data Overview All About Hadoop What is Hadoop? How does MicroStrategy connects to Hadoop?

More information

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE AGENDA Introduction to Big Data Introduction to Hadoop HDFS file system Map/Reduce framework Hadoop utilities Summary BIG DATA FACTS In what timeframe

More information

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...

More information

Bringing the Power of SAS to Hadoop. White Paper

Bringing the Power of SAS to Hadoop. White Paper White Paper Bringing the Power of SAS to Hadoop Combine SAS World-Class Analytic Strength with Hadoop s Low-Cost, Distributed Data Storage to Uncover Hidden Opportunities Contents Introduction... 1 What

More information

Oracle BI Suite Enterprise Edition For Discoverer Users. Mark Rittman, Rittman Mead Consulting http://www.rittmanmead.com

Oracle BI Suite Enterprise Edition For Discoverer Users. Mark Rittman, Rittman Mead Consulting http://www.rittmanmead.com Oracle BI Suite Enterprise Edition For Discoverer Users Mark Rittman, Rittman Mead Consulting http://www.rittmanmead.com Who Am I? Oracle BI&W Architecture & Development Specialist The Rittman of Rittman

More information

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani Technical Architect - Big Data Syntel Agenda Welcome to the Zoo! Evolution Timeline Traditional BI/DW Architecture Where Hadoop Fits In 2 Welcome to

More information

Self-service BI for big data applications using Apache Drill

Self-service BI for big data applications using Apache Drill Self-service BI for big data applications using Apache Drill 2015 MapR Technologies 2015 MapR Technologies 1 Data Is Doubling Every Two Years Unstructured data will account for more than 80% of the data

More information

Oracle Data Integrator for Big Data. Alex Kotopoulis Senior Principal Product Manager

Oracle Data Integrator for Big Data. Alex Kotopoulis Senior Principal Product Manager Oracle Data Integrator for Big Data Alex Kotopoulis Senior Principal Product Manager Hands on Lab - Oracle Data Integrator for Big Data Abstract: This lab will highlight to Developers, DBAs and Architects

More information

Upcoming Announcements

Upcoming Announcements Enterprise Hadoop Enterprise Hadoop Jeff Markham Technical Director, APAC jmarkham@hortonworks.com Page 1 Upcoming Announcements April 2 Hortonworks Platform 2.1 A continued focus on innovation within

More information

Oracle OLAP 11g and Oracle Essbase

Oracle OLAP 11g and Oracle Essbase Oracle OLAP 11g and Oracle Essbase Mark Rittman, Director, Rittman Mead Consulting Who Am I? Oracle BI&W Architecture and Development Specialist Co-Founder of Rittman Mead Consulting Oracle BI&W Project

More information

Oracle Database 12c Plug In. Switch On. Get SMART.

Oracle Database 12c Plug In. Switch On. Get SMART. Oracle Database 12c Plug In. Switch On. Get SMART. Duncan Harvey Head of Core Technology, Oracle EMEA March 2015 Safe Harbor Statement The following is intended to outline our general product direction.

More information