Welkom! Copyright 2014 Oracle and/or its affiliates. All rights reserved.



Similar documents
Safe Harbor Statement

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

Oracle Big Data Discovery Unlock Potential in Big Data Reservoir

Are You Big Data Ready?

Big Data and New Paradigms in Information Management. Vladimir Videnovic Institute for Information Management

Big Data SQL and Query Franchising

Ganzheitliches Datenmanagement

Apache Hadoop: The Pla/orm for Big Data. Amr Awadallah CTO, Founder, Cloudera, Inc.

Implementation of Big Data and Analytics Projects with Big Data Discovery and BICS March 2015

HDP Hadoop From concept to deployment.

Safe Harbor Statement

Data First Framework. How to Build Your Enterprise Data Hub. Luis Campos Big Data Solutions Director Oracle Europe, Middle East and Africa

Bringing Big Data to People

AtScale Intelligence Platform

Oracle Big Data Discovery (BDD) Hadoop Visualization

Apache Sentry. Prasad Mujumdar

Apache Hadoop: The Big Data Refinery

Workshop on Hadoop with Big Data

Oracle Big Data Essentials

HDP Enabling the Modern Data Architecture

Big Data Open Source Stack vs. Traditional Stack for BI and Analytics

#TalendSandbox for Big Data

Data processing goes big

Oracle Big Data Spatial & Graph Social Network Analysis - Case Study

The Future of Data Management

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Qsoft Inc

Oracle Big Data Fundamentals Ed 1 NEW

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

The Future of Data Management with Hadoop and the Enterprise Data Hub

Oracle Big Data Discovery The Visual Face of Hadoop

Oracle Big Data SQL Technical Update

Comprehensive Analytics on the Hortonworks Data Platform

Data Governance in the Hadoop Data Lake. Michael Lang May 2015

Automated Data Ingestion. Bernhard Disselhoff Enterprise Sales Engineer

Professional Hadoop Solutions

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

Self-service BI for big data applications using Apache Drill

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

BIG DATA HADOOP TRAINING

Implement Hadoop jobs to extract business value from large and varied data sets

Self-service BI for big data applications using Apache Drill

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

Cloudera Manager Training: Hands-On Exercises

Native Connectivity to Big Data Sources in MSTR 10

Deploy. Friction-free self-service BI solutions for everyone Scalable analytics on a modern architecture

HADOOP. Revised 10/19/2015

Introducing the Reimagined Power BI Platform. Jen Underwood, Microsoft

The Digital Enterprise Demands a Modern Integration Approach. Nada daveiga, Sr. Dir. of Technical Sales Tony LaVasseur, Territory Leader

IBM BigInsights Has Potential If It Lives Up To Its Promise. InfoSphere BigInsights A Closer Look

The Inside Scoop on Hadoop

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Data-Intensive Programming. Timo Aaltonen Department of Pervasive Computing

MySQL and Hadoop: Big Data Integration. Shubhangi Garg & Neha Kumari MySQL Engineering

More Data in Less Time

Oracle Big Data SQL. Architectural Deep Dive. Dan McClary, Ph.D. Big Data Product Management Oracle

Hadoop Ecosystem B Y R A H I M A.

Big Data Analytics Nokia

Real World Big Data Architecture - Splunk, Hadoop, RDBMS

Deploying Hadoop with Manager

Simplifying Big Data Analytics: Unifying Batch and Stream Processing. John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!!

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

Certified Big Data and Apache Hadoop Developer VS-1221

How to avoid building a data swamp

Apache Hadoop: Past, Present, and Future

Integrating VoltDB with Hadoop

Building Your Big Data Team

Building Scalable Big Data Infrastructure Using Open Source Software. Sam William

Cisco Data Preparation

Best Practices for Hadoop Data Analysis with Tableau

Complete Java Classes Hadoop Syllabus Contact No:

Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014

Big Data on Microsoft Platform

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

How To Create A Data Visualization With Apache Spark And Zeppelin

Hadoop Job Oriented Training Agenda

Integrating Hadoop. Into Business Intelligence & Data Warehousing. Philip Russom TDWI Research Director for Data Management, April

Harnessing the Power of the Microsoft Cloud for Deep Data Analytics

IBM BigInsights for Apache Hadoop

Apache Hadoop in the Enterprise. Dr. Amr Awadallah,

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

BIG DATA TRENDS AND TECHNOLOGIES

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview

Integrating a Big Data Platform into Government:

SQL Server 2012 PDW. Ryan Simpson Technical Solution Professional PDW Microsoft. Microsoft SQL Server 2012 Parallel Data Warehouse

MySQL and Hadoop Big Data Integration

Oracle BI Roadmap & Visual Analyzer Ljiljana Perica, Oracle Business Solution Leader Ljiljana.perica@oracle.com

White Paper: Evaluating Big Data Analytical Capabilities For Government Use

Information Builders Mission & Value Proposition

SQL on NoSQL (and all of the data) With Apache Drill

Peers Techno log ies Pv t. L td. HADOOP

QUICK FACTS. Delivering a Unified Data Architecture for Sony Computer Entertainment America TEKSYSTEMS GLOBAL SERVICES CUSTOMER SUCCESS STORIES

Roadmap Talend : découvrez les futures fonctionnalités de Talend

The Top 10 7 Hadoop Patterns and Anti-patterns. Alex

SAP and Hortonworks Reference Architecture

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Transcription:

Welkom!

WIE? Bestuurslid OGh met BI / WA ervaring Bepalen activiteiten van de vereniging Deelname in organisatie commite van 1 of meerdere events Faciliteren van de SIG s Redactie van OGh-Visie Onderhouden van contacten met leden 3

Agenda 1 Positioneren van data discovery 2. Overzichtspresentatie van stappen bij het Big Data Discovery tool 3. Voorbereiding: Bespreken van de mogelijke bewerkingen 4. Demo van het BDD Tool en bespreking van de fasen van bewerkingen. Find, Explore, Transform, Discover en Publish 5. Bespreking van de verschillende rollen binnen een project 6. Installatie van het Discovery tool 7. Hoe er snel mee aan de slag te gaan >> Zie Link Find Explore Discover Transform 4

Information Management platform Reference Architecture Actionable Events Actionable Insights Actionable Information Structured Enterprise Data Data Streams Event Engine Data Reservoir Data Factory Enterprise Data Business Analytics Other Data Execution Innovation Events & Data Discovery Lab Discovery Output 11

Oracle Big Data Discovery Wim Villano Oracle

Data Reservoir is growing Emerging Sources Data Reservoir 25

Not Easy to Get Analytic Value at Fast Enough Pace Data Uncertainty Not familiar and overwhelming Potential value not obvious Requires significant manipulation Tool Complexity Early Hadoop tools only for experts Existing BI tools not designed for Hadoop Emerging solutions lack broad capabilities 80% effort typically spent on evaluating and preparing data Overly dependent on scarce and highly skilled resources

Requires a Fundamentally New Approach A single intuitive and visual user interface, to... find explore transform discover share find and explore big data to understand its potential quickly transform and enrich it to make it better unlock big data for anyone to discover and share new value

Oracle Big Data Discovery. The Visual Face of Hadoop find explore transform discover share

Oracle Big Data Discovery See the Potential in Big Data, Quickly Make it Better and Unlock Value for Everyone Business Benefits Get value faster. Rapidly turn raw data into actionable insights, leveraged across the enterprise Democratize value from Big Data. Increase the size, diversify the skills, and improve the efficiency of Big Data teams Technical Benefits Destroy existing technical barriers. Run natively on Hadoop cluster for maximum scalability and performance Publish, secure and leverage. Integrate with Hadoop open standards and leverage the unified Oracle big data ecosystem

The Hadoop Ecosystem Standard Hadoop Node Hadoop Analytic & Data Processing Tools Spark Map Reduce Sqoop MLlib R-on-Hadoop Hive Hadoop Management Tools HCatalog Oozie YARN Zookeeper HDFS

Big Data Discovery In Hadoop Hadoop Node Hadoop Analytic & Data Processing BDD Data Processing BDD Node Studio The visual face of Hadoop Hadoop Management Tools HDFS Provisioning & Transformation of Data Dgraph Gateway Hybrid search-analytics database

Agenda 1 Positioneren van data discovery 2. Overzichtspresentatie van stappen bij het Big Data Discovery tool 3. Voorbereiding: Bespreken van de mogelijke bewerkingen 4. Demo van het BDD Tool en bespreking van de fasen van bewerkingen. Find, Explore, Transform, Discover en Publish 5. Bespreking van de verschillende rollen binnen een project 6. Installatie van het Discovery tool 7. Hoe er snel mee aan de slag te gaan >> Zie Link

Data Stored in Hadoop Example: Files with JSON data Hadoop/NoSQL Ecosystem {"custid":1185972,"movieid":null,"genreid":null,"time":"2012-07-01:00:00:07","recommended":null,"activity":8} {"custid":1354924,"movieid":1948,"genreid":9,"time":"2012-07-01:00:00:22","recommended":"n","activity":7} {"custid":1083711,"movieid":null,"genreid":null,"time":"2012-07-01:00:00:26","recommended":null,"activity":9} {"custid":1234182,"movieid":11547,"genreid":44,"time":"2012-07-01:00:00:32","recommended":"y","activity":7} {"custid":1010220,"movieid":11547,"genreid":44,"time":"2012-07-01:00:00:42","recommended":"y","activity":6} {"custid":1143971,"movieid":null,"genreid":null,"time":"2012-07-01:00:00:43","recommended":null,"activity":8} {"custid":1253676,"movieid":null,"genreid":null,"time":"2012-07-01:00:00:50","recommended":null,"activity":9} {"custid":1351777,"movieid":608,"genreid":6,"time":"2012-07-01:00:01:03","recommended":"n","activity":7} {"custid":1143971,"movieid":null,"genreid":null,"time":"2012-07-01:00:01:07","recommended":null,"activity":9} {"custid":1363545,"movieid":27205,"genreid":9,"time":"2012-07-01:00:01:18","recommended":"y","activity":7} {"custid":1067283,"movieid":1124,"genreid":9,"time":"2012-07-01:00:01:26","recommended":"y","activity":7} {"custid":1126174,"movieid":16309,"genreid":9,"time":"2012-07-01:00:01:35","recommended":"n","activity":7} {"custid":1234182,"movieid":11547,"genreid":44,"time":"2012-07-01:00:01:39","recommended":"y","activity":7}} {"custid":1346299,"movieid":424,"genreid":1,"time":"2012-07-01:00:05:02","recommended":"y","activity":4} 33

Hadoop and Databases Databases Schema-on-Write Hadoop Schema-on-Read Schema must be created before any data can be loaded An explicit load operation has to take place which transforms data to DB internal structure New columns must be added explicitly before new data for such columns can be loaded into the database Data is simply copied to the file store, no transformation is needed A SerDe (Serializer/Deserlizer) is applied during read time to extract the required columns (late binding) New data can start flowing anytime and will appear retroactively once the SerDe is updated to parse it 1) Reads are Fast 2) Standards and Governance PROS 1) Loads are Fast 2) Flexibility and Agility

Hive Metastore SQL-on-Hadoop Engines Share Metadata, not MapReduce SparkSQL Hive Impala Hive Metastore Table Definitions: movieapp_log_json Tweets avro_log Metastore maps DDL to Java access classes 35

Prepare Discovery Metastore Data Processing Discovery (Potential) Sampling Profiling Enrichment Big Data Discovery Command Line Interface (CLI) The preferred method for IT / Data Engineer / Data Scientist / Anyone who loves CLI s Self Service Upload via BDD Studio The preferred method for the Business Analyst 36

Command Line Interface Claims File Define Hive Table (if not exist) Run Data Processing Result BDD Script Location: Run Script:

Self Service Apache Log File

Agenda 1 Positioneren van data discovery 2. Overzichtspresentatie van stappen bij het Big Data Discovery tool 3. Voorbereiding: Bespreken van de mogelijke bewerkingen 4. Demo van het BDD Tool en bespreking van de fasen van bewerkingen. Find, Explore, Transform, Discover en Publish 5. Bespreking van de verschillende rollen binnen een project 6. Installatie van het Discovery tool 7. Hoe er snel mee aan de slag te gaan >> Zie Link 39

Agenda 1 Positioneren van data discovery 2. Overzichtspresentatie van stappen bij het Big Data Discovery tool 3. Voorbereiding: Bespreken van de mogelijke bewerkingen 4. Demo van het BDD Tool en bespreking van de fasen van bewerkingen. Find, Explore, Transform, Discover en Publish 5. Bespreking van de verschillende rollen binnen een project 6. Installatie van het Discovery tool 7. Hoe er snel mee aan de slag te gaan >> Zie Link 40

BDD Project Roles Role Skills and Background Level of participation During Project Level of participation Ongoing Other Notes Business Owner - Deep business knowledge - Aware of business success criteria - Up to to 1 week for Design/Detailed requriements & Deployment - Status and iteration reviews during development As needed for providing feeding feedback or additional planning Key Project Manager - Project delivery skills - Knowledge of customer delivery standards - Part time for duration of project (1-3 days) - None Typically 1 BDD Delivery Manager and 1 Customer Delivery Manager Project Business Analyst - Understanding of key business metrics - Experience configuring and interpreting charts - Ability to spot data quality problems - Basic statistical knowledge helpful - Near full time participating in design, creation of metrics, charts, and reports (typically 2-4 weeks) - 1/2 time participation during testing and rollout - Up to 4 hours/week reviewing site usage and creating / updating metrics based on feedback Roles Data Engineer - Knowledge of data sources and extracts - Experience building ETL pipelines - Groovy experience - Full time for initial ID of sources, ingest, and transformations (2-4 weeks) '- 1/2 time during testing and roll out - Up to 1/4 time writing custom transformations or assisting with advanced transformations Hadoop Engineer - Experience with HDFS and Hive (in particular, registering data with Hive) - Can programmatically manipulate data - Knowledge of Apache Spark helpful -Full time during the project (installing and configuring product, getting data in HCatalog, performing necessary special transformations) (2-4 weeks) -Up to 1/4 time (getting new data into Hive) - Periodic upgrades to Hadoop components may require 1-2 days System Administrator - Technical infrastructure - Usage auditing - Security management - Full time during deployment activities (typically 2-4 weeks) - Up to 1 hour / week to review logs. - Periodic upgrades to Endeca software may require 1-2 days Component Developer - Portal develoment experience - Hands on Java & Javascript coding skills - CSS/Photoshop for visual styling if needed -Full time during a development of a custom component (typically 1-3 weeks) - None Optional Integration Architect - Specfic point technology experience (ODI, OBIEE, security systems) - Full time during integration activities (varies based on specific requirements) - None Could include moving data into Hadoop Roles Statistician - Predictive statistics, data mining, or machine learning training and experience - Familiarity with a stastistical tool like R - Knowledge of enterprise's practices around predictive model management and deployment - 1/4 time during requirements phase - None 41

Key Roles Optional Roles Activities Phase and Actvities Design and Detailed Requirements Refine requirements Business Owner Project Manager Business Analyst Data Engineer Hadoop Engineer System Administrator Component Developer Integration Architect Statistician Identify data sources Development Iterations Install and configure BDD Register data with Hive Explore key data sets Transform key data sets Functional testing Triage gaps from functional testing Build dashboards Performance testing Deploy product Ongoing Support Ingest new data sources Maintain environments Write customized transformations 42

Agenda 1 Positioneren van data discovery 2. Overzichtspresentatie van stappen bij het Big Data Discovery tool 3. Voorbereiding: Bespreken van de mogelijke bewerkingen 4. Demo van het BDD Tool en bespreking van de fasen van bewerkingen. Find, Explore, Transform, Discover en Publish 5. Bespreking van de verschillende rollen binnen een project 6. Installatie van het Discovery tool 7. Hoe er snel mee aan de slag te gaan >> Zie Link

Installation http://docs.oracle.com/cd/e64107_01/bigdata.doc/install_deploy_bdd/toc.htm#about%20this%20guide Pre-requisites - Cloudera Distribution for Hadoop 5.3.x-5.4.x - Hortonworks Data Platform 2.2.4-2.3 - Hadoop Yarn, Spark, Hive, Zookeeper, Download Software from edelivery.oracle.com Copy to Machine in directory, Rename, Unzip Update Configuration File (java home, ports, yarn location, ) Run Orchestration Script Oracle Confidential Internal 44

Agenda 1 Positioneren van data discovery 2. Overzichtspresentatie van stappen bij het Big Data Discovery tool 3. Voorbereiding: Bespreken van de mogelijke bewerkingen 4. Demo van het BDD Tool en bespreking van de fasen van bewerkingen. Find, Explore, Transform, Discover en Publish 5. Bespreking van de verschillende rollen binnen een project 6. Installatie van het Discovery tool 7. Hoe er snel mee aan de slag te gaan >> Zie Link

OVM BDALite 4.2.1

Attention: Settings >12500 MB

Automatically Starts after check

[oracle@bigdatalite ~]$ cd /u04/oracle/middleware/bdd/bdd_manager/bin/ [oracle@bigdatalite bin]$./bdd-admin.sh start Enter the Weblogic Server Administrator username [default=weblogic]: weblogic Enter the Weblogic Server Administrator password: welcome1

http://192.168.56.101:9003/bdd/ admin@oracle.com/welcome1