Data First Framework. How to Build Your Enterprise Data Hub. Luis Campos Big Data Solutions Director Oracle Europe, Middle East and Africa



Similar documents
Oracle Big Data Discovery Unlock Potential in Big Data Reservoir

Safe Harbor Statement

Oracle Big Data Discovery The Visual Face of Hadoop

Safe Harbor Statement

MT Search Elastic Search for Magento

Implementation of Big Data and Analytics Projects with Big Data Discovery and BICS March 2015

Safe Harbor Statement

Remote Desktop Services Guide

Welkom! Copyright 2014 Oracle and/or its affiliates. All rights reserved.

Oracle Big Data Discovery (BDD) Hadoop Visualization

PRICE LIST. ALPHA TRANSLATION AGENCY

Languages Supported. SpeechGear s products are being used to remove communications barriers throughout the world.

CAPTURING THE VALUE OF UNSTRUCTURED DATA: INTRODUCTION TO TEXT MINING

Xerox Easy Translator Service User Guide

Tel: Fax: P.O. Box: 22392, Dubai - UAE info@communicationdubai.com comm123@emirates.net.ae

Quality Data for Your Information Infrastructure

Actuate Business Intelligence and Reporting Tools (BIRT)

SAP BusinessObjects Edge BI, Standard Package Preferred Business Intelligence Choice for Growing Companies

Who We Are. Services We Offer

EMC SourceOne. Products Compatibility Guide REV 54

RECENSEO Quick Reference

IBM Content Analytics with Enterprise Search, Version 3.0

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

Microsoft SharePoint Workspace 2010Product Guide

Cyclope Internet Filtering Proxy. - User Guide -

Yandex.Translate API Developer's guide

We Answer To All Your Localization Needs!

Consumption of OData Services of Open Items Analytics Dashboard using SAP Predictive Analysis

Oracle BI Roadmap & Visual Analyzer Ljiljana Perica, Oracle Business Solution Leader Ljiljana.perica@oracle.com

Table 1: TSQM Version 1.4 Available Translations

Media labels and their contents

INTERC O MBASE. Global Language Solution

Personal Archive User Guide

SAP BusinessObjects EDGE BI WITH DATA MANAGEMENT CENTRALIZE DATA QUALITY FUNCTIONALITY. SAP Solutions for Small Businesses and Midsize Companies

LANGUAGE CONNECTIONS YOUR LINGUISTIC GATEWAY

SAP BusinessObjects Edge BI, Preferred Business Intelligence. SAP Solutions for Small Business and Midsize Companies

Interactive product brochure :: Nina TM Mobile: The Virtual Assistant for Mobile Customer Service Apps

Translution Price List GBP

placing people first SALARY REPORT Summary of 2014 Bratislava

Licensing and Pricing Guide

SWOT Assessment: BMC Remedy v9

We Answer All Your Localization Needs!

SAP BusinessObjects Edge BI. The Preferred Choice for Growing Companies. SAP Solutions for Small Businesses and Midsize Companies

Oracle Taleo Enterprise Mobile for Talent Management Cloud Service Administration Guide

Live Office. Personal Archive User Guide

Web Conferencing Comparison Guide

Oracle Analytics A New Day. Nick Whitehead Senior Director, Oracle Business Analytics, EMEA

Microsoft SharePoint Workspace 2010 Product Guide

Reference Guide: Approved Vendors for Translation and In-Person Interpretation Services

Poliscript Installation Guide

MicroStrategy Course Catalog

Microsoft Dynamics CRM 2016 On-Premises. Volume Licensing and Pricing Guide

Linking the world through professional language services

Financial Reporting Comparison Matrix

Sisense. Product Highlights.

Software Requirements Specification. PDF Split and Merge. for. Requirements for Version Prepared by Ploutarchos Spyridonos, AUTH

Cross-Language Instant Messaging with Automatic Translation

Internet sites for machine translation available language-pairs ** Part 1 direct translation sites

Big Data Open Source Stack vs. Traditional Stack for BI and Analytics

Survey of University of Michigan Graduate-level Area Studies Alumni/ae & FLAS Recipients from : Selected Findings

Cisco Unified Presence Server 1.0

Deploy. Friction-free self-service BI solutions for everyone Scalable analytics on a modern architecture

Microsoft stores badge guidelines. February 2016

ivms-4500 HD (Android) Mobile Client Software User Manual (V3.4)

A global leader in document translations

Contents. BMC Atrium Core Compatibility Matrix

Cisco Unified IP Phone CP-6961 VoIP -puhelin

ivms-4500 HD (ios) Mobile Client Software User Manual (V3.4)

RESEARCH ASSISTANCE. The Portal is also accessible to the general public but restricted to the free case law databases.

Data Governance in the Hadoop Data Lake. Michael Lang May 2015

Professional. Accurate. Fast.

About CRC? What is Link?

AccuRead OCR. Administrator's Guide

Are You Big Data Ready?

Licensing and Pricing Guide

Licensing and Pricing Guide

Bringing Big Data to People

Infor M3 Report Manager. Solution Consultant

2011 Census: Language

Big Data Discovery: Five Easy Steps to Value

Voice Mail. Service & Operations

Post Jobs Technical Integration

PrinterOn Mobile Applications for ios and Android

Microsoft Office 2010 via Windows 7 (Word, Excel, Access, One Note, Outlook, PowerPoint and Publisher) Microsoft Exchange 2007, Visio, Project.

Helping Companies with Globalization

GET YOUR START MENU BACK IN MICROSOFT WINDOWS SERVER 2012

Helping Companies with Globalization

Big Data Analytics Nokia

HDP Hadoop From concept to deployment.

Formatting Custom List Information

webcertain Recruitment pack Ceri Wright [Pick the date]

Service Updates and Enhancements

PrinterOn Mobile Print Application Overview and User Guide

Licensing and Pricing Guide

Mantis: Quick Overview

2015 Population Office figures for October to December and year to date

SIMPLIFYING BIG DATA Real- &me, interac&ve data analy&cs pla4orm for Hadoop NFLABS

USER GUIDE: Trading Central Indicator for the MT4 platform

Software Requirements Specification. KeePass Password Safe. for. Requirements for Version Prepared by Elia Kouzari

Decoding the Big Data Deluge a Virtual Approach. Dan Luongo, Global Lead, Field Solution Engineering Data Virtualization Business Unit, Cisco

Transcription:

Data First Framework How to Build Your Enterprise Data Hub Luis Campos Big Data Solutions Director Oracle Europe, Middle East and Africa @luigicampos June 2014 Copyright 2015 Oracle and/or its affiliates. All rights reserved. No DBAs were hurt in the making of this presentation

Why we measure things? Ultimately to convert Known Unknowns! Copyright 2015 Oracle and/or its affiliates. All rights reserved.

Oracle Big Data Discovery The Visual Face of Hadoop The Hidden Face of Spark, and Dgraph, and Weblogic, Oozie, HUE Luis Campos Director, Big Data Solutions EMEA #BudapestData @luigicampos @oraclebigdata Budapest Data Forum June 2015

Who is the Data Scientist? Give me the data Give me the computing power I will show you the FUTURE! Copyright 2015 Oracle and/or its affiliates. All rights reserved.

Data Science Limits and Promisses Copyright 2015 Oracle and/or its affiliates. All rights reserved.

Data Scientist Skillset (Unicorn profile) Copyright 2015 Oracle and/or its affiliates. All rights reserved.

Data Engineer + Business Analyst + Data Scientist You need them all Data Engineer DBA, ETL, etc. Representation Reporting and summarization Data Scientist Extrapolation Movement Business Analyst Interpretation Prescription Copyright 2015 Oracle and/or its affiliates. All rights reserved. Oracle Confidential Internal 7

Storyboard Terrorboard The Arrival of a New Data Source Load data from new data source into Hadoop Extracting new insight from newly added data set proves to be almost impossible due to data quality issues. Data is very inconsistent, inaccurate, and not complete. Data needs to be cleansed and as such data cleansing rules need to be set Go through many transformevaluate iterations with product marketing until data is in the desirable format. Too much time is spent on manual data wrangling tasks. Focus is directed away from generating valuable insights for the business. By the time insights can be extracted, product marketing have already moved on to the next problem. Copyright 2015 Oracle and/or its affiliates. All rights reserved. 10

Introducing Oracle Big Data Discovery 1.0 Explore Analyze Discover Transformation Augment Oracle Confidential Business Analytics Product Group

Oracle Big Data Discovery. The Visual Face of Hadoop find explore transform discover share Copyright 2015 Oracle and/or its affiliates. All rights reserved. 12

Oracle Big Data Discovery. The Visual Face of Hadoop Projects Data Sets Atributes Scratchpad find explore transform discover share Copyright 2015 Oracle and/or its affiliates. All rights reserved. 13

Catalog Projects are comprised of Data Sets Search and guided navigation for ease of use See data set summaries, user annotation and recommendations Load data to Hadoop via selfservice Copyright 2015 Oracle and/or its affiliates. All rights reserved. 14

Explore Visualize all attributes by type Sort attributes by name, information potential*, relation Assess attribute statistics, data quality and outliers Use scratch pad to uncover correlations between attributes * Shannon Entropy based algorithm Copyright 2015 Oracle and/or its affiliates. All rights reserved. 15

Scratchpad Cool and Addictive! Explore Graphic type changes as additional attributes are added Autoselects best visualization Offers next best graphics option(s) Copyright 2015 Oracle and/or its affiliates. All rights reserved. Oracle Confidential Internal 16

Transform Intuitive, user driven data wrangling Extensive library of powerful data transformations and enrichments Preview results, undo, commit and replay transforms Test on sample data then apply to full data set in Hadoop Copyright 2015 Oracle and/or its affiliates. All rights reserved. 17

Oracle Big Data Discovery. The Visual Face of Hadoop Unlock big data for everyone find explore transform discover share Copyright 2015 Oracle and/or its affiliates. All rights reserved. 18

Discovery Dashboard creation Control over Layout Filtering behavior Metrics Formatting controls At Project level At Component level D3 Charts Copyright 2015 Oracle and/or its affiliates. All rights reserved. Oracle Confidential Internal

Discover Join and blend data for deeper perspectives Compose project pages via drag and drop Use powerful search and guided navigation to ask questions See new patterns in rich, interactive data visualizations Copyright 2015 Oracle and/or its affiliates. All rights reserved. 20

Share Share projects, bookmarks and snapshots with others Build galleries and tell big data stories Collaborate and iterate as a team Publish blended data to HDFS for leverage in other tools Copyright 2015 Oracle and/or its affiliates. All rights reserved. 21

Now let s dive in

The Hadoop Ecosystem Standard Hadoop Node Hadoop Analytic & Data Processing Tools Spark Map Reduce Sqoop MLlib R-on-Hadoop Hive Hadoop Management Tools HCatalog Oozie (Workflow) YARN Zookeeper HDFS Copyright 2015 Oracle and/or its affiliates. All rights reserved.

Big Data Discovery In Hadoop Hadoop Node Hadoop Analytic & Data Processing BDD Data Processing BDD Node BDD Server Components Hadoop Management Tools HDFS Indexing & Transformation of Data Management Visual Tool Indexing Copyright 2015 Oracle and/or its affiliates. All rights reserved.

Requests Big Data Discovery In Detail Hadoop Node Hadoop Analytic & Data Processing Hadoop Management Tools HDFS Data Processing (Spark) Dgraph HDFS Agent Sync & Transformations Self Service Load Transformations BDD Node Data Processing CLI DP Workflows (Oozie) Hive Table Detector Studio Visual Interface (J2EE) DP Workflows (Oozie) Dgraph Gateway (J2EE) Caching + Business Logic Dgraph Instance(s) Indexing EM Plug-in EM Agent Note that although a BDD data set can be deleted by a Studio user, the Data Processing software can never delete a Hive table. Therefore, it is up to the Hive administrator to delete obsolete Hive tables. Copyright 2015 Oracle and/or its affiliates. All rights reserved.

Data Ingest 2 Methods Self Service Upload via BDD Studio The preferred method for the Business Analyst Command Line Interface (CLI) The preferred method for IT / Data Engineer / Data Scientist / Anyone who loves CLI s Remember: BDD does not hold data, only index and metadata! Copyright 2015 Oracle and/or its affiliates. All rights reserved.

Data Ingest - Personal Data Upload Big Data Discovery supports personal data upload, in a variety of formats Flat Files A user can upload a personal file in the following formats: Delimited (CSV, tab, pipe, etc) Excel (XLS, XLSX) Copyright 2015 Oracle and/or its affiliates. All rights reserved.

Deep Dive... Nahh, we don t have time Just a command line example for the amusement of hardline professionals

Hive Table METADATA Creation via SQL Copyright 2015 Oracle and/or its affiliates. All rights reserved.

BDD Command Line Data Set Creation (manual) (only needed if the BDD Listener* is not working) Run manually or via cron job BDD installation orchestration script defaults to cron job Needs configuration to run correctly (define paths to cluster, Dgraphs, etc.) Can be run on individual, group (whitelist/blacklist), Hive database, or all Hive tables Invoking the BDD Hive Table Detector* ( invoked within the DP CLI script) Keeps Hive database/tables in sync with BDD data sets Manual Data Set Creation Copyright 2015 Oracle and/or its affiliates. All rights reserved.

Copyright 2015 Oracle and/or its affiliates. All rights reserved.

Job Monitoring & Debugging Copyright 2015 Oracle and/or its affiliates. All rights reserved.

Language Support Studio Search English, Chinese Simplified, German, Japanese, Korean, Portuguese Brazilian, Spanish Arabic, Basque, Belarusian, Bosnian, Bulgarian, Catalan, Chinese (Simplified), Chinese (Traditional), Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Greek, Hebrew, Hungarian, Indonesian, Italian, Japanese, Korean, Latvian Lithuanian, Macedonian, Malay, Norwegian Bokmål, Norwegian Nynorsk, Persian, Polish, Portuguese, Portuguese (Brazilian), Romanian, Russian, Serbian (Cyrillic), Serbian (Latin), Slovak, Slovenian, Spanish, Swedish, Thai, Turkish, Ukrainian, Valencian, Vietnamese Text Enrichment Entity Extraction Whitelist Tagger Sentiment Language Detection English, French, German, Spanish, Italian, Portuguese (Brazilian) English English, Danish, German, Spanish, French, Italian, Japanese, Korean, Chinese (Simplified), Chinese (Traditional), Portuguese English, French, German, Spanish, Italian, Portuguese (Brazilian) All OLT 2.1 languages (50) Copyright 2014 Oracle and/or its affiliates. All rights reserved. Oracle Confidential Internal 41

Authentication By default BDD owns authentication, based on users and assigned roles Users and roles may be imported via LDAP/Active Directory Standard User Roles define a users rights to both data sets and projects. Additionally, global roles are defined that control a user s read/write access to Hadoop. Single Sign-On is supported Preferred method is via Oracle Access Manager, though other options, such as OpenSSO and SiteMinder are also supported Will bypass the login portlet SSO User SSO Portal Support also exists for auto login hooks Copyright 2014 Oracle and/or its affiliates. All rights reserved.

Oracle BDD 1.0 What is it for? Intuitive visual interfaces for the entire Hadoop analytics process Data transformation and enrichment at scale Shares data with Oracle and Hadoop ecosystems (via Big Data SQL) What is it not for? Modeling or advanced analytics Reporting Other data engines rather than Hadoop/Spark (eg. NoSQL) Copyright 2014 Oracle and/or its affiliates. All rights reserved. Oracle Confidential Internal/Restricted/Highly Restricted 43

Big Data Discovery Cloud Service (coming soon) Copyright 2014 Oracle and/or its affiliates. All rights reserved. Oracle Confidential Highly Restricted 44

BDD Easy Deployment Partner http://bigdatadisco.branchbird.com/bdd/web/home/index Copyright 2014 Oracle and/or its affiliates. All rights reserved.

Enjoy the rest of the conference! Thank You @luigicampos www.oracle.com/bigdatadiscovery