Welkom!
WIE? Bestuurslid OGh met BI / WA ervaring Bepalen activiteiten van de vereniging Deelname in organisatie commite van 1 of meerdere events Faciliteren van de SIG s Redactie van OGh-Visie Onderhouden van contacten met leden 3
Agenda 1 Positioneren van data discovery 2. Overzichtspresentatie van stappen bij het Big Data Discovery tool 3. Voorbereiding: Bespreken van de mogelijke bewerkingen 4. Demo van het BDD Tool en bespreking van de fasen van bewerkingen. Find, Explore, Transform, Discover en Publish 5. Bespreking van de verschillende rollen binnen een project 6. Installatie van het Discovery tool 7. Hoe er snel mee aan de slag te gaan >> Zie Link Find Explore Discover Transform 4
Information Management platform Reference Architecture Actionable Events Actionable Insights Actionable Information Structured Enterprise Data Data Streams Event Engine Data Reservoir Data Factory Enterprise Data Business Analytics Other Data Execution Innovation Events & Data Discovery Lab Discovery Output 11
Oracle Big Data Discovery Wim Villano Oracle
Data Reservoir is growing Emerging Sources Data Reservoir 25
Not Easy to Get Analytic Value at Fast Enough Pace Data Uncertainty Not familiar and overwhelming Potential value not obvious Requires significant manipulation Tool Complexity Early Hadoop tools only for experts Existing BI tools not designed for Hadoop Emerging solutions lack broad capabilities 80% effort typically spent on evaluating and preparing data Overly dependent on scarce and highly skilled resources
Requires a Fundamentally New Approach A single intuitive and visual user interface, to... find explore transform discover share find and explore big data to understand its potential quickly transform and enrich it to make it better unlock big data for anyone to discover and share new value
Oracle Big Data Discovery. The Visual Face of Hadoop find explore transform discover share
Oracle Big Data Discovery See the Potential in Big Data, Quickly Make it Better and Unlock Value for Everyone Business Benefits Get value faster. Rapidly turn raw data into actionable insights, leveraged across the enterprise Democratize value from Big Data. Increase the size, diversify the skills, and improve the efficiency of Big Data teams Technical Benefits Destroy existing technical barriers. Run natively on Hadoop cluster for maximum scalability and performance Publish, secure and leverage. Integrate with Hadoop open standards and leverage the unified Oracle big data ecosystem
The Hadoop Ecosystem Standard Hadoop Node Hadoop Analytic & Data Processing Tools Spark Map Reduce Sqoop MLlib R-on-Hadoop Hive Hadoop Management Tools HCatalog Oozie YARN Zookeeper HDFS
Big Data Discovery In Hadoop Hadoop Node Hadoop Analytic & Data Processing BDD Data Processing BDD Node Studio The visual face of Hadoop Hadoop Management Tools HDFS Provisioning & Transformation of Data Dgraph Gateway Hybrid search-analytics database
Agenda 1 Positioneren van data discovery 2. Overzichtspresentatie van stappen bij het Big Data Discovery tool 3. Voorbereiding: Bespreken van de mogelijke bewerkingen 4. Demo van het BDD Tool en bespreking van de fasen van bewerkingen. Find, Explore, Transform, Discover en Publish 5. Bespreking van de verschillende rollen binnen een project 6. Installatie van het Discovery tool 7. Hoe er snel mee aan de slag te gaan >> Zie Link
Data Stored in Hadoop Example: Files with JSON data Hadoop/NoSQL Ecosystem {"custid":1185972,"movieid":null,"genreid":null,"time":"2012-07-01:00:00:07","recommended":null,"activity":8} {"custid":1354924,"movieid":1948,"genreid":9,"time":"2012-07-01:00:00:22","recommended":"n","activity":7} {"custid":1083711,"movieid":null,"genreid":null,"time":"2012-07-01:00:00:26","recommended":null,"activity":9} {"custid":1234182,"movieid":11547,"genreid":44,"time":"2012-07-01:00:00:32","recommended":"y","activity":7} {"custid":1010220,"movieid":11547,"genreid":44,"time":"2012-07-01:00:00:42","recommended":"y","activity":6} {"custid":1143971,"movieid":null,"genreid":null,"time":"2012-07-01:00:00:43","recommended":null,"activity":8} {"custid":1253676,"movieid":null,"genreid":null,"time":"2012-07-01:00:00:50","recommended":null,"activity":9} {"custid":1351777,"movieid":608,"genreid":6,"time":"2012-07-01:00:01:03","recommended":"n","activity":7} {"custid":1143971,"movieid":null,"genreid":null,"time":"2012-07-01:00:01:07","recommended":null,"activity":9} {"custid":1363545,"movieid":27205,"genreid":9,"time":"2012-07-01:00:01:18","recommended":"y","activity":7} {"custid":1067283,"movieid":1124,"genreid":9,"time":"2012-07-01:00:01:26","recommended":"y","activity":7} {"custid":1126174,"movieid":16309,"genreid":9,"time":"2012-07-01:00:01:35","recommended":"n","activity":7} {"custid":1234182,"movieid":11547,"genreid":44,"time":"2012-07-01:00:01:39","recommended":"y","activity":7}} {"custid":1346299,"movieid":424,"genreid":1,"time":"2012-07-01:00:05:02","recommended":"y","activity":4} 33
Hadoop and Databases Databases Schema-on-Write Hadoop Schema-on-Read Schema must be created before any data can be loaded An explicit load operation has to take place which transforms data to DB internal structure New columns must be added explicitly before new data for such columns can be loaded into the database Data is simply copied to the file store, no transformation is needed A SerDe (Serializer/Deserlizer) is applied during read time to extract the required columns (late binding) New data can start flowing anytime and will appear retroactively once the SerDe is updated to parse it 1) Reads are Fast 2) Standards and Governance PROS 1) Loads are Fast 2) Flexibility and Agility
Hive Metastore SQL-on-Hadoop Engines Share Metadata, not MapReduce SparkSQL Hive Impala Hive Metastore Table Definitions: movieapp_log_json Tweets avro_log Metastore maps DDL to Java access classes 35
Prepare Discovery Metastore Data Processing Discovery (Potential) Sampling Profiling Enrichment Big Data Discovery Command Line Interface (CLI) The preferred method for IT / Data Engineer / Data Scientist / Anyone who loves CLI s Self Service Upload via BDD Studio The preferred method for the Business Analyst 36
Command Line Interface Claims File Define Hive Table (if not exist) Run Data Processing Result BDD Script Location: Run Script:
Self Service Apache Log File
Agenda 1 Positioneren van data discovery 2. Overzichtspresentatie van stappen bij het Big Data Discovery tool 3. Voorbereiding: Bespreken van de mogelijke bewerkingen 4. Demo van het BDD Tool en bespreking van de fasen van bewerkingen. Find, Explore, Transform, Discover en Publish 5. Bespreking van de verschillende rollen binnen een project 6. Installatie van het Discovery tool 7. Hoe er snel mee aan de slag te gaan >> Zie Link 39
Agenda 1 Positioneren van data discovery 2. Overzichtspresentatie van stappen bij het Big Data Discovery tool 3. Voorbereiding: Bespreken van de mogelijke bewerkingen 4. Demo van het BDD Tool en bespreking van de fasen van bewerkingen. Find, Explore, Transform, Discover en Publish 5. Bespreking van de verschillende rollen binnen een project 6. Installatie van het Discovery tool 7. Hoe er snel mee aan de slag te gaan >> Zie Link 40
BDD Project Roles Role Skills and Background Level of participation During Project Level of participation Ongoing Other Notes Business Owner - Deep business knowledge - Aware of business success criteria - Up to to 1 week for Design/Detailed requriements & Deployment - Status and iteration reviews during development As needed for providing feeding feedback or additional planning Key Project Manager - Project delivery skills - Knowledge of customer delivery standards - Part time for duration of project (1-3 days) - None Typically 1 BDD Delivery Manager and 1 Customer Delivery Manager Project Business Analyst - Understanding of key business metrics - Experience configuring and interpreting charts - Ability to spot data quality problems - Basic statistical knowledge helpful - Near full time participating in design, creation of metrics, charts, and reports (typically 2-4 weeks) - 1/2 time participation during testing and rollout - Up to 4 hours/week reviewing site usage and creating / updating metrics based on feedback Roles Data Engineer - Knowledge of data sources and extracts - Experience building ETL pipelines - Groovy experience - Full time for initial ID of sources, ingest, and transformations (2-4 weeks) '- 1/2 time during testing and roll out - Up to 1/4 time writing custom transformations or assisting with advanced transformations Hadoop Engineer - Experience with HDFS and Hive (in particular, registering data with Hive) - Can programmatically manipulate data - Knowledge of Apache Spark helpful -Full time during the project (installing and configuring product, getting data in HCatalog, performing necessary special transformations) (2-4 weeks) -Up to 1/4 time (getting new data into Hive) - Periodic upgrades to Hadoop components may require 1-2 days System Administrator - Technical infrastructure - Usage auditing - Security management - Full time during deployment activities (typically 2-4 weeks) - Up to 1 hour / week to review logs. - Periodic upgrades to Endeca software may require 1-2 days Component Developer - Portal develoment experience - Hands on Java & Javascript coding skills - CSS/Photoshop for visual styling if needed -Full time during a development of a custom component (typically 1-3 weeks) - None Optional Integration Architect - Specfic point technology experience (ODI, OBIEE, security systems) - Full time during integration activities (varies based on specific requirements) - None Could include moving data into Hadoop Roles Statistician - Predictive statistics, data mining, or machine learning training and experience - Familiarity with a stastistical tool like R - Knowledge of enterprise's practices around predictive model management and deployment - 1/4 time during requirements phase - None 41
Key Roles Optional Roles Activities Phase and Actvities Design and Detailed Requirements Refine requirements Business Owner Project Manager Business Analyst Data Engineer Hadoop Engineer System Administrator Component Developer Integration Architect Statistician Identify data sources Development Iterations Install and configure BDD Register data with Hive Explore key data sets Transform key data sets Functional testing Triage gaps from functional testing Build dashboards Performance testing Deploy product Ongoing Support Ingest new data sources Maintain environments Write customized transformations 42
Agenda 1 Positioneren van data discovery 2. Overzichtspresentatie van stappen bij het Big Data Discovery tool 3. Voorbereiding: Bespreken van de mogelijke bewerkingen 4. Demo van het BDD Tool en bespreking van de fasen van bewerkingen. Find, Explore, Transform, Discover en Publish 5. Bespreking van de verschillende rollen binnen een project 6. Installatie van het Discovery tool 7. Hoe er snel mee aan de slag te gaan >> Zie Link
Installation http://docs.oracle.com/cd/e64107_01/bigdata.doc/install_deploy_bdd/toc.htm#about%20this%20guide Pre-requisites - Cloudera Distribution for Hadoop 5.3.x-5.4.x - Hortonworks Data Platform 2.2.4-2.3 - Hadoop Yarn, Spark, Hive, Zookeeper, Download Software from edelivery.oracle.com Copy to Machine in directory, Rename, Unzip Update Configuration File (java home, ports, yarn location, ) Run Orchestration Script Oracle Confidential Internal 44
Agenda 1 Positioneren van data discovery 2. Overzichtspresentatie van stappen bij het Big Data Discovery tool 3. Voorbereiding: Bespreken van de mogelijke bewerkingen 4. Demo van het BDD Tool en bespreking van de fasen van bewerkingen. Find, Explore, Transform, Discover en Publish 5. Bespreking van de verschillende rollen binnen een project 6. Installatie van het Discovery tool 7. Hoe er snel mee aan de slag te gaan >> Zie Link
OVM BDALite 4.2.1
Attention: Settings >12500 MB
Automatically Starts after check
[oracle@bigdatalite ~]$ cd /u04/oracle/middleware/bdd/bdd_manager/bin/ [oracle@bigdatalite bin]$./bdd-admin.sh start Enter the Weblogic Server Administrator username [default=weblogic]: weblogic Enter the Weblogic Server Administrator password: welcome1
http://192.168.56.101:9003/bdd/ admin@oracle.com/welcome1