Oracle Big Data for Dummies Sai Janakiram Penumuru WW Product Expert Cloud Platforms
The Father of Microbiology First Microbiologist Antonie Philips van Leeuwenhoek 2
Sai Janakiram Penumuru o o o o o o Twelve years in Oracle DBA / Oracle Apps DBA / Cloud Technologist/ Oracle ACE Current Position: WW Product Expert, Cloud Platform - Oracle in hp Co-Fonder & Vice President - All India Oracle Users Group (AIOUG) Oracle Database 12c Beta Tester Oracle VM SIG Leader www.oraclevmsig.org 3 Blog: www.oadba.com; www.oracle12c.info
Agenda A New Style of IT What is Big Data? Defining Big Data: Market Drivers and Trends Why Big Data Now? Overview of Oracle Big Data Appliance Oracle Big Data Implementation Steps Oracle Integrated Software Solution Demo The Call to Action (Next Steps)
A new era of accelerated innovation Forever changing how consumers and businesses interact, enabling new opportunities 2013 Every 60 seconds 98,000+ tweets 695,000 status updates 11million instant messages 698,445 Google searches 168 million+ emails sent 1,820TB of data created Growing Internet of Things (IoT) By 2020 Devices DATA Mobile Apps 30 Billion (1) 40 Trillion GB (2) 10 Million (3) 217 new mobile web users Pervasive Connectivity Smart Device Expansion Explosion of Information for 8 Billion (4) A new style of IT required for IoT solutions 5 (1) IDC Directions 2013: Why the Datacenter of the Future Will Leverage a Converged Infrastructure, March 2013, Matt Eastwood ; (2) & (3) IDC Predictions 2012: Competing for 2020, Document 231720, December 2011, Frank Gens; (4) http://en.wikipedia.org
What is Big Data?
What is Big Data? : Regular structured data Over a longer period Faster analysis 7
What is Big Data? : Micro-transactions 20 TB 2 2.5 28,537 365 20 terabytes of information per engine every hour twin-engine Boeing 737 Average duration for US flights in hours # of commercial flights in the sky in the United States on any given day days in a year 1,041,600,500 TB 8
What is Big Data? : Meaning from Unstructured Data Images Video Audio Social media Email Documents 9
The Lost Opportunities of Big Data 23% % of data that would be potentially useful IF tagged and analyzed 3% actually being tagged for Big Data value 0.5% % of the digital universe that is actually being tagged, analyzed and leveraged 10 Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Big Data promises Productivity and Profit increases +5% +6% Productivity Profitability 11 Big Data: The Management Revolution, Harvard Business Review, October 2012, Andrew McAfee & Erik Brynjolfsson.
Defining Big Data: Market Drivers and Trends
Big Data defined Big data is high-volume, -velocity and variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making Velocity Variety Big Data Volume Value Information Sources $ CRM, SCM, ERP Video IT Ops Email Transactional Data Mobile Audio Texts Social Media Search Images Big Data is no longer just a buzzword ¹Source: Gartner, The Importance of 'Big Data': A Definition, June 2012 13
Information from the Internet of Things: We have gone beyond the decimal system Today data scientist uses Yottabytes to describe how much government data the NSA or FBI have on people altogether. In the near future, Brontobyte will be the measurement to describe the type of sensor data that will be generated from the IoT 14 Brontobyte This will be our digital universe tomorrow Yottabyte This is our digital universe today = 250 trillion of DVDs Exabyte 1 EB of data is created on the internet each day = 250 million DVDs worth of information. The proposed Square Kilometer Array telescope will generated an EB of data per day 10 27 10 24 Terabyte 500TB of new data per day are ingested in Facebook databases 10 18 Megabyte 10 12 10 6 Geopbyte 10 30 This will take us beyond our decimal system 10 21 Zettabyte 1.3 ZB of network traffic by 2016 10 15 Petabyte The CERN Large Hadron Collider generates 1PB per second 10 9 Gigabyte
Second half of the chessboard The wheat and chessboard problem On the entire chessboard there would be 2 64 1 = 18,446,744,073,709,551,615 grains of rice Weighing 461,168,602,000 metric tons Heap of rice larger than Mount Everest Around 1,000 times the global production of rice in 2010 (464,000,000 metric tons). http://en.wikipedia.org/wiki/wheat_and_chessboard_problem 15
Big data is derived from a variety of sources Variety Big Data Velocity Volume 16
A day in the life of Big Data An intelligent end-to-end approach delivers the right information to the right person at the right time Executive Dashboards Enterprise Search Customer Interaction Predictive Analytics Web Engagement Variety Velocity Volume Social Media Video Audio Email MGD Texts Transactional Data CRM (Sales) Transactional Operational Strategic Web ERP (Procurement) Supply Chain (Ops) Word, Excel Logs Clickstream Data HR Images Machine Generated Data 17
Why Big Data Now?
Command of information drives increased business performance Right information. Right person. Right time. Insight What is happening now? Better, Right Time Decisions Decisions Active information In Flight information Inactive information Foresight What will happen? Hindsight what happened? 19
Big Data opportunities across industries and use cases Innovative analytic use cases are cutting across structured, unstructured and semi structured data Finance Government Telecom Manufacturing Energy Healthcare Fraud detection Anti-money laundering Risk management Law enforcement Counter terrorism Traffic flow optimization Broadcast monitoring Churn prevention Advertising optimization Supply chain optimization Defect tracking RFID Correlation Warranty management Weather forecasting Natural resource exploration Drug development Scientific research Evidence based medicine Healthcare outcomes analysis Sentiment analysis Social CRM / network analysis Churn mitigation Brand monitoring Cross and Up sell Loyalty & promotion analysis Web application optimization Horizontal use cases Marketing campaign optimization Brand management Social media analytics Pricing optimization Internal risk assessment Customer behavior analysis Revenue assurance Logistics optimization Clickstream analysis Influencer analysis IT infrastructure analysis Legal discovery Equipment monitoring Enterprise search 20 Sources: IDC: 2012 Worldwide Big Data Technology and Services Forecast: 2011-2015, Gartner: 2012 Big Data Drives Rapid Changes in Infrastructure and $232 Billion in IT Spending Through 2016
Overview of Oracle Big Data Appliance
Oracle Big Data Appliance 22
BDA Full Rack: Hadoop + Oracle NoSQL Database Node1 Node2 Node3 Node4 Node5 Node6 Node7 Node8 Node9 Node10 Node11 Node12 Node13 Node14 Node15 Node16 Node17 Node18 First NameNode, DataNode, ZooKeeper,failover controller, Balancer, Puppet master & Agent, NoSQL Second NameNode,DataNode,ZooKeeper,failover controller, MySQL backup Server, Puppet Agent, NoSQL Admin Job Tracker, DataNode, ZooKeeper, CMserver, ODI agent, MySQL primary server, Hue, Hive, Beeswax,Puppet Agent, NoSQL DataNode, TaskTracker, Cloudera Mgr Agents,Puppet Agents, NoSQL DataNode, TaskTracker, Cloudera Mgr Agents,Puppet Agents, NoSQL DataNode, TaskTracker, Cloudera Mgr Agents,Puppet Agents, NoSQL DataNode, TaskTracker, Cloudera Mgr Agents,Puppet Agents, NoSQL DataNode, TaskTracker, Cloudera Mgr Agents,Puppet Agents, NoSQL DataNode, TaskTracker, Cloudera Mgr Agents,Puppet Agents, NoSQL DataNode, TaskTracker, Cloudera Mgr Agents,Puppet Agents, NoSQL DataNode, TaskTracker, Cloudera Mgr Agents,Puppet Agents, NoSQL DataNode, TaskTracker, Cloudera Mgr Agents,Puppet Agents, NoSQL DataNode, TaskTracker, Cloudera Mgr Agents,Puppet Agents, NoSQL DataNode, TaskTracker, Cloudera Mgr Agents,Puppet Agents, NoSQL DataNode, TaskTracker, Cloudera Mgr Agents,Puppet Agents, NoSQL DataNode, TaskTracker, Cloudera Mgr Agents,Puppet Agents, NoSQL DataNode, TaskTracker, Cloudera Mgr Agents,Puppet Agents, NoSQL DataNode, TaskTracker, Cloudera Mgr Agents,Puppet Agents, NoSQL 23
Software Layout 24 * Optional Configuration
Oracle Big Data Implementation Steps
Implementation Step -1 Acquire Organize Analyze Visualize Collect tweets into HDFS using Twitter API Specify search terms Terms may be Categorized Bulk collect or stream 26
Implementation Step - 2 Acquire Organize Analyze Visualize MapReduce jobs compute sentiment Summarize by airline, category, time and user Load into Oracle Database using Oracle SQL Connector for HDFS Map Twitter ID to customer 27
Implementation Step - 3 Acquire Organize Analyze Visualize Combine other customer data sets to provide a deeper level of analysis, such as social and economic importance. 28
Implementation Step - 4 Acquire Organize Analyze Visualize Visualize social ad economic importance by using Oracle Database analytics and OBIEE dashboards to drive decision making 29
Question - 2 How many nodes in BDA Full Rack 1. 4 Nodes 2. 6 Nodes 3. 16 Nodes 4. 18 Nodes 30
Oracle Integrated Software Solution
Oracle Big Data Appliance Software Overview Shows the relationships among the tools and identifies the tasks that they perform 32
Oracle Engineered Systems Data Variety Unstructured Schema-less Big Data Appliance Oracle BI Schema Exadata 33 Information Density Acquire Organize Analyze
Oracle Big Data Appliance Usage Model Oracle Big Data Appliance Oracle Exadata Oracle Exalytics InfiniBand InfiniBand Stream Acquire Organize Analyze & Visualize 34
Demo - How to get Oracle Big Data Lite VM - Start/Stop Services - Software Walk through
Oracle Big Data Lite Virtual Machine Prepare your host system Minimum 8GB of real memory ~38GB disk space needed to download and install (This includes the 11GB zipped.ova file and ~26GB of the imported image) Download and install 7Zip. Download and install Oracle Virtual Box (4.3 and above is supported). Download Big Data Lite files from OTN 36 http://www.oracle.com/technetwork/database/bigdata-appliance/oracle-bigdatalite-2104726.html
Installation Steps Prepare your host system After all zip files are downloaded, extract them with 7zip. Start Oracle Virtual Box Manager and Import the Appliance 37
What s Next
Harness the power of Big Data the time is now Requires a cultural shift, enterprise-wide Build an information driven corporate culture By using information effectively to understand and align with your business needs and preferences of your customers is the key to creating a competitive advantage in today's customer-empowered environment. 39
oracle.com/bigdata twitter.com/oracleanalytics sai.hp@hp.com www.facebook.com/oraclebusinessanalytics www.linkedin.com/groups/bigdata-oracle-4124374 blogs.oracle.com/bigdata www.youtube.com/user/oracle/videos?query=big+data