Big Data and Analytics in Government Nov 29, 2012 Mark Johnson Director, Engineered Systems Program 2
Agenda What Big Data Is Government Big Data Use Cases Building a Complete Information Solution Conclusion 3
The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle s products remain at the sole discretion of Oracle. 4 Copyright 4 2011 2012, Oracle Oracle Corporation and/or its Proprietary affiliates. All and rights Confidential reserved.
Big Data Summary 5
Big Data Defined Big data: techniques and technologies that enable organizations to effectively and economically analyze all of their data 6
90% Of the World s Data Has Been Created in the Last 2 Years 7
And That Data is Estimated to Grow 50X 50X 2020 By 8
Social Data Generated Every Minute 571 New Websites 695,000 Status updates 510,040 2,000,000 Comments Search Queries 204,166,667 Emails 9
Big Data Buzz Why big data is a big deal InfoWorld 9/1/11 Keeping Afloat in a Sea of 'Big Data ITBusinessEdge 9/6/11 The challenge and opportunity of big data McKinsey Quarterly 5/11 Getting a Handle on Big Data with Hadoop Businessweek-9/7/11 Ten reasons why Big Data will change the travel industry Tnooz -8/15/11 The promise of Big Data Intelligent Utility-8/28/11 10
What is Big Data? 11
What Makes it Big Data? VOLUME Very large quantities of data VELOCITY Extremely fast streams of data VARIETY Wide range of datatype characteristics 12
IS HARD TO FIND VOLUME VELOCITY VARIETY 13
Big Data - Tools Social Data Hadoop (Map/Reduce) 14
A Map/Reduce Pipeline INPUT MAP MAP MAP MAP MAP SHUFFLE /SORT REDUCE REDUCE REDUCE OUTPUT Hurricane (2) Sandy (1) Hurricane (1) Flooding (2) Sandy (1) Water(1) Flooding (1) Flooding (3) Hurricane (3) Sandy (2) Water (1) 15
Big Data Tools Add to Existing Toolsets New ways of processing data we couldn t access before 16
Government Big Data Use Cases 17
Big Data in Public Sector Fraud Prevention Revenue Management Constituent Sentiment Threat Identification Economic Analysis Healthcare Regulatory Compliance, Licensing & Law Enforcement Open Government Maintenance & Utilities 18
Predictive Policing Where is a violent crime likely to occur? Use Case Weather - recent and forecasted Retrieved from the National Weather Service during nightly ETL processes Contact Cards 911 Calls Incidents Arrests Day of Week Date of Month 19
Big Data to detect Sales Tax Fraud Potential revenue losses approach $2.8B/year in one state Use Case Supply Data Credit Info Labor Costs Tax Databases 20
Big Data for Building Audits Targeting over-occupied buildings Use Case Neighborhood Socioeconomic Status Foreclosure Proceedings Tax Delinquency Year of Construction New York City improved audit rates from 13% to over 70% 21
Big Data in Healthcare Find relationship between gene to cancer interaction Use Case Cross-referenced the relationships between 17000 genes and five major cancer types across 20 million medical publication abstracts Cross-referenced genes from 60Million patients and mirna for a simulated 900Million population. Understanding additional layers of the pathways these genes operate in and the drugs that target them is expected to help researchers in their work 22
Policy Implications of Big Data in Government What information can/should the government collect & aggregate? How is that information (and the resultant data) protected? How are errors identified & corrected? How is data collected by private companies accessed? Which government agencies can use the data? Etc 23
Building a Unified Information Management Solution 24
BIG DATA LIFECYCLE NEW REQUIREMENTS DECIDE ACQUIRE New Tools for Acquiring & Organizing Information Easy Integration of New Infrastructure In-memory Processing ANALYZE ORGANIZE Advanced Analytics 25
Footprint in Most Organizations Today Historic Source of Truth Data Warehouse / Data Marts Reporting, Query and Analysis Tools Business Intelligence Tools ERP, CRM & Other Transactional Apps 26
Goal: Make an Informed Recommendation Structured Sources Unstructured Sources Constituent Analytics Constituent History Constituent Behavior Sentiment & Influence Real-Time Recommendations Job Analytics Position Inventory + Channel Impact Job Placement Analytics Search Promotions 27
Using all Data to Understand Constituents Data Warehouse Employment Site Website Logs & Data NoSQL DB 28
Discovering Valuable Data Knowledge Discovery Engine Structured Data Warehouse Unstructured Semi-structured High Volume Distributed File System Website Logs & Data NoSQL DB 29
Information Discovery Engine Overview Searc h Guided Navigatio n Analytics 30
Information Discovery Case Study 31
Information Discovery Case Study 32
Valuable Data Found Now Store it Securely Persistent Data Store for All Data of Value Knowledge Discovery Engine Data Warehouse Discoveries High Volume Distributed File System MapReduce code separates valued data, then sent to via specialized adapters to Data Warehouse Website Logs & Data NoSQL DB 33
Deploy Widely Available Reports & Analytics Persistent Data Store for All Data of Value + In-DB Analytics Knowledge Discovery Engine Data Warehouse BI Tools and Dashboards Enterprise-class for reporting & analysis High Volume Distributed File System MapReduce code separates valued data, then sent to via specialized adapters to Data Warehouse Website Logs & Data NoSQL DB 34
Feed the Recommendation Engine Persistent Data Store for All Data of Value + In-DB Analytics Knowledge Discovery Engine Data Warehouse BI Tools and Dashboards Update Website Recommendations High Volume Distributed File System MapReduce code separates valued data, then sent to via specialized adapters to Data Warehouse Real-Time Analytics and Recommendations Website Logs & Data NoSQL DB 35
Make Well-Tuned Real-Time Recommendations Persistent Data Store for All Data of Value + In-DB Analytics Knowledge Discovery Engine Data Warehouse BI Tools and Dashboards High Volume Distributed File System MapReduce code separates valued data, then sent to via specialized adapters to Data Warehouse Location & User Profile Real-Time Analytics and Recommendations Recommend Website Logs & Data NoSQL DB 36
Summary 37
Comprehensive Data Reference Architecture Information Analysis Data Discovery Statistical Analysis Reporting Data Mining In-DB MapReduce Descriptive Analytics Predictive Analytics (In-Database) Semantic Analysis Dashboards Text Mining & Sentiment Analysis Mobile BI Spatial Information Provisioning Big Data Processing & Discovery Massive Unstructured Data Big Data & Stream Processing Processing Information Discovery Data Conversion Operational Database Data Warehouse Massive Unstructured & Structured Data Access, Conversion, and Storage Data Sources Distributed File Systems NoSQL Faceted Unstructured Data Streams Relational Spatial/Relational Real-Time and/or Batch Information Acquisition (Ingestion) Original Data Sources 38 Documents Multimedia Web & Social Media SMS Machine-Generated
How do I Start? Think Big and Start Small Find a question or requirement that your organization has been wrestling with responding to Establish a clear scope Try for a quick win Use tools to flatten the initial learning curve Setting up a Hadoop cluster is a specialized skill set Scale up as necessary 39
Mark.A.Johnson@Oracle.com 40
41