Integrating a Big Data Platform into Government: Drive Better Decisions for Policy and Program Outcomes John Haddad, Senior Director Product Marketing, Informatica Digital Government Institute s Government Big Data Conference, October 9, Washington, DC
Regulatory Data Climate Data Web Logs Social Data Sensor Data Energy Consumption GPS Insurance Claims EMR Flight Data Network Monitoring
Big Data is a balancing act Keeping the lights on (KOL) Complying with regulations Reducing costs Adopting new technologies Acquiring new analytic skills Increasing agility
How do you get value from Big Data? Agency Goals Analyst Data Scientist Developer Business Prioritize Goals Generate Insights Validate Hypothesis Make Operational Take Action Big Data Supply Chain Acquire & Store Refine & Enrich Explore & Curate Distribute & Manage Big Data Data Management & Analytic Systems Business Value
The Big Data Journey Healthcare Data Warehouse Optimization Managed Data Lake Real-time Operational Intelligence Security Transportation Optimize infrastructure for performance, cost, & scalability A single place to manage the supply and demand of data Proactively respond to threats and opportunities in real-time Treasury Energy IT driven Business driven Public Safety
Data Warehouse Optimization EHR Business Intelligence / Data Warehouse Web & App Log Files EMR Claims Data Reduce IT Costs ERP Batch Load Near Real- Time Increase Op Efficiencies Changed Data Staging Data Integration Data Quality
Managed Data Lake EHR Business Intelligence / Data Warehouse Patient / Provider Master Visualization / Analytics Web & App Log Files EMR Claims Data Master Data Management Reduce IT Costs ERP Healthcare & Patient Forums Social Data / Signals Patient / Provider Mobile Devices Batch Load Real-Time Ingestion Changed Data Staging Sandbox Reservoir Data Integration Data Matching Near Real- Time Pub / Sub Increase Op Efficiencies Improve Fraud Detection Reduce Readmissions RFID, Patient Monitoring Data Quality Data Security Improve Outcomes
Real-time Operational Intelligence EHR Business Intelligence / Data Warehouse Patient / Provider Master Visualization / Analytics Web & App Log Files EMR Claims Data Master Data Management Reduce IT Costs ERP Healthcare & Patient Forums Social Data / Signals Patient / Provider Mobile Devices Batch Load Real-Time Ingestion Changed Data Staging Sandbox Reservoir Data Integration Data Matching Event Based Processing Near Real- Time Pub / Sub Real-Time Delivery Increase Op Efficiencies Improve Fraud Detection Reduce Readmissions RFID, Patient Monitoring Data Quality Data Security Streaming Analytics Improve Outcomes
Cyber Security Business Intelligence / Data Warehouse Person of Interest Master Visualization / Analytics Access Monitors, Honeypots System & Network Monitors, Log Files Master Data Management Reduce IT Costs Social Data / Signals RDBMS, Flat Files OSINT (Security Bulletins, Internet Events) DoD/Intel Security Messages & Alerts Batch Load Real-Time Ingestion Changed Data Staging Sandbox Reservoir Data Integration Data Quality Data Matching Data Security Event Based Processing Streaming Analytics Near Real- Time Pub / Sub Real-Time Delivery Increase Op Efficiencies Stop & Predict Cyber Threats Share Threat Information
Transportation Service Records Business Intelligence / Data Warehouse Person of Interest Master Visualization / Analytics Image & Video Master Data Management Reduce IT Costs GPS Scheduled Routes Weather & Climate Batch Load Real-Time Ingestion Changed Data Near Real- Time Pub / Sub Real-Time Delivery Optimize Routes Reduce Delays & Disruptions Social Data / Signals Sensors & Radar Staging Sandbox Reservoir Data Integration Data Quality Data Matching Data Security Event Based Processing Streaming Analytics Improve Public Safety Reduce Fuel Consumption
Does your data platform support Big Data requirements? DEPLOY your data pipeline (i.e. access, integrate, and prepare data) from pilot to production quickly STAFF projects with affordable and readily available skills (e.g. analytics, ETL, data quality) ADOPT new Big Data technologies (e.g. Hadoop, NoSQL, IoT) without major disruption to your production environment TRUST (i.e. certify, secure, master) your data to make the right decisions faster with minimal risk REAL TIME processing (i.e. ingest, correlate, alert) to proactively respond to business situations (e.g. events, threats, opportunities)
Do you have the right skills? Enterprise Architect Data Steward Data Analyst Data Scientist Business Analyst Data Engineer Domain Expert Application Developer ETL Developer Data Architect Database Admin Solution Architect
Qualifications for a Data Scientist (source: job posting on Dice.com) A background in data mining, machine learning and distributed computing is desired Bachelor's Degree or Master's Degree in a quantitative discipline such as Mathematics, Statistics, Finance, Accounting, Economics, Operational Research or a related discipline 6 plus years experience in a decision support analytic function is a must 4 plus years of quantitative analysis experience Knowledge of Hadoop, Pig, Hive and MapReduce 8 plus years experience in a decision support analytic function Experience on a Hadoop Platform Experience with Python, Perl or other scripting language Familiarity with object-oriented programming concepts Experience in Java or C++ is a plus Demonstrated proficiency with statistical computing languages such as R, MATLAB, etc Experience with integrating large-scale heterogeneous datasets Expertise with statistical research techniques, including modeling, data mining, clustering and segmentation Strong analytical and problem solving skills Excellent interpersonal skills and ability to communicate effectively with third parties and internal staff at all levels of the organization Excellent organization and time management skills A proven record as a team player plus the ability to work independently given general direction Knowledge of streamline program analysis procedures in SQL, SAS, R, Pig, Python, Apache Mahout or other chosen languages Ability to create, deploy, maintain and refine decision management models Perform study and discovery of new data sources or new uses for existing data sources Participate in the design and implementation of statistical data quality procedures Interpret and implement data findings creatively in a variety of formats Ability to work closely across an array of various teams and organizations in the company to champion Big Data technologies and advanced analytics Ability to work with cross-departmental teams to define metrics, guidelines and strategies for effective use of algorithms and data
Increase Productivity
Ensure Trust
Provide Self-Service
21 Increase Intelligence 5