Are You Ready for Big Data? Jim Gallo National Director, Business Analytics February 11, 2013
Agenda What is Big Data? How do you leverage Big Data in your company? How do you prepare for a Big Data initiative? Summary 2
What is Big Data? 3
What is Big Data? Big data" is high-volume, -velocity, -variety and -veracity information assets that demand costeffective, innovative forms of information processing for enhanced insight and decision making. Volume (TB to ZB) Velocity (streaming &large volume data movement) Variety (relational & nonrelational data types) Model, Predict and Score Twitter RFID Machi ne Data Monitors Relational Video Facebook Click Stream Trades & Transactions Identity Geospatial Text Measure and Analyze Cost-effective Veracity (managing the reliability and predictability of inherently imprecise data types) 4
What might a Big Data platform look like? Data Warehouse Hadoop BI/ Reporting Information Integration Stream Computing Content Analytics Functional Apps Exploration/ Visualization Industry Apps Instrumentation Analytics Predictive Analytics 5
What is Hadoop? Open source software project Distributed processing of large data sets Leverage clusters of commodity servers Scale from single server to thousands of machines High degree of fault tolerance (detects and handles failures at the application layer) 6
What are the benefits of Hadoop? Scalable New nodes can be added as needed Add without needing to change: data formats how data is loaded how jobs are written the applications Cost effective Massively parallel computing on commodity servers Sizeable decrease in the cost per terabyte of storage Fault tolerant Redirects work to another location of the data Continues processing Flexible Schema-less Can absorb any type of data, structured or not Any number of sources Data from multiple sources can be joined and aggregated in arbitrary ways 7
What are the key components of Hadoop? MapReduce Hadoop Distributed File System (HDFS) Pig Hive ZooKeeper 8
What is MapReduce? Programming paradigm that allows for massive scalability across hundreds or thousands of servers in a Hadoop cluster. Map" step: The master node takes the input, divides it into smaller subproblems, and distributes them to worker nodes. A worker node may do this again in turn, leading to a multi-level tree structure. The worker node processes the smaller problem, and passes the answer back to its master node. "Reduce" step: The master node then collects the answers to all the sub-problems and combines them in some way to form the output the answer to the problem it was originally trying to solve. 9
What is HDFS? Data in a Hadoop cluster is broken down into smaller pieces (called blocks) and distributed throughout the cluster. The map and reduce functions can be executed on smaller subsets of larger data sets, and this provides the scalability that is needed for big data processing. 10
What are Pig and Hive? Pig Developed at Yahoo! Programming language Designed to handle any kind of data Hive Developed at Facebook Hive Query Language (HQL) similar to standard SQL Allows anyone who is already fluent with SQL to more quickly leverage the Hadoop platform 11
What is Zookeeper? Provides a centralized infrastructure and services that enable synchronization across a cluster Maintains common objects needed in large cluster environments, such as: configuration information hierarchical naming space, etc. Applications can leverage these services to coordinate distributed processing across large clusters 12
What does a Big Data platform do? Analyze a Variety of Information Novel analytics on a broad set of mixed information that could not be analyzed before. Analyze Information in Motion Streaming data analysis Large volume data bursts and ad hoc analysis Analyze Extreme Volumes of Information Cost-efficiently process and analyze petabytes of information Manage and analyze high volumes of structured, relational data Discover and Experiment Ad hoc analytics, data discovery and experimentation Manage and Plan Enforce data structure, integrity and control to ensure consistency for repeatable queries 13
How does a Big Data platform fit? Data Warehouse Big Data Platform Enterprise Integration Traditional Sources New Sources 14
Is the approach the same? Traditional Approach Structured & Repeatable Analysis Big Data Approach Iterative and Exploratory Analysis Business Users Determine what questions to ask IT Delivers a platform to enable creative discovery IT Structures the data to answer the questions Monthly sales reports Profitability analysis Customer surveys Business Users Explore what questions could be asked Brand sentiment Product strategy Maximum asset utilization 15
Leveraging Big Data 16
What can you do with Big Data? Analyze Information in Motion Smart Grid management Multimodal surveillance Real-time promotions Cyber security ICU monitoring Options trading Click-stream analysis CDR processing IT log analysis RFID tracking and analysis Analyze Extreme Volumes of Information Transaction analysis to create insightbased product/service offerings Fraud monitoring and detection Risk modeling and management Social media/sentiment analysis Environmental analysis 17 Manage and Plan Operational analytics BI reporting Planning and forecasting analysis Predictive analysis Analyze a Variety of Information Social media/sentiment analysis Geospatial analysis Brand strategy Scientific research Epidemic early warning system Market analysis Video analysis Audio analysis Discovery and Experimentation Sentiment analysis Brand strategy Scientific research Ad hoc analysis Model development Hypothesis testing Transaction analysis to create insight-based product/service offerings
What are some use cases? Fraud Detection and Modeling o 360 View of the Customer Smart Grid / Smarter Utilities Cyber Security Email, Call Center Transcript Analysis Risk Modeling & Management Call Detail Record Analysis Threat Detection / Multi-modal Surveillance RFID Tracking and Analysis Geo-marketing 18
What are some analytics examples? Financial Services Improved risk decisions Customer sentiment analysis AML (Anti Money Laundering) Transportation Weather and traffic impact on logistics and fuel consumption Call Centers Voice-to-text for customer behavior understanding Telecommunications Operations and failure analysis from device, sensor, and GPS inputs Utilities Weather impact analysis on power generation Smart meter data analysis IT Transaction log analysis for multiple transactional systems E Commerce Internet behavior and buying patters Digital asset piracy Multi-channel Integration Integrated customer behavior modeling 19
What are some streaming analytics examples? Transportation Intelligent traffic management Manufacturing Process control for microchip fabrication Natural Systems Wild fire management Water management Health & Life Sciences Neonatal ICU monitoring Epidemic early warning system Remote healthcare monitoring Telephony CDR processing Social analysis Churn prediction Geomapping Stock Market Impact of weather on securities prices Market analysis at ultra-low latencies Law Enforcement, Defense & Cyber Security Real-time multimodal surveillance Situational awareness Cyber security detection Fraud Prevention Detecting multi-party fraud Real time fraud prevention e-science Space weather prediction Detection of transient events Genomics research Other Smart Grid Text analysis Who s talking to whom? 20
To what extent is Bid Data being adopted? Three out of four organizations have big data activities underway; and one in four are either in pilot or production Early days of big data era Almost half of all organizations surveyed report active discussions about big data plans Big data has moved out of IT and into business discussions Getting underway More than a quarter of organizations have active big data pilots or implementations Tapping into big data is becoming real Acceleration ahead The number of active pilots underway suggests big data implementations will rise exponentially in the next few years Once foundational technologies are installed, use spreads quickly across the organization 28% Pilot and implementation of big data activities 24% Have not begun big data activities 48% Planning big data activities 28% Pilot and implementati on of big data activities Source: IBM Institute for Business Value and Saïd Business School, University of Oxford, 2012 21
What are some tends for Big Data adoption? Improving the customer experience by better understanding behaviors drives almost half of all active big data efforts. 22 Source: IBM Institute for Business Value and Saïd Business School, University of Oxford, 2012
Preparing for a Big Data Initiative 23
Five Practical Questions 24
What do you want to know? Business Objectives Improved decision-making Better business performance Needs Postulates Questions Results Improved customer satisfaction Increased profit margin Expanded social awareness 25
Big Data or lots of data? or 26
Is there a data source? Surveys Twitter LinkedIn Foursquare Sentiment Analysis Demographics Sales Geospatial Identity Facial Recognition Predictive Analytics License Plate Recognition Effectiveness Site behavior & Experience Ad Campaigns Facebook Blogs Competitors Weather RFID Monitors Machine Data Trades & Transactions Display Media 27
Is it worth it? Labor Options ROI Sourcing Hardware & Software 28
Will it work? Model, Predict and Score Options Resources (Internal & External) Measure and Analyze Intranet & Extranet Time & Money 29
Summary 30
Summary Big Data High-volume, -velocity, -variety and -veracity information assets Cost-effective, innovative forms of information processing Enhanced insight and decision making Features and Functions Analyze a variety of information Analyze information in motion Analyze extreme volumes of information Discover and experiment Manage and plan Be Pragmatic Business-driven Provable ROI Proof of concept Not for everyone Uses Wide applicability Cross-industry Iterative and exploratory Complimentary to BI/DW 31
For More Information Jim Gallo National Director, Business Analytics Information Control Corporation jgallo@iccohio.com (614) 523-3070 x192 32