Big Data for Big Value @ Intel Moty Fania, PE Big data Analytics Assaf Araki, Sr. Arch. Big data Analytics
Advanced Analytics team @ Intel IT Corporate ownership of advanced analytics Team charter Solve strategic high value business units problems Leverage analytics to grow Intel s revenue Specialized in Big data and Machine Learning Skills: Software Engineering, Decision Science and Business Acumen
Harnessing Analytics Transform Data into actionable knowledge Actionable Insights We are drowning in data, but starving for knowledge
4 Copyright 2013, Intel Corporation. All rights reserved. The Challenge Datasets that are unmanageable using traditional technologies Capture Visualization Storage Big Data requires a new approach Analytics Search Sharing Adapted from Forrester Growing need to derive meaning from previously unexplored data
Intel s IT Strategy for Big Data Priority Embrace Big Data - Form an enterprise Big Data Analytics Competency Center Build Implement an internal, cost-effective big data platform and inparallel build the necessary skill- set Approach Systematically apply big data analytics across Intel to solves high value problems -> 5-6-10. Business Value The value of our Big- Data efforts was about USD $100M in 2012. We expect that figure to grow 10x by 2014.
Proving the Value of Big Data Analytics Manufacturing Decrease manufacturing costs by Personalizing unit testing using its historical data Test time reduction Yield improvements Chip Validation Optimize the chip validation process to cut product time-to-market Coverage Bug handling Content optimization $100M Cut TTM by 25%
Proving the Value of Big Data Other Examples Advanced Threats & Malware Detection Uses big data technologies and statistical models to detect anomalous patterns of malicious activity. Sales & Marketing Drive customer engagements based upon analytics leveraging internal & external info Prioritize new customers engagements (Who?) Optimize offering (What?) Improve triggering (When?) Context Aware Recommendation Engine Generic, context aware recommendation system developed for Telmap and now leveraged by other use cases
Big Data Analytics Challenges
Big Data Challenges Analytic Platform Limitations Not all platforms support code execution (e.g. R, Java, C etc.) Most platform are specialized for specific purpose Storage structure (key value, document, relational etc.) Mix processing loads (batch vs. real time) Data load into the DBMS (batch vs. streaming) Solutions are immature ( lack of features, security, HA & multi tenancy) Big Data Analytics Platform Off lin e Operation Source Prediction Model Builder Prediction Model Query
Analytics Algorithms Challenges Task characteristics - State dependency, Distributed Learning, CPU & IO intensive, possibly real time processing Algorithm Limitations The Distribution Curse Most algorithms are written sequential A change in Data Scientist mindset is needed No cross platforms code Can t leverage most of R packages (~4000)
Solution A two layer Hybrid architecture Crunch raw data into meaningful patterns which do not tend to change dramatically Offline Raw data algorithm Underline patterns Run on a scalable platform (Hadoop), Gain scalability Use latest user data and underline patterns to compute user prediction on demand Online Compute prediction using computed model and latest data Prediction Use latest feedback for real time prediction DB
Noticeable trends Hadoop 2.x - YARN Copyright 2013, Intel Corporation. All rights reserved.
Noticeable trends In Memory 128-512GB Named - Berkeley Data Analytic Stack ( BDAS ) Distributed RAM processing 40-60GB/s Batch, Interactive & Stream in one Stack 16 cores
Intel BGU Hadoop Lab Joint effort of Intel & Information System Engineering department The cluster has ~200TB of storage Installed with Hadoop 2.x & Spark Focused on development of new distributed algorithms for ML Impact: Research - Allows researchers to mine larger datasets than before and develop more complex, distributed algorithms Curriculum - Run a masters course for mining massive datasets which focus on implementing distributed machine-learning algorithms
Summary We, at Intel, leverage Big Data analytics to systematically solve high-value business problems across Intel that couldn t be addressed effectively in the past Big Data analytics offers high value but has its own challenges Notable trends - Hadoop 2.0 and in-memory technologies The new Intel-BGU Hadoop Lab will support research and enable new curriculum
Q&A Copyright 2013, Intel Corporation. All rights reserved.
Intel Confidential Do Not Forward www.intel.com/it
Backup