Big Data for Big Value @ Intel



Similar documents
Oracle Big Data SQL Technical Update

Data and Machine Architecture for the Data Science Lab Workflow Development, Testing, and Production for Model Training, Evaluation, and Deployment

The Rise of Industrial Big Data. Brian Courtney General Manager Industrial Data Intelligence

Big Data Research in the AMPLab: BDAS and Beyond

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

Interactive data analytics drive insights

The Big Data Paradigm Shift. Insight Through Automation

Dell* In-Memory Appliance for Cloudera* Enterprise

Roadmap for Transforming Intel s Business with Advanced Analytics

Cost-Effective Business Intelligence with Red Hat and Open Source

locuz.com Big Data Services

Client Overview. Engagement Situation. Key Requirements

Harnessing the Power of the Microsoft Cloud for Deep Data Analytics

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

Accelerating Hadoop MapReduce Using an In-Memory Data Grid

Hadoop2, Spark Big Data, real time, machine learning & use cases. Cédric Carbone Twitter

Hybrid Software Architectures for Big

Moving From Hadoop to Spark

Dell In-Memory Appliance for Cloudera Enterprise

BIG DATA What it is and how to use?

NextGen Infrastructure for Big DATA Analytics.

On a Hadoop-based Analytics Service System

Internet of Things. Opportunity Challenges Solutions

Sujee Maniyam, ElephantScale

The Flink Big Data Analytics Platform. Marton Balassi, Gyula Fora" {mbalassi,

Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist, Graph Computing. October 29th, 2015

Archiving and Sharing Big Data Digital Repositories, Libraries, Cloud Storage

Intel s Big Data Journey

Massive Cloud Auditing using Data Mining on Hadoop

Big Data Analytics - Accelerated. stream-horizon.com

Toronto 26 th SAP BI. Leap Forward with SAP

Advanced Big Data Analytics with R and Hadoop

HDP Hadoop From concept to deployment.

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January Website:

How To Create A Data Visualization With Apache Spark And Zeppelin

Safe Harbor Statement

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

PEPPERDATA IN MULTI-TENANT ENVIRONMENTS

SQream Technologies Ltd - Confiden7al

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

Dashboard Engine for Hadoop

ANALYTICS CENTER LEARNING PROGRAM

Oracle Big Data Building A Big Data Management System

EMC SOLUTION FOR SPLUNK

How Companies are! Using Spark

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances

Executive Summary... 2 Introduction Defining Big Data The Importance of Big Data... 4 Building a Big Data Platform...

Six Days in the Network Security Trenches at SC14. A Cray Graph Analytics Case Study

Analytics on Spark &

Grab some coffee and enjoy the pre-show banter before the top of the hour!

The 4 Pillars of Technosoft s Big Data Practice

PLATFORA INTERACTIVE, IN-MEMORY BUSINESS INTELLIGENCE FOR HADOOP

Towards Smart and Intelligent SDN Controller

[Hadoop, Storm and Couchbase: Faster Big Data]

Bayesian networks - Time-series models - Apache Spark & Scala

Detecting Anomalous Behavior with the Business Data Lake. Reference Architecture and Enterprise Approaches.

Simplifying Big Data Analytics: Unifying Batch and Stream Processing. John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!!

Blueprints for Big Data Success

Big Data Frameworks Course. Prof. Sasu Tarkoma

Cloud Sure - Virtual Machines

Pulsar Realtime Analytics At Scale. Tony Ng April 14, 2015

Pentaho High-Performance Big Data Reference Configurations using Cisco Unified Computing System

What Is In-Memory Computing and What Does It Mean to U.S. Leaders? EXECUTIVE WHITE PAPER

Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

Accelerating Enterprise Big Data Success. Tim Stevens, VP of Business and Corporate Development Cloudera

Big Data Use Case. How Rackspace is using Private Cloud for Big Data. Bryan Thompson. May 8th, 2013

What s next for the Berkeley Data Analytics Stack?

Introducing Oracle Exalytics In-Memory Machine

System Architecture. In-Memory Database

TUT NoSQL Seminar (Oracle) Big Data

Hadoop MapReduce and Spark. Giorgio Pedrazzi, CINECA-SCAI School of Data Analytics and Visualisation Milan, 10/06/2015

Mambo Running Analytics on Enterprise Storage

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Hur hanterar vi utmaningar inom området - Big Data. Jan Östling Enterprise Technologies Intel Corporation, NER

Real-Time Analytical Processing (RTAP) Using the Spark Stack. Jason Dai Intel Software and Services Group

Breaking News! Big Data is Solved. What Is In-Memory Computing and What Does It Mean to U.S. Leaders? EXECUTIVE WHITE PAPER

CRITEO INTERNSHIP PROGRAM 2015/2016

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control

Maximizing Hadoop Performance and Storage Capacity with AltraHD TM

Trafodion Operational SQL-on-Hadoop

Deploying Flash- Accelerated Hadoop with InfiniFlash from SanDisk

Big Data - Business, Math, Technology Best combination for big data 商 业 理 解, 数 据 科 学, 技 术 实 践 之 完 美 结 合

CitusDB Architecture for Real-Time Big Data

Apache Flink Next-gen data analysis. Kostas

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack

Big Data at Cloud Scale

SEIZE THE DATA SEIZE THE DATA. 2015

Copyright 2013, Oracle and/or its affiliates. All rights reserved.

Comprehensive Analytics on the Hortonworks Data Platform

Infrastructure Matters: POWER8 vs. Xeon x86

An Integrated Analytics & Big Data Infrastructure September 21, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle Enterprise

Big Data and Data Science. The globally recognised training program

Transcription:

Big Data for Big Value @ Intel Moty Fania, PE Big data Analytics Assaf Araki, Sr. Arch. Big data Analytics

Advanced Analytics team @ Intel IT Corporate ownership of advanced analytics Team charter Solve strategic high value business units problems Leverage analytics to grow Intel s revenue Specialized in Big data and Machine Learning Skills: Software Engineering, Decision Science and Business Acumen

Harnessing Analytics Transform Data into actionable knowledge Actionable Insights We are drowning in data, but starving for knowledge

4 Copyright 2013, Intel Corporation. All rights reserved. The Challenge Datasets that are unmanageable using traditional technologies Capture Visualization Storage Big Data requires a new approach Analytics Search Sharing Adapted from Forrester Growing need to derive meaning from previously unexplored data

Intel s IT Strategy for Big Data Priority Embrace Big Data - Form an enterprise Big Data Analytics Competency Center Build Implement an internal, cost-effective big data platform and inparallel build the necessary skill- set Approach Systematically apply big data analytics across Intel to solves high value problems -> 5-6-10. Business Value The value of our Big- Data efforts was about USD $100M in 2012. We expect that figure to grow 10x by 2014.

Proving the Value of Big Data Analytics Manufacturing Decrease manufacturing costs by Personalizing unit testing using its historical data Test time reduction Yield improvements Chip Validation Optimize the chip validation process to cut product time-to-market Coverage Bug handling Content optimization $100M Cut TTM by 25%

Proving the Value of Big Data Other Examples Advanced Threats & Malware Detection Uses big data technologies and statistical models to detect anomalous patterns of malicious activity. Sales & Marketing Drive customer engagements based upon analytics leveraging internal & external info Prioritize new customers engagements (Who?) Optimize offering (What?) Improve triggering (When?) Context Aware Recommendation Engine Generic, context aware recommendation system developed for Telmap and now leveraged by other use cases

Big Data Analytics Challenges

Big Data Challenges Analytic Platform Limitations Not all platforms support code execution (e.g. R, Java, C etc.) Most platform are specialized for specific purpose Storage structure (key value, document, relational etc.) Mix processing loads (batch vs. real time) Data load into the DBMS (batch vs. streaming) Solutions are immature ( lack of features, security, HA & multi tenancy) Big Data Analytics Platform Off lin e Operation Source Prediction Model Builder Prediction Model Query

Analytics Algorithms Challenges Task characteristics - State dependency, Distributed Learning, CPU & IO intensive, possibly real time processing Algorithm Limitations The Distribution Curse Most algorithms are written sequential A change in Data Scientist mindset is needed No cross platforms code Can t leverage most of R packages (~4000)

Solution A two layer Hybrid architecture Crunch raw data into meaningful patterns which do not tend to change dramatically Offline Raw data algorithm Underline patterns Run on a scalable platform (Hadoop), Gain scalability Use latest user data and underline patterns to compute user prediction on demand Online Compute prediction using computed model and latest data Prediction Use latest feedback for real time prediction DB

Noticeable trends Hadoop 2.x - YARN Copyright 2013, Intel Corporation. All rights reserved.

Noticeable trends In Memory 128-512GB Named - Berkeley Data Analytic Stack ( BDAS ) Distributed RAM processing 40-60GB/s Batch, Interactive & Stream in one Stack 16 cores

Intel BGU Hadoop Lab Joint effort of Intel & Information System Engineering department The cluster has ~200TB of storage Installed with Hadoop 2.x & Spark Focused on development of new distributed algorithms for ML Impact: Research - Allows researchers to mine larger datasets than before and develop more complex, distributed algorithms Curriculum - Run a masters course for mining massive datasets which focus on implementing distributed machine-learning algorithms

Summary We, at Intel, leverage Big Data analytics to systematically solve high-value business problems across Intel that couldn t be addressed effectively in the past Big Data analytics offers high value but has its own challenges Notable trends - Hadoop 2.0 and in-memory technologies The new Intel-BGU Hadoop Lab will support research and enable new curriculum

Q&A Copyright 2013, Intel Corporation. All rights reserved.

Intel Confidential Do Not Forward www.intel.com/it

Backup