IBM Big Data Platform



Similar documents
IBM Data Warehousing and Analytics Portfolio Summary

Are You Ready for Big Data?

IBM Big Data Platform

Are You Ready for Big Data?

IBM InfoSphere BigInsights Enterprise Edition

Big Data and Trusted Information

Luncheon Webinar Series May 13, 2013

Exploiting Data at Rest and Data in Motion with a Big Data Platform

IBM System x reference architecture solutions for big data

Analytics and the Context Multiplier

Hadoop and ecosystem * 本 文 中 的 言 论 仅 代 表 作 者 个 人 观 点 * 本 文 中 的 一 些 图 例 来 自 于 互 联 网. Information Management. Information Management IBM CDL Lab

IBM BigInsights for Apache Hadoop

A brief introduction of IBM s work around Hadoop - BigInsights

Raul F. Chong Senior program manager Big data, DB2, and Cloud IM Cloud Computing Center of Competence - IBM Toronto Lab, Canada

Big Data System and Architecture

Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014

Transforming Government with Big Data and Analytics

Einsatz von Big Data Lösungen. Dipl.Ing.Wolfgang Nimführ, Information Agenda Executive Conultant, Big Data Tiger Team IBM Softare Group Europe

BAO & Big Data Overview Applied to Real-time Campaign GSE. Joel Viale Telecom Solutions Lab Solution Architect. Telecom Solutions Lab

Simplifying Big Data Analytics: Unifying Batch and Stream Processing. John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!!

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

IBM BigInsights Has Potential If It Lives Up To Its Promise. InfoSphere BigInsights A Closer Look

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

BIG DATA TRENDS AND TECHNOLOGIES

Please give me your feedback

Beyond Watson: The Business Implications of Big Data

Oracle Big Data SQL Technical Update

IBM accelerators for big data

Einsatzfelder von IBM PureData Systems und Ihre Vorteile.

White Paper: Datameer s User-Focused Big Data Solutions

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

Big Data Management and Security

Using Big Data for Smarter Decision Making. Colin White, BI Research July 2011 Sponsored by IBM

Unified Batch & Stream Processing Platform

The 4 Pillars of Technosoft s Big Data Practice

Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013

Constructing a Data Lake: Hadoop and Oracle Database United!

Hadoop at Yahoo! Owen O Malley Yahoo!, Grid Team owen@yahoo-inc.com

Beyond the Single View with IBM InfoSphere

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January Website:

Data processing goes big

Transforming the Telecoms Business using Big Data and Analytics

How the oil and gas industry can gain value from Big Data?

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce

Big Data on Microsoft Platform

White Paper. Unified Data Integration Across Big Data Platforms

Unified Data Integration Across Big Data Platforms

IBM Netezza High Capacity Appliance

Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop

Data Refinery with Big Data Aspects

Beyond Web Application Log Analysis using Apache TM Hadoop. A Whitepaper by Orzota, Inc.

Hadoop Ecosystem B Y R A H I M A.

IBM Solution Framework for Lifecycle Management of Research Data IBM Corporation

Big Data Strategies with IMS

Bringing Big Data to People

Navigating the Big Data infrastructure layer Helena Schwenk

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

BIG DATA APPLICATIONS

Apache Hadoop: The Big Data Refinery

BIG DATA What it is and how to use?

Search and Real-Time Analytics on Big Data

Big Data Analytics: Where is it Going and How Can it Be Taught at the Undergraduate Level?

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

W H I T E P A P E R. Architecting A Big Data Platform for Analytics INTELLIGENT BUSINESS STRATEGIES

Microsoft Big Data Solutions. Anar Taghiyev P-TSP

How To Handle Big Data With A Data Scientist

Big Data for the JVM developer. Costin Leau,

IBM Big Data in Government

Big Data & Analytics. Counterparty Credit Risk Management. Big Data in Risk Analytics

IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS!

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

BIG DATA SOLUTION DATA SHEET

HDP Hadoop From concept to deployment.

BIG DATA TECHNOLOGY. Hadoop Ecosystem

Processing of massive data: MapReduce. 2. Hadoop. New Trends In Distributed Systems MSc Software and Systems

BIG Data Analytics Move to Competitive Advantage

Real-time Data Analytics mit Elasticsearch. Bernhard Pflugfelder inovex GmbH

Hadoop. Sunday, November 25, 12

Big Data Can Drive the Business and IT to Evolve and Adapt

In-Memory Analytics for Big Data

BIRT in the World of Big Data

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

A Brief Outline on Bigdata Hadoop

Data Governance in the Hadoop Data Lake. Michael Lang May 2015

The Future of Data Management

IBM InfoSphere Guardium Data Activity Monitor for Hadoop-based systems

Executive Summary... 2 Introduction Defining Big Data The Importance of Big Data... 4 Building a Big Data Platform...

What s Happening to the Mainframe? Mobile? Social? Cloud? Big Data?

Transcription:

Mike Winer IBM Information Management IBM Big Data Platform

The big data opportunity Extracting insight from an immense volume, variety and velocity of data, in a timely and cost-effective manner. Variety: Velocity: Volume: Manage the complexity of multiple relational and nonrelational data types and schemas Streaming data and large volume data movement Scale from terabytes to zettabytes 1

What is BIG DATA? All kinds of data Large volumes Valuable insight, but difficult to extract Often extremely time sensitive

What makes big data technology different? Jobs distributed across affordable hardware. Manages and analyzes all kinds of data. Analyzes data in native format. 3

Where do Enterprises get Stuck with Big Data? Unwillingness to Change Shortage Skilled Resources One-Size-Fits-All Lack Executive Support Vague Goals Processing Bottlenecks The key to success in a Big Data world: Organizational change to embrace exploratory analytics

IBM Big Data Platform Components InfoSphere BigInsights Hadoop-based low latency analytics for variety and volume Hadoop Information Integration Stream Computing InfoSphere Information Server InfoSphere Streams Low Latency Analytics for streaming data High volume data integration and transformation MPP Data Warehouse IBM InfoSphere Warehouse IBM Netezza High Capacity Appliance Large volume structured data analytics Queryable Archive Structured Data IBM Netezza 1000 BI+Ad Hoc Analytics Structured Data IBM Smart Analytics System Operational Analytics on Structured Data

What does a Big Data platform do? Analyze a Variety of Information Novel analytics on a broad set of mixed information that could not be analyzed before Analyze Information in Motion Streaming data analysis Large volume data bursts & ad-hoc analysis Analyze Extreme Volumes of Information Cost-efficiently process and analyze petabytes of information Manage & analyze high volumes of structured, relational data Discover & Experiment Ad-hoc analytics, data discovery & experimentation Manage & Plan Enforce data structure, integrity and control to ensure consistency for repeatable queries

IBM s unique strengths in Big Data Big Data in Real-Time Ingest, analyze and act on massive volumes of streaming data Faster AND more cost-effective for specific use cases. (10x volume of data on the same hardware.) Fit for purpose analytics Analyzes a variety of data types, in their native format For example: text, geospatial, time series, video, audio & more Enterprise Class High performance warehouse software and appliances Ease of use with end users, admin and development UIs Open source enhanced for reliability, performance and security Integration Integration into your IM architecture Pre-integrated analytic applications

InfoSphere BigInsights

BigInsights delivers a comprehensive Hadoopbased big data solution. Volume Variety Terabytes, petabytes, even exabytes All kinds of data All kinds of analytics Store Analyze Explore Velocity Analyze data in... Hours instead of days Days instead of weeks Traditional / Non-traditional data sources 9 2009 IBM Corporation

BigInsights provides unique business value when: Data volumes cannot be cost effectively managed using existing technologies. Analyzing larger volumes of data can provide better results. Mining insights from non-relational data types. Exploring data to understand it s potential value to the business. 10 2009 IBM Corporation

What makes BigInsights special? + IBM Innovation Scalable New nodes can be added on the fly. Affordable Massively parallel computing on commodity servers Flexible Hadoop is schema-less, and can absorb any type of data. Fault Tolerant Through MapReduce software framework Performance & reliability Adaptive MapReduce, Compression, BigIndex, Flexible Scheduler Analytic Accelerators Productivity Accelerators Web-based UIs Tools to leverage existing skills End-user visualization Enterprise Integration To extend & enrich your information supply chain. 2009 IBM Corporation

InfoSphere BigInsights Hadoop-Based Big Data Solution Hadoop A set of capabilities to cost-effectively analyze a wide variety and large volume of information to gain insights that were not previously possible. Ability to analyze data in its native format, without imposing a schema/structure, to enable fast ad-hoc analysis. The big data platform builds on the Apache Hadoop project and incorporates latest technology from IBM research. Use Cases Analyze a Variety of Information Novel analytics on a broad set of mixed information that could not be analyzed before Analyze Extreme Volumes of Information Cost-efficiently process and analyze petabytes of information Discovery and Experimentation Quick and easy sandbox to explore data and determine its value

Hadoop Explained Hadoop computation model Data stored in a distributed file system spanning many inexpensive computers Bring function to the data Distribute application to the compute resources where the data is stored Scalable to thousands of nodes and petabytes of data Hadoop Data Nodes public static class TokenizerMapper public static class TokenizerMapper extends Mapper<Object,Text,Text,IntWritable> { extends Mapper<Object,Text,Text,IntWritable> { private final static IntWritable private final static IntWritable one = new IntWritable(1); one = new IntWritable(1); private Text word = new Text(); private Text word = new Text(); public void map(object key, Text val, Context public void map(object key, Text val, Context StringTokenizer itr = StringTokenizer itr = new StringTokenizer(val.toString()); new StringTokenizer(val.toString()); while (itr.hasmoretokens()) { while (itr.hasmoretokens()) { word.set(itr.nexttoken()); word.set(itr.nexttoken()); context.write(word, one); context.write(word, one); } } } } } } public static class IntSumReducer public static class IntSumReducer extends Reducer<Text,IntWritable,Text,IntWrita extends Reducer<Text,IntWritable,Text,IntWrita private IntWritable result = new IntWritable(); private IntWritable result = new IntWritable(); public void reduce(text key, public void reduce(text key, Iterable<IntWritable> val, Context context){ Iterable<IntWritable> val, Context context){ int sum = 0; int sum = 0; for (IntWritable v : val) { for (IntWritable v : val) { sum += v.get(); sum += v.get(); 1. Map Phase (break job into small parts) Distribute map tasks to cluster...... (transfer interim output for final processing) 3. Reduce Phase MapReduce Application Shuffle Result Set 2. Shuffle (boil all output down to a single result set) Return a single result set

InfoSphere BigInsights A Full Hadoop Stack Analytics ML Analytics Text Analytics BigSheets Interface Management Console Application Zookeeper IBM LZO Compression Pig Hive Jaql MapReduce AdaptiveMR BigIndex Oozie Lucene Avro (browser-based) Development Tooling (Eclipse plug-ins) REST API (for applications) Storage HBase HDFS GPFS-SNC Data Sources/ Connectors Streams Netezza BoardReader Cognos Data Stage DB2 CSV / XML / JSON R Flume JDBC Web Crawler SPSS

How Text Analytics Works Football World Cup 2010, the Dutch team distinguished themselves well, losing to the Spanish champions 1-0 in the Final. Early in the second half, Netherlands striker, Arjen Robben, had a breakaway, but the keeper for Spain, Iker Casilas made the save. Winger Andres Iniesta scored for Spain for the win. World Cup 2010 Highlights Arjen Robben Iker Casilas Andres Iniesta Striker Keeper Winger Netherlands Spain Spain

What is Text Analytics? Parsing Facts Deep understanding of text Text Text Analytics Analytics Social media Call center records Logs SEC filings

InfoSphere Streams

InfoSphere Streams delivers Real Time Analytic Processing A Platform to Run In-Motion Analytics on BIG Data Real time delivery ICU Monitoring Environment Monitoring Volume Terabytes per second Petabytes per day Algo Trading Powerful Analytics Cyber Security Government / Law enforcement Smart Grid Telco churn predict Variety All kinds of data All kinds of analytics Millions of events per second Microsecond Latency Velocity Insights in microseconds Traditional / Non-traditional data sources 18 2009 IBM Corporation

Big Data in Real-Time with InfoSphere Streams Filter / Sample Modify Annotate Fuse Classify 19 2009 IBM Corporation

Streams provides unique business value when: It would be too expensive to store before analyzing. Data fusion across multiple, disparate streams brings advantage. True real-time data analysis can provide better business outcomes. Ability to run multiple analytic models or applications against the same data. 20 2009 IBM Corporation

How it works: Streams Processing Language (SPL) built for Streaming apps: Reusable operators Rapid application development Continuous pipeline processing Compile groups of operators into single processes: Efficient use of cores Distributed execution Very fast data exchange Can be automatic or tuned Scaled with push of a button Easy to manage: Automatic placement Extend apps incrementally w/out downtime Multi-user / multiple applications Easy to extend: Built in adapters Custom developments with C++ and Java Any type of data: Can handle virtually any data type. Productivity Layer Developer Productivity Admin Productivity End User Productivity Acceleration Layer Analytic Accelerators Solution Accelerators Big Data Management Layer Performance & Reliability Components Data-at-Rest Data-in-Motion Dynamic analysis: Programmatically change topology at runtime Create new subscriptions Create new port properties Flexible & high performance transport: Ultra low latency High data rates 21 2009 IBM Corporation

Accelerators to speed and simplify development Telco Finance Smarter energy Data mining Public transportation... with more on the way. Over 100 samples applications User Defined Toolkits Standard Toolkit >300 functions & operators 22 2009 IBM Corporation

InfoSphere Streams Stream Computing Big Data Solution Stream Computing A platform for managing information streams and applying big data analytics to data in motion. Analyze a wide variety of data in its native format, at a massive volume and scale (terabytes per second). Use Cases Analyze Information in Motion Low-latency analysis of streaming information Analyze Extreme Volumes of Information in Motion Terabytes per second, petabytes per day. Analyze a Variety of Information Analyze a variety of data in its native format streaming audio, video, spatial, among others

IBM InfoSphere Streams v2.0 Agile Development Environment Distributed Runtime Environment Sophisticated Analytics with Toolkits & Adapters Front Office 3.0 Eclipse IDE Streams Live Graph Streams Debugger Clustered runtime for near-limitless capacity RHEL v5.3 and above x86 multicore hardware InfiniBand support Database Mining Financial Standard Internet Big Data (HDFS) Text User-defined toolkits

IBM Big Data Platform

What Could You do with a Big Data Platform? Analyze a Variety of Information Novel analytics on a broad set of mixed information that could not be analyzed before Analyze Information in Motion Streaming data analysis Large volume data bursts & ad-hoc analysis Analyze Extreme Volumes of Information Cost-efficiently process and analyze petabytes of information Manage & analyze high volumes of structured, relational data Discover & Experiment Ad-hoc analytics, data discovery & experimentation Manage & Plan Enforce data structure, integrity and control to ensure consistency for repeatable queries

THINK