Taming the Beast of Big Data



Similar documents
Taming the Beast of Big Data

Raul F. Chong Senior program manager Big data, DB2, and Cloud IM Cloud Computing Center of Competence - IBM Toronto Lab, Canada

Big Data and Trusted Information

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014

BIG DATA TRENDS AND TECHNOLOGIES

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

Big Data Explained. An introduction to Big Data Science.

The Future of Data Management

BIG DATA-AS-A-SERVICE

Big Data Buzzwords From A to Z. By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012

A Survey on Big Data Concepts and Tools

Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics

Big Data Market Size and Vendor Revenues

Annex: Concept Note. Big Data for Policy, Development and Official Statistics New York, 22 February 2013

The Next Wave of Data Management. Is Big Data The New Normal?

Are You Ready for Big Data?

IBM Big Data Platform

INVESTOR PRESENTATION. First Quarter 2014

Data Refinery with Big Data Aspects

Exploiting Data at Rest and Data in Motion with a Big Data Platform

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Big Data and Hadoop for the Executive A Reference Guide

Are You Ready for Big Data?

Luncheon Webinar Series May 13, 2013

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

How Big Is Big Data Adoption? Survey Results. Survey Results Big Data Company Strategy... 6

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce

BIG DATA TECHNOLOGY. Hadoop Ecosystem

BIRT in the World of Big Data

Extend your analytic capabilities with SAP Predictive Analysis

HDP Hadoop From concept to deployment.

Modernizing Your Data Warehouse for Hadoop

TAMING THE BIG CHALLENGE OF BIG DATA MICROSOFT HADOOP

The Future of Data Management with Hadoop and the Enterprise Data Hub

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

HDP Enabling the Modern Data Architecture

Il mondo dei DB Cambia : Tecnologie e opportunita`

Keywords Big Data, NoSQL, Relational Databases, Decision Making using Big Data, Hadoop

Big Data on Microsoft Platform

BAO & Big Data Overview Applied to Real-time Campaign GSE. Joel Viale Telecom Solutions Lab Solution Architect. Telecom Solutions Lab

Next presentation starting soon Business Analytics using Big Data to gain competitive advantage

Beyond Watson: The Business Implications of Big Data

Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014

IBM BigInsights for Apache Hadoop

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

Virtualizing Apache Hadoop. June, 2012

IBM Data Warehousing and Analytics Portfolio Summary

IBM Big Data Platform

How To Handle Big Data With A Data Scientist

Changing the face of Business Intelligence & Information Management

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Data processing goes big

BIG DATA CHALLENGES AND PERSPECTIVES

Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru

Age of Big data. Presented by: Mohammad Iqbal BCM -2014

INVESTOR PRESENTATION. Third Quarter 2014

Big Data Solutions. Portal Development with MongoDB and Liferay. Solutions

IBM InfoSphere BigInsights Enterprise Edition

Doing Multidisciplinary Research in Data Science

Tap into Hadoop and Other No SQL Sources

Sources: Summary Data is exploding in volume, variety and velocity timely

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

Majed Al-Ghandour, PhD, PE, CPM Division of Planning and Programming NCDOT 2016 NCAMPO Conference- Greensboro, NC May 12, 2016

Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time?

Microsoft Big Data. Solution Brief

The 3 questions to ask yourself about BIG DATA

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

Chapter 7. Using Hadoop Cluster and MapReduce

This Symposium brought to you by

SCALABLE FILE SHARING AND DATA MANAGEMENT FOR INTERNET OF THINGS

Big Data System and Architecture

Big Data: What You Should Know. Mark Child Research Manager - Software IDC CEMA

A Modern Data Architecture with Apache Hadoop

There s no way around it: learning about Big Data means

Using Big Data for Smarter Decision Making. Colin White, BI Research July 2011 Sponsored by IBM

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

A New Era Of Analytic

Generating the Business Value of Big Data:

Ubuntu and Hadoop: the perfect match

Mike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved.

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Introduction to Big Data! with Apache Spark" UC#BERKELEY#

Parallel Data Warehouse

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

In-Memory Analytics for Big Data

#mstrworld. Tapping into Hadoop and NoSQL Data Sources in MicroStrategy. Presented by: Trishla Maru. #mstrworld

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

Chapter 6. Foundations of Business Intelligence: Databases and Information Management

Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect

Using Tableau Software with Hortonworks Data Platform

Getting Started Practical Input For Your Roadmap

Protecting Big Data Data Protection Solutions for the Business Data Lake

So What s the Big Deal?

BIG DATA: FIVE TACTICS TO MODERNIZE YOUR DATA WAREHOUSE

Transcription:

Taming the Beast of Big Data Jeff Zakrzewski Vice President Sogeti USA Local Touch, Global Reach 1

Agenda What is Big Data? Some Sources of Big Data Approaches to Big Data The Hadoop Buzz Vertical Perspective Vendor Perspective Role of the Future Q & A Local Touch, Global Reach 2

Local Touch, Global Reach 3

What is Big Data? Local Touch, Global Reach 4

What is Big Data? Big data is a term applied to data sets whose size is beyond the ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time. As much as 80% of the world s data is now in unstructured formats, which is created and held on the web. This data is increasingly associated with genuine Cloud-based services, used externally to the Enterprise IT. The part of Big Data that relates to the expected explosive growth and creation of new value is the unstructured data mostly arising from these external sources. Data sets are growing at a staggering pace Expected to grow by 100% every year for at least the next 5 years. Most of this data is unstructured or semi-structured generated by servers, network devices, social media, and distributed sensors. Big Data refers to such data because the volume (petabytes and exabytes), the type (semi- and unstructured, distributed), and the speed of growth (exponential) make the traditional data storage and analytics tools insufficient and cost-prohibitive. An entirely new set of processing and analytic systems are required for Big Data, with Apache Hadoop being one example of a Big Data processing system that has gained significant popularity and acceptance. According to a recent McKinsey Big Data report, Big Data can provide up to $300 billion annual value to the US Healthcare industry, and can increase US retail operating margins by up to 60%. It s no surprise that Big Data analytics is quickly becoming a critical priority for large enterprises across all verticals. Local Touch, Global Reach 5

Big Data V3 Characteristics The usual big data characteristics are: Volume: there is a lot of data to be analyzed and/or the analysis is extremely intense; either way, a lot of hardware is needed. Variety: the data is not organized into simple, regular patterns as in a table; rather text, images and highly varied structures or structures unknown in advance are typical. Velocity: the data comes into the data management system rapidly and often requires quick analysis or decision making. Local Touch, Global Reach 6

Big Data Trend Overview Drivers Volume, variety, velocity, and complexity of incoming data streams Growth of Internet of Things results in explosion of new data Commoditization of inexpensive terabyte-scale storage hardware is making storage less costly.so why not store it? Increasingly enterprises are needing to store non-traditional and unstructured data in a way that is easily queried Desire to integrate all the data into a single source The power of Compression Local Touch, Global Reach 7

Big Data Trend Overview Challenges Data comes from many different sources (enterprise apps, web, search, video, mobile, social conversations and sensors) All of this information has been getting increasingly difficult to store in traditional relational databases and even data warehouses Unstructured or semi-structured text is difficult to query. How does one query a table with a billion rows? Culture, skills, and business processes Conceptual Data Modeling Data Quality Management Local Touch, Global Reach 8

Big Data Trend Overview Implications Emerging capabilities to process vast quantities of structured and unstructured data are bringing about changes in technology and business landscapes As data sets get bigger and the time allotted to their processing shrinks, look for ever more innovative technology to help organizations glean the insights they'll need to face an increasingly data-driven future Local Touch, Global Reach 9

Have you processed your Yottabyte today? With the advent of big data comes even bigger storage capacity now we can deal in Yottabytes! The National Security Agency (NSA) is already building a gigantic supercomputer to process this gigantic amount of information in the biggest spy center ever (bigger than 17 football fields). The million square foot Centre will be more than five times the size of the US Capitol and be able to sift through literally all electronic communications all over the world. Local Touch, Global Reach 10 The Utah-based facility that can process yottabytes (a quadrillion gigabytes) of data, (according to the Gizmondo technology blog), is designed to intercept, decipher, analyze, and store vast swaths of the world s communications as they zap down from satellites and zip through the underground and undersea cables of international, foreign, and domestic networks, It will be the centerpiece for the Global Information Grid and is set to go live in September 2013.

Big Data The Byte Scale The file size conversion table below shows the relationship between the file storage sizes that computers use. Binary calculations are based on units of 1,024, and decimal calculations are based on units of 1,000. File size measures the size of a computer file. Typically it is measured in bytes with a prefix. The actual amount of disk space consumed by the file depends on the file system. The maximum file size a file system supports depends on the number of bits reserved to store size information and the total size of the file system. For example, with FAT32, the size of one file cannot be equal or larger than 4 GiB. Name Symbol Binary Measurement Decimal Measurement Number of Bytes Equal to kilobyte KB 2^10 10^3 1,024 1,024 bytes megabyte MB 2^20 10^6 1,048,576 1,024KB gigabyte GB 2^30 10^9 1,073,741,824 1,024MB terabyte TB 2^40 10^12 1,099,511,627,776 1,024GB petabyte PB 2^50 10^15 exabyte EB 2^60 10^18 zettabyte ZB 2^70 10^21 yottabyte YB 2^80 10^24 1,125,899,906,842,624 1,152,921,504,606,846,976 1,180,591,620,717,411,303,424 1,208,925,819,614,629,174,706,176 1,024TB 1,024PB 1,024EB 1,024ZB Local Touch, Global Reach 11

Some Sources of Big Data Local Touch, Global Reach 12

A Connected World Local Touch, Global Reach 13

An Explosion in Data in Recent History! 1.8 Billion RFID tags in 2005 4 Billion RFID tags in 2009 30 Billion RFID tags in 2010 Over 2.3 Billion Internet users 24 Petabytes of data processed in a single day Billions of financial transactions daily TBs of data! 6 Billon Mobile Phones World Wide 100s of Millions Videos 10s of Petabytes of Data World Data Centre for Climate 220 Terabytes of Web data 9 Petabytes of additional data Twitter processes 12 terabytes of data every day - 230 million tweets Facebook processes 25 terabytes of data every day The Human Genome Project Fully mapped in 2003 Local Touch, Global Reach Petabytes 14 of data ~1GB per human non-compressed

What do we do with all of this data? Local Touch, Global Reach 15

The Challenge: Bring Together a Large Volume and Variety of Data to Find New Insights Analyzing a variety of data at enormous volumes Insights on streaming data Large volume structured data analysis Multi-channel customer sentiment and experience analysis Detect life-threatening conditions at hospitals in time to intervene Predict weather patterns to plan optimal wind turbine usage, and optimize capital expenditure on asset placement Make risk decisions based on real-time transactional data Local Touch, Global Reach 16 Identify criminals and threats from disparate video, audio, and data feeds

Approaches to Big Data Local Touch, Global Reach 17

The Big Data Approach: Information Sources Drive Creative Discovery Business and IT Identify Information Sources Available New insights drive integration to traditional technology IT Delivers a Platform that enables creative exploration of all available data and content Business determines what questions to ask by exploring the data and relationships Local Touch, Global Reach 18

Big Data Enterprise Data Platform Manage Big Data from the instant it enters the enterprise High fidelity no changes to original format Available for new uses, analyses, and integrations Business Analytic Applications and Solutions Big Data Applications Operational Data Store Big Data Platform Big Data Solutions Big Data User Environment Client and Partner Solutions Warehouse and Appliances Developers End Users Admin. Big Data Enterprise Engine Traditional data sources Streaming analytics Internet-scale analytics Source data (Web, sensors, logs, media, etc. ) Local Touch, Global Reach 19 Govern: Quality, Lifecycle Management, Security, Privacy

Data Processing and Analytics: The Old Way Traditionally, data processing for analytic purposes follows a fairly static blueprint. Namely, through the regular course of business enterprises create modest amounts of structured data with stable data models via enterprise applications like CRM, ERP and financial systems. Data integration tools are used to extract, transform and load the data from enterprise applications and transactional databases to a staging area where data quality and data normalization (hopefully) occur and the data is modeled into neat rows and tables. The modeled, cleansed data is then loaded into an enterprise data warehouse. This routine usually occurs on a scheduled basis usually daily or weekly, sometimes more frequently Traditional Data Processing/Analytics - Source: Wikibon 2011 Local Touch, Global Reach 20

Big Data Analytics Complements the DW Transactional Big-data projects cannot use Hadoop, as it is not real-time. For transactional systems that do not need a database with ACID 2 guarantees, NoSQL databases can be used, though there are constraints such as weak consistency guarantees (e.g., eventual consistency) or restricting transactions to a single data item. For big-data transactional SQL databases that need the ACID 2 guarantees the choices are limited. Traditional scale-up databases are usually too costly for very large-scale deployment, and don't scale out very well. Most social medial databases have had to hand-craft solutions. Recently a new breed of scale-out SQL database have emerged with architectures that move the processing next to the data (in the same way as Hadoop), such as Clustrix. These allow greater scaleoutability. This area is extremely fast growing, with many new entrants into the market expected over the next few years. 2 ACID stands for atomicity, consistency, isolation, durability. Local Touch, Global Reach 21

Merging Traditional and Big Data Approaches Traditional Approach Structured & Repeatable Analysis Big Data Approach Iterative & Exploratory Analysis Business Users Determine what question to ask IT Delivers a platform to enable creative discovery IT Structures the data to answer that question Monthly sales reports Profitability analysis Customer surveys Business Explores what questions could be asked Brand sentiment Product strategy Maximum asset utilization Preventative care Local Touch, Global Reach 22

Data flow and Processes Compared Local Touch, Global Reach 23

Enterprise Integration Trusted Information & Governance Companies need to govern what comes in, and the insights that come out Data management Insights from Big Data must be incorporated into the warehouse Data Warehouse Enterprise Integration Big Data Platform Traditional Sources New Sources Local Touch, Global Reach 24 24

Local Touch, Global Reach 25

Big data and Hadoop What is Hadoop? The most well known technology used for Big Data is Hadoop. It has been inspired from Google publications on MapReduce, GoogleFS and BigTable. As Hadoop can be hosted on commodity hardware (usually Intel PC on Linux with one or 2 CPU and a few TB on HDD, without any RAID replication technology), it allows them to store huge quantities of data (petabytes or even more) at very low costs (compared to SAN systems). Hadoop is an opensource version of Google s MapReduce framework. It is a free, Java-based programming framework that supports the processing of large data sets in a distributed computing environment. It is part of the Apache project sponsored by the Apache Software Foundation: http://hadoop.apache.org/. The Hadoop brand contains many different tools. Two of them are core parts of Hadoop: Hadoop Distributed File System (HDFS) is a virtual file system that looks like any other file system except than when you move a file on HDFS, this file is split into many small files, each of those files is replicated and stored on (usually, may be customized) 3 servers for fault tolerance constraints. Hadoop MapReduce is a way to split every request into smaller requests which are sent to many small servers, allowing a truly scalable use of CPU power. Local Touch, Global Reach 26

How does Hadoop help? What problems can Hadoop solve? The Hadoop framework is used by major players including Google, Yahoo, IBM, ebay, LinkedIn and Facebook, largely for applications involving search engines and advertising. The preferred operating systems are Windows and Linux but Hadoop can also work with BSD and OS X. Hadoop was originally the name of a stuffed toy elephant belonging to a child of the framework's creator, Doug Cutting. Mike Olson (Cloudera): The Hadoop platform was designed to solve problems where you have a lot of data perhaps a mixture of complex and structured data and it doesn't fit nicely into tables. It's for situations where you want to run analytics that are deep and computationally extensive, like clustering and targeting. That's exactly what Google was doing when it was indexing the web and examining user behavior to improve performance algorithms. Hadoop applies to a bunch of markets. In finance, if you want to do accurate portfolio evaluation and risk analysis, you can build sophisticated models that are hard to jam into a database engine. But Hadoop can handle it. In online retail, if you want to deliver better search answers to your customers so they're more likely to buy the thing you show them, that sort of problem is well addressed by the platform Google built. Those are just a few examples. Local Touch, Global Reach 27

What does the Hadoop architecture look like? Hadoop Internal Software Architecture Local Touch, Global Reach 28

Enterprise Hadoop Vendors The free open source application, Apache Hadoop, is available for enterprise IT departments to download, use and change however they wish. But for many business users, the need for support and technical expertise often largely overshadows the lure of free do-it-yourself applications, especially when there are critical IT systems at stake. That's where supported, enterprise-ready versions of Hadoop can instead be a better, more realistic option. Here is a sampling of some of the major commercial vendors that can help your company get started with Hadoop. Some offer on-premises software packages; others sell Hadoop in the cloud. There are also some Hadoop database appliances beginning to appear, including the recently announced joint effort by Oracle and Cloudera. Amazon Web Services runs Amazon Elastic MapReduce, a hosted Hadoop framework running on Amazon's Elastic Compute Cloud and its Simple Storage Service The Cloudera Enterprise subscription service The Datameer Analytics Solution using Hadoop The DataStax Enterprise Hadoop software Greenplum, a Division of EMC, offers Greenplum HD Enterprise-Ready Apache Hadoop The Hortonworks Data Platform BigInsights, an unstructured-data cloud service from IBM based on Hadoop Karmasphere Analyst, a toolkit to help produce data using Hadoop MapR provides an enterprise-ready M5 edition of its Hadoop software This list features only some of the many vendors offering enterprise Hadoop products and services today. The number of vendors is constantly growing as Hadoop gains steady traction in the data marketplace. Local Touch, Global Reach 29

WHY HADOOP? Hadoop Open source platform supporting large-scale parallel processing 1000 s of servers Massive scale distributed file system Petabytes of data Customer Requirements Very affordable, scalable storage (petabytes) Want to store complete transaction data Flexible schema new datasets with new schema created regularly Scalable, flexible analytics generation of models of fraudulent card usage Job fault-tolerance Hadoop Benefits We showed that jobs that took multiple weeks reduced to hours with Hadoop Fundamentally change what they are able to do 30 30 Local Touch, Global Reach 30

Vendor Perspective Local Touch, Global Reach 31

Big Data Vendor Landscape Local Touch, Global Reach 32

Big Data Market The Big Data market is on the verge of a rapid growth spurt that will see it top the $50 billion mark worldwide within the next five years. As of early 2012, the Big Data market stands at just over $5 billion based on related software, hardware, and services revenue. Increased interest in and awareness of the power of Big Data and related analytic capabilities to gain competitive advantage and to improve operational efficiencies, coupled with developments in the technologies and services that make Big Data a practical reality, will result in a super-charged CAGR of 58% between now and 2017. Local Touch, Global Reach 33

Big Data Market Forcast Big Data is the new definitive source of competitive advantage across all industries. For those organizations that understand and embrace the new reality of Big Data, the possibilities for new innovation, improved agility, and increased profitability are nearly endless. Below is Wikibon s five-year forecast for the Big Data market as a whole: Local Touch, Global Reach 34 Source: Wikibon 2012

Big Data Pure-Play Vendors Annual Revenue Below is a worldwide revenue breakdown of the top Big Data pure-play vendors as of February 2012. Local Touch, Global Reach 35 Source: Wikibon 2012

Big Data Pure-Play Vendors Market Share Below is a breakdown of market share among the pure-play segment of the Big Data market. Local Touch, Global Reach 36 Source Wikibon 2012

Components of Big-data Processing Big-data projects have a number of different layers of abstraction from abstaction of the data through to running analytics against the abstracted data. Figure 1 shows the common components of analytical Bigdata and their relationship to each other. The higher level components help make big data projects easier and more productive. Hadoop is often at the center of Big-data projects, but it is not a prerequisite. Analytical Big-data Components - Source: Wikibon 2011 Local Touch, Global Reach 37

The Forrester Wave : Enterprise Hadoop Solutions, Q1 2012 The Forrester Wave is copyrighted by Forrester Research, Inc. Forrester and Forrester Wave are trademarks of Forrester Research, Inc. The Forrester Wave is a graphical representation of Forrester's call on a market and is plotted using a detailed spreadsheet with exposed scores, weightings, and comments. Forrester does not endorse any vendor, product, or service depicted in the Forrester Wave. Information is based on best available resources. Opinions reflect judgment at the time and are subject to change. Local Touch, Global Reach 38

Cloudera Local Touch, Global Reach 39

InfoSphere Information Server IBM Big Data Platform IBM Big Data Solutions Client and Partner Solutions Marketing IBM Unica Text Statistics Big Data Accelerators Financial Geospatial Acoustic Content Analytics ECM Image/Video Mining Times Series Mathematical Connectors Applications Blueprints Big Data Enterprise Engines Business Analytics Cognos & SPSS Warehouse Appliance IBM Netezza InfoSphere Streams InfoSphere BigInsights Productivity Tools and Optimization Workload Management and Optimization Consumability and Management Tools Open Source Foundation Compnents Eclipse Oozie Hadoop HBase Pig Lucene Jaql Master Data Management InfoSphere MDM Data Warehouse InfoSphere Warehouse Database DB2 Data Growth Management InfoSphere Optim Local Touch, Global Reach 40

LEGEND BigInsights Platform and Roadmap IBM unique value IBM differentiating value IBM complementary value Open source Performance Manageability Consumability Analytics Integration BigInsights Enterprise Console DBs Crawlers Streams Data Explorer Application Flows Dashboards/Reports Administration DBA/Analyst/ Programmer Analyst Analyst BigInsights Enterprise Engine DBA DataStage Streams DB2 Netezza JMS HTTP Web & Application logs Analytics (machine learning, text) Languages (Jaql, Pig, Hive, HBase) Workflow orchestration Map-reduce (Hadoop) File system (GPFS+, HDFS) Indexing Workload Prioritization SPSS Cognos Unica... Local Touch, Global Reach 41 41

IBM PureSystems Local Touch, Global Reach 42

Oracle s Big Data Solutions Local Touch, Global Reach 43

Oracle s Big Data Appliance and Exadata Local Touch, Global Reach 44

Microsoft BI Connectivity to Hadoop Local Touch, Global Reach 45

Microsoft Big Data stack Local Touch, Global Reach 46

The Microsoft Big Data Solution Local Touch, Global Reach 47

The Informatica Approach Data Warehouse Data Migration Test Data Management & Archiving Data Consolidation Master Data Management Data Synchronization B2B Data Exchange SWIFT NACHA HIPAA Cloud Computing Application Database Unstructured Partner Data Local Touch, Global Reach 48

Informatica Big Data Unleashed Local Touch, Global Reach 49

EMC Greenplum s MPP Shared-Nothing Architecture Local Touch, Global Reach 50

Pentaho and DataStax Pentaho and DataStax will offer the first Cassandra-based big data analytics solution that combines the highly scalable, low-latency performance of Cassandra with Kettle s visual interface for high-performance data extract, transformation and load, as well as integrated reporting, visualization and interactive analysis capabilities. This will make it easier for developers and data scientists to operationalize, integrate and analyze both big data and traditional data sources. Local Touch, Global Reach 51

DataStax Cassandra Enterprise DataStax Enterprise real-time, analytic, and search capabilities in one integrated big data platform Local Touch, Global Reach 52

Vertical Perspective Local Touch, Global Reach 53

Enhancing Fraud Detection for Banks and Credit Card Companies Scenario Build up-to-date models from transactional to feed real-time risk-scoring systems for fraud detection. Requirement Analyze volumes of data with response times that are not possible today. Apply analytic models to individual client, not just client segment. Benefits Detect transaction fraud in progress, allow fraud models to be updated in hours than weeks. Local Touch, Global Reach 54

Social Media Analysis for Products, Services and Brands Scenario Monitor data from various sources such as blogs, boards, news feeds, tweets, and social medias for information pertinent to brand and products, as well as competitors Requirement Extract and aggregate relevant topics, relationships, discover patterns and reveal up-and-coming topics and trends Benefits Brand Management for marketing campaigns, Brand protection for ad placement networks Local Touch, Global Reach 55

Store Clustering Analysis in the Retail industry Scenario Retailer with large number of stores needs to understand cluster patterns of shoppers. Requirement Use shopping patterns for multiple characteristics like location, incomes, family size for better product placement. Benefits Store specific clustering of products, clustering specific types of products by locations. Local Touch, Global Reach 56 Age Range Education Income Children Assets Urbanicity

Healthcare and Energy Industry IBM Stream Computing for Smarter Healthcare InfoSphere Streams based analytics can alert hospital staff of impending life threatening infections in premature infants up to 24 hours earlier than current practices Healthcare Energy Vestas Wind Systems use IBM big data analytics software and powerful IBM systems to improve wind turbine placement for optimal energy output. Local Touch, Global Reach 57

Big Data Value Potential Index Local Touch, Global Reach 58

The Role of the Future Local Touch, Global Reach 59

Data Science and the Data Scientist Local Touch, Global Reach 60

Data Science and the Data Scientist Local Touch, Global Reach 61

Some References Local Touch, Global Reach 62

Big Data Some References Forrester : The Forrester Wave : Enterprise Hadoop Solutions, Q1 2012 IBM Software: Big Data and Data Management IBM Systems: Big Data IBM - Big Data and Better Business Outcomes A Strategic Foundation for Analytics International Data Corporation (IDC) Oracle: Big Data McKinsey Global Institute Microsoft: Big Data EMC Greenplum: Big Data Cloudera.com Hadoop.com Wikibon: Big Data Wikipedia: Big Data Local Touch, Global Reach 63

Q & A Local Touch, Global Reach 64

Prize & Thank you! Local Touch, Global Reach 65