Big Data Analytics - Zwischen Wunsch und Realität



Similar documents
IBM Big Data Platform

Exploiting Data at Rest and Data in Motion with a Big Data Platform

Are You Ready for Big Data?

Are You Ready for Big Data?

Big Data and Trusted Information

IBM Big Data Platform

Transforming Government with Big Data and Analytics

Luncheon Webinar Series May 13, 2013

Big Data and Data Quality - Mutually Exclusive?

Addressing Open Source Big Data, Hadoop, and MapReduce limitations

How the oil and gas industry can gain value from Big Data?

IBM Big Data in Government

BAO & Big Data Overview Applied to Real-time Campaign GSE. Joel Viale Telecom Solutions Lab Solution Architect. Telecom Solutions Lab

Big Data & Analytics for Semiconductor Manufacturing

IBM InfoSphere BigInsights Enterprise Edition

Sources: Summary Data is exploding in volume, variety and velocity timely

Hadoop Ecosystem B Y R A H I M A.

Beyond Watson: The Business Implications of Big Data

The Future of Data Management

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

The Future of Data Management with Hadoop and the Enterprise Data Hub

IBM BigInsights for Apache Hadoop

HDP Hadoop From concept to deployment.

IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS!

CSE-E5430 Scalable Cloud Computing Lecture 2

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

Big Data and the new trends for BI and Analytics Juha Teljo Business Intelligence and Predictive Solutions Executive IBM Europe

HDP Enabling the Modern Data Architecture

Implement Hadoop jobs to extract business value from large and varied data sets

IBM Data Warehousing and Analytics Portfolio Summary

Hadoop implementation of MapReduce computational model. Ján Vaňo

Data Management in der Ära von Big Data

IBM System x reference architecture solutions for big data

Raul F. Chong Senior program manager Big data, DB2, and Cloud IM Cloud Computing Center of Competence - IBM Toronto Lab, Canada

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

Big Data Strategies with IMS

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

Big Data Analytics. Copyright 2011 EMC Corporation. All rights reserved.

BIG DATA TECHNOLOGY. Hadoop Ecosystem

Big Data overview. Livio Ventura. SICS Software week, Sept Cloud and Big Data Day

Industry Impact of Big Data in the Cloud: An IBM Perspective

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc All Rights Reserved

IBM BigInsights Has Potential If It Lives Up To Its Promise. InfoSphere BigInsights A Closer Look

Constructing a Data Lake: Hadoop and Oracle Database United!

Klarna Tech Talk: Mind the Data! Jeff Pollock InfoSphere Information Integration & Governance

How to Leverage Big Data in the Cloud to Gain Competitive Advantage

Large scale processing using Hadoop. Ján Vaňo

BIG DATA TRENDS AND TECHNOLOGIES

SQL Server 2012 PDW. Ryan Simpson Technical Solution Professional PDW Microsoft. Microsoft SQL Server 2012 Parallel Data Warehouse

IBM Software Hadoop in the cloud

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect

Apache Hadoop: The Big Data Refinery

Big Data System and Architecture

Data Refinery with Big Data Aspects

Modernizing Your Data Warehouse for Hadoop

Advanced In-Database Analytics

How To Create A Data Science System

Chapter 7. Using Hadoop Cluster and MapReduce

Navigating the Big Data infrastructure layer Helena Schwenk

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

Deploying Big Data to the Cloud: Roadmap for Success

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

BIG DATA What it is and how to use?

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

Please give me your feedback

Big Data & Analytics. The. Deal. About. Jacob Büchler jbuechler@dk.ibm.com Cand. Polit. IBM Denmark, Solution Exec IBM Corporation

Demystifying Big Data Government Agencies & The Big Data Phenomenon

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

An Integrated Analytics & Big Data Infrastructure September 21, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle Enterprise

Hadoop. Sunday, November 25, 12

Hadoop IST 734 SS CHUNG

Big Data & Analytics Heute & Morgen

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, Viswa Sharma Solutions Architect Tata Consultancy Services

A Brief Outline on Bigdata Hadoop

How To Handle Big Data With A Data Scientist

Executive Summary... 2 Introduction Defining Big Data The Importance of Big Data... 4 Building a Big Data Platform...

P4.1 Reference Architectures for Enterprise Big Data Use Cases Romeo Kienzler, Data Scientist, Advisory Architect, IBM Germany, Austria, Switzerland

I/O Considerations in Big Data Analytics

Big Data Analytics Nokia

A New Era Of Analytic

How To Scale Out Of A Nosql Database

Data-Intensive Programming. Timo Aaltonen Department of Pervasive Computing

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

How To Use Big Data To Help A Retailer

Big Data Challenges and Success Factors. Deloitte Analytics Your data, inside out

Workshop on Hadoop with Big Data

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

HP Vertica. Echtzeit-Analyse extremer Datenmengen und Einbindung von Hadoop. Helmut Schmitt Sales Manager DACH

Big Data on Microsoft Platform

Big Data Explained. An introduction to Big Data Science.

Cray: Enabling Real-Time Discovery in Big Data

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013

HadoopTM Analytics DDN

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

Transcription:

Big Data Analytics - Zwischen Wunsch und Realität Dr. Wolfgang Rother IBM Deutschland GmbH Nahmitzer Damm 12 12277 Berlin Email: wrother@de.ibm.com 1

Agenda Über Daten Paradigmenwechsel Apache Hadoop Ein einfaches Beispiel für Text Analytics IBM Watson Big Data ist nicht nur Hadoop Weitere Big Data Analytics Beispiele Why Infrastructure Matters Zwischen Wunsch und Realität 3 How Big is the Internet of Things? 4 2

6/30/2014 The 10 A They major million meters read gas the and meters read meters electricread utilityevery has 15 minutes = Now, they smart reare installing 10 once million abillion an month. hour. meters. smart 350 meters. transactions a year. 5 The Big Data Conundrum The percentage of available data an enterprise can analyze is decreasing This means enterprises are getting more naive over time Data AVAILABLE to an organization Data an organization can PROCESS 6 3

The Four V s Volume Use greater amounts of data Variety Use more types of data Velocity Use data more quickly Veracity Use uncertain data 7 Big Data is All Data and All Paradigms Transactional & Application Data Machine Data Social Data Enterprise Content Volume Velocity Variety Variety Structured Structured Unstructured Unstructured Throughput Ingestion Veracity Volume 8 4

PARADIGMENWECHSEL How is Big Data transforming the way organizations analyze information and generate actionable insights? Paradigm shifts enabled by big data Leverage more of the data being captured TRADITIONAL APPROACH BIG DATA APPROACH All available information Analyzed information All available information analyzed Analyze small subsets of information Analyze all information 10 5

How is Big Data transforming the way organizations analyze information and generate actionable insights? Paradigm shifts enabled by big data Reduce effort required to leverage data TRADITIONAL APPROACH BIG DATA APPROACH Small amount of carefully organized information Large amount of messy information Carefully cleanse information before any analysis Analyze information as is, cleanse as needed 11 How is Big Data transforming the way organizations analyze information and generate actionable insights? Paradigm shifts enabled by big data Data leads the way and sometimes correlations are good enough TRADITIONAL APPROACH BIG DATA APPROACH Hypothesis Question Data Exploration Answer Data Insight Correlation Start with hypothesis and test against selected data Explore all data and identify correlations 12 6

How is Big Data transforming the way organizations analyze information and generate actionable insights? Paradigm shifts enabled by big data Leverage data as it is captured TRADITIONAL APPROACH BIG DATA APPROACH Data Analysis Data Repository Analysis Insight Insight Analyze data after it s been processed and landed in a warehouse or mart Analyze data in motion as it s generated, in real-time 13 APACHE HADOOP 7

It s easy to forget just how big the data really is! Datasets are vast Facebook daily logs ~ 60 TB 1,000 genomes project ~ 200 TB Google web index ~ 10+ PB Storage is cheap Cost of a commodity 1TB drive ~ $50 A terabyte is still a lot of data! Time to read 1TB from a single disk: ~ 6 hours @ 50 MB/second!! As data gets big, traditional approaches no longer work Distributed systems are the only way to scale 15 What is Hadoop? Apache Hadoop = free, open source framework for data-intensive applications Inspired by Google technologies (MapReduce, GFS) Well-suited to batch-oriented, read-intensive applications Originally built to address scalability problems of Nutch, an open source Web search technology Enables applications to work with thousands of nodes and petabytes of data in a highly parallel, cost effective manner CPU + disks of commodity box = Hadoop node Boxes can be combined into clusters New nodes can be added as needed without changing Data formats How data is loaded How jobs are written 16 8

How files are stored: HDFS Key ideas: Divide big files in blocks and store blocks randomly across cluster Provide API to ask: where are the pieces of this file? => Programs can be shipped to nodes for parallel distributed processing Blocks 10110100 10100100 11100111 1 11100101 00111010 01010010 11001001 2 01010011 00010100 10111010 11101011 11011011 3 01010110 10010101 00101010 10101110 01001101 4 01110100 1 2 4 4 Cluster 3 1 2 1 2 3 3 4 Logical File 17 HDFS stores data across multiple nodes http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/hdfsdesign.html 18 9

HDFS assumes nodes will fail, so it achieves reliability by replicating data across multiple nodes http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/hdfsdesign.html 19 How Files are Processed: MapReduce Common pattern in data processing: apply a function, then aggregate grep "World Cup *.txt wc l User simply writes two pieces of code: mapper and reducer Mapper code executes on every split of every file Reducer consumes/aggregates mapper outputs The Hadoop MR framework takes care of the rest (resource allocation, scheduling, coordination, temping of intermediate results, storage of final result on HDFS) Splits 10110100 10100100 11100111 1 11100101 00111010 01010010 11001001 2 01010011 00010100 10111010 11101011 11011011 3 01010110 10010101 1 2 Cluster Map Map Reduce Map Result 3 20 Logical File 10

Logical MapReduce Example: Word Count map(string key, String value): // key: document name // value: document contents for each word w in value: EmitIntermediate(w, "1"); reduce(string key, Iterator values): // key: a word // values: a list of counts int result = 0; for each v in values: result += ParseInt(v); Emit(AsString(result)); Content of Input Documents Hello World Bye World Hello IBM Map 1 emits: < Hello, 1> < World, 1> < Bye, 1> < World, 1> Map 2 emits: < Hello, 1> < IBM, 1> Reduce (final output): < Bye, 1> < IBM, 1> < Hello, 2> < World, 2> 21 WordCount 22 11

So What Does This Result In? Easy To Scale Fault Tolerant and Self-Healing Data Agnostic Extremely Flexible BUT you need programming skills 23 EIN EINFACHES BEISPIEL FÜR TEXTANALYSE Aus einer Bachelorarbeit Wirtschaftsinformatik FH Brandenburg 12

Use Case: IBM Quartalsberichte Ziel: Lösung eines Big Data Textanalyse Problems ohne Expertenhilfe oder spezielle Ausbildung Umgebung: IBM POWER 7R2 Server RHEL 6.2 IBM Infosphere BigInsights 2.0 Ablauf Laden von Pressemitteilungen mittels Webcrawler Erste Verarbeitung in BigSheets Entwicklung von Textanalyse Skripts Anwenden der Skripte 25 BigInsights Enterprise Edition Open Source IBM Optional IBM and partner offerings Analytics and discovery Text processing engine and library Accelerator for social data analysis Apps Web Crawler Boardreader DB export DB import Ad hoc query Administrative and development tools Web console Infrastructure Integrated installer Text compression BigSheets Enhanced security Indexing Flexible scheduler Accelerator for machine data analysis ZooKeeper Oozie Lucene GPFS (EAP) Distrib file copy... Jaql HBase Adaptive MapReduce HCatalog Machine learning Data processing Pig Hive MapReduce HDFS Monitor cluster health, jobs, etc. Add / remove nodes Start / stop services Inspect job status Inspect workflow status Deploy applications Launch apps / jobs Work with distrib file system Work with spreadsheet Interface Support REST-based API... Eclipse tools Connectivity and Integration JDBC Sqoop DB2 Netezza R Streams Text analytics MapReduce programming Jaql, Hive, Pig development BigSheets plug-in development Oozie workflow generation Flume Data Explorer Guardium DataStage Cognos BI 26 13

BigInsights and Text Analytics Distills structured info from unstructured text Sentiment analysis Consumer behavior Illegal or suspicious activities Parses text and detects meaning with annotators Understands the context in which the text is analyzed Features pre-built extractors for names, addresses, phone numbers, etc. Built-in support for English, Spanish, French, German, Portuguese, Dutch, Japanese, Chinese Unstructured text (document, email, etc) Football World Cup 2010, one team distinguished themselves well, losing to the eventual champions 1-0 in the Final. Early in the second half, Netherlands striker,arjen Robben, had a breakaway, but the keeper for Spain, Iker Casillas made the save. Winger Andres Iniesta scored for Spain for the win. Classification and Insight 27 Web Crawler Web Crawler intuitiv einsetzbar Abhängig von Breitbandanbindung zeitintensiv Laufzeit über 3 Tage 28 14

Use Case: Erste Verarbeitung in BigSheets WebCrawler lieferte über 17.000 Pressemitteilungen Nach Filterung nur noch 65 Quartalsberichte Innerhalb des erstellten Workbooks wurden zunächst alle HTML Seiten extrahiert, welche die Begriffe quarter und results enthalten. 29 Text Analytics Tooling AQL Editor Result Viewer Runtime Explain 30 15

Use Case: Entwicklung eines AQL Textanalyse Skripts create view content as extract regex /Start Whitespace.* End Whitespace/ on D.text as text from Document D; 31 Entwicklung AQL Textanalyse Skript 32 16

Use Case Entwicklung Textanalyse Skript Weitere 8 Views waren notwendig um Umsatz nach Region, Jahr und Quartal zu extrahieren. 33 Anwenden der Textanalyse Skripte America? Q4? 34 Man beachte: Information sind nicht immer vollständig! Forschung nach Ursachen vs. Auswirkung? 17

IBM WATSON IBM Watson answers a grand challenge Can we design a computing system that rivals a human s ability to answer questions posed in natural language, interpreting meaning and context and retrieving, analyzing and understanding vast amounts of information in real-time? 36 18

2011: Taking on Jeopardy! Chess A finite, mathematically well-defined search space Large but limited number of moves and states Everything explicit, unambiguous mathematical rules Human Language Ambiguous, contextual and implicit Grounded only in human cognition Seemingly infinite number of ways to express the same meaning 37 Keyword search In May 1898 Portugal celebrated the 400th anniversary of this explorer s arrival in India. In May, Craig arrived in India after he celebrated his anniversary in Portugal. arrived in celebrated Keyword Matching celebrated In May 1898 Keyword Matching In May 400th anniversary Keyword Matching anniversary Portugal Keyword Matching in Portugal arrival in India Keyword Matching India 38 explorer Craig 19

Finding Deeper Evidence In May 1898 Portugal celebrated the 400th anniversary of this explorer s arrival in India. On On 27th 27th May May 1498, 1498, Vasco Vasco da da Gama Gama On landed 27th May landed in in Kappad Kappad 1498, Vasco Beach Beachda Gama On the 27 landed in th of May 1498, Vasco da Kappad Beach Gama landed in Kappad Beach Search Far and Wide Explore many hypotheses celebrated Portugal Find & judge evidence Many inference algorithms landed in May 1898 400th anniversary Temporal Reasoning 27th May 1498 arrival in Statistical Paraphrasing India GeoSpatial Reasoning Kappad Beach explorer Vasco da Gama 39 Watson won Jeopardy, but the People THE AMERICAN DREAM Decades before Lincoln, Daniel Webster spoke of government "made for", "made by" & "answerable to" them No One Apollo 11 moon landing MILESTONES In 1994, 25 years after this event, 1 participant said, "For one crowning moment, we were creatures of the cosmic ocean the Big Bang Louis Pasteur FATHERLY NICKNAMES This Frenchman was "The Father of Bacteriology" How Tasty Was My Little Frenchman 40 20

41 Watson Workload Optimized System in 2011 90 x IBM Power 750 1 servers 2880 POWER7 cores POWER7 3.55 GHz chip 500 GB per sec on-chip bandwidth 10 Gb Ethernet network 16 Terabytes of memory 20 Terabytes of disk storage Can operate at 80 Teraflops Runs IBM DeepQA software Scales out with and searches vast amounts of unstructured information with UIMA & Hadoop open source components SUSE Linux performance-optimized to exploit POWER 7 systems 10 racks include servers, networking, shared disk system, cluster controllers 42 1 Note that the Power 750 featuring POWER7 is a commercially available server that runs AIX, IBM i and Linux and has been in market since Feb 2010 21

What s for Watson? Healthcare and life sciences Diagnostic Assistance Evidence-based Collaborative Medicine In healthcare, we talk about turning data into knowledge. That s really what Watson does. Technical support: help-desk, call centers Joe Jasinski Program Director, IBM Healthcare and Life Sciences Research Enterprise knowledge management and business intelligence Government citizen services 43 BIG DATA IST NICHT NUR HADOOP 22

Ohne Analytics ist BigData Einfach nur ein Sack voll Daten MYTH: Big Data geht nur um MEHR Daten MYTH: Big Data = Hadoop... fertig MYTH: Big Data ersetzt alles Vorhandene, Tot dem RDBMS und keinerlei Governance MYTH: NoSQL = no SQL... niemals MYTH: Big Data sind unstrukturierte Daten und nur für Meinungsanalysen 45 How are leading companies transforming their data and analytics environment? Big Data Hadoop There s a belief that if you want big data, you need to go out and buy Hadoop and then you re pretty much set. People shouldn t get ideas about turning off their relational systems and replacing them with Hadoop As we start thinking about big data from the perspective of business needs, we re realizing that Hadoop isn t always the best tool for everything we need to do, and that using the wrong tool can sometimes be painful. Ken Rudin Head of Analytics at Facebook 46 23

Big Data is about more than just Hadoop Data may be structured, un-structured, static, in-flight (or all of above) Data at rest Huge volumes of data on disk Structured or semi-structured May or may not have schemas Too large for traditional tools Need to process in place Data in Motion In-flight, frequently not stored Tremendous velocity, high bandwidth Diverse data sources Frequently unstructured, semi-structured Ultra low-latency processing required 47 InfoSphere Streams delivers analytics for data in-motion Real time delivery ICU Monitoring Environment Monitoring Scale-out architecture for massive linear scalability Sophisticated analytics with pre-built toolkits & accelerators Comprehensive development tools to build applications with minimal learning Algorithmic Trading Millions of events per second Powerful Analytics Cyber Security Government / Law enforcement Telco Churn Prediction Smart Grid Microsecond Latency Traditional / Non-traditional data sources 48 Video, audio, networks, social media, etc 24

New Architecture to Leverage All Data and Analytics Data in Motion Data at Rest Streams Information Ingestion and Operational Information Stream Processing Data Integration Master Data Real-time Analytics Video/Audio Network/Sensor Entity Analytics Predictive Landing Area, Analytics Zone and Archive Raw Data Structured Data Text Analytics Data Mining Entity Analytics Machine Learning Exploration, Integrated Warehouse, and Mart Zones Discovery Deep Reflection Operational Predictive Intelligence Analysis Decision Management BI and Predictive Analytics Navigation and Discovery Data in Many Forms Information Governance, Security and Business Continuity 49 How are leading companies transforming their data and analytics environment? Big Data Landing zone eco-system Watson Foundations 5 Data Types Real-time processing & analytics 2 3 Actionable Insight Machine and sensor data Image and video Enterprise content Transaction and application data Social data Operational systems 3 Exploration, landing and archive 1 Trusted data 3 Deep analytics & modeling 3 Reporting & interactive analysis 3 Decision management Predictive analytics and modeling 3 Reporting, analysis, content analytics Third-party data 3 Discovery and exploration Information Integration & Governance 4 1 2 More than Hadoop Greater resiliency and recoverability Advanced workload management & multi-tenancy Enhanced, flexible storage management (GPFS) Enhanced data access (BigSQL, Search) Analytics accelerators & visualization Enterprise-ready security framework Data in Motion Enterprise class stream processing & analytics 3 4 5 Analytics Everywhere Richest set of analytics capabilities Ability to analyze data in place Governance Everywhere Complete integration & governance capabilities Ability to govern all data where ever it is Complete Portfolio End-to-end capabilities to address all needs Ability to grow and address future needs Remains open to work with existing investments 50 25

Why SQL on Hadoop? Hadoop stores large volumes and varieties of data SQL gets information and insight out of Hadoop SQL leverages existing IT skills resulting in quicker time to value and lower cost 51 SQL on Hadoop and Hive Hadoop can process data of any kind (as long as it's splittable, etc) A very common scenario: Tabular data Programs that query the data Java Hadoop APIs are the wrong tool for this Too low level, steep learning curve Require strong programming expertise Universally accepted solution: SQL Enter Hive... 1.Impose relational structure on plain files 2.Translate SELECT statements to MapReduce jobs 3.Hide all the low level details 52 26

Big SQL 3.0 Comprehensive SQL functionality IBM SQL/PL support, including Stored procedures (SQL bodied and external) Functions (SQL bodied and external) IBM Data Server JDBC and ODBC drivers SQL-based Application IBM data server client Leverages advanced IBM SQL compiler/runtime High performance native (C++) runtime Replaces Map/Reduce Advanced message passing runtime Data flows between nodes without requiring persisting intermediate results Continuous running daemons Advanced workload management allows resources to remain constrained Low latency, high throughput Big SQL Engine SQL MPP Run-time Data Sources CSV Seq Parquet RC Avro ORC JSON Custom InfoSphere BigInsights 53 Big R End-to-end integration of R into IBM BigInsights R Clients 1. Explore, visualize, transform, and model big data using familiar R syntax and paradigm Pull data (summaries) to R client R Packages 2. Scale out R Partitioning of large data ( divide ) Parallel cluster execution of pushed down R code ( conquer ) All of this from within the R environment (Jaql, Map/Reduce are hidden from you Almost any R package can run in this environment 3. Scalable machine learning A scalable statistics engine that provides canned algorithms, and an ability to author new ones, all via R Or, push R functions right on the data Data Sources R Packages 1 2 3 Scalable Statistic s Engine Embedded R Execution 54 27

Why names are difficult? There are no consistent standards for names. Some countries mandate certain standards but they differ from country to country, and most countries have no standards. Names can contain a variety of OPTIONAL information that can make the same name appear very differently. Ben Al Haden (Anglo) Bin Al-Hadin (son of somebody who came from the city of Hadin) Bin Al Hadin (son of Hadin) Bint Ali Hadin Renato Loffreda Mancinelli = Renato Mancinelli <> Renato Loffreda Using the anglo rules 55 IBM InfoSphere Identity Insight Solutions Commercially available Identity Analytics and Relationship Detection software Identity Insight 3 Key Functionalities: Who is who? No matter how hard they try to hide - Who knows who? The infamous hiding behind the innocuous - Who does what? Alerts you when bad guys do bad things Who Is Who?????? Who Knows Who Who Does What Entity Analytics is a methodical process of detecting like and related entities across large, sparse, and disparate collections of data, that is both new and old, internal and external, using advanced techniques to establish connections that are not obvious. 56 28

BIG DATA ANALYTICS BEISPIELE Predictive Maintenance bei Union Pacific Predictive analytics help Union Pacific to predict certain derailments days or even weeks before they are likely to occur. Using thermometers, acoustic and visual sensors on the underside of each of its rail carriages they can detect and analyse imminent problems with tracks and wheels. In order for all the data to be transmitted over the vast rail system, they have deployed a fibre optic communications network throughout its rail system. Although a train derailment does not have to be a large accident, small errors can result in vast delays and with 3.350 trains operational on any given day this can become very expensive. 58 29

6/30/2014 Smarter Farming Claas Landmaschinen: Landmaschinenhersteller arbeiten unterdessen an der Vernetzung von Maschinen und Daten, an Strategien des Data-Mining. Bodendaten, Ertragsdaten, Verbrauchsdaten, Wetterdaten, sie werden zum Rohstoff eines umfassenden Expertensystems. Landwirtschaft 4.0 nennen das die Experten eine Parallele zur Industrie 4.0, in der Maschinen und Werkstücke miteinander kommunizieren. 365FarmNet nennt das Claas und holt mit Erfolg auch die Konkurrenz auf diese erste universelle Managementplattform. 59 Handel Luxottica nutzt statistische Methoden auf einem Verhaltensmodell, um Kunden über Identitäten hinweg zu segmentieren und bewerten. 10% improvement in marketing effectiveness 100 million customers can be down-selected to the highest value individuals Target individual customers based on unique preferences and histories Solution Components 60 Customer Intelligence Appliance Software Twin Fin 12 PDA IBM Campaign IBM Enterprise Marketing Operations Business Challenge: Luxottica, the eyewear giant with nearly 100 million customers in eight house brands on the company s numerous websites and in retail stores, generates massive amounts of data, the majority of which was housed and managed by outside data and marketing vendors. Lacking a holistic understanding and view of the customers, marketers struggled to nurture customer relationships, seize cross-sell and up-sell opportunities, personalize campaigns and acquire new customers during the shopping process. The Smarter Solution: After a successful proof of concept, the company is deploying an advanced Customer Intelligence analytics appliance, built on a high-performance platform that integrates online and physical customer data from multiple sources. The resulting 360-degree omni-channel customer view will not only help the retailer identify its most profitable sales channels, but also segment, track and score customers down to the persona level based on thousands of behavioral attributes, and refine and personalize marketing campaigns. The results of the POC were eye-opening, revealing unprecedented and actionable insight into omni-channel customers we had never seen or analyzed before. Chief Digital Officer 30

Optimizing capital investments based on double digit Petabyte analysis Model the weather to optimize placement of turbines, maximizing power generation for their client and longevity (warranty optimization) Needed more data in richer models (adding hundreds of variables) Perspective: If you were to replay the Vestas Wind library, you would be sitting down to watch 70 years of TV in HD http://www.youtube.com/watch?v=z4xka4qye5i 61 Neonatal Care http://www.youtube.com/watch?v=cc8uv3tcsfg InfoSphere Streams Low Latency Analytics for streaming data 62 Multiple devices are attached to the baby or humidicrib Medical devices output via serial port in a range of formats Indicative readings are recorded on paper every 30 or 60 minutes Cost of care per baby is approx $100-150K not including morbidity related care 31

Wir essen mehr Süsses, wenn es regnet Wetterabhängige Absatzprognosen für eine Großbäckerei Selbstlernender Regelkreis Verbesserte Produkt- und Service verfügbarkeit Kaufverhalten Data Mining -30% Retouren hoch präzise Absatz- Prognosemodelle Spart 2-3 Arbeitsstunden pro Woche und Filiale Präzisere Produktionsplanung Punktgenaue Wettervorhersage für jede Filiale Abfallvermeidung Umweltschutz 63 Optimierung der Einsatzplanung Sixt Autovermietung Standort A Standort B Modeller Optimierte Einsatzplanung Standort C Standort D Kundenverhalten Voraussage No-Show Fahrzeugbuchungen Fahrzeug- Verfügbarkeit Stillstände vermeiden No Show -Kunden erschweren Einsatzplanung Überbuchung zur besseren Auslastung Ohne Eingriff in Prozesse oder Infrastruktur 64 32

Prävention bei Mehrfach- und Intensivtätern Kriminalistisch-Kriminologische Forschungsstelle des Hessischen Landes-Kriminal-Amts Vollerhebung Biografien von Mehrfachund Intensivtätern Clusteranalyse Handlungsrelevantes Wissen Ableitung passender Maßnahmen Prävention 65 The 5 Key Use Cases Big Data Exploration Find, visualize, understand all big data to improve decision making Enhanced 360 o View of the Customer Extend existing customer views by incorporating additional internal and external information sources Security/Intelligence Extension Lower risk, detect fraud and monitor cyber security in real-time 66 Operations Analysis Analyze a variety of machine data for improved business results Data Warehouse Augmentation Integrate big data and data warehouse capabilities to increase operational efficiency 33

We can take the same use cases further with big data solutions Financial Services Fraud detection Risk management 360 View of the Customer Utilities Weather impact analysis on power generation Transmission monitoring Smart grid management Transportation Weather and traffic impact on logistics and fuel consumption Health & Life Sciences Epidemic early warning system ICU monitoring Remote healthcare monitoring IT Transition log analysis for multiple transactional systems Cybersecurity Retail 360 View of the Customer Click-stream analysis Real-time promotions Telecommunications CDR processing Churn prediction Geomapping / marketing Network monitoring Law Enforcement Real-time multimodal surveillance Situational awareness Cyber security detection 67 WHY INFRATRUCTURE MATTERS 34

Access Matters To get new levels of visibility into customers and operations Speed Matters To accelerate insights in real-time at the point of impact Availability Matters To consistently deliver insights to the people and processes that need them Infrastructure must enable shared and secured access to all relevant data, no matter it s type or where it resides. Infrastructure must build intelligence into operational events and transactions. Infrastructure must maximize the availability of information and insights at the point of impact. 69 Herausforderungen an Big Data Analytics Projekte ZWISCHEN WUNSCH UND REALITÄT 70 35

FRAGEN? 71 36