Big Data Analytics - Zwischen Wunsch und Realität

Size: px
Start display at page:

Download "Big Data Analytics - Zwischen Wunsch und Realität"

Transcription

1 Big Data Analytics - Zwischen Wunsch und Realität Dr. Wolfgang Rother IBM Deutschland GmbH Nahmitzer Damm Berlin wrother@de.ibm.com 1

2 Agenda Über Daten Paradigmenwechsel Apache Hadoop Ein einfaches Beispiel für Text Analytics IBM Watson Big Data ist nicht nur Hadoop Weitere Big Data Analytics Beispiele Why Infrastructure Matters Zwischen Wunsch und Realität 3 How Big is the Internet of Things? 4 2

3 6/30/2014 The 10 A They major million meters read gas the and meters read meters electricread utilityevery has 15 minutes = Now, they smart reare installing 10 once million abillion an month. hour. meters. smart 350 meters. transactions a year. 5 The Big Data Conundrum The percentage of available data an enterprise can analyze is decreasing This means enterprises are getting more naive over time Data AVAILABLE to an organization Data an organization can PROCESS 6 3

4 The Four V s Volume Use greater amounts of data Variety Use more types of data Velocity Use data more quickly Veracity Use uncertain data 7 Big Data is All Data and All Paradigms Transactional & Application Data Machine Data Social Data Enterprise Content Volume Velocity Variety Variety Structured Structured Unstructured Unstructured Throughput Ingestion Veracity Volume 8 4

5 PARADIGMENWECHSEL How is Big Data transforming the way organizations analyze information and generate actionable insights? Paradigm shifts enabled by big data Leverage more of the data being captured TRADITIONAL APPROACH BIG DATA APPROACH All available information Analyzed information All available information analyzed Analyze small subsets of information Analyze all information 10 5

6 How is Big Data transforming the way organizations analyze information and generate actionable insights? Paradigm shifts enabled by big data Reduce effort required to leverage data TRADITIONAL APPROACH BIG DATA APPROACH Small amount of carefully organized information Large amount of messy information Carefully cleanse information before any analysis Analyze information as is, cleanse as needed 11 How is Big Data transforming the way organizations analyze information and generate actionable insights? Paradigm shifts enabled by big data Data leads the way and sometimes correlations are good enough TRADITIONAL APPROACH BIG DATA APPROACH Hypothesis Question Data Exploration Answer Data Insight Correlation Start with hypothesis and test against selected data Explore all data and identify correlations 12 6

7 How is Big Data transforming the way organizations analyze information and generate actionable insights? Paradigm shifts enabled by big data Leverage data as it is captured TRADITIONAL APPROACH BIG DATA APPROACH Data Analysis Data Repository Analysis Insight Insight Analyze data after it s been processed and landed in a warehouse or mart Analyze data in motion as it s generated, in real-time 13 APACHE HADOOP 7

8 It s easy to forget just how big the data really is! Datasets are vast Facebook daily logs ~ 60 TB 1,000 genomes project ~ 200 TB Google web index ~ 10+ PB Storage is cheap Cost of a commodity 1TB drive ~ $50 A terabyte is still a lot of data! Time to read 1TB from a single disk: ~ 6 50 MB/second!! As data gets big, traditional approaches no longer work Distributed systems are the only way to scale 15 What is Hadoop? Apache Hadoop = free, open source framework for data-intensive applications Inspired by Google technologies (MapReduce, GFS) Well-suited to batch-oriented, read-intensive applications Originally built to address scalability problems of Nutch, an open source Web search technology Enables applications to work with thousands of nodes and petabytes of data in a highly parallel, cost effective manner CPU + disks of commodity box = Hadoop node Boxes can be combined into clusters New nodes can be added as needed without changing Data formats How data is loaded How jobs are written 16 8

9 How files are stored: HDFS Key ideas: Divide big files in blocks and store blocks randomly across cluster Provide API to ask: where are the pieces of this file? => Programs can be shipped to nodes for parallel distributed processing Blocks Cluster Logical File 17 HDFS stores data across multiple nodes

10 HDFS assumes nodes will fail, so it achieves reliability by replicating data across multiple nodes 19 How Files are Processed: MapReduce Common pattern in data processing: apply a function, then aggregate grep "World Cup *.txt wc l User simply writes two pieces of code: mapper and reducer Mapper code executes on every split of every file Reducer consumes/aggregates mapper outputs The Hadoop MR framework takes care of the rest (resource allocation, scheduling, coordination, temping of intermediate results, storage of final result on HDFS) Splits Cluster Map Map Reduce Map Result 3 20 Logical File 10

11 Logical MapReduce Example: Word Count map(string key, String value): // key: document name // value: document contents for each word w in value: EmitIntermediate(w, "1"); reduce(string key, Iterator values): // key: a word // values: a list of counts int result = 0; for each v in values: result += ParseInt(v); Emit(AsString(result)); Content of Input Documents Hello World Bye World Hello IBM Map 1 emits: < Hello, 1> < World, 1> < Bye, 1> < World, 1> Map 2 emits: < Hello, 1> < IBM, 1> Reduce (final output): < Bye, 1> < IBM, 1> < Hello, 2> < World, 2> 21 WordCount 22 11

12 So What Does This Result In? Easy To Scale Fault Tolerant and Self-Healing Data Agnostic Extremely Flexible BUT you need programming skills 23 EIN EINFACHES BEISPIEL FÜR TEXTANALYSE Aus einer Bachelorarbeit Wirtschaftsinformatik FH Brandenburg 12

13 Use Case: IBM Quartalsberichte Ziel: Lösung eines Big Data Textanalyse Problems ohne Expertenhilfe oder spezielle Ausbildung Umgebung: IBM POWER 7R2 Server RHEL 6.2 IBM Infosphere BigInsights 2.0 Ablauf Laden von Pressemitteilungen mittels Webcrawler Erste Verarbeitung in BigSheets Entwicklung von Textanalyse Skripts Anwenden der Skripte 25 BigInsights Enterprise Edition Open Source IBM Optional IBM and partner offerings Analytics and discovery Text processing engine and library Accelerator for social data analysis Apps Web Crawler Boardreader DB export DB import Ad hoc query Administrative and development tools Web console Infrastructure Integrated installer Text compression BigSheets Enhanced security Indexing Flexible scheduler Accelerator for machine data analysis ZooKeeper Oozie Lucene GPFS (EAP) Distrib file copy... Jaql HBase Adaptive MapReduce HCatalog Machine learning Data processing Pig Hive MapReduce HDFS Monitor cluster health, jobs, etc. Add / remove nodes Start / stop services Inspect job status Inspect workflow status Deploy applications Launch apps / jobs Work with distrib file system Work with spreadsheet Interface Support REST-based API... Eclipse tools Connectivity and Integration JDBC Sqoop DB2 Netezza R Streams Text analytics MapReduce programming Jaql, Hive, Pig development BigSheets plug-in development Oozie workflow generation Flume Data Explorer Guardium DataStage Cognos BI 26 13

14 BigInsights and Text Analytics Distills structured info from unstructured text Sentiment analysis Consumer behavior Illegal or suspicious activities Parses text and detects meaning with annotators Understands the context in which the text is analyzed Features pre-built extractors for names, addresses, phone numbers, etc. Built-in support for English, Spanish, French, German, Portuguese, Dutch, Japanese, Chinese Unstructured text (document, , etc) Football World Cup 2010, one team distinguished themselves well, losing to the eventual champions 1-0 in the Final. Early in the second half, Netherlands striker,arjen Robben, had a breakaway, but the keeper for Spain, Iker Casillas made the save. Winger Andres Iniesta scored for Spain for the win. Classification and Insight 27 Web Crawler Web Crawler intuitiv einsetzbar Abhängig von Breitbandanbindung zeitintensiv Laufzeit über 3 Tage 28 14

15 Use Case: Erste Verarbeitung in BigSheets WebCrawler lieferte über Pressemitteilungen Nach Filterung nur noch 65 Quartalsberichte Innerhalb des erstellten Workbooks wurden zunächst alle HTML Seiten extrahiert, welche die Begriffe quarter und results enthalten. 29 Text Analytics Tooling AQL Editor Result Viewer Runtime Explain 30 15

16 Use Case: Entwicklung eines AQL Textanalyse Skripts create view content as extract regex /Start Whitespace.* End Whitespace/ on D.text as text from Document D; 31 Entwicklung AQL Textanalyse Skript 32 16

17 Use Case Entwicklung Textanalyse Skript Weitere 8 Views waren notwendig um Umsatz nach Region, Jahr und Quartal zu extrahieren. 33 Anwenden der Textanalyse Skripte America? Q4? 34 Man beachte: Information sind nicht immer vollständig! Forschung nach Ursachen vs. Auswirkung? 17

18 IBM WATSON IBM Watson answers a grand challenge Can we design a computing system that rivals a human s ability to answer questions posed in natural language, interpreting meaning and context and retrieving, analyzing and understanding vast amounts of information in real-time? 36 18

19 2011: Taking on Jeopardy! Chess A finite, mathematically well-defined search space Large but limited number of moves and states Everything explicit, unambiguous mathematical rules Human Language Ambiguous, contextual and implicit Grounded only in human cognition Seemingly infinite number of ways to express the same meaning 37 Keyword search In May 1898 Portugal celebrated the 400th anniversary of this explorer s arrival in India. In May, Craig arrived in India after he celebrated his anniversary in Portugal. arrived in celebrated Keyword Matching celebrated In May 1898 Keyword Matching In May 400th anniversary Keyword Matching anniversary Portugal Keyword Matching in Portugal arrival in India Keyword Matching India 38 explorer Craig 19

20 Finding Deeper Evidence In May 1898 Portugal celebrated the 400th anniversary of this explorer s arrival in India. On On 27th 27th May May 1498, 1498, Vasco Vasco da da Gama Gama On landed 27th May landed in in Kappad Kappad 1498, Vasco Beach Beachda Gama On the 27 landed in th of May 1498, Vasco da Kappad Beach Gama landed in Kappad Beach Search Far and Wide Explore many hypotheses celebrated Portugal Find & judge evidence Many inference algorithms landed in May th anniversary Temporal Reasoning 27th May 1498 arrival in Statistical Paraphrasing India GeoSpatial Reasoning Kappad Beach explorer Vasco da Gama 39 Watson won Jeopardy, but the People THE AMERICAN DREAM Decades before Lincoln, Daniel Webster spoke of government "made for", "made by" & "answerable to" them No One Apollo 11 moon landing MILESTONES In 1994, 25 years after this event, 1 participant said, "For one crowning moment, we were creatures of the cosmic ocean the Big Bang Louis Pasteur FATHERLY NICKNAMES This Frenchman was "The Father of Bacteriology" How Tasty Was My Little Frenchman 40 20

21 41 Watson Workload Optimized System in x IBM Power servers 2880 POWER7 cores POWER GHz chip 500 GB per sec on-chip bandwidth 10 Gb Ethernet network 16 Terabytes of memory 20 Terabytes of disk storage Can operate at 80 Teraflops Runs IBM DeepQA software Scales out with and searches vast amounts of unstructured information with UIMA & Hadoop open source components SUSE Linux performance-optimized to exploit POWER 7 systems 10 racks include servers, networking, shared disk system, cluster controllers 42 1 Note that the Power 750 featuring POWER7 is a commercially available server that runs AIX, IBM i and Linux and has been in market since Feb

22 What s for Watson? Healthcare and life sciences Diagnostic Assistance Evidence-based Collaborative Medicine In healthcare, we talk about turning data into knowledge. That s really what Watson does. Technical support: help-desk, call centers Joe Jasinski Program Director, IBM Healthcare and Life Sciences Research Enterprise knowledge management and business intelligence Government citizen services 43 BIG DATA IST NICHT NUR HADOOP 22

23 Ohne Analytics ist BigData Einfach nur ein Sack voll Daten MYTH: Big Data geht nur um MEHR Daten MYTH: Big Data = Hadoop... fertig MYTH: Big Data ersetzt alles Vorhandene, Tot dem RDBMS und keinerlei Governance MYTH: NoSQL = no SQL... niemals MYTH: Big Data sind unstrukturierte Daten und nur für Meinungsanalysen 45 How are leading companies transforming their data and analytics environment? Big Data Hadoop There s a belief that if you want big data, you need to go out and buy Hadoop and then you re pretty much set. People shouldn t get ideas about turning off their relational systems and replacing them with Hadoop As we start thinking about big data from the perspective of business needs, we re realizing that Hadoop isn t always the best tool for everything we need to do, and that using the wrong tool can sometimes be painful. Ken Rudin Head of Analytics at Facebook 46 23

24 Big Data is about more than just Hadoop Data may be structured, un-structured, static, in-flight (or all of above) Data at rest Huge volumes of data on disk Structured or semi-structured May or may not have schemas Too large for traditional tools Need to process in place Data in Motion In-flight, frequently not stored Tremendous velocity, high bandwidth Diverse data sources Frequently unstructured, semi-structured Ultra low-latency processing required 47 InfoSphere Streams delivers analytics for data in-motion Real time delivery ICU Monitoring Environment Monitoring Scale-out architecture for massive linear scalability Sophisticated analytics with pre-built toolkits & accelerators Comprehensive development tools to build applications with minimal learning Algorithmic Trading Millions of events per second Powerful Analytics Cyber Security Government / Law enforcement Telco Churn Prediction Smart Grid Microsecond Latency Traditional / Non-traditional data sources 48 Video, audio, networks, social media, etc 24

25 New Architecture to Leverage All Data and Analytics Data in Motion Data at Rest Streams Information Ingestion and Operational Information Stream Processing Data Integration Master Data Real-time Analytics Video/Audio Network/Sensor Entity Analytics Predictive Landing Area, Analytics Zone and Archive Raw Data Structured Data Text Analytics Data Mining Entity Analytics Machine Learning Exploration, Integrated Warehouse, and Mart Zones Discovery Deep Reflection Operational Predictive Intelligence Analysis Decision Management BI and Predictive Analytics Navigation and Discovery Data in Many Forms Information Governance, Security and Business Continuity 49 How are leading companies transforming their data and analytics environment? Big Data Landing zone eco-system Watson Foundations 5 Data Types Real-time processing & analytics 2 3 Actionable Insight Machine and sensor data Image and video Enterprise content Transaction and application data Social data Operational systems 3 Exploration, landing and archive 1 Trusted data 3 Deep analytics & modeling 3 Reporting & interactive analysis 3 Decision management Predictive analytics and modeling 3 Reporting, analysis, content analytics Third-party data 3 Discovery and exploration Information Integration & Governance More than Hadoop Greater resiliency and recoverability Advanced workload management & multi-tenancy Enhanced, flexible storage management (GPFS) Enhanced data access (BigSQL, Search) Analytics accelerators & visualization Enterprise-ready security framework Data in Motion Enterprise class stream processing & analytics Analytics Everywhere Richest set of analytics capabilities Ability to analyze data in place Governance Everywhere Complete integration & governance capabilities Ability to govern all data where ever it is Complete Portfolio End-to-end capabilities to address all needs Ability to grow and address future needs Remains open to work with existing investments 50 25

26 Why SQL on Hadoop? Hadoop stores large volumes and varieties of data SQL gets information and insight out of Hadoop SQL leverages existing IT skills resulting in quicker time to value and lower cost 51 SQL on Hadoop and Hive Hadoop can process data of any kind (as long as it's splittable, etc) A very common scenario: Tabular data Programs that query the data Java Hadoop APIs are the wrong tool for this Too low level, steep learning curve Require strong programming expertise Universally accepted solution: SQL Enter Hive... 1.Impose relational structure on plain files 2.Translate SELECT statements to MapReduce jobs 3.Hide all the low level details 52 26

27 Big SQL 3.0 Comprehensive SQL functionality IBM SQL/PL support, including Stored procedures (SQL bodied and external) Functions (SQL bodied and external) IBM Data Server JDBC and ODBC drivers SQL-based Application IBM data server client Leverages advanced IBM SQL compiler/runtime High performance native (C++) runtime Replaces Map/Reduce Advanced message passing runtime Data flows between nodes without requiring persisting intermediate results Continuous running daemons Advanced workload management allows resources to remain constrained Low latency, high throughput Big SQL Engine SQL MPP Run-time Data Sources CSV Seq Parquet RC Avro ORC JSON Custom InfoSphere BigInsights 53 Big R End-to-end integration of R into IBM BigInsights R Clients 1. Explore, visualize, transform, and model big data using familiar R syntax and paradigm Pull data (summaries) to R client R Packages 2. Scale out R Partitioning of large data ( divide ) Parallel cluster execution of pushed down R code ( conquer ) All of this from within the R environment (Jaql, Map/Reduce are hidden from you Almost any R package can run in this environment 3. Scalable machine learning A scalable statistics engine that provides canned algorithms, and an ability to author new ones, all via R Or, push R functions right on the data Data Sources R Packages Scalable Statistic s Engine Embedded R Execution 54 27

28 Why names are difficult? There are no consistent standards for names. Some countries mandate certain standards but they differ from country to country, and most countries have no standards. Names can contain a variety of OPTIONAL information that can make the same name appear very differently. Ben Al Haden (Anglo) Bin Al-Hadin (son of somebody who came from the city of Hadin) Bin Al Hadin (son of Hadin) Bint Ali Hadin Renato Loffreda Mancinelli = Renato Mancinelli <> Renato Loffreda Using the anglo rules 55 IBM InfoSphere Identity Insight Solutions Commercially available Identity Analytics and Relationship Detection software Identity Insight 3 Key Functionalities: Who is who? No matter how hard they try to hide - Who knows who? The infamous hiding behind the innocuous - Who does what? Alerts you when bad guys do bad things Who Is Who?????? Who Knows Who Who Does What Entity Analytics is a methodical process of detecting like and related entities across large, sparse, and disparate collections of data, that is both new and old, internal and external, using advanced techniques to establish connections that are not obvious

29 BIG DATA ANALYTICS BEISPIELE Predictive Maintenance bei Union Pacific Predictive analytics help Union Pacific to predict certain derailments days or even weeks before they are likely to occur. Using thermometers, acoustic and visual sensors on the underside of each of its rail carriages they can detect and analyse imminent problems with tracks and wheels. In order for all the data to be transmitted over the vast rail system, they have deployed a fibre optic communications network throughout its rail system. Although a train derailment does not have to be a large accident, small errors can result in vast delays and with trains operational on any given day this can become very expensive

30 6/30/2014 Smarter Farming Claas Landmaschinen: Landmaschinenhersteller arbeiten unterdessen an der Vernetzung von Maschinen und Daten, an Strategien des Data-Mining. Bodendaten, Ertragsdaten, Verbrauchsdaten, Wetterdaten, sie werden zum Rohstoff eines umfassenden Expertensystems. Landwirtschaft 4.0 nennen das die Experten eine Parallele zur Industrie 4.0, in der Maschinen und Werkstücke miteinander kommunizieren. 365FarmNet nennt das Claas und holt mit Erfolg auch die Konkurrenz auf diese erste universelle Managementplattform. 59 Handel Luxottica nutzt statistische Methoden auf einem Verhaltensmodell, um Kunden über Identitäten hinweg zu segmentieren und bewerten. 10% improvement in marketing effectiveness 100 million customers can be down-selected to the highest value individuals Target individual customers based on unique preferences and histories Solution Components 60 Customer Intelligence Appliance Software Twin Fin 12 PDA IBM Campaign IBM Enterprise Marketing Operations Business Challenge: Luxottica, the eyewear giant with nearly 100 million customers in eight house brands on the company s numerous websites and in retail stores, generates massive amounts of data, the majority of which was housed and managed by outside data and marketing vendors. Lacking a holistic understanding and view of the customers, marketers struggled to nurture customer relationships, seize cross-sell and up-sell opportunities, personalize campaigns and acquire new customers during the shopping process. The Smarter Solution: After a successful proof of concept, the company is deploying an advanced Customer Intelligence analytics appliance, built on a high-performance platform that integrates online and physical customer data from multiple sources. The resulting 360-degree omni-channel customer view will not only help the retailer identify its most profitable sales channels, but also segment, track and score customers down to the persona level based on thousands of behavioral attributes, and refine and personalize marketing campaigns. The results of the POC were eye-opening, revealing unprecedented and actionable insight into omni-channel customers we had never seen or analyzed before. Chief Digital Officer 30

31 Optimizing capital investments based on double digit Petabyte analysis Model the weather to optimize placement of turbines, maximizing power generation for their client and longevity (warranty optimization) Needed more data in richer models (adding hundreds of variables) Perspective: If you were to replay the Vestas Wind library, you would be sitting down to watch 70 years of TV in HD 61 Neonatal Care InfoSphere Streams Low Latency Analytics for streaming data 62 Multiple devices are attached to the baby or humidicrib Medical devices output via serial port in a range of formats Indicative readings are recorded on paper every 30 or 60 minutes Cost of care per baby is approx $ K not including morbidity related care 31

32 Wir essen mehr Süsses, wenn es regnet Wetterabhängige Absatzprognosen für eine Großbäckerei Selbstlernender Regelkreis Verbesserte Produkt- und Service verfügbarkeit Kaufverhalten Data Mining -30% Retouren hoch präzise Absatz- Prognosemodelle Spart 2-3 Arbeitsstunden pro Woche und Filiale Präzisere Produktionsplanung Punktgenaue Wettervorhersage für jede Filiale Abfallvermeidung Umweltschutz 63 Optimierung der Einsatzplanung Sixt Autovermietung Standort A Standort B Modeller Optimierte Einsatzplanung Standort C Standort D Kundenverhalten Voraussage No-Show Fahrzeugbuchungen Fahrzeug- Verfügbarkeit Stillstände vermeiden No Show -Kunden erschweren Einsatzplanung Überbuchung zur besseren Auslastung Ohne Eingriff in Prozesse oder Infrastruktur 64 32

33 Prävention bei Mehrfach- und Intensivtätern Kriminalistisch-Kriminologische Forschungsstelle des Hessischen Landes-Kriminal-Amts Vollerhebung Biografien von Mehrfachund Intensivtätern Clusteranalyse Handlungsrelevantes Wissen Ableitung passender Maßnahmen Prävention 65 The 5 Key Use Cases Big Data Exploration Find, visualize, understand all big data to improve decision making Enhanced 360 o View of the Customer Extend existing customer views by incorporating additional internal and external information sources Security/Intelligence Extension Lower risk, detect fraud and monitor cyber security in real-time 66 Operations Analysis Analyze a variety of machine data for improved business results Data Warehouse Augmentation Integrate big data and data warehouse capabilities to increase operational efficiency 33

34 We can take the same use cases further with big data solutions Financial Services Fraud detection Risk management 360 View of the Customer Utilities Weather impact analysis on power generation Transmission monitoring Smart grid management Transportation Weather and traffic impact on logistics and fuel consumption Health & Life Sciences Epidemic early warning system ICU monitoring Remote healthcare monitoring IT Transition log analysis for multiple transactional systems Cybersecurity Retail 360 View of the Customer Click-stream analysis Real-time promotions Telecommunications CDR processing Churn prediction Geomapping / marketing Network monitoring Law Enforcement Real-time multimodal surveillance Situational awareness Cyber security detection 67 WHY INFRATRUCTURE MATTERS 34

35 Access Matters To get new levels of visibility into customers and operations Speed Matters To accelerate insights in real-time at the point of impact Availability Matters To consistently deliver insights to the people and processes that need them Infrastructure must enable shared and secured access to all relevant data, no matter it s type or where it resides. Infrastructure must build intelligence into operational events and transactions. Infrastructure must maximize the availability of information and insights at the point of impact. 69 Herausforderungen an Big Data Analytics Projekte ZWISCHEN WUNSCH UND REALITÄT 70 35

36 FRAGEN? 71 36

IBM Big Data Platform

IBM Big Data Platform Mike Winer IBM Information Management IBM Big Data Platform The big data opportunity Extracting insight from an immense volume, variety and velocity of data, in a timely and cost-effective manner. Variety:

More information

Exploiting Data at Rest and Data in Motion with a Big Data Platform

Exploiting Data at Rest and Data in Motion with a Big Data Platform Exploiting Data at Rest and Data in Motion with a Big Data Platform Sarah Brader, sarah_brader@uk.ibm.com What is Big Data? Where does it come from? 12+ TBs of tweet data every day 30 billion RFID tags

More information

Are You Ready for Big Data?

Are You Ready for Big Data? Are You Ready for Big Data? Jim Gallo National Director, Business Analytics February 11, 2013 Agenda What is Big Data? How do you leverage Big Data in your company? How do you prepare for a Big Data initiative?

More information

Are You Ready for Big Data?

Are You Ready for Big Data? Are You Ready for Big Data? Jim Gallo National Director, Business Analytics April 10, 2013 Agenda What is Big Data? How do you leverage Big Data in your company? How do you prepare for a Big Data initiative?

More information

Big Data and Trusted Information

Big Data and Trusted Information Dr. Oliver Adamczak Big Data and Trusted Information CAS Single Point of Truth 7. Mai 2012 The Hype Big Data: The next frontier for innovation, competition and productivity McKinsey Global Institute 2012

More information

IBM Big Data Platform

IBM Big Data Platform IBM Big Data Platform Turning big data into smarter decisions Stefan Söderlund. IBM kundarkitekt, Försvarsmakten Sesam vår-seminarie Big Data, Bigga byte kräver Pigga Hertz! May 16, 2013 By 2015, 80% of

More information

Transforming Government with Big Data and Analytics

Transforming Government with Big Data and Analytics Transforming Government with Big Data and Analytics Deepak Mohapatra Sr. Consultant IBM Software Group dmohapatra@us.ibm.com April 29 th 2014 1 Big Data Creates A Challenge And an Opportunity Yet requires

More information

Luncheon Webinar Series May 13, 2013

Luncheon Webinar Series May 13, 2013 Luncheon Webinar Series May 13, 2013 InfoSphere DataStage is Big Data Integration Sponsored By: Presented by : Tony Curcio, InfoSphere Product Management 0 InfoSphere DataStage is Big Data Integration

More information

Big Data and Data Quality - Mutually Exclusive?

Big Data and Data Quality - Mutually Exclusive? Session 11929 Big Data and Data Quality - Mutually Exclusive? Tom Deutsch tdeutsch@us.ibm.com Program Director, Big Data August 9, 2012 Abstract It is popular to think that Big Data technologies are so

More information

Addressing Open Source Big Data, Hadoop, and MapReduce limitations

Addressing Open Source Big Data, Hadoop, and MapReduce limitations Addressing Open Source Big Data, Hadoop, and MapReduce limitations 1 Agenda What is Big Data / Hadoop? Limitations of the existing hadoop distributions Going enterprise with Hadoop 2 How Big are Data?

More information

How the oil and gas industry can gain value from Big Data?

How the oil and gas industry can gain value from Big Data? How the oil and gas industry can gain value from Big Data? Arild Kristensen Nordic Sales Manager, Big Data Analytics arild.kristensen@no.ibm.com, tlf. +4790532591 April 25, 2013 2013 IBM Corporation Dilbert

More information

IBM Big Data in Government

IBM Big Data in Government IBM Big in Government Turning big data into smarter decisions Deepak Mohapatra Sr. Consultant Government IBM Software Group dmohapatra@us.ibm.com The Big Paradigm Shift 2 Big Creates A Challenge And an

More information

BAO & Big Data Overview Applied to Real-time Campaign GSE. Joel Viale Telecom Solutions Lab Solution Architect. Telecom Solutions Lab

BAO & Big Data Overview Applied to Real-time Campaign GSE. Joel Viale Telecom Solutions Lab Solution Architect. Telecom Solutions Lab BAO & Big Data Overview Applied to Real-time Campaign GSE Joel Viale Telecom Solutions Lab Solution Architect Agenda BAO & Big Data - Overview Customer use-cases Live Prototypes: Streams for Real-time

More information

Big Data & Analytics for Semiconductor Manufacturing

Big Data & Analytics for Semiconductor Manufacturing Big Data & Analytics for Semiconductor Manufacturing 半 導 体 生 産 におけるビッグデータ 活 用 Ryuichiro Hattori 服 部 隆 一 郎 Intelligent SCM and MFG solution Leader Global CoC (Center of Competence) Electronics team General

More information

IBM InfoSphere BigInsights Enterprise Edition

IBM InfoSphere BigInsights Enterprise Edition IBM InfoSphere BigInsights Enterprise Edition Efficiently manage and mine big data for valuable insights Highlights Advanced analytics for structured, semi-structured and unstructured data Professional-grade

More information

Sources: Summary Data is exploding in volume, variety and velocity timely

Sources: Summary Data is exploding in volume, variety and velocity timely 1 Sources: The Guardian, May 2010 IDC Digital Universe, 2010 IBM Institute for Business Value, 2009 IBM CIO Study 2010 TDWI: Next Generation Data Warehouse Platforms Q4 2009 Summary Data is exploding

More information

Hadoop Ecosystem B Y R A H I M A.

Hadoop Ecosystem B Y R A H I M A. Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open

More information

Beyond Watson: The Business Implications of Big Data

Beyond Watson: The Business Implications of Big Data Beyond Watson: The Business Implications of Big Data Shankar Venkataraman IBM Program Director, STSM, Big Data August 10, 2011 The World is Changing and Becoming More INSTRUMENTED INTERCONNECTED INTELLIGENT

More information

The Future of Data Management

The Future of Data Management The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class

More information

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica

More information

The Future of Data Management with Hadoop and the Enterprise Data Hub

The Future of Data Management with Hadoop and the Enterprise Data Hub The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah Cofounder & CTO, Cloudera, Inc. Twitter: @awadallah 1 2 Cloudera Snapshot Founded 2008, by former employees of Employees

More information

IBM BigInsights for Apache Hadoop

IBM BigInsights for Apache Hadoop IBM BigInsights for Apache Hadoop Efficiently manage and mine big data for valuable insights Highlights: Enterprise-ready Apache Hadoop based platform for data processing, warehousing and analytics Advanced

More information

HDP Hadoop From concept to deployment.

HDP Hadoop From concept to deployment. HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some

More information

IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS!

IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS! The Bloor Group IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS VENDOR PROFILE The IBM Big Data Landscape IBM can legitimately claim to have been involved in Big Data and to have a much broader

More information

CSE-E5430 Scalable Cloud Computing Lecture 2

CSE-E5430 Scalable Cloud Computing Lecture 2 CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 14.9-2015 1/36 Google MapReduce A scalable batch processing

More information

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES Relational vs. Non-Relational Architecture Relational Non-Relational Rational Predictable Traditional Agile Flexible Modern 2 Agenda Big Data

More information

Big Data and the new trends for BI and Analytics Juha Teljo Business Intelligence and Predictive Solutions Executive IBM Europe

Big Data and the new trends for BI and Analytics Juha Teljo Business Intelligence and Predictive Solutions Executive IBM Europe Big Data and the new trends for BI and Analytics Juha Teljo Business Intelligence and Predictive Solutions Executive IBM Europe 2012 IBM Corporation The Mega Trends Cloud Mobile Social Analytics 2014 International

More information

HDP Enabling the Modern Data Architecture

HDP Enabling the Modern Data Architecture HDP Enabling the Modern Data Architecture Herb Cunitz President, Hortonworks Page 1 Hortonworks enables adoption of Apache Hadoop through HDP (Hortonworks Data Platform) Founded in 2011 Original 24 architects,

More information

Implement Hadoop jobs to extract business value from large and varied data sets

Implement Hadoop jobs to extract business value from large and varied data sets Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to

More information

IBM Data Warehousing and Analytics Portfolio Summary

IBM Data Warehousing and Analytics Portfolio Summary IBM Information Management IBM Data Warehousing and Analytics Portfolio Summary Information Management Mike McCarthy IBM Corporation mmccart1@us.ibm.com IBM Information Management Portfolio Current Data

More information

Hadoop implementation of MapReduce computational model. Ján Vaňo

Hadoop implementation of MapReduce computational model. Ján Vaňo Hadoop implementation of MapReduce computational model Ján Vaňo What is MapReduce? A computational model published in a paper by Google in 2004 Based on distributed computation Complements Google s distributed

More information

Data Management in der Ära von Big Data

Data Management in der Ära von Big Data Data Management in der Ära von Big Data Eine neue Generation der Geschwindigkeit und Effizienz Harald Gröger IBM Big Data hgroeger@de.ibm.com 2013 IBM Corporation IBM Big Data Strategie: Analyse näher

More information

IBM System x reference architecture solutions for big data

IBM System x reference architecture solutions for big data IBM System x reference architecture solutions for big data Easy-to-implement hardware, software and services for analyzing data at rest and data in motion Highlights Accelerates time-to-value with scalable,

More information

Raul F. Chong Senior program manager Big data, DB2, and Cloud IM Cloud Computing Center of Competence - IBM Toronto Lab, Canada

Raul F. Chong Senior program manager Big data, DB2, and Cloud IM Cloud Computing Center of Competence - IBM Toronto Lab, Canada What is big data? Raul F. Chong Senior program manager Big data, DB2, and Cloud IM Cloud Computing Center of Competence - IBM Toronto Lab, Canada 1 2011 IBM Corporation Agenda The world is changing What

More information

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE AGENDA Introduction to Big Data Introduction to Hadoop HDFS file system Map/Reduce framework Hadoop utilities Summary BIG DATA FACTS In what timeframe

More information

Big Data Strategies with IMS

Big Data Strategies with IMS Big Data Strategies with IMS #16103 Richard Tran IMS Development richtran@us.ibm.com Insert Custom Session QR if Desired. Agenda Big Data in an Information Driven economy Why start with System z IMS strategies

More information

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume

More information

Big Data Analytics. Copyright 2011 EMC Corporation. All rights reserved.

Big Data Analytics. Copyright 2011 EMC Corporation. All rights reserved. Big Data Analytics 1 Priority Discussion Topics What are the most compelling business drivers behind big data analytics? Do you have or expect to have data scientists on your staff, and what will be their

More information

BIG DATA TECHNOLOGY. Hadoop Ecosystem

BIG DATA TECHNOLOGY. Hadoop Ecosystem BIG DATA TECHNOLOGY Hadoop Ecosystem Agenda Background What is Big Data Solution Objective Introduction to Hadoop Hadoop Ecosystem Hybrid EDW Model Predictive Analysis using Hadoop Conclusion What is Big

More information

Big Data overview. Livio Ventura. SICS Software week, Sept 23-25 Cloud and Big Data Day

Big Data overview. Livio Ventura. SICS Software week, Sept 23-25 Cloud and Big Data Day Big Data overview SICS Software week, Sept 23-25 Cloud and Big Data Day Livio Ventura Big Data European Industry Leader for Telco, Energy and Utilities and Digital Media Agenda some data on Data Big Data

More information

Industry Impact of Big Data in the Cloud: An IBM Perspective

Industry Impact of Big Data in the Cloud: An IBM Perspective Industry Impact of Big Data in the Cloud: An IBM Perspective Inhi Cho Suh IBM Software Group, Information Management Vice President, Product Management and Strategy email: inhicho@us.ibm.com twitter: @inhicho

More information

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc. 2011 2014. All Rights Reserved

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc. 2011 2014. All Rights Reserved Hortonworks & SAS Analytics everywhere. Page 1 A change in focus. A shift in Advertising From mass branding A shift in Financial Services From Educated Investing A shift in Healthcare From mass treatment

More information

IBM BigInsights Has Potential If It Lives Up To Its Promise. InfoSphere BigInsights A Closer Look

IBM BigInsights Has Potential If It Lives Up To Its Promise. InfoSphere BigInsights A Closer Look IBM BigInsights Has Potential If It Lives Up To Its Promise By Prakash Sukumar, Principal Consultant at iolap, Inc. IBM released Hadoop-based InfoSphere BigInsights in May 2013. There are already Hadoop-based

More information

Constructing a Data Lake: Hadoop and Oracle Database United!

Constructing a Data Lake: Hadoop and Oracle Database United! Constructing a Data Lake: Hadoop and Oracle Database United! Sharon Sophia Stephen Big Data PreSales Consultant February 21, 2015 Safe Harbor The following is intended to outline our general product direction.

More information

Klarna Tech Talk: Mind the Data! Jeff Pollock InfoSphere Information Integration & Governance

Klarna Tech Talk: Mind the Data! Jeff Pollock InfoSphere Information Integration & Governance Klarna Tech Talk: Mind the Data! Jeff Pollock InfoSphere Information Integration & Governance IBM s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice

More information

How to Leverage Big Data in the Cloud to Gain Competitive Advantage

How to Leverage Big Data in the Cloud to Gain Competitive Advantage How to Leverage Big Data in the Cloud to Gain Competitive Advantage James Kobielus, IBM Big Data Evangelist Editor-in-Chief, IBM Data Magazine Senior Program Director, Product Marketing, Big Data Analytics

More information

Large scale processing using Hadoop. Ján Vaňo

Large scale processing using Hadoop. Ján Vaňo Large scale processing using Hadoop Ján Vaňo What is Hadoop? Software platform that lets one easily write and run applications that process vast amounts of data Includes: MapReduce offline computing engine

More information

BIG DATA TRENDS AND TECHNOLOGIES

BIG DATA TRENDS AND TECHNOLOGIES BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.

More information

SQL Server 2012 PDW. Ryan Simpson Technical Solution Professional PDW Microsoft. Microsoft SQL Server 2012 Parallel Data Warehouse

SQL Server 2012 PDW. Ryan Simpson Technical Solution Professional PDW Microsoft. Microsoft SQL Server 2012 Parallel Data Warehouse SQL Server 2012 PDW Ryan Simpson Technical Solution Professional PDW Microsoft Microsoft SQL Server 2012 Parallel Data Warehouse Massively Parallel Processing Platform Delivers Big Data HDFS Delivers Scale

More information

IBM Software Hadoop in the cloud

IBM Software Hadoop in the cloud IBM Software Hadoop in the cloud Leverage big data analytics easily and cost-effectively with IBM InfoSphere 1 2 3 4 5 Introduction Cloud and analytics: The new growth engine Enhancing Hadoop in the cloud

More information

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect Big Data & QlikView Democratizing Big Data Analytics David Freriks Principal Solution Architect TDWI Vancouver Agenda What really is Big Data? How do we separate hype from reality? How does that relate

More information

Apache Hadoop: The Big Data Refinery

Apache Hadoop: The Big Data Refinery Architecting the Future of Big Data Whitepaper Apache Hadoop: The Big Data Refinery Introduction Big data has become an extremely popular term, due to the well-documented explosion in the amount of data

More information

Big Data System and Architecture

Big Data System and Architecture CHANGE, a 2012 DAC workshop 2nd International Workshop on Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments Moscone Center, San Francisco, California, June 3, 2012 Big Data System and

More information

Data Refinery with Big Data Aspects

Data Refinery with Big Data Aspects International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 7 (2013), pp. 655-662 International Research Publications House http://www. irphouse.com /ijict.htm Data

More information

Modernizing Your Data Warehouse for Hadoop

Modernizing Your Data Warehouse for Hadoop Modernizing Your Data Warehouse for Hadoop Big data. Small data. All data. Audie Wright, DW & Big Data Specialist Audie.Wright@Microsoft.com O 425-538-0044, C 303-324-2860 Unlock Insights on Any Data Taking

More information

Advanced In-Database Analytics

Advanced In-Database Analytics Advanced In-Database Analytics Tallinn, Sept. 25th, 2012 Mikko-Pekka Bertling, BDM Greenplum EMEA 1 That sounds complicated? 2 Who can tell me how best to solve this 3 What are the main mathematical functions??

More information

How To Create A Data Science System

How To Create A Data Science System Enhance Collaboration and Data Sharing for Faster Decisions and Improved Mission Outcome Richard Breakiron Senior Director, Cyber Solutions Rbreakiron@vion.com Office: 571-353-6127 / Cell: 803-443-8002

More information

Chapter 7. Using Hadoop Cluster and MapReduce

Chapter 7. Using Hadoop Cluster and MapReduce Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in

More information

Navigating the Big Data infrastructure layer Helena Schwenk

Navigating the Big Data infrastructure layer Helena Schwenk mwd a d v i s o r s Navigating the Big Data infrastructure layer Helena Schwenk A special report prepared for Actuate May 2013 This report is the second in a series of four and focuses principally on explaining

More information

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Datenverwaltung im Wandel - Building an Enterprise Data Hub with Datenverwaltung im Wandel - Building an Enterprise Data Hub with Cloudera Bernard Doering Regional Director, Central EMEA, Cloudera Cloudera Your Hadoop Experts Founded 2008, by former employees of Employees

More information

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the

More information

Deploying Big Data to the Cloud: Roadmap for Success

Deploying Big Data to the Cloud: Roadmap for Success Deploying Big Data to the Cloud: Roadmap for Success James Kobielus Chair, CSCC Big Data in the Cloud Working Group IBM Big Data Evangelist. IBM Data Magazine, Editor-in- Chief. IBM Senior Program Director,

More information

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani Technical Architect - Big Data Syntel Agenda Welcome to the Zoo! Evolution Timeline Traditional BI/DW Architecture Where Hadoop Fits In 2 Welcome to

More information

BIG DATA What it is and how to use?

BIG DATA What it is and how to use? BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14

More information

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved. Collaborative Big Data Analytics 1 Big Data Is Less About Size, And More About Freedom TechCrunch!!!!!!!!! Total data: bigger than big data 451 Group Findings: Big Data Is More Extreme Than Volume Gartner!!!!!!!!!!!!!!!

More information

Please give me your feedback

Please give me your feedback Please give me your feedback Session BB4089 Speaker Claude Lorenson, Ph. D and Wendy Harms Use the mobile app to complete a session survey 1. Access My schedule 2. Click on this session 3. Go to Rate &

More information

Big Data & Analytics. The. Deal. About. Jacob Büchler jbuechler@dk.ibm.com Cand. Polit. IBM Denmark, Solution Exec. 2013 IBM Corporation

Big Data & Analytics. The. Deal. About. Jacob Büchler jbuechler@dk.ibm.com Cand. Polit. IBM Denmark, Solution Exec. 2013 IBM Corporation The Big Data & Analytics Deal About Jacob Büchler jbuechler@dk.ibm.com Cand. Polit. IBM Denmark, Solution Exec. 1 Big Data is All Data from Everywhere Big Data Is Becoming The Next Natural Resource We

More information

Demystifying Big Data Government Agencies & The Big Data Phenomenon

Demystifying Big Data Government Agencies & The Big Data Phenomenon Demystifying Big Data Government Agencies & The Big Data Phenomenon Today s Discussion If you only remember four things 1 Intensifying business challenges coupled with an explosion in data have pushed

More information

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014 5 Keys to Unlocking the Big Data Analytics Puzzle Anurag Tandon Director, Product Marketing March 26, 2014 1 A Little About Us A global footprint. A proven innovator. A leader in enterprise analytics for

More information

An Integrated Analytics & Big Data Infrastructure September 21, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle Enterprise

An Integrated Analytics & Big Data Infrastructure September 21, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle Enterprise An Integrated Analytics & Big Data Infrastructure September 21, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle Enterprise Solutions Group The following is intended to outline our

More information

Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12

Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12 Hadoop http://hadoop.apache.org/ What Is Apache Hadoop? The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using

More information

Hadoop IST 734 SS CHUNG

Hadoop IST 734 SS CHUNG Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to

More information

Big Data & Analytics Heute & Morgen

Big Data & Analytics Heute & Morgen Big Data & Analytics Heute & Morgen Dipl.Ing.Wolfgang Nimführ Business Development Executive Big Data Analytics Watson Ambassador IBM Analytics Group Europe 2015 IBM Corporation The World of Big Data &

More information

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, 2012. Viswa Sharma Solutions Architect Tata Consultancy Services

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, 2012. Viswa Sharma Solutions Architect Tata Consultancy Services Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, 2012 Viswa Sharma Solutions Architect Tata Consultancy Services 1 Agenda What is Hadoop Why Hadoop? The Net Generation is here Sizing the

More information

A Brief Outline on Bigdata Hadoop

A Brief Outline on Bigdata Hadoop A Brief Outline on Bigdata Hadoop Twinkle Gupta 1, Shruti Dixit 2 RGPV, Department of Computer Science and Engineering, Acropolis Institute of Technology and Research, Indore, India Abstract- Bigdata is

More information

How To Handle Big Data With A Data Scientist

How To Handle Big Data With A Data Scientist III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution

More information

Executive Summary... 2 Introduction... 3. Defining Big Data... 3. The Importance of Big Data... 4 Building a Big Data Platform...

Executive Summary... 2 Introduction... 3. Defining Big Data... 3. The Importance of Big Data... 4 Building a Big Data Platform... Executive Summary... 2 Introduction... 3 Defining Big Data... 3 The Importance of Big Data... 4 Building a Big Data Platform... 5 Infrastructure Requirements... 5 Solution Spectrum... 6 Oracle s Big Data

More information

P4.1 Reference Architectures for Enterprise Big Data Use Cases Romeo Kienzler, Data Scientist, Advisory Architect, IBM Germany, Austria, Switzerland

P4.1 Reference Architectures for Enterprise Big Data Use Cases Romeo Kienzler, Data Scientist, Advisory Architect, IBM Germany, Austria, Switzerland P4.1 Reference Architectures for Enterprise Big Data Use Cases Romeo Kienzler, Data Scientist, Advisory Architect, IBM Germany, Austria, Switzerland IBM Center of Excellence for Data Science, Cognitive

More information

I/O Considerations in Big Data Analytics

I/O Considerations in Big Data Analytics Library of Congress I/O Considerations in Big Data Analytics 26 September 2011 Marshall Presser Federal Field CTO EMC, Data Computing Division 1 Paradigms in Big Data Structured (relational) data Very

More information

Big Data Analytics Platform @ Nokia

Big Data Analytics Platform @ Nokia Big Data Analytics Platform @ Nokia 1 Selecting the Right Tool for the Right Workload Yekesa Kosuru Nokia Location & Commerce Strata + Hadoop World NY - Oct 25, 2012 Agenda Big Data Analytics Platform

More information

A New Era Of Analytic

A New Era Of Analytic Penang egovernment Seminar 2014 A New Era Of Analytic Megat Anuar Idris Head, Project Delivery, Business Analytics & Big Data Agenda Overview of Big Data Case Studies on Big Data Big Data Technology Readiness

More information

How To Scale Out Of A Nosql Database

How To Scale Out Of A Nosql Database Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI

More information

Data-Intensive Programming. Timo Aaltonen Department of Pervasive Computing

Data-Intensive Programming. Timo Aaltonen Department of Pervasive Computing Data-Intensive Programming Timo Aaltonen Department of Pervasive Computing Data-Intensive Programming Lecturer: Timo Aaltonen University Lecturer timo.aaltonen@tut.fi Assistants: Henri Terho and Antti

More information

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum Big Data Analytics with EMC Greenplum and Hadoop Big Data Analytics with EMC Greenplum and Hadoop Ofir Manor Pre Sales Technical Architect EMC Greenplum 1 Big Data and the Data Warehouse Potential All

More information

How To Use Big Data To Help A Retailer

How To Use Big Data To Help A Retailer IBM Software Big Data Retail Capitalizing on the power of big data for retail Adopt new approaches to keep customers engaged, maintain a competitive edge and maximize profitability 2 Capitalizing on the

More information

Big Data Challenges and Success Factors. Deloitte Analytics Your data, inside out

Big Data Challenges and Success Factors. Deloitte Analytics Your data, inside out Big Data Challenges and Success Factors Deloitte Analytics Your data, inside out Big Data refers to the set of problems and subsequent technologies developed to solve them that are hard or expensive to

More information

Workshop on Hadoop with Big Data

Workshop on Hadoop with Big Data Workshop on Hadoop with Big Data Hadoop? Apache Hadoop is an open source framework for distributed storage and processing of large sets of data on commodity hardware. Hadoop enables businesses to quickly

More information

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give

More information

HP Vertica. Echtzeit-Analyse extremer Datenmengen und Einbindung von Hadoop. Helmut Schmitt Sales Manager DACH

HP Vertica. Echtzeit-Analyse extremer Datenmengen und Einbindung von Hadoop. Helmut Schmitt Sales Manager DACH HP Vertica Echtzeit-Analyse extremer Datenmengen und Einbindung von Hadoop Helmut Schmitt Sales Manager DACH Big Data is a Massive Disruptor 2 A 100 fold multiplication in the amount of data is a 10,000

More information

Big Data on Microsoft Platform

Big Data on Microsoft Platform Big Data on Microsoft Platform Prepared by GJ Srinivas Corporate TEG - Microsoft Page 1 Contents 1. What is Big Data?...3 2. Characteristics of Big Data...3 3. Enter Hadoop...3 4. Microsoft Big Data Solutions...4

More information

Big Data Explained. An introduction to Big Data Science.

Big Data Explained. An introduction to Big Data Science. Big Data Explained An introduction to Big Data Science. 1 Presentation Agenda What is Big Data Why learn Big Data Who is it for How to start learning Big Data When to learn it Objective and Benefits of

More information

Cray: Enabling Real-Time Discovery in Big Data

Cray: Enabling Real-Time Discovery in Big Data Cray: Enabling Real-Time Discovery in Big Data Discovery is the process of gaining valuable insights into the world around us by recognizing previously unknown relationships between occurrences, objects

More information

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform

More information

Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013

Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013 Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software SC13, November, 2013 Agenda Abstract Opportunity: HPC Adoption of Big Data Analytics on Apache

More information

HadoopTM Analytics DDN

HadoopTM Analytics DDN DDN Solution Brief Accelerate> HadoopTM Analytics with the SFA Big Data Platform Organizations that need to extract value from all data can leverage the award winning SFA platform to really accelerate

More information

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ End to End Solution to Accelerate Data Warehouse Optimization Franco Flore Alliance Sales Director - APJ Big Data Is Driving Key Business Initiatives Increase profitability, innovation, customer satisfaction,

More information

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP Big-Data and Hadoop Developer Training with Oracle WDP What is this course about? Big Data is a collection of large and complex data sets that cannot be processed using regular database management tools

More information

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Highly competitive enterprises are increasingly finding ways to maximize and accelerate

More information

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84 Index A Amazon Web Services (AWS), 50, 58 Analytics engine, 21 22 Apache Kafka, 38, 131 Apache S4, 38, 131 Apache Sqoop, 37, 131 Appliance pattern, 104 105 Application architecture, big data analytics

More information