Big Data projects and use cases. Claus Samuelsen IBM Analytics, Europe

Size: px
Start display at page:

Download "Big Data projects and use cases. Claus Samuelsen IBM Analytics, Europe csa@dk.ibm.com"

Transcription

1 Big projects and use cases Caus Samuesen IBM Anaytics, Europe

2 IBM Sofware Overview of BigInsights IBM BigInsights Scientist Free Quick Start (non production): IBM Open Patform BigInsights Anayst, Scientist features Community support Text Anaytics IBM BigInsights Anayst Industry standard SQL (Big SQL) Spreadsheet-stye too (BigSheets) Machine Learning on Big R IBM BigInsights Enterprise Management Big R (R support) Big SQL POSIX Distributed Fiesystem BigSheets Muti-workoad, muti-tenant scheduing... IBM Open Patform with Apache Hadoop* (, YARN, MapReduce, Ambari, Hbase, Hive, Oozie, Parquet, Parquet Format, Pig, Snappy, Sor, Spark, Sqoop, Zookeeper, Open JDK, Knox, Sider) *IBM Open Patform with Apache Hadoop is a 100% open source Apache Hadoop distribution. IBM wi incude the Open Patform common kerne once avaiabe IBM Corporation

3 IBM Big SQL Runs 100% of the queries Other environments require significant effort at scae Key points With Impaa and Hive, many queries needed to be re-written, some significanty Owing to various restrictions, some queries coud not be re-written or faied at run-time Re-writing queries in a benchmark scenario where resuts are known is one thing doing this against rea databases in production is another Resuts for 10TB scae shown here IBM Corporation

4 Hadoop-DS benchmark Singe user 10TB Big SQL is 3.6x faster than Impaa and 5.4x faster than Hive 0.13 for singe query stream using 46 common queries Based on IBM interna tests comparing BigInsights Big SQL, Coudera Impaa and Hortonworks Hive (current versions avaiabe as of 9/01/2014) running on identica hardware. The test workoad was based on the atest revision of the TPC-DS benchmark specification at 10TB data size. Successfu executions measure the abiity to execute queries a) directy from the specification without modification, b) after simpe modifications, c) after extensive query rewrites. A minor modifications are either permitted by the TPC-DS benchmark specification or are of a simiar nature. A queries were reviewed and attested by a TPC certified auditor. Deveopment effort measured time required by a skied SQL deveoper famiiar with each system to modify queries so they wi execute correcty. Performance test measured scaed query throughput per hour of 4 concurrent users executing a common subset of 46 queries across a 3 systems at 10TB data size. Resuts may not be typica and wi vary based on actua workoad, configuration, appications, queries and other variabes in a production environment. Coudera, the Coudera ogo, Coudera Impaa are trademarks of Coudera. Hortonworks, the Hortonworks ogo and other Hortonworks trademarks are trademarks of Hortonworks Inc. in the United States and other countries IBM Corporation

5 Big Projects Stock Trade Anaysis Positive side effects of drugs Log Fie Root Cause Anaysis CRM anaysis 360 Degree Customer View Ontoogies Gamers Behaviour Document cassification Weather Anaysis Roaming Log Anaysis Sensitive Access Connected Cars Tax Fraud Investigation Historica Archive Research Warehouse Augmentation DNA sequencing 2009 IBM Corporation

6 Warehouse Augmentation Banking Industry Fraud Anaysis The customer wanted to impement two different kinds of fraud anaysis: Transaction fraud and Socia Engeneering fraud. Probem: Existing data warehouse does not aow for ong running jobs Extending the data warehouse has a huge cost 2009 IBM Corporation

7 Warehouse Augmentation Banking Industry Fraud Anaysis Soution: Moving data to IBM BigInsights reduces the cost significanty No imitations on ong running jobs Obtaining the data from the various sources is the most time consuming process Using BigSQL we can run the same queries in Hadoop as in the traditiona warehouse With BigSQL customer can connect using their standard JDBC/ODBC based SQL toos IBM Corporation

8 Document Cassification Insurrance Industry Automatic cassification Probem: Insurance documents are not standardized. They are typicay free form documents written as e-mais, MS Words etc. Incoming documents are not cassified, and are therefore often sent to wrong department or wrong person, thus resuting in unacceptabe ong processing time IBM Corporation

9 Document Cassification Soution: Using BigInsights Text Anaytics new documents can be cassified automatic. Customer had described what was the characteristics of the different casses the the documents had to be put into. Using these descriptions we coud in three weeks impements the rues in BigInsights to a degree that satisfied the customer IBM Corporation

10 IBM big data An IBM Proof of Technoogy IBM big data IBM big data IBM big data IBM big data IBM big data IBM big data THINK IBM big data IBM big data IBM big data 2013 IBM Corporation

11 IBM Software Distinguishing characteristics Appication Portabiity & Integration Performance shared with Hadoop ecosystem Comprehensive fie format support Superior enabement of IBM and Third Party software Modern MPP runtime Powerfu SQL query rewriter Cost based optimizer Optimized for concurrent user throughput Resuts not constrained by memory Rich SQL Comprehensive SQL Support IBM SQL PL compatibiity Extensive Anaytic Functions 11 Federation Enterprise Features Distributed requests to mutipe data sources within a singe SQL statement Main data sources supported: DB2 LUW, Teradata, Orace, Netezza, Informix, SQL Server Advanced security/auditing Resource and workoad management Sef tuning memory management Comprehensive monitoring 2014 IBM Corporation

12 IBM Software Big SQL Behind the scenes Big SQL is derived from an existing IBM shared-nothing RDBMS A very mature MPP architecture Aready understands distributed joins and optimization Behavior is sufficienty different Certain SQL constructs are disabed Traditiona data warehouse partitioning is unavaiabe New SQL constructs introduced On the surface, porting a shared nothing RDBMS to a shared nothing custer (Hadoop) seems easy, but database partition database partition database partition database partition Traditiona Distributed RBMS Architecture IBM Corporation

13 IBM Software Architecture Overview base Service Big SQL Scheduer Big SQL Master Hive Metastore DDL Big SQL Worker Native I/O Java I/O HBase Temp 13 Big SQL Worker Node Native I/O MR Task Tracker Java I/O HBase Other Service Temp Big SQL Worker Node Native I/O MR Task Tracker Java I/O HBase Other Service Temp Node MR Task Tracker Other Service 2014 IBM Corporation

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP Big-Data and Hadoop Developer Training with Oracle WDP What is this course about? Big Data is a collection of large and complex data sets that cannot be processed using regular database management tools

More information

Big Data Management and Security

Big Data Management and Security Big Data Management and Security Audit Concerns and Business Risks Tami Frankenfield Sr. Director, Analytics and Enterprise Data Mercury Insurance What is Big Data? Velocity + Volume + Variety = Value

More information

Which SQL Engine Leads the Herd?

Which SQL Engine Leads the Herd? October 2014 Which SQL Engine Leads the Herd? A Comparison of three leading SQL-on-Hadoop Implementations for compatibility, performance and scalability Which SQL Engine Leads the Herd? 2 Contents Executive

More information

Blistering Fast SQL Access to Hadoop using. IBM BigInsights 3.0 with Big SQL 3.0

Blistering Fast SQL Access to Hadoop using. IBM BigInsights 3.0 with Big SQL 3.0 Blistering Fast SQL Access to Hadoop using IBM BigInsights 3.0 with Big SQL 3.0 SQL-over-Hadoop implementations are ready to execute OLAP complex query workloads at a fraction of the cost of traditional

More information

Workshop on Hadoop with Big Data

Workshop on Hadoop with Big Data Workshop on Hadoop with Big Data Hadoop? Apache Hadoop is an open source framework for distributed storage and processing of large sets of data on commodity hardware. Hadoop enables businesses to quickly

More information

Bringing Big Data to People

Bringing Big Data to People Bringing Big Data to People Microsoft s modern data platform SQL Server 2014 Analytics Platform System Microsoft Azure HDInsight Data Platform Everyone should have access to the data they need. Process

More information

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future

More information

HDP Hadoop From concept to deployment.

HDP Hadoop From concept to deployment. HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some

More information

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture. Big Data Hadoop Administration and Developer Course This course is designed to understand and implement the concepts of Big data and Hadoop. This will cover right from setting up Hadoop environment in

More information

Hadoop Ecosystem B Y R A H I M A.

Hadoop Ecosystem B Y R A H I M A. Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open

More information

Big Data and Hadoop. Module 1: Introduction to Big Data and Hadoop. Module 2: Hadoop Distributed File System. Module 3: MapReduce

Big Data and Hadoop. Module 1: Introduction to Big Data and Hadoop. Module 2: Hadoop Distributed File System. Module 3: MapReduce Big Data and Hadoop Module 1: Introduction to Big Data and Hadoop Learn about Big Data and the shortcomings of the prevailing solutions for Big Data issues. You will also get to know, how Hadoop eradicates

More information

Qsoft Inc www.qsoft-inc.com

Qsoft Inc www.qsoft-inc.com Big Data & Hadoop Qsoft Inc www.qsoft-inc.com Course Topics 1 2 3 4 5 6 Week 1: Introduction to Big Data, Hadoop Architecture and HDFS Week 2: Setting up Hadoop Cluster Week 3: MapReduce Part 1 Week 4:

More information

Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013

Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013 Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software SC13, November, 2013 Agenda Abstract Opportunity: HPC Adoption of Big Data Analytics on Apache

More information

IBM BigInsights Has Potential If It Lives Up To Its Promise. InfoSphere BigInsights A Closer Look

IBM BigInsights Has Potential If It Lives Up To Its Promise. InfoSphere BigInsights A Closer Look IBM BigInsights Has Potential If It Lives Up To Its Promise By Prakash Sukumar, Principal Consultant at iolap, Inc. IBM released Hadoop-based InfoSphere BigInsights in May 2013. There are already Hadoop-based

More information

Upcoming Announcements

Upcoming Announcements Enterprise Hadoop Enterprise Hadoop Jeff Markham Technical Director, APAC jmarkham@hortonworks.com Page 1 Upcoming Announcements April 2 Hortonworks Platform 2.1 A continued focus on innovation within

More information

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani Technical Architect - Big Data Syntel Agenda Welcome to the Zoo! Evolution Timeline Traditional BI/DW Architecture Where Hadoop Fits In 2 Welcome to

More information

SQL Server 2012 PDW. Ryan Simpson Technical Solution Professional PDW Microsoft. Microsoft SQL Server 2012 Parallel Data Warehouse

SQL Server 2012 PDW. Ryan Simpson Technical Solution Professional PDW Microsoft. Microsoft SQL Server 2012 Parallel Data Warehouse SQL Server 2012 PDW Ryan Simpson Technical Solution Professional PDW Microsoft Microsoft SQL Server 2012 Parallel Data Warehouse Massively Parallel Processing Platform Delivers Big Data HDFS Delivers Scale

More information

HDP Enabling the Modern Data Architecture

HDP Enabling the Modern Data Architecture HDP Enabling the Modern Data Architecture Herb Cunitz President, Hortonworks Page 1 Hortonworks enables adoption of Apache Hadoop through HDP (Hortonworks Data Platform) Founded in 2011 Original 24 architects,

More information

Peers Techno log ies Pv t. L td. HADOOP

Peers Techno log ies Pv t. L td. HADOOP Page 1 Peers Techno log ies Pv t. L td. Course Brochure Overview Hadoop is a Open Source from Apache, which provides reliable storage and faster process by using the Hadoop distibution file system and

More information

Introduction to Big Data Training

Introduction to Big Data Training Introduction to Big Data Training The quickest way to be introduce with NOSQL/BIG DATA offerings Learn and experience Big Data Solutions including Hadoop HDFS, Map Reduce, NoSQL DBs: Document Based DB

More information

Integrating Apache Spark with an Enterprise Data Warehouse

Integrating Apache Spark with an Enterprise Data Warehouse Integrating Apache Spark with an Enterprise Warehouse Dr. Michael Wurst, IBM Corporation Architect Spark/R/Python base Integration, In-base Analytics Dr. Toni Bollinger, IBM Corporation Senior Software

More information

ITG Software Engineering

ITG Software Engineering Introduction to Apache Hadoop Course ID: Page 1 Last Updated 12/15/2014 Introduction to Apache Hadoop Course Overview: This 5 day course introduces the student to the Hadoop architecture, file system,

More information

Complete Java Classes Hadoop Syllabus Contact No: 8888022204

Complete Java Classes Hadoop Syllabus Contact No: 8888022204 1) Introduction to BigData & Hadoop What is Big Data? Why all industries are talking about Big Data? What are the issues in Big Data? Storage What are the challenges for storing big data? Processing What

More information

Constructing a Data Lake: Hadoop and Oracle Database United!

Constructing a Data Lake: Hadoop and Oracle Database United! Constructing a Data Lake: Hadoop and Oracle Database United! Sharon Sophia Stephen Big Data PreSales Consultant February 21, 2015 Safe Harbor The following is intended to outline our general product direction.

More information

Native Connectivity to Big Data Sources in MSTR 10

Native Connectivity to Big Data Sources in MSTR 10 Native Connectivity to Big Data Sources in MSTR 10 Bring All Relevant Data to Decision Makers Support for More Big Data Sources Optimized Access to Your Entire Big Data Ecosystem as If It Were a Single

More information

Comprehensive Analytics on the Hortonworks Data Platform

Comprehensive Analytics on the Hortonworks Data Platform Comprehensive Analytics on the Hortonworks Data Platform We do Hadoop. Page 1 Page 2 Back to 2005 Page 3 Vertical Scaling Page 4 Vertical Scaling Page 5 Vertical Scaling Page 6 Horizontal Scaling Page

More information

Enable your Modern Data Architecture by delivering Enterprise Apache Hadoop

Enable your Modern Data Architecture by delivering Enterprise Apache Hadoop Modern Data Architecture with Enterprise Apache Hadoop Hortonworks. We do Hadoop. Jeff Markham Technical Director, APAC jmarkham@hortonworks.com Page 1 Our Mission: Enable your Modern Data Architecture

More information

HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM

HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM 1. Introduction 1.1 Big Data Introduction What is Big Data Data Analytics Bigdata Challenges Technologies supported by big data 1.2 Hadoop Introduction

More information

QUEST meeting Big Data Analytics

QUEST meeting Big Data Analytics QUEST meeting Big Data Analytics Peter Hughes Business Solutions Consultant SAS Australia/New Zealand Copyright 2015, SAS Institute Inc. All rights reserved. Big Data Analytics WHERE WE ARE NOW 2005 2007

More information

Certified Big Data and Apache Hadoop Developer VS-1221

Certified Big Data and Apache Hadoop Developer VS-1221 Certified Big Data and Apache Hadoop Developer VS-1221 Certified Big Data and Apache Hadoop Developer Certification Code VS-1221 Vskills certification for Big Data and Apache Hadoop Developer Certification

More information

Dominik Wagenknecht Accenture

Dominik Wagenknecht Accenture Dominik Wagenknecht Accenture Improving Mainframe Performance with Hadoop October 17, 2014 Organizers General Partner Top Media Partner Media Partner Supporters About me Dominik Wagenknecht Accenture Vienna

More information

Hadoop implementation of MapReduce computational model. Ján Vaňo

Hadoop implementation of MapReduce computational model. Ján Vaňo Hadoop implementation of MapReduce computational model Ján Vaňo What is MapReduce? A computational model published in a paper by Google in 2004 Based on distributed computation Complements Google s distributed

More information

Big Data Realities Hadoop in the Enterprise Architecture

Big Data Realities Hadoop in the Enterprise Architecture Big Data Realities Hadoop in the Enterprise Architecture Paul Phillips Director, EMEA, Hortonworks pphillips@hortonworks.com +44 (0)777 444 3857 Hortonworks Inc. 2012 Page 1 Agenda The Growth of Enterprise

More information

I/O Considerations in Big Data Analytics

I/O Considerations in Big Data Analytics Library of Congress I/O Considerations in Big Data Analytics 26 September 2011 Marshall Presser Federal Field CTO EMC, Data Computing Division 1 Paradigms in Big Data Structured (relational) data Very

More information

BIG DATA SERIES: HADOOP DEVELOPER TRAINING PROGRAM. An Overview

BIG DATA SERIES: HADOOP DEVELOPER TRAINING PROGRAM. An Overview BIG DATA SERIES: HADOOP DEVELOPER TRAINING PROGRAM An Overview Contents Contents... 1 BIG DATA SERIES: HADOOP DEVELOPER TRAINING PROGRAM... 1 Program Overview... 4 Curriculum... 5 Module 1: Big Data: Hadoop

More information

Implement Hadoop jobs to extract business value from large and varied data sets

Implement Hadoop jobs to extract business value from large and varied data sets Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to

More information

Cognizant Interactive. Digital Marketing & Analytics(DMA) Practice. 2012, Cognizant

Cognizant Interactive. Digital Marketing & Analytics(DMA) Practice. 2012, Cognizant Cognizant Interactive Digita Marketing & Anaytics(DMA) Practice 2012, Cognizant About DMA group Cognizant Interactive provides innovative soutions for design, content, earning, digita marketing and anaytics

More information

Planning your OpenStack Cloud. Tom Fifield tom@openstack.org @TomFifield

Planning your OpenStack Cloud. Tom Fifield tom@openstack.org @TomFifield Panning your OpenStack Coud Tom Fified tom@openstack.org @TomFified Introduction Software Engineering Partice Physics Buiding Couds OpenStack Community Manager Much of this presentation is based on the

More information

#TalendSandbox for Big Data

#TalendSandbox for Big Data Evalua&on von Apache Hadoop mit der #TalendSandbox for Big Data Julien Clarysse @whatdoesdatado @talend 2015 Talend Inc. 1 Connecting the Data-Driven Enterprise 2 Talend Overview Founded in 2006 BRAND

More information

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum Big Data Analytics with EMC Greenplum and Hadoop Big Data Analytics with EMC Greenplum and Hadoop Ofir Manor Pre Sales Technical Architect EMC Greenplum 1 Big Data and the Data Warehouse Potential All

More information

IBM BigInsights for Apache Hadoop

IBM BigInsights for Apache Hadoop IBM BigInsights for Apache Hadoop Efficiently manage and mine big data for valuable insights Highlights: Enterprise-ready Apache Hadoop based platform for data processing, warehousing and analytics Advanced

More information

BITKOM& NIK - Big Data Wo liegen die Chancen für den Mittelstand?

BITKOM& NIK - Big Data Wo liegen die Chancen für den Mittelstand? BITKOM& NIK - Big Data Wo liegen die Chancen für den Mittelstand? The Big Data Buzz big data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database

More information

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop

More information

Apache Hadoop: The Pla/orm for Big Data. Amr Awadallah CTO, Founder, Cloudera, Inc. aaa@cloudera.com, twicer: @awadallah

Apache Hadoop: The Pla/orm for Big Data. Amr Awadallah CTO, Founder, Cloudera, Inc. aaa@cloudera.com, twicer: @awadallah Apache Hadoop: The Pla/orm for Big Data Amr Awadallah CTO, Founder, Cloudera, Inc. aaa@cloudera.com, twicer: @awadallah 1 The Problems with Current Data Systems BI Reports + Interac7ve Apps RDBMS (aggregated

More information

Please give me your feedback

Please give me your feedback Please give me your feedback Session BB4089 Speaker Claude Lorenson, Ph. D and Wendy Harms Use the mobile app to complete a session survey 1. Access My schedule 2. Click on this session 3. Go to Rate &

More information

WHITE PAPER BEsT PRAcTIcEs: PusHIng ExcEl BEyond ITs limits WITH InfoRmATIon optimization

WHITE PAPER BEsT PRAcTIcEs: PusHIng ExcEl BEyond ITs limits WITH InfoRmATIon optimization Best Practices: Pushing Exce Beyond Its Limits with Information Optimization WHITE Best Practices: Pushing Exce Beyond Its Limits with Information Optimization Executive Overview Microsoft Exce is the

More information

Five Keys to Big Data Audit and Protection WHITEPAPER

Five Keys to Big Data Audit and Protection WHITEPAPER 1 Introduction Driven by the promise of uncovering valuable insights that enable better, fine-tuned decision making, many businesses are planning if not already making substantial investments in big data

More information

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give

More information

MySQL and Hadoop. Percona Live 2014 Chris Schneider

MySQL and Hadoop. Percona Live 2014 Chris Schneider MySQL and Hadoop Percona Live 2014 Chris Schneider About Me Chris Schneider, Database Architect @ Groupon Spent the last 10 years building MySQL architecture for multiple companies Worked with Hadoop for

More information

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Important Notice 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and

More information

Informatica PowerCenter

Informatica PowerCenter Brochure Informatica PowerCenter Benefits Support better business decisions with the right information at the right time Acceerate projects in days vs. months with improved staff productivity and coaboration

More information

ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE

ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE Hadoop Storage-as-a-Service ABSTRACT This White Paper illustrates how EMC Elastic Cloud Storage (ECS ) can be used to streamline the Hadoop data analytics

More information

Hortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015

Hortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015 Hortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015 We Do Hadoop Fall 2014 Page 1 HDP delivers a comprehensive data management platform GOVERNANCE Hortonworks Data Platform

More information

BIG DATA HADOOP TRAINING

BIG DATA HADOOP TRAINING BIG DATA HADOOP TRAINING DURATION 40hrs AVAILABLE BATCHES WEEKDAYS (7.00AM TO 8.30AM) & WEEKENDS (10AM TO 1PM) MODE OF TRAINING AVAILABLE ONLINE INSTRUCTOR LED CLASSROOM TRAINING (MARATHAHALLI, BANGALORE)

More information

Technology and Consulting - Newsletter 1. IBM. July 2013

Technology and Consulting - Newsletter 1. IBM. July 2013 Technoogy and Consuting - Newsetter Juy 2013 Wecome to Latitude Executive Consuting s atest newsetter, reviewing recent marketpace activity. The newsetter focuses on the Technoogy and Consuting sectors,

More information

Fundamentals Curriculum HAWQ

Fundamentals Curriculum HAWQ Fundamentals Curriculum Pivotal Hadoop 2.1 HAWQ Education Services zdata Inc. 660 4th St. Ste. 176 San Francisco, CA 94107 t. 415.890.5764 zdatainc.com Pivotal Hadoop & HAWQ Fundamentals Course Description

More information

Modernizing Your Data Warehouse for Hadoop

Modernizing Your Data Warehouse for Hadoop Modernizing Your Data Warehouse for Hadoop Big data. Small data. All data. Audie Wright, DW & Big Data Specialist Audie.Wright@Microsoft.com O 425-538-0044, C 303-324-2860 Unlock Insights on Any Data Taking

More information

SQL on NoSQL (and all of the data) With Apache Drill

SQL on NoSQL (and all of the data) With Apache Drill SQL on NoSQL (and all of the data) With Apache Drill Richard Shaw Solutions Architect @aggress Who What Where NoSQL DB Very Nice People Open Source Distributed Storage & Compute Platform (up to 1000s of

More information

Lexmark ESF Applications Guide

Lexmark ESF Applications Guide Lexmark ESF Appications Guide Hep your customers bring out the fu potentia of their Lexmark soutions-enabed singe-function and mutifunction printers Lexmark Appications have been designed to hep businesses

More information

Moving From Hadoop to Spark

Moving From Hadoop to Spark + Moving From Hadoop to Spark Sujee Maniyam Founder / Principal @ www.elephantscale.com sujee@elephantscale.com Bay Area ACM meetup (2015-02-23) + HI, Featured in Hadoop Weekly #109 + About Me : Sujee

More information

Big Data Course Highlights

Big Data Course Highlights Big Data Course Highlights The Big Data course will start with the basics of Linux which are required to get started with Big Data and then slowly progress from some of the basics of Hadoop/Big Data (like

More information

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP Eva Andreasson Cloudera Most FAQ: Super-Quick Overview! The Apache Hadoop Ecosystem a Zoo! Oozie ZooKeeper Hue Impala Solr Hive Pig Mahout HBase MapReduce

More information

Communicating with the Elephant in the Data Center

Communicating with the Elephant in the Data Center Communicating with the Elephant in the Data Center Who am I? Instructor Consultant Opensource Advocate http://www.laubersoltions.com sml@laubersolutions.com Twitter: @laubersm Freenode: laubersm Outline

More information

Hadoop Introduction. Olivier Renault Solution Engineer - Hortonworks

Hadoop Introduction. Olivier Renault Solution Engineer - Hortonworks Hadoop Introduction Olivier Renault Solution Engineer - Hortonworks Hortonworks A Brief History of Apache Hadoop Apache Project Established Yahoo! begins to Operate at scale Hortonworks Data Platform 2013

More information

COURSE CONTENT Big Data and Hadoop Training

COURSE CONTENT Big Data and Hadoop Training COURSE CONTENT Big Data and Hadoop Training 1. Meet Hadoop Data! Data Storage and Analysis Comparison with Other Systems RDBMS Grid Computing Volunteer Computing A Brief History of Hadoop Apache Hadoop

More information

Session 0202: Big Data in action with SAP HANA and Hadoop Platforms Prasad Illapani Product Management & Strategy (SAP HANA & Big Data) SAP Labs LLC,

Session 0202: Big Data in action with SAP HANA and Hadoop Platforms Prasad Illapani Product Management & Strategy (SAP HANA & Big Data) SAP Labs LLC, Session 0202: Big Data in action with SAP HANA and Hadoop Platforms Prasad Illapani Product Management & Strategy (SAP HANA & Big Data) SAP Labs LLC, Bellevue, WA Legal disclaimer The information in this

More information

Big Data Infrastructure at Spotify

Big Data Infrastructure at Spotify Big Data Infrastructure at Spotify Wouter de Bie Team Lead Data Infrastructure June 12, 2013 2 Agenda Let s talk about Data Infrastructure, how we did it, what we learned and how we ve failed Some Context

More information

Data-Intensive Programming. Timo Aaltonen Department of Pervasive Computing

Data-Intensive Programming. Timo Aaltonen Department of Pervasive Computing Data-Intensive Programming Timo Aaltonen Department of Pervasive Computing Data-Intensive Programming Lecturer: Timo Aaltonen University Lecturer timo.aaltonen@tut.fi Assistants: Henri Terho and Antti

More information

alllearn Microsoft with We are XMA. Call us now on 0115 846 4900 Visit www.xma.co.uk/alllearn Email alllearn@xma.co.uk Follow us @WeareXMA

alllearn Microsoft with We are XMA. Call us now on 0115 846 4900 Visit www.xma.co.uk/alllearn Email alllearn@xma.co.uk Follow us @WeareXMA alearn with Microsoft We are XMA. Ca us now on 0115 846 4900 Visit www.xma.co.uk/aearn Emai alearn@xma.co.uk Foow us @WeareXMA Introduction Use our 'steps to alearn' framework to ensure you cover a bases...

More information

IBM Big Data Platform

IBM Big Data Platform IBM Big Data Platform Turning big data into smarter decisions Stefan Söderlund. IBM kundarkitekt, Försvarsmakten Sesam vår-seminarie Big Data, Bigga byte kräver Pigga Hertz! May 16, 2013 By 2015, 80% of

More information

We are XMA and Viglen.

We are XMA and Viglen. alearn with Microsoft 16pp 21.07_Layout 1 22/12/2014 10:49 Page 1 FRONT COVER alearn with Microsoft We are XMA and Vigen. Ca us now on 0115 846 4900 Visit www.xma.co.uk/aearn Emai alearn@xma.co.uk Foow

More information

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview Programming Hadoop 5-day, instructor-led BD-106 MapReduce Overview The Client Server Processing Pattern Distributed Computing Challenges MapReduce Defined Google's MapReduce The Map Phase of MapReduce

More information

HADOOP. Revised 10/19/2015

HADOOP. Revised 10/19/2015 HADOOP Revised 10/19/2015 This Page Intentionally Left Blank Table of Contents Hortonworks HDP Developer: Java... 1 Hortonworks HDP Developer: Apache Pig and Hive... 2 Hortonworks HDP Developer: Windows...

More information

and Hadoop Technology

and Hadoop Technology SAS and Hadoop Technology Overview SAS Documentation The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2015. SAS and Hadoop Technology: Overview. Cary, NC: SAS Institute

More information

... ... PEPPERDATA OVERVIEW AND DIFFERENTIATORS ... ... ... ... ...

... ... PEPPERDATA OVERVIEW AND DIFFERENTIATORS ... ... ... ... ... ..................................... WHITEPAPER PEPPERDATA OVERVIEW AND DIFFERENTIATORS INTRODUCTION Prospective customers will often pose the question, How is Pepperdata different from tools like Ganglia,

More information

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved. Collaborative Big Data Analytics 1 Big Data Is Less About Size, And More About Freedom TechCrunch!!!!!!!!! Total data: bigger than big data 451 Group Findings: Big Data Is More Extreme Than Volume Gartner!!!!!!!!!!!!!!!

More information

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Important Notice 2010-2016 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and

More information

Hadoop Job Oriented Training Agenda

Hadoop Job Oriented Training Agenda 1 Hadoop Job Oriented Training Agenda Kapil CK hdpguru@gmail.com Module 1 M o d u l e 1 Understanding Hadoop This module covers an overview of big data, Hadoop, and the Hortonworks Data Platform. 1.1 Module

More information

EDS-Unigraphics MIS DataBroker Architecture

EDS-Unigraphics MIS DataBroker Architecture EDS-Unigraphics MIS DataBroker Architecture Jeff Greiner Bob Woodridge October 9,1996 Topics UG/MIS Probem Domain Requirements for New Architecture Seection of Java Deveoping Java Based Intranet Soutions

More information

IBM InfoSphere Guardium Data Activity Monitor for Hadoop-based systems

IBM InfoSphere Guardium Data Activity Monitor for Hadoop-based systems IBM InfoSphere Guardium Data Activity Monitor for Hadoop-based systems Proactively address regulatory compliance requirements and protect sensitive data in real time Highlights Monitor and audit data activity

More information

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84 Index A Amazon Web Services (AWS), 50, 58 Analytics engine, 21 22 Apache Kafka, 38, 131 Apache S4, 38, 131 Apache Sqoop, 37, 131 Appliance pattern, 104 105 Application architecture, big data analytics

More information

Big Data Open Source Stack vs. Traditional Stack for BI and Analytics

Big Data Open Source Stack vs. Traditional Stack for BI and Analytics Big Data Open Source Stack vs. Traditional Stack for BI and Analytics Part I By Sam Poozhikala, Vice President Customer Solutions at StratApps Inc. 4/4/2014 You may contact Sam Poozhikala at spoozhikala@stratapps.com.

More information

Chapter 3: e-business Integration Patterns

Chapter 3: e-business Integration Patterns Chapter 3: e-business Integration Patterns Page 1 of 9 Chapter 3: e-business Integration Patterns "Consistency is the ast refuge of the unimaginative." Oscar Wide In This Chapter What Are Integration Patterns?

More information

BIG DATA & HADOOP DEVELOPER TRAINING & CERTIFICATION

BIG DATA & HADOOP DEVELOPER TRAINING & CERTIFICATION FACT SHEET BIG DATA & HADOOP DEVELOPER TRAINING & CERTIFICATION BIGDATA & HADOOP CLASS ROOM SESSION GreyCampus provides Classroom sessions for Big Data & Hadoop Developer Certification. This course will

More information

Human Capital & Human Resources Certificate Programs

Human Capital & Human Resources Certificate Programs MANAGEMENT CONCEPTS Human Capita & Human Resources Certificate Programs Programs to deveop functiona and strategic skis in: Human Capita // Human Resources ENROLL TODAY! Contract Hoder Contract GS-02F-0010J

More information

Apache Hadoop: The Big Data Refinery

Apache Hadoop: The Big Data Refinery Architecting the Future of Big Data Whitepaper Apache Hadoop: The Big Data Refinery Introduction Big data has become an extremely popular term, due to the well-documented explosion in the amount of data

More information

Hadoop IST 734 SS CHUNG

Hadoop IST 734 SS CHUNG Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to

More information

Luncheon Webinar Series May 13, 2013

Luncheon Webinar Series May 13, 2013 Luncheon Webinar Series May 13, 2013 InfoSphere DataStage is Big Data Integration Sponsored By: Presented by : Tony Curcio, InfoSphere Product Management 0 InfoSphere DataStage is Big Data Integration

More information

Big Data Technologies Compared June 2014

Big Data Technologies Compared June 2014 Big Data Technologies Compared June 2014 Agenda What is Big Data Big Data Technology Comparison Summary Other Big Data Technologies Questions 2 What is Big Data by Example The SKA Telescope is a new development

More information

WINMAG Graphics Management System

WINMAG Graphics Management System SECTION 10: page 1 Section 10: by Honeywe WINMAG Graphics Management System Contents What is WINMAG? WINMAG Text and Graphics WINMAG Text Ony Scenarios Fire/Emergency Management of Fauts & Disabement Historic

More information

Red Hat Enterprise Linux is open, scalable, and flexible

Red Hat Enterprise Linux is open, scalable, and flexible CHOOSING AN ENTERPRISE PLATFORM FOR BIG DATA Red Hat Enterprise Linux is open, scalable, and flexible TECHNOLOGY OVERVIEW 10 things your operating system should deliver for big data 1) Open source project

More information

Bringing the Power of SAS to Hadoop. White Paper

Bringing the Power of SAS to Hadoop. White Paper White Paper Bringing the Power of SAS to Hadoop Combine SAS World-Class Analytic Strength with Hadoop s Low-Cost, Distributed Data Storage to Uncover Hidden Opportunities Contents Introduction... 1 What

More information

Dell In-Memory Appliance for Cloudera Enterprise

Dell In-Memory Appliance for Cloudera Enterprise Dell In-Memory Appliance for Cloudera Enterprise Hadoop Overview, Customer Evolution and Dell In-Memory Product Details Author: Armando Acosta Hadoop Product Manager/Subject Matter Expert Armando_Acosta@Dell.com/

More information

#mstrworld. Tapping into Hadoop and NoSQL Data Sources in MicroStrategy. Presented by: Trishla Maru. #mstrworld

#mstrworld. Tapping into Hadoop and NoSQL Data Sources in MicroStrategy. Presented by: Trishla Maru. #mstrworld Tapping into Hadoop and NoSQL Data Sources in MicroStrategy Presented by: Trishla Maru Agenda Big Data Overview All About Hadoop What is Hadoop? How does MicroStrategy connects to Hadoop? Customer Case

More information

Enhanced continuous, real-time detection, alarming and analysis of partial discharge events

Enhanced continuous, real-time detection, alarming and analysis of partial discharge events DMS PDMG-RH DMS PDMG-RH Partia discharge monitor for GIS Partia discharge monitor for GIS Enhanced continuous, rea-time detection, aarming and anaysis of partia discharge events Unrivaed PDM feature set

More information

Large scale processing using Hadoop. Ján Vaňo

Large scale processing using Hadoop. Ján Vaňo Large scale processing using Hadoop Ján Vaňo What is Hadoop? Software platform that lets one easily write and run applications that process vast amounts of data Includes: MapReduce offline computing engine

More information

Monitis Project Proposals for AUA. September 2014, Yerevan, Armenia

Monitis Project Proposals for AUA. September 2014, Yerevan, Armenia Monitis Project Proposals for AUA September 2014, Yerevan, Armenia Distributed Log Collecting and Analysing Platform Project Specifications Category: Big Data and NoSQL Software Requirements: Apache Hadoop

More information

Name: Srinivasan Govindaraj Title: Big Data Predictive Analytics

Name: Srinivasan Govindaraj Title: Big Data Predictive Analytics Name: Srinivasan Govindaraj Title: Big Data Predictive Analytics Please note the following IBM s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice

More information

Advanced ColdFusion 4.0 Application Development - 3 - Server Clustering Using Bright Tiger

Advanced ColdFusion 4.0 Application Development - 3 - Server Clustering Using Bright Tiger Advanced CodFusion 4.0 Appication Deveopment - CH 3 - Server Custering Using Bri.. Page 1 of 7 [Figures are not incuded in this sampe chapter] Advanced CodFusion 4.0 Appication Deveopment - 3 - Server

More information

IT@Intel How Intel IT Successfully Migrated to Cloudera Apache Hadoop*

IT@Intel How Intel IT Successfully Migrated to Cloudera Apache Hadoop* White Paper April 2015 IT@Intel How Intel IT Successfully Migrated to Cloudera Apache Hadoop* From our original experience with Apache Hadoop software, Intel IT identified new opportunities to reduce IT

More information