Big Data: Making Sense of it all!
|
|
|
- Bertram Barker
- 10 years ago
- Views:
Transcription
1 Big Data: Making Sense of it all! Jamie Engesser [email protected] Page 1
2 Data Driven Business? Facts not Intuition! Data driven decisions are better decisions its as simple as that. Using big data enables managers to decide on the basis of evidence rather than intuition. For that reason it has the potential to revolutionize management Harvard Business Review October Page 2
3 Web giants proved the ROI in data products applying data science to large amounts of data Prediction of click through rates Netflix: 75% of streaming video results from recommendations Amazon: 35% of product sales come from product recommendations Page 3
4 Page 4
5 Page 5
6 Big Data: Optimize Outcomes at Scale Sports Intelligence Finance Advertising Fraud Retail / Wholesale Manufacturing Healthcare Education Government optimize optimize optimize optimize optimize optimize optimize optimize optimize optimize Championships Detection Algorithms Performance Prevention Inventory turns Supply chains Patient outcomes Learning outcomes Citizen services Source: Geoffrey Moore. Hadoop Summit 2012 keynote presentation. Page 6
7 Analytics started with basic transaction history Megabytes ERP Order detail Trade record Payment record Increasing Data Variety and Complexity
8 then we added customer information Gigabytes CRM Segmentation Megabytes ERP Purchase detail Purchase record Payment record Customer Touches Support Contacts Offer details Increasing Data Variety and Complexity
9 and the web started to impact Terabytes WEB Web logs A/B testing Gigabytes Megabytes CRM ERP Behavioral Targeting Dynamic Pricing Segmentation Search Marketing Customer Touches Affiliate Networks Purchase detail Support Contacts Dynamic Funnels Purchase record Payment record Offer details Offer history Increasing Data Variety and Complexity
10 Big Data: Organizational Game Changer Petabytes BIG DATA Mobile Web Sentiment User Click Stream Transactions + Interactions + Observations = BIG DATA SMS/MMS Speech to Text Terabytes Gigabytes Megabytes WEB Social Interactions & Feeds Web logs Spatial & GPS Coordinates A/B testing Sensors / RFID / Devices Behavioral Targeting CRM Business Data Feeds Dynamic Pricing Segmentation External Demographics Search Marketing Customer Touches ERP User Generated Content Affiliate Networks Purchase detail Support Contacts HD Video, Audio, Images Dynamic Funnels Purchase record Payment record Offer details Offer history Product/Service Logs Increasing Data Variety and Complexity Page 10
11 A little history it s 2005
12 A Brief History of Apache Hadoop Apache Project Established Yahoo! begins to Operate at scale Hortonworks Data Platform : Yahoo! creates team under E14 to work on Hadoop Focus on INNOVATION Enterprise Hadoop 2007: Yahoo team extends focus to operations to support multiple projects & growing clusters Focus on OPERATIONS 2011: Hortonworks created to focus on Enterprise Hadoop. Starts with 24 key Hadoop engineers from Yahoo STABILITY Page 16
13 Processing Storage Apache Hadoop: Center of Big Data Strategy Open Source data management with scale-out storage & distributed processing HDFS Distributed across nodes Natively redundant Name node tracks locations Map Reduce Splits a task across processors near the data & assembles results Self-Healing, High Bandwidth Clustered Storage Key Characteristics Scalable Efficiently store and process petabytes of data Linear scale driven by additional processing and storage Reliable Redundant storage Failover across nodes and racks Flexible Store all types of data in any format Apply schema on analysis and sharing of the data Economical Use commodity hardware Open source software guards against vendor lock-in Page 17
14 Hadoop is more? Page 18
15 Data Services for Full Data Lifecycle HADOOP CORE PLATFORM SERVICES FLUME SQOOP DATA SERVICES PIG HIVE HCATALOG WEBHDFS Distributed Storage & Processing Enterprise Readiness HBASE Provide data services to store, process & access data in many ways Unique Focus Areas: Apache HCatalog Metadata services for consistent table access to Hadoop data Apache Hive Explore & process Hadoop data via SQL & ODBC-compliant BI tools Apache HBase NoSQL database for Hadoop WebHDFS Access Hadoop files via scalable REST API Talend Open Studio for Big Data Graphical data integration tools Page 20
16 Metadata Service & Table-level Abstractions Apache HCatalog provides flexible metadata services across tools and external access Consistency of metadata and data models across tools (MapReduce, Pig, HBase and Hive) Accessibility: share data as tables in and out of HDFS Availability: enables flexible, thin-client access via REST API Raw Hadoop data Inconsistent, unknown Tool specific access HCatalog Table access Aligned metadata REST API Shared table and schema management opens the platform Page 21
17 Operational Services for Ease of Use OPERATIONAL SERVICES AMBARI OOZIE DATA SERVICES Store, Process and Access Data Include complete operational services for productive operations & management HADOOP CORE PLATFORM SERVICES Distributed Storage & Processing Enterprise Readiness Unique Focus Area: Apache Ambari: Provision, manage & monitor a cluster; complete REST APIs to integrate with existing operational tools; job & task visualizer to diagnose issues Page 22
18 New Ambari Features Job Diagnostics Visualize and troubleshoot Hadoop job execution and performance Cluster History View historical job execution & performance Instant Insight View health of Core Hadoop (HDFS, MapReduce) and related projects Cluster Navigation Quick link buttons jump into namenode web UI for a server Apache Ambari Dashboard REST interface provides external access to Ambari for existing tools. Facilitates integration with Microsoft System Center and Teradata Viewpoint Page 23
19 Deployable Across a Range of Options OPERATIONAL SERVICES Manage & Operate at Scale DATA SERVICES Store, Process and Access Data Hadoop allows you to deploy seamlessly across any deployment option HADOOP CORE PLATFORM SERVICES Distributed Storage & Processing Enterprise Readiness Linux & Windows Azure, Rackspace & other clouds Virtual platforms Big data appliances HORTONWORKS DATA PLATFORM (HDP) OS Cloud VM Appliance Page 24
20 Enterprise Hadoop Distribution OPERATIONAL SERVICES Manage AMBARI & Operate at Scale OOZIE HADOOP CORE FLUME SQOOP DATA SERVICES PIG Store, HIVE Process and Access Data HCATALOG HBASE WEBHDFS Distributed MAP REDUCE Storage HDFS & Processing YARN (in 2.0) Hortonworks Data Platform (HDP) Enterprise Hadoop The ONLY 100% open source and complete distribution PLATFORM SERVICES Enterprise Readiness: HA, DR, Snapshots, Security, HORTONWORKS DATA PLATFORM (HDP) OS Cloud VM Appliance Enterprise grade, proven and tested at scale Ecosystem endorsed to ensure interoperability Page 25
21 Where does it fit in the enterprise? Page 26
22 DATA SOURCES DATA SYSTEMS APPLICATIONS Existing Data Architecture Business Analytics Custom Applications Enterprise Applications DEV & DATA TOOLS BUILD & TEST OPERATIONAL TOOLS RDBMS EDW MPP TRADITIONAL REPOS MANAGE & MONITOR OLTP, POS SYSTEMS Traditional Sources (RDBMS, OLTP, OLAP) Page 27
23 DATA SOURCES DATA SYSTEMS APPLICATIONS Existing Data Architecture Business Analytics Custom Applications Enterprise Applications DEV & DATA TOOLS BUILD & TEST RDBMS EDW MPP TRADITIONAL REPOS? OPERATIONAL TOOLS MANAGE & MONITOR OLTP, POS SYSTEMS Traditional Sources (RDBMS, OLTP, OLAP) New Sources (web logs, , sensor data, social media) Page 28
24 DATA SOURCES DATA SYSTEMS APPLICATIONS An Emerging Data Architecture Business Analytics Custom Applications Enterprise Applications DEV & DATA TOOLS BUILD & TEST OPERATIONAL TOOLS RDBMS EDW MPP TRADITIONAL REPOS MANAGE & MONITOR OLTP, POS SYSTEMS Traditional Sources (RDBMS, OLTP, OLAP) New Sources (web logs, , sensor data, social media) MOBILE DATA Page 29
25 DATA SOURCES DATA SYSTEMS APPLICATIONS Interoperating With Your Tools Microsoft Applications DEV & DATA TOOLS OPERATIONAL TOOLS TRADITIONAL REPOS Viewpoint OLTP, POS SYSTEMS Traditional Sources (RDBMS, OLTP, OLAP) New Sources (web logs, , sensor data, social media) MOBILE DATA Page 30
26 Data Refinement Page 31
27 DATA SOURCES DATA SYSTEMS APPLICATIONS Operational Data Refinery Refine Explore Enric h Business Analytics Custom Applications Enterprise Applications Collect data and apply a known algorithm to it in trusted operational process RDBMS EDW MPP TRADITIONAL REPOS Traditional Sources (RDBMS, OLTP, OLAP) 3 New Sources (web logs, , sensor data, social media) Capture Capture all data New and Traditional 2 3 Process Parse, cleanse, apply structure & transform Exchange Push to existing data warehouse for use with existing analytic tools Page 32
28 Challenges with a Traditional ETL Platform Data discarded due to cost and/or performance No visibility into transactional data Incapable/high complexity when dealing with loosely structured data -Doesn t scale linearly. -License Costs High -Lot of time spent understanding source and defining destination data structures -High latency between data generation and availability Page 33
29 HDP Based ETL Platform -Provides data for use with minimum delay and latency -Enables real time capture of source data -Store raw transactional data -Store 7+ years of data with no archiving -Data Lineage: Store intermediate stages of data -Becomes a powerful analytics platform -Support for any type of data: structured/ unstructured -Linearly scalable on commodity hardware -Massively parallel storage and compute -Data warehouse can focus less on storage & transformation and more on analytics Page 34
30 Key Capability in Hadoop: Late binding With traditional ETL, structure must be agreed upon far in advance and is difficult to change. MACHINE GENERATED Store Transformed Data WEB LOGS, CLICK STREAMS OLTP ETL Server Data Mart / EDW Client Apps With Hadoop, capture all data, structure data as business need evolve. MACHINE GENERATED HORTONWORKS DATA PLATFORM Dynamically Apply Transformations WEB LOGS, CLICK STREAMS OPERATIONAL SERVICES HADOOP CORE DATA SERVICES OLTP Hortonworks HDP Data Mart / EDW Client Apps Page 35
31 Data Exploration Page 36
32 DATA SOURCES DATA SYSTEMS APPLICATIONS Big Data Exploration & Visualization Refine Explore Enrich Business Analytics Custom Applications Enterprise Applications Collect data and perform iterative investigation for value 3 1 Capture Capture all data RDBMS EDW MPP TRADITIONAL REPOS 2 2 Process Parse, cleanse, apply structure & transform 1 3 Exchange Explore and visualize with analytics tools supporting Hadoop Traditional Sources (RDBMS, OLTP, OLAP) New Sources (web logs, , sensor data, social media) Page 37
33 Visualization Tooling Robust visualization and business tooling Ensures scalability when working with large datasets Native Excel support Web browser support Mobile support Page 38
34 Data science is a natural next step after business intelligence Value Data Science Prediction Dashboards Reports Score-cards Discovery Affinity Analysis Outlier Detection Clustering Recommendation Regression Classification Refine Explore Enrich Business Intelligence: measure & count; simple analytics Data Science: discovery & prediction; complex analytics; data product Page 39
35 Application Enrichment Page 40
36 DATA SOURCES DATA SYSTEMS APPLICATIONS Application Enrichment Refine Explore Enrich Custom Applications Enterprise Applications Collect data, analyze and present salient results for online apps 3 1 Capture Capture all data RDBMS EDW MPP TRADITIONAL REPOS NOSQL 2 2 Process Parse, cleanse, apply structure & transform 1 3 Exchange Incorporate data directly into applications Traditional Sources (RDBMS, OLTP, OLAP) New Sources (web logs, , sensor data, social media) Page 41
37 Key use-cases in Finance/Insurance Trading Analysis: How do I predict Trading trends based on market sentiment? How do I test Algorithms against years vs days fo data? Fraud detection: Detect illegal credit card activity and alert bank/consumer Detect illegal insurance claims Customer risk profiling: How likely is this customer to pay back his mortgage? How likely is this customer to get sick? Internal fraud detection (compliance): Is this employee accessing financial information they are not allowed to access? Page 42
38 HDP Reference Architecture Organize/ Create Metadata Publish Event Signal Data Transformation Transform & Aggregate Publish Exchange Extract & Load Explore Visualize Analyze Report Page 43
39 From Community to the Enterprise Page 44
40 What is a Hadoop Distribution A complimentary set of open source technologies that make up a complete data platform Templeton WebHDFS Sqoop Flume HCatalog Pig Hive HBase MapReduce HDFS Ambari Oozie ZooKeeper HA Tested and pre-packaged to ease installation and usage Collects the right versions of the components that all have different release cycles and ensures they work together Page 45
41 Apache Community Leadership Apache Pig Apache HBase Apache Hive Test & Patch Other Apache Projects Apache Hadoop Design & Develop Apache HCatalo g Release Apache Ambari We have noticed more activity over the last year from Hortonworks engineers on building out Apache Hadoop s more innovative features. These include YARN, Ambari and HCatalog.. - Jeff Kelly: Wikibon Apache Software Foundation Guiding Principles Release early & often Transparency, respect, meritocracy Key Roles held by Hortonworkers VP & PMC Members Arun Murthy (Hadoop), Daniel Dai (Pig), Mahadev Konar (Zookeeper) Release Managers Matt Foley (Hadoop 1.x), Arun Murthy (Hadoop 2.x), Ashutosh Chauhan (Hive), Daniel Dai (Pig), Alan Gates (HCatalog), Mahadev Konar (Ambari) Committers (We can all be Contributors) 54 across all Hadoop-related projects Page 46
42 Hortonworks Process for Enterprise Hadoop Upstream Community Projects Downstream Enterprise Product Virtuous cycle when development & fixed issues done upstream & stable project releases flow downstream Fixed Issues Integrate & Test Apache Pig Apache Hive Test & Patch Apache Hadoop Design & Develop Release Design & Develop Stable Project Releases Hortonworks Data Platform Package & Certify Apache HBase Other Apache Projects Apache HCatalo g Apache Ambari Distribute No Lock-in: Integrated, tested & certified distribution lowers risk by ensuring close alignment with Apache projects Page 47
43 Hadoop Evolution Starts at the Core Over 1800 Contributors Very fast moving and improving Driving next generation Hadoop YARN, MapReduce2, HDFS2, High Availability, Disaster Recovery 420k+ lines authored since 2006 More than twice nearest contributor Deeply integrating w/ecosystem Enabling new deployment platforms (ex. Windows & Azure, Linux & VMware HA) Creating deeply engineered solutions (ex. Teradata big data appliance) All Apache, NO holdbacks 100% of code contributed to Apache Page 48
44 Where is Enterprise Apache Hadoop Going? Hive Interactive Query Platform Services Ambari Manage & Operate OPERATIONA L SERVICES DATA SERVICES HBase Online Data Replication, Mirroring, Snapshots, HADOOP CORE Data Services Secure Access PLATFORM SERVICES HORTONWORKS DATA PLATFORM (HDP) Data Movement In support of Refine, Explore, Enrich Operational Services Biz Continuity Manageability, Security, Page 49
45 Becoming Data Driven Page 50
46 Path to Becoming Big Data Driven Key Considerations for a Data Driven Business 1. Large web properties were born this way, you may have to adapt a strategy 2. Start with a project tied to a key objective or KPI Don t OVER engineer 3. Make sure your Big Data strategy fits your organization and grow it over time 4. Don t do big data just to do big data you can get lost in all that data Simply put, because of big data, managers can measure, and hence know, radically more about their businesses, and directly translate that knowledge into improved decision making & performance. - Erik Brynjolfsson and Andrew McAfee
47 Your Fastest On-ramp to Enterprise Hadoop! The Sandbox lets you experience Apache Hadoop from the convenience of your own laptop no data center, no cloud and no internet connection needed! The Hortonworks Sandbox is: A free download: A complete, self contained virtual machine with Apache Hadoop pre-configured A personal, portable and standalone Hadoop environment A set of hands-on, step-by-step tutorials that allow you to learn and explore Hadoop Page 52
48 Thank You! Questions & Answers Page 53
Big Data Realities Hadoop in the Enterprise Architecture
Big Data Realities Hadoop in the Enterprise Architecture Paul Phillips Director, EMEA, Hortonworks [email protected] +44 (0)777 444 3857 Hortonworks Inc. 2012 Page 1 Agenda The Growth of Enterprise
Hadoop Introduction. Olivier Renault Solution Engineer - Hortonworks
Hadoop Introduction Olivier Renault Solution Engineer - Hortonworks Hortonworks A Brief History of Apache Hadoop Apache Project Established Yahoo! begins to Operate at scale Hortonworks Data Platform 2013
HDP Enabling the Modern Data Architecture
HDP Enabling the Modern Data Architecture Herb Cunitz President, Hortonworks Page 1 Hortonworks enables adoption of Apache Hadoop through HDP (Hortonworks Data Platform) Founded in 2011 Original 24 architects,
HDP Hadoop From concept to deployment.
HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some
The Evolving Apache Hadoop Eco-System
The Evolving Apache Hadoop Eco-System What it means for Big Data Analytics and Storage Sanjay Radia Architect/Founder, Hortonworks Inc. All Rights Reserved Page 1 Outline Hadoop and Big Data Analytics
Upcoming Announcements
Enterprise Hadoop Enterprise Hadoop Jeff Markham Technical Director, APAC [email protected] Page 1 Upcoming Announcements April 2 Hortonworks Platform 2.1 A continued focus on innovation within
Apache Hadoop's Role in Your Big Data Architecture
Apache Hadoop's Role in Your Big Data Architecture Chris Harris EMEA, Hortonworks [email protected] Twi
Modernizing Your Data Warehouse for Hadoop
Modernizing Your Data Warehouse for Hadoop Big data. Small data. All data. Audie Wright, DW & Big Data Specialist [email protected] O 425-538-0044, C 303-324-2860 Unlock Insights on Any Data Taking
Comprehensive Analytics on the Hortonworks Data Platform
Comprehensive Analytics on the Hortonworks Data Platform We do Hadoop. Page 1 Page 2 Back to 2005 Page 3 Vertical Scaling Page 4 Vertical Scaling Page 5 Vertical Scaling Page 6 Horizontal Scaling Page
A Modern Data Architecture with Apache Hadoop
Modern Data Architecture with Apache Hadoop Talend Big Data Presented by Hortonworks and Talend Executive Summary Apache Hadoop didn t disrupt the datacenter, the data did. Shortly after Corporate IT functions
Modern Data Architecture for Predictive Analytics
Modern Data Architecture for Predictive Analytics David Smith VP Marketing and Community - Revolution Analytics John Kreisa VP Strategic Marketing- Hortonworks Hortonworks Inc. 2013 Page 1 Your Presenters
Stinger Initiative: Introduction
Stinger Initiative: Introduction Interactive Query on Hadoop Chris Harris E-Mail : [email protected] Twitter : cj_harris5 Page 1 The World of Data is Changing Data Explosion 1 Zettabyte (ZB) = 1
#TalendSandbox for Big Data
Evalua&on von Apache Hadoop mit der #TalendSandbox for Big Data Julien Clarysse @whatdoesdatado @talend 2015 Talend Inc. 1 Connecting the Data-Driven Enterprise 2 Talend Overview Founded in 2006 BRAND
Big Data 101 Webinar
Big Data 101 Webinar A Functional Introduction Today s Presenters: Paul S. Barth, PhD, Managing Partner Prithwi Thakuria, Big Data Practice Lead NewVantage Partners An Introduction Structured Semi Structured
Bringing Big Data to People
Bringing Big Data to People Microsoft s modern data platform SQL Server 2014 Analytics Platform System Microsoft Azure HDInsight Data Platform Everyone should have access to the data they need. Process
Please give me your feedback
Please give me your feedback Session BB4089 Speaker Claude Lorenson, Ph. D and Wendy Harms Use the mobile app to complete a session survey 1. Access My schedule 2. Click on this session 3. Go to Rate &
Apache Hadoop Patterns of Use
Community Driven Apache Hadoop Apache Hadoop Patterns of Use April 2013 2013 Hortonworks Inc. http://www.hortonworks.com Big Data: Apache Hadoop Use Distilled There certainly is no shortage of hype when
Apache Hadoop: The Big Data Refinery
Architecting the Future of Big Data Whitepaper Apache Hadoop: The Big Data Refinery Introduction Big data has become an extremely popular term, due to the well-documented explosion in the amount of data
BIG DATA TRENDS AND TECHNOLOGIES
BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.
BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata
BIG DATA: FROM HYPE TO REALITY Leandro Ruiz Presales Partner for C&LA Teradata Evolution in The Use of Information Action s ACTIVATING MAKE it happen! Insights OPERATIONALIZING WHAT IS happening now? PREDICTING
Using Tableau Software with Hortonworks Data Platform
Using Tableau Software with Hortonworks Data Platform September 2013 2013 Hortonworks Inc. http:// Modern businesses need to manage vast amounts of data, and in many cases they have accumulated this data
Community Driven Apache Hadoop. Apache Hadoop Basics. May 2013. 2013 Hortonworks Inc. http://www.hortonworks.com
Community Driven Apache Hadoop Apache Hadoop Basics May 2013 2013 Hortonworks Inc. http://www.hortonworks.com Big Data A big shift is occurring. Today, the enterprise collects more data than ever before,
Session 0202: Big Data in action with SAP HANA and Hadoop Platforms Prasad Illapani Product Management & Strategy (SAP HANA & Big Data) SAP Labs LLC,
Session 0202: Big Data in action with SAP HANA and Hadoop Platforms Prasad Illapani Product Management & Strategy (SAP HANA & Big Data) SAP Labs LLC, Bellevue, WA Legal disclaimer The information in this
The Future of Data Management
The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class
Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster. Nov 7, 2012
Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster Nov 7, 2012 Who I Am Robert Lancaster Solutions Architect, Hotel Supply Team [email protected] @rob1lancaster Organizer of Chicago
SAP and Hortonworks Reference Architecture
SAP and Hortonworks Reference Architecture Hortonworks. We Do Hadoop. June Page 1 2014 Hortonworks Inc. 2011 2014. All Rights Reserved A Modern Data Architecture With SAP DATA SYSTEMS APPLICATIO NS Statistical
Hadoop, the Data Lake, and a New World of Analytics
Hadoop, the Data Lake, and a New World of Analytics Hortonworks. We do Hadoop. Spring 2014 Version 1.0 Page 1 Hortonworks Inc. 2014 Traditional Data Architecture Pressured 2.8 ZB in 2012 85% from New Data
Luncheon Webinar Series May 13, 2013
Luncheon Webinar Series May 13, 2013 InfoSphere DataStage is Big Data Integration Sponsored By: Presented by : Tony Curcio, InfoSphere Product Management 0 InfoSphere DataStage is Big Data Integration
BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES
BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES Relational vs. Non-Relational Architecture Relational Non-Relational Rational Predictable Traditional Agile Flexible Modern 2 Agenda Big Data
The Future of Data Management with Hadoop and the Enterprise Data Hub
The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah Cofounder & CTO, Cloudera, Inc. Twitter: @awadallah 1 2 Cloudera Snapshot Founded 2008, by former employees of Employees
BIG DATA TECHNOLOGY. Hadoop Ecosystem
BIG DATA TECHNOLOGY Hadoop Ecosystem Agenda Background What is Big Data Solution Objective Introduction to Hadoop Hadoop Ecosystem Hybrid EDW Model Predictive Analysis using Hadoop Conclusion What is Big
GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION
GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION Syed Rasheed Solution Manager Red Hat Corp. Kenny Peeples Technical Manager Red Hat Corp. Kimberly Palko Product Manager Red Hat Corp.
Workshop on Hadoop with Big Data
Workshop on Hadoop with Big Data Hadoop? Apache Hadoop is an open source framework for distributed storage and processing of large sets of data on commodity hardware. Hadoop enables businesses to quickly
Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies
Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies Big Data: Global Digital Data Growth Growing leaps and bounds by 40+% Year over Year! 2009 =.8 Zetabytes =.08
Evolution from Big Data to Smart Data
Evolution from Big Data to Smart Data Information is Exploding 120 HOURS VIDEO UPLOADED TO YOUTUBE 50,000 APPS DOWNLOADED 204 MILLION E-MAILS EVERY MINUTE EVERY DAY Intel Corporation 2015 The Data is Changing
End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ
End to End Solution to Accelerate Data Warehouse Optimization Franco Flore Alliance Sales Director - APJ Big Data Is Driving Key Business Initiatives Increase profitability, innovation, customer satisfaction,
Data Integration Checklist
The need for data integration tools exists in every company, small to large. Whether it is extracting data that exists in spreadsheets, packaged applications, databases, sensor networks or social media
Talend Big Data. Delivering instant value from all your data. Talend 2014 1
Talend Big Data Delivering instant value from all your data Talend 2014 1 I may say that this is the greatest factor: the way in which the expedition is equipped. Roald Amundsen race to the south pole,
Data Governance in the Hadoop Data Lake. Michael Lang May 2015
Data Governance in the Hadoop Data Lake Michael Lang May 2015 Introduction Product Manager for Teradata Loom Joined Teradata as part of acquisition of Revelytix, original developer of Loom VP of Sales
SAP Database Strategy Overview. Uwe Grigoleit September 2013
SAP base Strategy Overview Uwe Grigoleit September 2013 SAP s In-Memory and management Strategy Big- in Business-Context: Are you harnessing the opportunity? Mobile Transactions Things Things Instant Messages
Getting Started Practical Input For Your Roadmap
Getting Started Practical Input For Your Roadmap Mike Ferguson Managing Director, Intelligent Business Strategies BA4ALL Big Data & Analytics Insight Conference Stockholm, May 2015 About Mike Ferguson
AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW
AGENDA What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story Hadoop PDW Our BIG DATA Roadmap BIG DATA? Volume 59% growth in annual WW information 1.2M Zetabytes (10 21 bytes) this
A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani
A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani Technical Architect - Big Data Syntel Agenda Welcome to the Zoo! Evolution Timeline Traditional BI/DW Architecture Where Hadoop Fits In 2 Welcome to
SQL Server 2012 PDW. Ryan Simpson Technical Solution Professional PDW Microsoft. Microsoft SQL Server 2012 Parallel Data Warehouse
SQL Server 2012 PDW Ryan Simpson Technical Solution Professional PDW Microsoft Microsoft SQL Server 2012 Parallel Data Warehouse Massively Parallel Processing Platform Delivers Big Data HDFS Delivers Scale
Hortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015
Hortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015 We Do Hadoop Fall 2014 Page 1 HDP delivers a comprehensive data management platform GOVERNANCE Hortonworks Data Platform
Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP
Big-Data and Hadoop Developer Training with Oracle WDP What is this course about? Big Data is a collection of large and complex data sets that cannot be processed using regular database management tools
Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing
Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Wayne W. Eckerson Director of Research, TechTarget Founder, BI Leadership Forum Business Analytics
Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc. 2011 2014. All Rights Reserved
Hortonworks & SAS Analytics everywhere. Page 1 A change in focus. A shift in Advertising From mass branding A shift in Financial Services From Educated Investing A shift in Healthcare From mass treatment
Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data
Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give
Testing Big data is one of the biggest
Infosys Labs Briefings VOL 11 NO 1 2013 Big Data: Testing Approach to Overcome Quality Challenges By Mahesh Gudipati, Shanthi Rao, Naju D. Mohan and Naveen Kumar Gajja Validate data quality by employing
BIG DATA AND THE ENTERPRISE DATA WAREHOUSE WORKSHOP
BIG DATA AND THE ENTERPRISE DATA WAREHOUSE WORKSHOP Business Analytics for All Amsterdam - 2015 Value of Big Data is Being Recognized Executives beginning to see the path from data insights to revenue
Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database
Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica
Teradata Unified Big Data Architecture
Teradata Unified Big Data Architecture Agenda Recap the challenges of Big Analytics The 2 analytical gaps for most enterprises Teradata Unified Data Architecture - How we bridge the gaps - The 3 core elements
Advanced In-Database Analytics
Advanced In-Database Analytics Tallinn, Sept. 25th, 2012 Mikko-Pekka Bertling, BDM Greenplum EMEA 1 That sounds complicated? 2 Who can tell me how best to solve this 3 What are the main mathematical functions??
5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014
5 Keys to Unlocking the Big Data Analytics Puzzle Anurag Tandon Director, Product Marketing March 26, 2014 1 A Little About Us A global footprint. A proven innovator. A leader in enterprise analytics for
Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes
Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Highly competitive enterprises are increasingly finding ways to maximize and accelerate
Native Connectivity to Big Data Sources in MSTR 10
Native Connectivity to Big Data Sources in MSTR 10 Bring All Relevant Data to Decision Makers Support for More Big Data Sources Optimized Access to Your Entire Big Data Ecosystem as If It Were a Single
Microsoft SQL Server 2012 with Hadoop
Microsoft SQL Server 2012 with Hadoop Debarchan Sarkar Chapter No. 1 "Introduction to Big Data and Hadoop" In this package, you will find: A Biography of the author of the book A preview chapter from the
Harnessing big data with Hortonworks Data Platform and Red Hat JBoss Data Virtualization
Harnessing big data with Hortonworks Data Platform and Red Hat JBoss Data Virtualization Kimberly Palko, Product Manager Red Hat JBoss Doug Reid, Director Partner Product Management Hortonworks Cojan van
Roadmap Talend : découvrez les futures fonctionnalités de Talend
Roadmap Talend : découvrez les futures fonctionnalités de Talend Cédric Carbone Talend Connect 9 octobre 2014 Talend 2014 1 Connecting the Data-Driven Enterprise Talend 2014 2 Agenda Agenda Why a Unified
Hadoop Job Oriented Training Agenda
1 Hadoop Job Oriented Training Agenda Kapil CK [email protected] Module 1 M o d u l e 1 Understanding Hadoop This module covers an overview of big data, Hadoop, and the Hortonworks Data Platform. 1.1 Module
Information Architecture
The Bloor Group Actian and The Big Data Information Architecture WHITE PAPER The Actian Big Data Information Architecture Actian and The Big Data Information Architecture Originally founded in 2005 to
Qsoft Inc www.qsoft-inc.com
Big Data & Hadoop Qsoft Inc www.qsoft-inc.com Course Topics 1 2 3 4 5 6 Week 1: Introduction to Big Data, Hadoop Architecture and HDFS Week 2: Setting up Hadoop Cluster Week 3: MapReduce Part 1 Week 4:
Datenverwaltung im Wandel - Building an Enterprise Data Hub with
Datenverwaltung im Wandel - Building an Enterprise Data Hub with Cloudera Bernard Doering Regional Director, Central EMEA, Cloudera Cloudera Your Hadoop Experts Founded 2008, by former employees of Employees
Big data for the Masses The Unique Challenge of Big Data Integration
Big data for the Masses The Unique Challenge of Big Data Integration White Paper Table of contents Executive Summary... 4 1. Big Data: a Big Term... 4 1.1. The Big Data... 4 1.2. The Big Technology...
Ganzheitliches Datenmanagement
Ganzheitliches Datenmanagement für Hadoop Michael Kohs, Senior Sales Consultant @mikchaos The Problem with Big Data Projects in 2016 Relational, Mainframe Documents and Emails Data Modeler Data Scientist
Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap
Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap 3 key strategic advantages, and a realistic roadmap for what you really need, and when 2012, Cognizant Topics to be discussed
Azure Data Lake Analytics
Azure Data Lake Analytics Compose and orchestrate data services at scale Fully managed service to support orchestration of data movement and processing Connect to relational or non-relational data
W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract
W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the
BIG DATA: FIVE TACTICS TO MODERNIZE YOUR DATA WAREHOUSE
BIG DATA: FIVE TACTICS TO MODERNIZE YOUR DATA WAREHOUSE Current technology for Big Data allows organizations to dramatically improve return on investment (ROI) from their existing data warehouse environment.
Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014
Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014 Defining Big Not Just Massive Data Big data refers to data sets whose size is beyond the ability of typical database software tools
Integrating Hadoop. Into Business Intelligence & Data Warehousing. Philip Russom TDWI Research Director for Data Management, April 9 2013
Integrating Hadoop Into Business Intelligence & Data Warehousing Philip Russom TDWI Research Director for Data Management, April 9 2013 TDWI would like to thank the following companies for sponsoring the
HADOOP. Revised 10/19/2015
HADOOP Revised 10/19/2015 This Page Intentionally Left Blank Table of Contents Hortonworks HDP Developer: Java... 1 Hortonworks HDP Developer: Apache Pig and Hive... 2 Hortonworks HDP Developer: Windows...
INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE
INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE AGENDA Introduction to Big Data Introduction to Hadoop HDFS file system Map/Reduce framework Hadoop utilities Summary BIG DATA FACTS In what timeframe
Safe Harbor Statement
Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment
Hadoop implementation of MapReduce computational model. Ján Vaňo
Hadoop implementation of MapReduce computational model Ján Vaňo What is MapReduce? A computational model published in a paper by Google in 2004 Based on distributed computation Complements Google s distributed
Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.
Big Data Hadoop Administration and Developer Course This course is designed to understand and implement the concepts of Big data and Hadoop. This will cover right from setting up Hadoop environment in
Large scale processing using Hadoop. Ján Vaňo
Large scale processing using Hadoop Ján Vaňo What is Hadoop? Software platform that lets one easily write and run applications that process vast amounts of data Includes: MapReduce offline computing engine
<Insert Picture Here> Big Data
Big Data Kevin Kalmbach Principal Sales Consultant, Public Sector Engineered Systems Program Agenda What is Big Data and why it is important? What is your Big
#mstrworld. Tapping into Hadoop and NoSQL Data Sources in MicroStrategy. Presented by: Trishla Maru. #mstrworld
Tapping into Hadoop and NoSQL Data Sources in MicroStrategy Presented by: Trishla Maru Agenda Big Data Overview All About Hadoop What is Hadoop? How does MicroStrategy connects to Hadoop? Customer Case
How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning
How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume
The Next Wave of Data Management. Is Big Data The New Normal?
The Next Wave of Data Management Is Big Data The New Normal? Table of Contents Introduction 3 Separating Reality and Hype 3 Why Are Firms Making IT Investments In Big Data? 4 Trends In Data Management
Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84
Index A Amazon Web Services (AWS), 50, 58 Analytics engine, 21 22 Apache Kafka, 38, 131 Apache S4, 38, 131 Apache Sqoop, 37, 131 Appliance pattern, 104 105 Application architecture, big data analytics
The Inside Scoop on Hadoop
The Inside Scoop on Hadoop Orion Gebremedhin National Solutions Director BI & Big Data, Neudesic LLC. VTSP Microsoft Corp. [email protected] [email protected] @OrionGM The Inside Scoop
Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru
Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy Presented by: Jeffrey Zhang and Trishla Maru Agenda Big Data Overview All About Hadoop What is Hadoop? How does MicroStrategy connects to Hadoop?
Saving Millions through Data Warehouse Offloading to Hadoop. Jack Norris, CMO MapR Technologies. MapR Technologies. All rights reserved.
Saving Millions through Data Warehouse Offloading to Hadoop Jack Norris, CMO MapR Technologies MapR Technologies. All rights reserved. MapR Technologies Overview Open, enterprise-grade distribution for
Chase Wu New Jersey Ins0tute of Technology
CS 698: Special Topics in Big Data Chapter 4. Big Data Analytics Platforms Chase Wu New Jersey Ins0tute of Technology Some of the slides have been provided through the courtesy of Dr. Ching-Yung Lin at
SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera
SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP Eva Andreasson Cloudera Most FAQ: Super-Quick Overview! The Apache Hadoop Ecosystem a Zoo! Oozie ZooKeeper Hue Impala Solr Hive Pig Mahout HBase MapReduce
Microsoft Big Data. Solution Brief
Microsoft Big Data Solution Brief Contents Introduction... 2 The Microsoft Big Data Solution... 3 Key Benefits... 3 Immersive Insight, Wherever You Are... 3 Connecting with the World s Data... 3 Any Data,
TE's Analytics on Hadoop and SAP HANA Using SAP Vora
TE's Analytics on Hadoop and SAP HANA Using SAP Vora Naveen Narra Senior Manager TE Connectivity Santha Kumar Rajendran Enterprise Data Architect TE Balaji Krishna - Director, SAP HANA Product Mgmt. -
Hortonworks Data Platform for Hadoop and SAP HANA
Hortonworks Data Platform for Hadoop and SAP HANA Prasad illapani, Big Data & SAP HANA- Product Management & Strategy SAP Labs LLC., Bellevue, WA Bob Page, VP Partner Products, Hortonworks Inc. Palo Alto,
Integrating a Big Data Platform into Government:
Integrating a Big Data Platform into Government: Drive Better Decisions for Policy and Program Outcomes John Haddad, Senior Director Product Marketing, Informatica Digital Government Institute s Government
Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12
Hadoop http://hadoop.apache.org/ What Is Apache Hadoop? The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using
Big Data and Your Data Warehouse Philip Russom
Big Data and Your Data Warehouse Philip Russom TDWI Research Director for Data Management April 5, 2012 Sponsor Speakers Philip Russom Research Director, Data Management, TDWI Peter Jeffcock Director,
TAMING THE BIG CHALLENGE OF BIG DATA MICROSOFT HADOOP
Pythian White Paper TAMING THE BIG CHALLENGE OF BIG DATA MICROSOFT HADOOP ABSTRACT As companies increasingly rely on big data to steer decisions, they also find themselves looking for ways to simplify
Teradata s Big Data Technology Strategy & Roadmap
Teradata s Big Data Technology Strategy & Roadmap Artur Borycki, Director International Solutions Marketing 18 March 2014 Agenda > Introduction and level-set > Enabling the Logical Data Warehouse > Any
Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum
Big Data Analytics with EMC Greenplum and Hadoop Big Data Analytics with EMC Greenplum and Hadoop Ofir Manor Pre Sales Technical Architect EMC Greenplum 1 Big Data and the Data Warehouse Potential All
Implement Hadoop jobs to extract business value from large and varied data sets
Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to
