An to Big Data, Apache Hadoop, and Cloudera
|
|
|
- Muriel Hopkins
- 10 years ago
- Views:
Transcription
1 An to Big Data, Apache Hadoop, and Cloudera Ian Wrigley, Curriculum Manager, Cloudera 1
2 The for Hadoop 2
3 Large- Scale Tradi*onally, computa*on has been processor- bound small amounts of data Significant amount of complex processing performed on that data For decades, the primary push was to increase the compu*ng power of a single machine Faster processor, more RAM 3
4 The Data Explosion GIGABYTES OF DATA CREATED (IN BILLIONS) 10,000 5, trillion gigabytes of data was created in 2011 More than 90% is unstructured data Approx. 500 quadrillion files doubles every 2 years Source: IDC 2011 STRUCTURED DATA UNSTRUCTURED DATA 4
5 Current 10,000 GIGABYTES OF DATA CREATED (IN BILLIONS) 5,000 Current Database Solutions are designed for structured data. to answer known ques*ons quickly Schemas dictate form/context Difficult to adapt to new data types and new Expensive at Petabyte scale % STRUCTURED DATA UNSTRUCTURED DATA 5
6 Why Use Hadoop? Move beyond rigid legacy frameworks Hadoop handles any data type, in any quan*ty Structured, unstructured Schema, no schema High volume, low volume All kinds of Hadoop grows with your business Proven at petabyte scale Capacity and performance grow simultaneously Leverages commodity hardware to costs Hadoop is 100% Apache licensed and open source No vendor lock- in Community development Rich ecosystem of related projects Hadoop helps you derive the complete value of all your data Drives revenue by value from data that was previously out of reach Controls costs by storing data more affordably than any other pla`orm
7 The Origins of Hadoop Launches SQL support for Hadoop Open source web crawler project created by Doug Cuang Publishes MapReduce and GFS Paper Open Source MapReduce and HDFS project created by Doug Cuang Runs 4,000- node Hadoop cluster Hadoop wins Terabyte sort benchmark Releases CDH and Cloudera Enterprise
8 Core Hadoop: HDFS Self-healing, high bandwidth HDFS HDFS breaks incoming files into blocks and stores them redundantly across the cluster. 8
9 Core Hadoop: MapReduce framework MR Processes large jobs in parallel across many nodes and combines the results. 9
10 Hadoop and Databases You need Best Used For: OLAP (<1sec) ACID 100% SQL Compliance Best Used For: Structured or Not (Flexibility) Scalability of Storage/Compute Complex Data Processing 10
11 Typical Datacenter Architecture Enterprise web site Business intelligence apps Interactive database Data export OLAP load Oracle, SAP... 11
12 Adding Hadoop To The Mix Enterprise web site Dynamic OLAP queries Business intelligence apps Interactive database New data Hadoop Oracle, SAP... Recommendations, etc... 12
13 Why Cloudera? 13
14 Cloudera is in Customers and Users in Integrated Partners in banking, mobile services, defense & intelligence, media and retail depend on Cloudera all other Hadoop systems combined than across hardware, pla`orms, database and business intelligence (BI) for hardware, pla`orms, sokware and services in Training and Certification in Nodes Under Management developers, administrators and managers trained on 6 con@nents since 2009 in Open Source Contributions for developers, administrators and managers in Data Science 14
15 Experienced and Proven Across Hundreds of Deployments Copyright Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent. 15
16 The Only Vendor With a Complete Solu@on Cloudera s Distribu*on Including Apache Hadoop (CDH) Big Data storage, processing and analy@cs pla`orm based on Apache Hadoop 100% open source STORAGE COMPU- TATION ACCESS INTEGRA- TION Cloudera Enterprise 4.0 Cloudera Manager End- to- end management applica@on for the deployment and opera@on of CDH DEPLO- YMENT CONFIGUR- ATION MONITOR- ING DIAGNOS- TICS AND REPORT- ING Produc*on Support Our team of experts on call to help you meet your Service Level Agreements (SLAs) ISSUE RESOLU- TION ESCALATION PROCESSES OPTIMIZA- TION KNOW- LEDGE BASE Partner Ecosystem 250+ partners across hardware, software, platforms and services Cloudera University Equipping the Big Data workforce 12,000+ trained Professional Services Use case discovery, pilots, process & team development 16
17 Solving Problems with Hadoop 17
18 Eight Common Hadoop- able Problems 1. Modeling true risk 2. Customer churn analysis 3. Recommenda*on engine 4. PoS transac*on analysis 5. Analyzing network data to predict failure 6. Threat analysis 7. Search quality 8. Data sandbox 18
19 1. Modeling True Risk Challenge: How much risk exposure does an organiza*on really have with each customer? sources of data and across lines of business Solu*on with Hadoop: Source and aggregate disparate data sources to build data picture e.g. credit card records, call recordings, chat sessions, s, banking Structure and analyze analysis, graph pa=ern Typical Industry: Financial Services (banks, insurance companies) 19
20 2. Customer Churn Analysis Challenge: Why is an organiza*on really losing customers? Data on these factors comes from different sources Solu*on with Hadoop: Rapidly build behavioral model from disparate data sources Structure and analyze with Hadoop Traversing Graph Pa=ern Typical Industry: Financial Services 20
21 3. Engine/Ad Challenge: Using user data to predict which products to recommend Solu*on with Hadoop: Batch processing framework Allow in in parallel over large datasets Collabora*ve filtering taste from many users to predict what similar users like Typical Industry Ecommerce, Manufacturing, Retail 21
22 4. Point of Sale Analysis Challenge: Analyzing Point of Sale (PoS) data to target promo*ons and manage opera*ons Sources are complex and data volumes grow across chains of stores and other sources Solu*on with Hadoop: Batch processing framework Allow in in parallel over large datasets Paiern recogni*on over data sources to predict demand Typical Industry: Retail 22
23 5. Analyzing Network Data to Predict Failure Challenge: Analyzing real- *me data series from a network of sensors Calcula@ng average frequency is extremely tedious because of the need to analyze terabytes Solu*on with Hadoop: Take the computa*on to the data Expand from simple scans to more complex data mining Beier understand how the network reacts to fluctua*ons Discrete anomalies may, in fact, be interconnected Iden*fy leading indicators of component failure Typical Industry: U@li@es, Telecommunica@ons, Data Centers 23
24 6. Threat Analysis/Trade Surveillance Challenge: Detec*ng threats in the form of fraudulent ac*vity or aiacks Large data volumes involved Like looking for a needle in a haystack Solu*on with Hadoop: Parallel processing over huge datasets Paiern recogni*on to iden*fy anomalies, i.e., threats Typical Industry: Security, Financial Services, General: spam figh@ng, click fraud 24
25 7. Search Quality Challenge: Providing real *me meaningful search results Solu*on with Hadoop: Analyzing search aiempts in conjunc*on with structured data Paiern recogni*on Browsing pa=ern of users performing searches in different categories Typical Industry: Web, Ecommerce 25
26 8. Data Sandbox Challenge: Data Deluge Don t know what to do with the data or what analysis to run Solu*on with Hadoop: Dump all this data into an HDFS cluster Use Hadoop to start trying out different analysis on the data See paierns to derive value from data Typical Industry: Common across all industries 26
27 Orbitz: Major Online Travel Booking Service Challenge: Orbitz performs millions of searches and daily, which leads to hundreds of gigabytes of log data every day Not all of that data has value (i.e., it is logged for historic reasons) Much is quite valuable Want to capture even more data Solu*on with Hadoop: Hadoop provides Orbitz with efficient, economical, scalable, and reliable storage and processing of these large amounts of data Hadoop places no constraints on how data is processed 27
28 Before Hadoop Orbitz s data warehouse contains a full archive of all transac*ons Every booking, refund, cancella@on etc. Non- transac*onal data was thrown away because it was uneconomical to store Non-transactional Data (e.g., Searches) Transactional Data (e.g., Bookings) Data Warehouse 28
29 Aker Hadoop Hadoop was deployed late 2009/early 2010 to begin collec*ng this non- transac*onal data Orbitz has been using CDH for that period with great success. Much of this non- transac*onal data is contained in Web analy*cs logs Non-transactional Data (e.g., Searches) Transactional Data (e.g., Bookings) Hadoop Data Warehouse 29
30 What Now? Access to this non- transac*onal data enables a number of applica*ons Op@mizing hotel search E.g., op@mize hotel ranking and show consumers hotels more closely matching their preferences User specific product Recommenda@ons Web page performance tracking Analyses to op@mize search result cache performance User segments analysis, which can drive personaliza@on Lots of press coverage in June 2012: company discovered that people using Macs are willing to spend 30% more on hotels that PC users Mac users are now presented with pricier hotels first in the list 30
31 Major Bank Background 100M customers data: 2.5B records/month Card home loans, auto loans, etc. Data volume growing by hundreds of TB/year Needs to incorporate non- data as well Web clicks, check images, voice data Uses Hadoop to credit risk, fraud manage capital 31
32 Financial Regulatory Body Stringent data reliability requirements Must store seven years of data 850TB of data collected from every Wall Street trade each year Data volumes growing at 40% each year Replacing EMC Greenplum + SAN with CDH Goal is to store data from years two to seven in Hadoop Will have 5PB of data in Hadoop by the end of 2013 Cost savings predicted to be 10s of millions of dollars Applica*on performance tes*ng is showing speed gains of 20x in some cases 32
33 Leading North American Retailer Storing 400TB of data in CDH cluster Capture and analysis of data on individual customers and SKUs across 4,000 Using Hadoop for: Loyalty program and personal pricing Fraud Supply chain and and pricing overstocked items for clearance 33
34 Digital Media Company Needs to quickly and reliably process high volume clickstream and pageview data Experienced database boilenecks and reliability issues Now using CDH A cluster of just 20 nodes Inges*ng 75 million clickstream, page view, and user profile events per day 15GB of data Processes 430 million records from six million users in 11 minutes Alterna@ve solu@on would have required 10x more investment in database sokware, high- end servers, 34
35 Leader in Real- Time Technology Hundreds of customers need unique views of the data Were using Netezza; unable to run more than 2-3 big jobs per day Too expensive to scale Now using CDH Processing hundreds of jobs concurrently GB/hour per job 10TB of data per day Moving data between CDH, Netezza, and 35
36 Ne`lix Before Hadoop Nightly processing of logs Imported into a database Analysis/BI As data volume grew, it took more than 24 hours to process and load a day s worth of logs Today, an hourly Hadoop job processes logs for quicker availability to the data for analysis/bi Currently inges*ng approximately 1TB of data per day Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent. 36
37 Hadoop as Cheap Storage Yahoo Before Hadoop: $1 million for 10TB storage With Hadoop: $1 million for1 PB of storage Other Large Company Before Hadoop: $5 million to store data in Oracle With Hadoop: $240K to store the data in HDFS Facebook Hadoop as unified storage 37
38 Hadoop Jobs 38
39 The Roles People Play System Administrators Developers Analysts Data Stewards 39
40 System Administrators Required skills: Strong Linux skills Networking knowledge Understanding of hardware Job responsibili*es Install, configure and upgrade Hadoop sokware Manage hardware components Monitor the cluster Integrate with other systems (e.g., Flume and Sqoop) 40
41 Developers Required skills: Strong Java or Understanding of MapReduce and algorithms Job responsibili*es: Write, package and deploy MapReduce programs MapReduce jobs and Hive/Pig programs 41
42 Data Analyst/Business Analyst Required skills: SQL Understanding data mining Job responsibili*es: Extract intelligence from the data Write Hive and/or Pig programs 42
43 Data Steward Required skills: Data modeling and ETL skills Job responsibili*es: Cataloging the data (analogous to a librarian for books) Manage data lifecycle, reten@on Data quality control with SLAs 43
44 Combining Roles System Administrator + Steward analogous to DBA Required skills: Data modeling and ETL Scrip@ng skills Strong Linux administra@on skills Job responsibili*es: Manage data lifecycle, reten@on Data quality control with SLAs Install, configure and upgrade Hadoop sokware Manage hardware components Monitor the cluster Integrate with other systems (e.g., Flume and Sqoop) 44
45 Finding The Right People Hiring Hadoop experts Strong Hadoop skills are scarce and expensive Hadoop User Groups Key words Developers: MapReduce, Cloudera Developer for Apache Hadoop (CCDH) System Admins: distributed systems (e.g., Teradata, RedHat Cluster), Linux, Cloudera Administrator for Apache Hadoop (CCAH) Consider cross- training, especially system administrators and data librarians 45
46 Cloudera s Academic Partnership Program 46
47 Cloudera s Academic Partnerships: Overview Cloudera s Academic Partnerships (CAP) An essen@al component of Cloudera s strategy to provide comprehensive Apache Hadoop training to current and future data professionals Designed to be a mutually beneficial rela*onship Universi@es are enabled to deliver new and relevant areas of study to their students Cloudera is able to help fill the demand for qualified data professionals to help the market con@nue is explosive growth With CDH and Cloudera Manager available for free, and our curriculum and Virtual Machine, we provide universi*es the founda*on to start experimen*ng with Hadoop and developing exper*se among their students 47
48 Cloudera s Academic Partnerships: Goals Introduce students to Apache Hadoop Provide students and instructors with quality course materials and virtual machine images to complete hands- on labs Grant 50% discount on cer*fica*on costs to students associated with the program who are interested in aiemp*ng Cloudera's industry leading Hadoop cer*fica*on exams Highly recommended they take the class and a=empt the cer@fica@on exam Allow academic ins*tu*ons op*ons to augment their degree program requirements 48
49 Cloudera s Academic Partnerships: Financial Overview Cloudera does not currently charge Academic Partners for usage of the training materials This is a program designed solely to facilitate students learning of an emerging technology Our reward is helping the industry grow, and ideally the exposure to Cloudera is a posi@ve one which will be remembered when the students we service today are making decisions for their business tomorrow Instructors who are delivering the Cloudera courses are eligible for a 50% discount to commercial training courses delivered by Cloudera We want to make sure the folks leading the classes have the skillset to help their students be successful Normally we provide universi*es with courses focused on the roles of Hadoop Developer or Administrator 49
50 Ian Wrigley 50 Copyright Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
Big Data. The Big Picture. Our flexible and efficient Big Data solu9ons open the door to new opportuni9es and new business areas
Big Data The Big Picture Our flexible and efficient Big Data solu9ons open the door to new opportuni9es and new business areas What is Big Data? Big Data gets its name because that s what it is data that
The Future of Data Management
The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class
Data Management in the Cloud: Limitations and Opportunities. Annies Ductan
Data Management in the Cloud: Limitations and Opportunities Annies Ductan Discussion Outline: Introduc)on Overview Vision of Cloud Compu8ng Managing Data in The Cloud Cloud Characteris8cs Data Management
Linux Clusters Ins.tute: Turning HPC cluster into a Big Data Cluster. A Partnership for an Advanced Compu@ng Environment (PACE) OIT/ART, Georgia Tech
Linux Clusters Ins.tute: Turning HPC cluster into a Big Data Cluster Fang (Cherry) Liu, PhD [email protected] A Partnership for an Advanced Compu@ng Environment (PACE) OIT/ART, Georgia Tech Targets
Using RDBMS, NoSQL or Hadoop?
Using RDBMS, NoSQL or Hadoop? DOAG Conference 2015 Jean- Pierre Dijcks Big Data Product Management Server Technologies Copyright 2014 Oracle and/or its affiliates. All rights reserved. Data Ingest 2 Ingest
Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing
Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Wayne W. Eckerson Director of Research, TechTarget Founder, BI Leadership Forum Business Analytics
Big Data Realities Hadoop in the Enterprise Architecture
Big Data Realities Hadoop in the Enterprise Architecture Paul Phillips Director, EMEA, Hortonworks [email protected] +44 (0)777 444 3857 Hortonworks Inc. 2012 Page 1 Agenda The Growth of Enterprise
Apache Hadoop: The Pla/orm for Big Data. Amr Awadallah CTO, Founder, Cloudera, Inc. [email protected], twicer: @awadallah
Apache Hadoop: The Pla/orm for Big Data Amr Awadallah CTO, Founder, Cloudera, Inc. [email protected], twicer: @awadallah 1 The Problems with Current Data Systems BI Reports + Interac7ve Apps RDBMS (aggregated
NextGen Infrastructure for Big DATA Analytics.
NextGen Infrastructure for Big DATA Analytics. So What is Big Data? Data that exceeds the processing capacity of conven4onal database systems. The data is too big, moves too fast, or doesn t fit the structures
Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster. Nov 7, 2012
Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster Nov 7, 2012 Who I Am Robert Lancaster Solutions Architect, Hotel Supply Team [email protected] @rob1lancaster Organizer of Chicago
Texas Digital Government Summit. Data Analysis Structured vs. Unstructured Data. Presented By: Dave Larson
Texas Digital Government Summit Data Analysis Structured vs. Unstructured Data Presented By: Dave Larson Speaker Bio Dave Larson Solu6ons Architect with Freeit Data Solu6ons In the IT industry for over
Performance Management in Big Data Applica6ons. Michael Kopp, Technology Strategist @mikopp
Performance Management in Big Data Applica6ons Michael Kopp, Technology Strategist NoSQL: High Volume/Low Latency DBs Web Java Key Challenges 1) Even Distribu6on 2) Correct Schema and Access paperns 3)
Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database
Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica
HDP Enabling the Modern Data Architecture
HDP Enabling the Modern Data Architecture Herb Cunitz President, Hortonworks Page 1 Hortonworks enables adoption of Apache Hadoop through HDP (Hortonworks Data Platform) Founded in 2011 Original 24 architects,
Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc. 2011 2014. All Rights Reserved
Hortonworks & SAS Analytics everywhere. Page 1 A change in focus. A shift in Advertising From mass branding A shift in Financial Services From Educated Investing A shift in Healthcare From mass treatment
White Paper: What You Need To Know About Hadoop
CTOlabs.com White Paper: What You Need To Know About Hadoop June 2011 A White Paper providing succinct information for the enterprise technologist. Inside: What is Hadoop, really? Issues the Hadoop stack
DNS Big Data Analy@cs
Klik om de s+jl te bewerken Klik om de models+jlen te bewerken! Tweede niveau! Derde niveau! Vierde niveau DNS Big Data Analy@cs Vijfde niveau DNS- OARC Fall 2015 Workshop October 4th 2015 Maarten Wullink,
HDP Hadoop From concept to deployment.
HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some
Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014
Forecast of Big Data Trends Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014 Big Data transforms Business 2 Data created every minute Source http://mashable.com/2012/06/22/data-created-every-minute/
Open source Google-style large scale data analysis with Hadoop
Open source Google-style large scale data analysis with Hadoop Ioannis Konstantinou Email: [email protected] Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory School of Electrical
BIG DATA TRENDS AND TECHNOLOGIES
BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.
Advanced In-Database Analytics
Advanced In-Database Analytics Tallinn, Sept. 25th, 2012 Mikko-Pekka Bertling, BDM Greenplum EMEA 1 That sounds complicated? 2 Who can tell me how best to solve this 3 What are the main mathematical functions??
Information Architecture
The Bloor Group Actian and The Big Data Information Architecture WHITE PAPER The Actian Big Data Information Architecture Actian and The Big Data Information Architecture Originally founded in 2005 to
Tap into Hadoop and Other No SQL Sources
Tap into Hadoop and Other No SQL Sources Presented by: Trishla Maru What is Big Data really? The Three Vs of Big Data According to Gartner Volume Volume Orders of magnitude bigger than conventional data
Big Data Big Data/Data Analytics & Software Development
Big Data Big Data/Data Analytics & Software Development Danairat T. [email protected], 081-559-1446 1 Agenda Big Data Overview Business Cases and Benefits Hadoop Technology Architecture Big Data Development
The Future of Data Management with Hadoop and the Enterprise Data Hub
The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah Cofounder & CTO, Cloudera, Inc. Twitter: @awadallah 1 2 Cloudera Snapshot Founded 2008, by former employees of Employees
W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract
W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the
AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW
AGENDA What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story Hadoop PDW Our BIG DATA Roadmap BIG DATA? Volume 59% growth in annual WW information 1.2M Zetabytes (10 21 bytes) this
Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP
Big-Data and Hadoop Developer Training with Oracle WDP What is this course about? Big Data is a collection of large and complex data sets that cannot be processed using regular database management tools
Big Data + Big Analytics Transforming the way you do business
Big Data + Big Analytics Transforming the way you do business Bryan Harris Chief Technology Officer VSTI A SAS Company 1 AGENDA Lets get Real Beyond the Buzzwords Who is SAS? Our PerspecDve of Big Data
Bringing the Power of SAS to Hadoop
Bringing the Power of SAS to Hadoop Combine SAS World-Class Analytic Strength with Hadoop s Low-Cost, High-Performance Data Storage and Processing to Get Better Answers, Faster WHITE PAPER SAS White Paper
A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani
A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani Technical Architect - Big Data Syntel Agenda Welcome to the Zoo! Evolution Timeline Traditional BI/DW Architecture Where Hadoop Fits In 2 Welcome to
BIG DATA AND THE ENTERPRISE DATA WAREHOUSE WORKSHOP
BIG DATA AND THE ENTERPRISE DATA WAREHOUSE WORKSHOP Business Analytics for All Amsterdam - 2015 Value of Big Data is Being Recognized Executives beginning to see the path from data insights to revenue
More Data in Less Time
More Data in Less Time Leveraging Cloudera CDH as an Operational Data Store Daniel Tydecks, Systems Engineering DACH & CE Goals of an Operational Data Store Load Data Sources Traditional Architecture Operational
Real World Big Data Architecture - Splunk, Hadoop, RDBMS
Copyright 2015 Splunk Inc. Real World Big Data Architecture - Splunk, Hadoop, RDBMS Raanan Dagan, Big Data Specialist, Splunk Disclaimer During the course of this presentagon, we may make forward looking
#TalendSandbox for Big Data
Evalua&on von Apache Hadoop mit der #TalendSandbox for Big Data Julien Clarysse @whatdoesdatado @talend 2015 Talend Inc. 1 Connecting the Data-Driven Enterprise 2 Talend Overview Founded in 2006 BRAND
Hadoop and Map-Reduce. Swati Gore
Hadoop and Map-Reduce Swati Gore Contents Why Hadoop? Hadoop Overview Hadoop Architecture Working Description Fault Tolerance Limitations Why Map-Reduce not MPI Distributed sort Why Hadoop? Existing Data
Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect
Big Data & QlikView Democratizing Big Data Analytics David Freriks Principal Solution Architect TDWI Vancouver Agenda What really is Big Data? How do we separate hype from reality? How does that relate
Hadoop: Distributed Data Processing. Amr Awadallah Founder/CTO, Cloudera, Inc. ACM Data Mining SIG Thursday, January 25 th, 2010
Hadoop: Distributed Data Processing Amr Awadallah Founder/CTO, Cloudera, Inc. ACM Data Mining SIG Thursday, January 25 th, 2010 Outline Scaling for Large Data Processing What is Hadoop? HDFS and MapReduce
Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics
In Organizations Mark Vervuurt Cluster Data Science & Analytics AGENDA 1. Yellow Elephant 2. Data Ingestion & Complex Event Processing 3. SQL on Hadoop 4. NoSQL 5. InMemory 6. Data Science & Machine Learning
Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.
Collaborative Big Data Analytics 1 Big Data Is Less About Size, And More About Freedom TechCrunch!!!!!!!!! Total data: bigger than big data 451 Group Findings: Big Data Is More Extreme Than Volume Gartner!!!!!!!!!!!!!!!
SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera
SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP Eva Andreasson Cloudera Most FAQ: Super-Quick Overview! The Apache Hadoop Ecosystem a Zoo! Oozie ZooKeeper Hue Impala Solr Hive Pig Mahout HBase MapReduce
Hadoop implementation of MapReduce computational model. Ján Vaňo
Hadoop implementation of MapReduce computational model Ján Vaňo What is MapReduce? A computational model published in a paper by Google in 2004 Based on distributed computation Complements Google s distributed
Apache Hadoop's Role in Your Big Data Architecture
Apache Hadoop's Role in Your Big Data Architecture Chris Harris EMEA, Hortonworks [email protected] Twi
Hadoop, the Data Lake, and a New World of Analytics
Hadoop, the Data Lake, and a New World of Analytics Hortonworks. We do Hadoop. Spring 2014 Version 1.0 Page 1 Hortonworks Inc. 2014 Traditional Data Architecture Pressured 2.8 ZB in 2012 85% from New Data
MySQL and Hadoop: Big Data Integration. Shubhangi Garg & Neha Kumari MySQL Engineering
MySQL and Hadoop: Big Data Integration Shubhangi Garg & Neha Kumari MySQL Engineering 1Copyright 2013, Oracle and/or its affiliates. All rights reserved. Agenda Design rationale Implementation Installation
Cost-Effective Business Intelligence with Red Hat and Open Source
Cost-Effective Business Intelligence with Red Hat and Open Source Sherman Wood Director, Business Intelligence, Jaspersoft September 3, 2009 1 Agenda Introductions Quick survey What is BI?: reporting,
Datenverwaltung im Wandel - Building an Enterprise Data Hub with
Datenverwaltung im Wandel - Building an Enterprise Data Hub with Cloudera Bernard Doering Regional Director, Central EMEA, Cloudera Cloudera Your Hadoop Experts Founded 2008, by former employees of Employees
Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum
Big Data Analytics with EMC Greenplum and Hadoop Big Data Analytics with EMC Greenplum and Hadoop Ofir Manor Pre Sales Technical Architect EMC Greenplum 1 Big Data and the Data Warehouse Potential All
SQream Technologies Ltd - Confiden7al
SQream Technologies Ltd - Confiden7al 1 Ge#ng Big Data Done On a GPU- Based Database Ori Netzer VP Product 26- Mar- 14 Analy7cs Performance - 3 TB, 18 Billion records SQream Database 400x More Cost Efficient!
Large scale processing using Hadoop. Ján Vaňo
Large scale processing using Hadoop Ján Vaňo What is Hadoop? Software platform that lets one easily write and run applications that process vast amounts of data Includes: MapReduce offline computing engine
Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.
Big Data Hadoop Administration and Developer Course This course is designed to understand and implement the concepts of Big data and Hadoop. This will cover right from setting up Hadoop environment in
Changing the face of Business Intelligence & Information Management
1300 530 335 [email protected] www.c3businesssolutions.com GPO Box 589 Melbourne VIC 3001 Australia ABN 35 122 885 465 White Paper Big Data Changing the face of Business Intelligence & Information
Big Data: Tools and Technologies in Big Data
Big Data: Tools and Technologies in Big Data Jaskaran Singh Student Lovely Professional University, Punjab Varun Singla Assistant Professor Lovely Professional University, Punjab ABSTRACT Big data can
Big Data and Its Impact on the Data Warehousing Architecture
Big Data and Its Impact on the Data Warehousing Architecture Sponsored by SAP Speaker: Wayne Eckerson, Director of Research, TechTarget Wayne Eckerson: Hi my name is Wayne Eckerson, I am Director of Research
Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12
Hadoop http://hadoop.apache.org/ What Is Apache Hadoop? The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using
Hadoop and Relational Database The Best of Both Worlds for Analytics Greg Battas Hewlett Packard
Hadoop and Relational base The Best of Both Worlds for Analytics Greg Battas Hewlett Packard The Evolution of Analytics Mainframe EDW Proprietary MPP Unix SMP MPP Appliance Hadoop? Questions Is Hadoop
Big Data Too Big To Ignore
Big Data Too Big To Ignore Geert! Big Data Consultant and Manager! Currently finishing a 3 rd Big Data project! IBM & Cloudera Certified! IBM & Microsoft Big Data Partner 2 Agenda! Defining Big Data! Introduction
Chapter 7. Using Hadoop Cluster and MapReduce
Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in
#mstrworld. Tapping into Hadoop and NoSQL Data Sources in MicroStrategy. Presented by: Trishla Maru. #mstrworld
Tapping into Hadoop and NoSQL Data Sources in MicroStrategy Presented by: Trishla Maru Agenda Big Data Overview All About Hadoop What is Hadoop? How does MicroStrategy connects to Hadoop? Customer Case
Beyond Web Application Log Analysis using Apache TM Hadoop. A Whitepaper by Orzota, Inc.
Beyond Web Application Log Analysis using Apache TM Hadoop A Whitepaper by Orzota, Inc. 1 Web Applications As more and more software moves to a Software as a Service (SaaS) model, the web application has
Hadoop Trends and Practical Use Cases. April 2014
Hadoop Trends and Practical Use Cases John Howey Cloudera [email protected] Kevin Lewis Cloudera [email protected] April 2014 1 Agenda Hadoop Overview Latest Trends in Hadoop Enterprise Ready Beyond
Cloudera Enterprise Data Hub in Telecom:
Cloudera Enterprise Data Hub in Telecom: Three Customer Case Studies Version: 103 Table of Contents Introduction 3 Cloudera Enterprise Data Hub for Telcos 4 Cloudera Enterprise Data Hub in Telecom: Customer
Implement Hadoop jobs to extract business value from large and varied data sets
Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to
Building Your Big Data Team
Building Your Big Data Team With all the buzz around Big Data, many companies have decided they need some sort of Big Data initiative in place to stay current with modern data management requirements.
Big Data at Cloud Scale
Big Data at Cloud Scale Pushing the limits of flexible & powerful analytics Copyright 2015 Pentaho Corporation. Redistribution permitted. All trademarks are the property of their respective owners. For
The 3 questions to ask yourself about BIG DATA
The 3 questions to ask yourself about BIG DATA Do you have a big data problem? Companies looking to tackle big data problems are embarking on a journey that is full of hype, buzz, confusion, and misinformation.
An Open Dynamic Big Data Driven Applica3on System Toolkit
An Open Dynamic Big Data Driven Applica3on System Toolkit Craig C. Douglas University of Wyoming and KAUST This research is supported in part by the Na3onal Science Founda3on and King Abdullah University
Manifest for Big Data Pig, Hive & Jaql
Manifest for Big Data Pig, Hive & Jaql Ajay Chotrani, Priyanka Punjabi, Prachi Ratnani, Rupali Hande Final Year Student, Dept. of Computer Engineering, V.E.S.I.T, Mumbai, India Faculty, Computer Engineering,
Virtualizing Apache Hadoop. June, 2012
June, 2012 Table of Contents EXECUTIVE SUMMARY... 3 INTRODUCTION... 3 VIRTUALIZING APACHE HADOOP... 4 INTRODUCTION TO VSPHERE TM... 4 USE CASES AND ADVANTAGES OF VIRTUALIZING HADOOP... 4 MYTHS ABOUT RUNNING
Data processing goes big
Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,
Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data
Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give
Data Center Evolu.on and the Cloud. Paul A. Strassmann George Mason University November 5, 2008, 7:20 to 10:00 PM
Data Center Evolu.on and the Cloud Paul A. Strassmann George Mason University November 5, 2008, 7:20 to 10:00 PM 1 Hardware Evolu.on 2 Where is hardware going? x86 con(nues to move upstream Massive compute
Data Refinery with Big Data Aspects
International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 7 (2013), pp. 655-662 International Research Publications House http://www. irphouse.com /ijict.htm Data
Getting Started with Hadoop. Raanan Dagan Paul Tibaldi
Getting Started with Hadoop Raanan Dagan Paul Tibaldi What is Apache Hadoop? Hadoop is a platform for data storage and processing that is Scalable Fault tolerant Open source CORE HADOOP COMPONENTS Hadoop
MySQL and Hadoop. Percona Live 2014 Chris Schneider
MySQL and Hadoop Percona Live 2014 Chris Schneider About Me Chris Schneider, Database Architect @ Groupon Spent the last 10 years building MySQL architecture for multiple companies Worked with Hadoop for
BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014
BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014 Ralph Kimball Associates 2014 The Data Warehouse Mission Identify all possible enterprise data assets Select those assets
Information Builders Mission & Value Proposition
Value 10/06/2015 2015 MapR Technologies 2015 MapR Technologies 1 Information Builders Mission & Value Proposition Economies of Scale & Increasing Returns (Note: Not to be confused with diminishing returns
Financial, Telco, Retail, & Manufacturing: Hadoop Business Services for Industries
Financial, Telco, Retail, & Manufacturing: Hadoop Business Services for Industries Ho Wing Leong, ASEAN 1 Cloudera company snapshot Founded Company Employees Today World Class Support Mission CriQcal 2008,
Executive Summary... 2 Introduction... 3. Defining Big Data... 3. The Importance of Big Data... 4 Building a Big Data Platform...
Executive Summary... 2 Introduction... 3 Defining Big Data... 3 The Importance of Big Data... 4 Building a Big Data Platform... 5 Infrastructure Requirements... 5 Solution Spectrum... 6 Oracle s Big Data
Ten Common Hadoopable Problems
XML Impacting the Enterprise Tapping into the Power of XML: Five Success Stories Ten Common Hadoopable Problems Real-World Hadoop Use Cases Ten Common Hadoopable Problems Real-World Hadoop Use Cases Table
Big Data Success Step 1: Get the Technology Right
Big Data Success Step 1: Get the Technology Right TOM MATIJEVIC Director, Business Development ANDY MCNALIS Director, Data Management & Integration MetaScale is a subsidiary of Sears Holdings Corporation
Big Data Course Highlights
Big Data Course Highlights The Big Data course will start with the basics of Linux which are required to get started with Big Data and then slowly progress from some of the basics of Hadoop/Big Data (like
Big Data for Investment Research Management
IDT Partners www.idtpartners.com Big Data for Investment Research Management Discover how IDT Partners helps Financial Services, Market Research, and Investment Management firms turn big data into actionable
Hadoop Introduction. 2012 coreservlets.com and Dima May. 2012 coreservlets.com and Dima May
2012 coreservlets.com and Dima May Hadoop Introduction Originals of slides and source code for examples: http://www.coreservlets.com/hadoop-tutorial/ Also see the customized Hadoop training courses (onsite
BIG DATA What it is and how to use?
BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14
Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time?
Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time? Kai Wähner [email protected] @KaiWaehner www.kai-waehner.de Disclaimer! These opinions are my own and do not necessarily
Big Data and Industrial Internet
Big Data and Industrial Internet Keijo Heljanko Department of Computer Science and Helsinki Institute for Information Technology HIIT School of Science, Aalto University [email protected] 16.6-2015
BIRT in the World of Big Data
BIRT in the World of Big Data David Rosenbacher VP Sales Engineering Actuate Corporation 2013 Actuate Customer Days Today s Agenda and Goals Introduction to Big Data Compare with Regular Data Common Approaches
White Paper: Evaluating Big Data Analytical Capabilities For Government Use
CTOlabs.com White Paper: Evaluating Big Data Analytical Capabilities For Government Use March 2012 A White Paper providing context and guidance you can use Inside: The Big Data Tool Landscape Big Data
Apache Hadoop: Past, Present, and Future
The 4 th China Cloud Computing Conference May 25 th, 2012. Apache Hadoop: Past, Present, and Future Dr. Amr Awadallah Founder, Chief Technical Officer [email protected], twitter: @awadallah Hadoop Past
