# Not a part of 1Z0-061 or 1Z0-144 Certification test, but very important technology in BIG DATA Analysis

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "# Not a part of 1Z0-061 or 1Z0-144 Certification test, but very important technology in BIG DATA Analysis"

Transcription

1 Section 9 : Case Study # Objectives of this Session The Motivation For Hadoop What problems exist with traditional large-scale computing systems What requirements an alternative approach should have How Hadoop addresses those requirements Hadoop: Basic Concepts What Is Hadoop? The Hadoop Distributed File System (HDFS) How Google MapReduce Algorithm works Anatomy of a Hadoop Cluster Who uses Hadoop? db.suven.net # Not a part of 1Z0-061 or 1Z0-144 Certification test, but very important technology in BIG DATA Analysis compiled by Rocky Jagtiani Tech Head for 1

2 Objectives of this Session contd Hadoop Solutions The most common problems Hadoop can solve The types of analytics often performed with Hadoop Where the data comes from? The benefits of analyzing data with Hadoop How some real-world companies use Hadoop Hadoop Ecosystem Cloudera Software (All Open-Source) compiled by Rocky Jagtiani Tech Head for 2

3 The Motivation For Hadoop compiled by Rocky Jagtiani Tech Head for 3

4 * MPI: Message Passing Interface PVM: Parallel Virtual Machine compiled by Rocky Jagtiani Tech Head for 4

5 Major Problem compiled by Rocky Jagtiani Tech Head for 5

6 1 GB = 1000 MB, 1 TB = 1000 GB, 1 PT = 1000 TB, 1 Exabyte = 1000 PT PT => petabyte, TB => terabyte compiled by Rocky Jagtiani Tech Head for 6

7 compiled by Rocky Jagtiani Tech Head for 7

8 The Motivation For Hadoop compiled by Rocky Jagtiani Tech Head for 8

9 1. 2. compiled by Rocky Jagtiani Tech Head for 9

10 compiled by Rocky Jagtiani Tech Head for

11 Hadoop History compiled by Rocky Jagtiani Tech Head for 11

12 Core Hadoop Concepts compiled by Rocky Jagtiani Tech Head for 12

13 Hadoop Components compiled by Rocky Jagtiani Tech Head for 13

14 HDFS compiled by Rocky Jagtiani Tech Head for 14

15 HDFS Concepts compiled by Rocky Jagtiani Tech Head for 15

16 HDFS : How Files Are Stored? compiled by Rocky Jagtiani Tech Head for 16

17 How Files Are Stored: Example compiled by Rocky Jagtiani Tech Head for 17

18 IMP : How MapReduce Work? compiled by Rocky Jagtiani Tech Head for 18

19 MapReduce: The Mapper compiled by Rocky Jagtiani Tech Head for 19

20 Example : compiled by Rocky Jagtiani Tech Head for 20

21 compiled by Rocky Jagtiani Tech Head for 21

22 compiled by Rocky Jagtiani Tech Head for 22

23 compiled by Rocky Jagtiani Tech Head for 23

24 compiled by Rocky Jagtiani Tech Head for 24

25 Anatomy of a Hadoop Cluster : compiled by Rocky Jagtiani Tech Head for 25

26 compiled by Rocky Jagtiani Tech Head for 26

27 compiled by Rocky Jagtiani Tech Head for 27

28 Who uses Hadoop? compiled by Rocky Jagtiani Tech Head for 28

29 Hadoop Solutions compiled by Rocky Jagtiani Tech Head for 29

30 A compiled by Rocky Jagtiani Tech Head for 30

31 B What is Problem if the data is coming? compiled by Rocky Jagtiani Tech Head for 31

32 C compiled by Rocky Jagtiani Tech Head for 32

33 D The most common problems Hadoop can solve : We understand how each problem is solved using Hadoop in brief compiled by Rocky Jagtiani Tech Head for 33

34 compiled by Rocky Jagtiani Tech Head for 34

35 compiled by Rocky Jagtiani Tech Head for 35

36 compiled by Rocky Jagtiani Tech Head for 36

37 compiled by Rocky Jagtiani Tech Head for 37

38 compiled by Rocky Jagtiani Tech Head for 38

39 compiled by Rocky Jagtiani Tech Head for 39

40 compiled by Rocky Jagtiani Tech Head for 40

41 compiled by Rocky Jagtiani Tech Head for 41

42 E How some real-world companies use Hadoop compiled by Rocky Jagtiani Tech Head for 42

43 Hadoop Ecosystem compiled by Rocky Jagtiani Tech Head for 43

44 Cloudera Software (All Open-Source) compiled by Rocky Jagtiani Tech Head for 44

45 Conclusion : *enterprise data warehouse (EDW) compiled by Rocky Jagtiani Tech Head for 45

46 Questions 1) Input to mapper is "Google is one of the richest companies " "one who works with the Google is technical expert " what will be the out put after reducing? compiled by Rocky Jagtiani Tech Head for 46

47 2) Input to mapper is "Cat is eating milk" "Cat is very sweet and she likes milk" "milk is in bottle" what will be the out put after reducing? compiled by Rocky Jagtiani Tech Head for 47

48 3) Input to mapper is "Dollar is national currency for USA" "Rupee is national currency for India" "Dollar is ahead of Rupee in economy" "India is developing country" what will be the out put after Mapping? compiled by Rocky Jagtiani Tech Head for 48

49 what will be the out put after shuffling? what will be the out put after reducing? compiled by Rocky Jagtiani Tech Head for 49

SAS BIG DATA SOLUTIONS ON AWS SAS FORUM ESPAÑA, OCTOBER 16 TH, 2014 IAN MEYERS SOLUTIONS ARCHITECT / AMAZON WEB SERVICES

SAS BIG DATA SOLUTIONS ON AWS SAS FORUM ESPAÑA, OCTOBER 16 TH, 2014 IAN MEYERS SOLUTIONS ARCHITECT / AMAZON WEB SERVICES SAS BIG DATA SOLUTIONS ON AWS SAS FORUM ESPAÑA, OCTOBER 16 TH, 2014 IAN MEYERS SOLUTIONS ARCHITECT / AMAZON WEB SERVICES AWS GLOBAL INFRASTRUCTURE 10 Regions 25 Availability Zones 51 Edge locations WHAT

More information

Native Connectivity to Big Data Sources in MicroStrategy 10. Presented by: Raja Ganapathy

Native Connectivity to Big Data Sources in MicroStrategy 10. Presented by: Raja Ganapathy Native Connectivity to Big Data Sources in MicroStrategy 10 Presented by: Raja Ganapathy Agenda MicroStrategy supports several data sources, including Hadoop Why Hadoop? How does MicroStrategy Analytics

More information

The Power of Pentaho and Hadoop in Action. Demonstrating MapReduce Performance at Scale

The Power of Pentaho and Hadoop in Action. Demonstrating MapReduce Performance at Scale The Power of Pentaho and Hadoop in Action Demonstrating MapReduce Performance at Scale Introduction Over the last few years, Big Data has gone from a tech buzzword to a value generator for many organizations.

More information

ENGINE(S) BEHIND BI. Sam Tawfik sam.tawfik@teradata.com @teradata_sam

ENGINE(S) BEHIND BI. Sam Tawfik sam.tawfik@teradata.com @teradata_sam ENGINE(S) BEHIND BI Sam Tawfik sam.tawfik@teradata.com @teradata_sam Transactions and Interactions BIG DATA 2 Transaction Interaction Data Growth 10 24 10 21 10 18 10 15 10 12 10 9 Yottabyte Zettabyte

More information

Use of Hadoop File System for Nuclear Physics Analyses in STAR

Use of Hadoop File System for Nuclear Physics Analyses in STAR 1 Use of Hadoop File System for Nuclear Physics Analyses in STAR EVAN SANGALINE UC DAVIS Motivations 2 Data storage a key component of analysis requirements Transmission and storage across diverse resources

More information

Big Data Rethink Algos and Architecture. Scott Marsh Manager R&D Personal Lines Auto Pricing

Big Data Rethink Algos and Architecture. Scott Marsh Manager R&D Personal Lines Auto Pricing Big Data Rethink Algos and Architecture Scott Marsh Manager R&D Personal Lines Auto Pricing Agenda History Map Reduce Algorithms History Google talks about their solutions to their problems Map Reduce:

More information

Privacy-Preserving Data Mining

Privacy-Preserving Data Mining Privacy-Preserving Data Mining Parallel Homomorphic Encryption Mandana Bozorgi Computer Science Department Las Vegas, NV, USA Mandana.Bozorgi@unlv.edu Yoohwan Kim, Ph.D., CISSP, CISA, CEH, CPT Computer

More information

Journal of Environmental Science, Computer Science and Engineering & Technology

Journal of Environmental Science, Computer Science and Engineering & Technology JECET; March 2015-May 2015; Sec. B; Vol.4.No.2, 202-209. E-ISSN: 2278 179X Journal of Environmental Science, Computer Science and Engineering & Technology An International Peer Review E-3 Journal of Sciences

More information

A Performance Analysis of Distributed Indexing using Terrier

A Performance Analysis of Distributed Indexing using Terrier A Performance Analysis of Distributed Indexing using Terrier Amaury Couste Jakub Kozłowski William Martin Indexing Indexing Used by search

More information

Hadoop and Map-reduce computing

Hadoop and Map-reduce computing Hadoop and Map-reduce computing 1 Introduction This activity contains a great deal of background information and detailed instructions so that you can refer to it later for further activities and homework.

More information

BIG DATA TECHNOLOGY. Hadoop Ecosystem

BIG DATA TECHNOLOGY. Hadoop Ecosystem BIG DATA TECHNOLOGY Hadoop Ecosystem Agenda Background What is Big Data Solution Objective Introduction to Hadoop Hadoop Ecosystem Hybrid EDW Model Predictive Analysis using Hadoop Conclusion What is Big

More information

Open source Google-style large scale data analysis with Hadoop

Open source Google-style large scale data analysis with Hadoop Open source Google-style large scale data analysis with Hadoop Ioannis Konstantinou Email: ikons@cslab.ece.ntua.gr Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory School of Electrical

More information

BIG DATA USING HADOOP

BIG DATA USING HADOOP + Breakaway Session By Johnson Iyilade, Ph.D. University of Saskatchewan, Canada 23-July, 2015 BIG DATA USING HADOOP + Outline n Framing the Problem Hadoop Solves n Meet Hadoop n Storage with HDFS n Data

More information

BIG DATA What it is and how to use?

BIG DATA What it is and how to use? BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14

More information

An Industrial Perspective on the Hadoop Ecosystem. Eldar Khalilov Pavel Valov

An Industrial Perspective on the Hadoop Ecosystem. Eldar Khalilov Pavel Valov An Industrial Perspective on the Hadoop Ecosystem Eldar Khalilov Pavel Valov agenda 03.12.2015 2 agenda Introduction 03.12.2015 2 agenda Introduction Research goals 03.12.2015 2 agenda Introduction Research

More information

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give

More information

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 15

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 15 Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases Lecture 15 Big Data Management V (Big-data Analytics / Map-Reduce) Chapter 16 and 19: Abideboul et. Al. Demetris

More information

Application Development. A Paradigm Shift

Application Development. A Paradigm Shift Application Development for the Cloud: A Paradigm Shift Ramesh Rangachar Intelsat t 2012 by Intelsat. t Published by The Aerospace Corporation with permission. New 2007 Template - 1 Motivation for the

More information

Big Data. Lyle Ungar, University of Pennsylvania

Big Data. Lyle Ungar, University of Pennsylvania Big Data Big data will become a key basis of competition, underpinning new waves of productivity growth, innovation, and consumer surplus. McKinsey Data Scientist: The Sexiest Job of the 21st Century -

More information

Yu Xu Pekka Kostamaa Like Gao. Presented By: Sushma Ajjampur Jagadeesh

Yu Xu Pekka Kostamaa Like Gao. Presented By: Sushma Ajjampur Jagadeesh Yu Xu Pekka Kostamaa Like Gao Presented By: Sushma Ajjampur Jagadeesh Introduction Teradata s parallel DBMS can hold data sets ranging from few terabytes to multiple petabytes. Due to explosive data volume

More information

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2 Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue

More information

Big Data on Microsoft Platform

Big Data on Microsoft Platform Big Data on Microsoft Platform Prepared by GJ Srinivas Corporate TEG - Microsoft Page 1 Contents 1. What is Big Data?...3 2. Characteristics of Big Data...3 3. Enter Hadoop...3 4. Microsoft Big Data Solutions...4

More information

Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12

Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12 Hadoop http://hadoop.apache.org/ What Is Apache Hadoop? The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using

More information

The Trials and Tribulations and ultimate success of parallelisation using Hadoop within the SCAPE project

The Trials and Tribulations and ultimate success of parallelisation using Hadoop within the SCAPE project The Trials and Tribulations and ultimate success of parallelisation using Hadoop within the SCAPE project Alastair Duncan STFC Pre Coffee talk STFC July 2014 SCAPE Scalable Preservation Environments The

More information

The Rise of Industrial Big Data. Brian Courtney General Manager Industrial Data Intelligence

The Rise of Industrial Big Data. Brian Courtney General Manager Industrial Data Intelligence The Rise of Industrial Big Data Brian Courtney General Manager Industrial Data Intelligence Agenda Introduction Big Data for the industrial sector Case in point: Big data saves millions at GE Energy Seeking

More information

ITG Software Engineering

ITG Software Engineering Introduction to Cloudera Course ID: Page 1 Last Updated 12/15/2014 Introduction to Cloudera Course : This 5 day course introduces the student to the Hadoop architecture, file system, and the Hadoop Ecosystem.

More information

Hadoop vs Apache Spark

Hadoop vs Apache Spark Innovate, Integrate, Transform Hadoop vs Apache Spark www.altencalsoftlabs.com Introduction Any sufficiently advanced technology is indistinguishable from magic. said Arthur C. Clark. Big data technologies

More information

Big Data and Analytics: Getting Started with ArcGIS. Mike Park Erik Hoel

Big Data and Analytics: Getting Started with ArcGIS. Mike Park Erik Hoel Big Data and Analytics: Getting Started with ArcGIS Mike Park Erik Hoel Agenda Overview of big data Distributed computation User experience Data management Big data What is it? Big Data is a loosely defined

More information

Big Data and Analytics: A Conceptual Overview. Mike Park Erik Hoel

Big Data and Analytics: A Conceptual Overview. Mike Park Erik Hoel Big Data and Analytics: A Conceptual Overview Mike Park Erik Hoel In this technical workshop This presentation is for anyone that uses ArcGIS and is interested in analyzing large amounts of data We will

More information

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW AGENDA What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story Hadoop PDW Our BIG DATA Roadmap BIG DATA? Volume 59% growth in annual WW information 1.2M Zetabytes (10 21 bytes) this

More information

MapReduce and Hadoop Distributed File System V I J A Y R A O

MapReduce and Hadoop Distributed File System V I J A Y R A O MapReduce and Hadoop Distributed File System 1 V I J A Y R A O The Context: Big-data Man on the moon with 32KB (1969); my laptop had 2GB RAM (2009) Google collects 270PB data in a month (2007), 20000PB

More information

Big Data Explained. An introduction to Big Data Science.

Big Data Explained. An introduction to Big Data Science. Big Data Explained An introduction to Big Data Science. 1 Presentation Agenda What is Big Data Why learn Big Data Who is it for How to start learning Big Data When to learn it Objective and Benefits of

More information

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop Lecture 32 Big Data 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop 1 2 Big Data Problems Data explosion Data from users on social

More information

Mining Large Datasets: Case of Mining Graph Data in the Cloud

Mining Large Datasets: Case of Mining Graph Data in the Cloud Mining Large Datasets: Case of Mining Graph Data in the Cloud Sabeur Aridhi PhD in Computer Science with Laurent d Orazio, Mondher Maddouri and Engelbert Mephu Nguifo 16/05/2014 Sabeur Aridhi Mining Large

More information

Cloud Computing: MapReduce and Hadoop

Cloud Computing: MapReduce and Hadoop Cloud Computing: MapReduce and Hadoop June 2010 Marcel Kunze, Research Group Cloud Computing KIT University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association www.kit.edu

More information

Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time?

Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time? Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time? Kai Wähner kwaehner@tibco.com @KaiWaehner www.kai-waehner.de Disclaimer! These opinions are my own and do not necessarily

More information

MapReduce and Hadoop Distributed File System

MapReduce and Hadoop Distributed File System MapReduce and Hadoop Distributed File System 1 B. RAMAMURTHY Contact: Dr. Bina Ramamurthy CSE Department University at Buffalo (SUNY) bina@buffalo.edu http://www.cse.buffalo.edu/faculty/bina Partially

More information

MapReduce on GPUs. Amit Sabne, Ahmad Mujahid Mohammed Razip, Kun Xu

MapReduce on GPUs. Amit Sabne, Ahmad Mujahid Mohammed Razip, Kun Xu 1 MapReduce on GPUs Amit Sabne, Ahmad Mujahid Mohammed Razip, Kun Xu 2 MapReduce MAP Shuffle Reduce 3 Hadoop Open-source MapReduce framework from Apache, written in Java Used by Yahoo!, Facebook, Ebay,

More information

Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics

Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics Dr. Liangxiu Han Future Networks and Distributed Systems Group (FUNDS) School of Computing, Mathematics and Digital Technology,

More information

Testing 3Vs (Volume, Variety and Velocity) of Big Data

Testing 3Vs (Volume, Variety and Velocity) of Big Data Testing 3Vs (Volume, Variety and Velocity) of Big Data 1 A lot happens in the Digital World in 60 seconds 2 What is Big Data Big Data refers to data sets whose size is beyond the ability of commonly used

More information

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect Big Data & QlikView Democratizing Big Data Analytics David Freriks Principal Solution Architect TDWI Vancouver Agenda What really is Big Data? How do we separate hype from reality? How does that relate

More information

What are Hadoop and MapReduce and how did we get here?

What are Hadoop and MapReduce and how did we get here? What are Hadoop and MapReduce and how did we get here? Term Big Data coined in 2005 by Roger Magoulas of O Reilly Media But as the idea of big data sets evolved on the Web, organizations began to wonder

More information

BIG DATA CHALLENGES AND PERSPECTIVES

BIG DATA CHALLENGES AND PERSPECTIVES BIG DATA CHALLENGES AND PERSPECTIVES Meenakshi Sharma 1, Keshav Kishore 2 1 Student of Master of Technology, 2 Head of Department, Department of Computer Science and Engineering, A P Goyal Shimla University,

More information

Machine Learning and Cloud Computing. trends, issues, solutions. EGI-InSPIRE RI-261323

Machine Learning and Cloud Computing. trends, issues, solutions. EGI-InSPIRE RI-261323 Machine Learning and Cloud Computing trends, issues, solutions Daniel Pop HOST Workshop 2012 Future plans // Tools and methods Develop software package(s)/libraries for scalable, intelligent algorithms

More information

TUT NoSQL Seminar (Oracle) Big Data

TUT NoSQL Seminar (Oracle) Big Data Timo Raitalaakso +358 40 848 0148 rafu@solita.fi TUT NoSQL Seminar (Oracle) Big Data 11.12.2012 Timo Raitalaakso MSc 2000 Work: Solita since 2001 Senior Database Specialist Oracle ACE 2012 Blog: http://rafudb.blogspot.com

More information

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica

More information

Attaching Cloud Storage to a Campus Grid Using Parrot, Chirp, and Hadoop

Attaching Cloud Storage to a Campus Grid Using Parrot, Chirp, and Hadoop Attaching Cloud Storage to a Campus Grid Using Parrot, Chirp, and Hadoop Patrick Donnelly, Peter Bui, Douglas Thain Computer Science and Engineering University of Notre Dame pdonnel3@nd.edu pbui@nd.edu

More information

Introduction to Apache Hadoop

Introduction to Apache Hadoop Introduction to Apache Hadoop 201203 01-1 Chapter 1 Introduction 01-2 Introduction Course Logistics About Apache Hadoop About Cloudera Conclusion 01-3 Logistics! Course start and end time! Breaks! Restrooms

More information

CIS 4930/6930 Spring 2014 Introduction to Data Science /Data Intensive Computing. University of Florida, CISE Department Prof.

CIS 4930/6930 Spring 2014 Introduction to Data Science /Data Intensive Computing. University of Florida, CISE Department Prof. CIS 4930/6930 Spring 2014 Introduction to Data Science /Data Intensie Computing Uniersity of Florida, CISE Department Prof. Daisy Zhe Wang Map/Reduce: Simplified Data Processing on Large Clusters Parallel/Distributed

More information

Implement Hadoop jobs to extract business value from large and varied data sets

Implement Hadoop jobs to extract business value from large and varied data sets Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to

More information

Large-Scale Data Processing

Large-Scale Data Processing Large-Scale Data Processing Eiko Yoneki eiko.yoneki@cl.cam.ac.uk http://www.cl.cam.ac.uk/~ey204 Systems Research Group University of Cambridge Computer Laboratory 2010s: Big Data Why Big Data now? Increase

More information

Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster. Nov 7, 2012

Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster. Nov 7, 2012 Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster Nov 7, 2012 Who I Am Robert Lancaster Solutions Architect, Hotel Supply Team rlancaster@orbitz.com @rob1lancaster Organizer of Chicago

More information

Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN

Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN Hadoop MPDL-Frühstück 9. Dezember 2013 MPDL INTERN Understanding Hadoop Understanding Hadoop What's Hadoop about? Apache Hadoop project (started 2008) downloadable open-source software library (current

More information

Native Connectivity to Big Data Sources in MSTR 10

Native Connectivity to Big Data Sources in MSTR 10 Native Connectivity to Big Data Sources in MSTR 10 Bring All Relevant Data to Decision Makers Support for More Big Data Sources Optimized Access to Your Entire Big Data Ecosystem as If It Were a Single

More information

DATA MINING WITH HADOOP AND HIVE Introduction to Architecture

DATA MINING WITH HADOOP AND HIVE Introduction to Architecture DATA MINING WITH HADOOP AND HIVE Introduction to Architecture Dr. Wlodek Zadrozny (Most slides come from Prof. Akella s class in 2014) 2015-2025. Reproduction or usage prohibited without permission of

More information

Laurence Liew General Manager, APAC. Economics Is Driving Big Data Analytics to the Cloud

Laurence Liew General Manager, APAC. Economics Is Driving Big Data Analytics to the Cloud Laurence Liew General Manager, APAC Economics Is Driving Big Data Analytics to the Cloud Big Data 101 The Analytics Stack Economics of Big Data Convergence of the 3 forces Big Data Analytics in the Cloud

More information

Lambda Architecture. CSCI 5828: Foundations of Software Engineering Lecture 29 12/09/2014

Lambda Architecture. CSCI 5828: Foundations of Software Engineering Lecture 29 12/09/2014 Lambda Architecture CSCI 5828: Foundations of Software Engineering Lecture 29 12/09/2014 1 Goals Cover the material in Chapter 8 of the Concurrency Textbook The Lambda Architecture Batch Layer MapReduce

More information

Hadoop Big Data for Processing Data and Performing Workload

Hadoop Big Data for Processing Data and Performing Workload Hadoop Big Data for Processing Data and Performing Workload Girish T B 1, Shadik Mohammed Ghouse 2, Dr. B. R. Prasad Babu 3 1 M Tech Student, 2 Assosiate professor, 3 Professor & Head (PG), of Computer

More information

Tap into Hadoop and Other No SQL Sources

Tap into Hadoop and Other No SQL Sources Tap into Hadoop and Other No SQL Sources Presented by: Trishla Maru What is Big Data really? The Three Vs of Big Data According to Gartner Volume Volume Orders of magnitude bigger than conventional data

More information

BigMemory and Hadoop: Powering the Real-time Intelligent Enterprise

BigMemory and Hadoop: Powering the Real-time Intelligent Enterprise WHITE PAPER and Hadoop: Powering the Real-time Intelligent Enterprise BIGMEMORY: IN-MEMORY DATA MANAGEMENT FOR THE REAL-TIME ENTERPRISE Terracotta is the solution of choice for enterprises seeking the

More information

Hadoop Cluster Applications

Hadoop Cluster Applications Hadoop Overview Data analytics has become a key element of the business decision process over the last decade. Classic reporting on a dataset stored in a database was sufficient until recently, but yesterday

More information

What is Hadoop? The webinar will begin at 3pm

What is Hadoop? The webinar will begin at 3pm What is Hadoop? The webinar will begin at 3pm You now have a menu in the top right corner of your screen. The red button with a white arrow allows you to expand and contract the webinar menu, in which

More information

HiBench Introduction. Carson Wang (carson.wang@intel.com) Software & Services Group

HiBench Introduction. Carson Wang (carson.wang@intel.com) Software & Services Group HiBench Introduction Carson Wang (carson.wang@intel.com) Agenda Background Workloads Configurations Benchmark Report Tuning Guide Background WHY Why we need big data benchmarking systems? WHAT What is

More information

Big Data Big Deal? Salford Systems www.salford-systems.com

Big Data Big Deal? Salford Systems www.salford-systems.com Big Data Big Deal? Salford Systems www.salford-systems.com 2015 Copyright Salford Systems 2010-2015 Big Data Is The New In Thing Google trends as of September 24, 2015 Difficult to read trade press without

More information

Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru

Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy Presented by: Jeffrey Zhang and Trishla Maru Agenda Big Data Overview All About Hadoop What is Hadoop? How does MicroStrategy connects to Hadoop?

More information

Oracle Big Data Fundamentals Ed 1

Oracle Big Data Fundamentals Ed 1 Oracle University Contact Us: 001-855-844-3881 Oracle Big Data Fundamentals Ed 1 Duration: 5 Days What you will learn In the Oracle Big Data Fundamentals course, learn to use Oracle's Integrated Big Data

More information

Introduction to Hadoop

Introduction to Hadoop Introduction to Hadoop Miles Osborne School of Informatics University of Edinburgh miles@inf.ed.ac.uk October 28, 2010 Miles Osborne Introduction to Hadoop 1 Background Hadoop Programming Model Examples

More information

Quantcast Petabyte Storage at Half Price with QFS!

Quantcast Petabyte Storage at Half Price with QFS! 9-131 Quantcast Petabyte Storage at Half Price with QFS Presented by Silvius Rus, Director, Big Data Platforms September 2013 Quantcast File System (QFS) A high performance alternative to the Hadoop Distributed

More information

Chapter 7. Using Hadoop Cluster and MapReduce

Chapter 7. Using Hadoop Cluster and MapReduce Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in

More information

Hadoop-BAM and SeqPig

Hadoop-BAM and SeqPig Hadoop-BAM and SeqPig Keijo Heljanko 1, André Schumacher 1,2, Ridvan Döngelci 1, Luca Pireddu 3, Matti Niemenmaa 1, Aleksi Kallio 4, Eija Korpelainen 4, and Gianluigi Zanetti 3 1 Department of Computer

More information

Hadoop-based Open Source ediscovery: FreeEed. (Easy as popcorn)

Hadoop-based Open Source ediscovery: FreeEed. (Easy as popcorn) + Hadoop-based Open Source ediscovery: FreeEed (Easy as popcorn) + Hello! 2 Sujee Maniyam & Mark Kerzner Founders @ Elephant Scale consulting and training around Hadoop, Big Data technologies Enterprise

More information

Hadoop 只 支 援 用 Java 開 發 嘛? Is Hadoop only support Java? 總 不 能 全 部 都 重 新 設 計 吧? 如 何 與 舊 系 統 相 容? Can Hadoop work with existing software?

Hadoop 只 支 援 用 Java 開 發 嘛? Is Hadoop only support Java? 總 不 能 全 部 都 重 新 設 計 吧? 如 何 與 舊 系 統 相 容? Can Hadoop work with existing software? Hadoop 只 支 援 用 Java 開 發 嘛? Is Hadoop only support Java? 總 不 能 全 部 都 重 新 設 計 吧? 如 何 與 舊 系 統 相 容? Can Hadoop work with existing software? 可 以 跟 資 料 庫 結 合 嘛? Can Hadoop work with Databases? 開 發 者 們 有 聽 到

More information

#mstrworld. Tapping into Hadoop and NoSQL Data Sources in MicroStrategy. Presented by: Trishla Maru. #mstrworld

#mstrworld. Tapping into Hadoop and NoSQL Data Sources in MicroStrategy. Presented by: Trishla Maru. #mstrworld Tapping into Hadoop and NoSQL Data Sources in MicroStrategy Presented by: Trishla Maru Agenda Big Data Overview All About Hadoop What is Hadoop? How does MicroStrategy connects to Hadoop? Customer Case

More information

Unstructured Data Accelerator (UDA) Author: Motti Beck, Mellanox Technologies Date: March 27, 2012

Unstructured Data Accelerator (UDA) Author: Motti Beck, Mellanox Technologies Date: March 27, 2012 Unstructured Data Accelerator (UDA) Author: Motti Beck, Mellanox Technologies Date: March 27, 2012 1 Market Trends Big Data Growing technology deployments are creating an exponential increase in the volume

More information

Oracle Big Data Fundamentals Ed 1 NEW

Oracle Big Data Fundamentals Ed 1 NEW Oracle University Contact Us: +90 212 329 6779 Oracle Big Data Fundamentals Ed 1 NEW Duration: 5 Days What you will learn In the Oracle Big Data Fundamentals course, learn to use Oracle's Integrated Big

More information

The Best Database for Hadoop

The Best Database for Hadoop The Best Database for Hadoop Justin Makeig, Director, Product Management, MarkLogic April 9, 2013 Disclaimer Forward-looking Statements All statements describing future releases and capabilities, estimated

More information

Bringing Big Data Modelling into the Hands of Domain Experts

Bringing Big Data Modelling into the Hands of Domain Experts Bringing Big Data Modelling into the Hands of Domain Experts David Willingham Senior Application Engineer MathWorks david.willingham@mathworks.com.au 2015 The MathWorks, Inc. 1 Data is the sword of the

More information

Benchmark Study on Distributed XML Filtering Using Hadoop Distribution Environment. Sanjay Kulhari, Jian Wen UC Riverside

Benchmark Study on Distributed XML Filtering Using Hadoop Distribution Environment. Sanjay Kulhari, Jian Wen UC Riverside Benchmark Study on Distributed XML Filtering Using Hadoop Distribution Environment Sanjay Kulhari, Jian Wen UC Riverside Team Sanjay Kulhari M.S. student, CS U C Riverside Jian Wen Ph.D. student, CS U

More information

A Professional Big Data Master s Program to train Computational Specialists

A Professional Big Data Master s Program to train Computational Specialists A Professional Big Data Master s Program to train Computational Specialists Anoop Sarkar, Fred Popowich, Alexandra Fedorova! School of Computing Science! Education for Employable Graduates: Critical Questions

More information

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Wayne W. Eckerson Director of Research, TechTarget Founder, BI Leadership Forum Business Analytics

More information

QUEST meeting Big Data Analytics

QUEST meeting Big Data Analytics QUEST meeting Big Data Analytics Peter Hughes Business Solutions Consultant SAS Australia/New Zealand Copyright 2015, SAS Institute Inc. All rights reserved. Big Data Analytics WHERE WE ARE NOW 2005 2007

More information

High Performance Computing with Hadoop WV HPC Summer Institute 2014

High Performance Computing with Hadoop WV HPC Summer Institute 2014 High Performance Computing with Hadoop WV HPC Summer Institute 2014 E. James Harner Director of Data Science Department of Statistics West Virginia University June 18, 2014 Outline Introduction Hadoop

More information

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 1 Hadoop: A Framework for Data- Intensive Distributed Computing CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 2 What is Hadoop? Hadoop is a software framework for distributed processing of large datasets

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A REVIEW ON HIGH PERFORMANCE DATA STORAGE ARCHITECTURE OF BIGDATA USING HDFS MS.

More information

Big Data Weather Analytics Using Hadoop

Big Data Weather Analytics Using Hadoop Big Data Weather Analytics Using Hadoop Veershetty Dagade #1 Mahesh Lagali #2 Supriya Avadhani #3 Priya Kalekar #4 Professor, Computer science and Engineering Department, Jain College of Engineering, Belgaum,

More information

From Internet Data Centers to Data Centers in the Cloud

From Internet Data Centers to Data Centers in the Cloud From Internet Data Centers to Data Centers in the Cloud This case study is a short extract from a keynote address given to the Doctoral Symposium at Middleware 2009 by Lucy Cherkasova of HP Research Labs

More information

From GWS to MapReduce: Google s Cloud Technology in the Early Days

From GWS to MapReduce: Google s Cloud Technology in the Early Days Large-Scale Distributed Systems From GWS to MapReduce: Google s Cloud Technology in the Early Days Part II: MapReduce in a Datacenter COMP6511A Spring 2014 HKUST Lin Gu lingu@ieee.org MapReduce/Hadoop

More information

Improving Data Processing Speed in Big Data Analytics Using. HDFS Method

Improving Data Processing Speed in Big Data Analytics Using. HDFS Method Improving Data Processing Speed in Big Data Analytics Using HDFS Method M.R.Sundarakumar Assistant Professor, Department Of Computer Science and Engineering, R.V College of Engineering, Bangalore, India

More information

The Hadoop Implementation. Thomas Zimmermann Philipp Berger

The Hadoop Implementation. Thomas Zimmermann Philipp Berger Link Analysis goes MapReduce The Hadoop Implementation Thomas Zimmermann Philipp Berger Flashback 2 Overview 3 1. Pre- / Postprocessing 2. Our Jobs 3. Evaluation Overview 4 1. Pre- / Postprocessing 2.

More information

Spark: Cluster Computing with Working Sets

Spark: Cluster Computing with Working Sets Spark: Cluster Computing with Working Sets Outline Why? Mesos Resilient Distributed Dataset Spark & Scala Examples Uses Why? MapReduce deficiencies: Standard Dataflows are Acyclic Prevents Iterative Jobs

More information

Apache Hadoop, Big Data, and You. November 18, 2009

Apache Hadoop, Big Data, and You.  November 18, 2009 Apache Hadoop, Big Data, and You Philip Zeyliger philip@cloudera.com @philz42 @cloudera November 18, 2009 Hi there! Software Engineer Worked at I work on stuff... Outline Why should you care? (Intro) Challenging

More information

WHITEPAPER. A Technical Perspective on the Talena Data Availability Management Solution

WHITEPAPER. A Technical Perspective on the Talena Data Availability Management Solution WHITEPAPER A Technical Perspective on the Talena Data Availability Management Solution BIG DATA TECHNOLOGY LANDSCAPE Over the past decade, the emergence of social media, mobile, and cloud technologies

More information

Reduction of Data at Namenode in HDFS using harballing Technique

Reduction of Data at Namenode in HDFS using harballing Technique Reduction of Data at Namenode in HDFS using harballing Technique Vaibhav Gopal Korat, Kumar Swamy Pamu vgkorat@gmail.com swamy.uncis@gmail.com Abstract HDFS stands for the Hadoop Distributed File System.

More information

www.pwc.com/oracle Next presentation starting soon Business Analytics using Big Data to gain competitive advantage

www.pwc.com/oracle Next presentation starting soon Business Analytics using Big Data to gain competitive advantage www.pwc.com/oracle Next presentation starting soon Business Analytics using Big Data to gain competitive advantage If every image made and every word written from the earliest stirring of civilization

More information

MapReduce, Hadoop and Amazon AWS

MapReduce, Hadoop and Amazon AWS MapReduce, Hadoop and Amazon AWS Yasser Ganjisaffar http://www.ics.uci.edu/~yganjisa February 2011 What is Hadoop? A software framework that supports data-intensive distributed applications. It enables

More information

BIG DATA TRENDS AND TECHNOLOGIES

BIG DATA TRENDS AND TECHNOLOGIES BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.

More information

Big Data: Study in Structured and Unstructured Data

Big Data: Study in Structured and Unstructured Data Big Data: Study in Structured and Unstructured Data Motashim Rasool 1, Wasim Khan 2 mail2motashim@gmail.com, khanwasim051@gmail.com Abstract With the overlay of digital world, Information is available

More information

An Introduction to MOHAMMAD REZA KARIMI DASTJERDI SPRING

An Introduction to MOHAMMAD REZA KARIMI DASTJERDI SPRING An Introduction to MOHAMMAD REZA KARIMI DASTJERDI SPRING 2015 1 Table Of Contents Introduction Problems with RDBMs What is Hadoop? Who use Hadoop? Job Positions History Hadoop Distributions Hadoop Ecosystem

More information

Application and practice of parallel cloud computing in ISP. Guangzhou Institute of China Telecom Zhilan Huang 2011-10

Application and practice of parallel cloud computing in ISP. Guangzhou Institute of China Telecom Zhilan Huang 2011-10 Application and practice of parallel cloud computing in ISP Guangzhou Institute of China Telecom Zhilan Huang 2011-10 Outline Mass data management problem Applications of parallel cloud computing in ISPs

More information

RDMA-based Big Data Analytic

RDMA-based Big Data Analytic RDMA-based Big Data Analytic Gilad Shainer Technion, March 2014 The InfiniBand Architecture Industry standard defined by the InfiniBand Trade Association Defines System Area Network architecture Comprehensive

More information