|
|
- Gary Bradley
- 8 years ago
- Views:
Transcription
1 MapReduce on
2 Big Data
3
4
5
6 Map / Reduce
7
8 Hadoop Hello world - Word count
9 Hadoop Ecosystem
10 + rmr - functions providing Hadoop MapReduce functionality in R rhdfs - functions providing file management of the HDFS from within R rhbase - functions providing database management for the HBase distributed database from within R NEW! plyrmr - higher level plyr-like data processing for structured data, powered by rmr
11 library(rhdfs) Loading required package: rjava HADOOP_CMD=/usr/bin/hadoop Be sure to run hdfs.init() hdfs.init() hdfs.ls("pig_out") permission owner group size modtime file 1 -rw-r--r-- brokaa linga_admin :46 /user/brokaa/pig_out/_success 2 drwx--x--x brokaa linga_admin :46 /user/brokaa/pig_out/_logs 3 -rw-r--r-- brokaa linga_admin :46 /user/brokaa/pig_out/partm hdfs.stat("pig_out/part-m-00000") perms isdir block replication owner group size modtime path 1 rw-r--r-- FALSE brokaa linga_admin : 15:45 pig_out/part-m pig_out = hdfs.cat("pig_out/part-m-00000") pig_out[1:4] [1] "" [2] "PROJECT GUTENBERG ETEXT OF A MIDSUMMER NIGHT'S DREAM BY SHAKESPEARE" [3] "PG HAS MULTIPLE EDITIONS OF WILLIAM SHAKESPEARE'S COMPLETE WORKS" [4] ""
12 MapReduce without Hadoop 1 # Generate some numbers small.ints = 1:10 cat(small.ints) # Map sapply(small.ints, function(x) x^2) [1] # Reduce sum(sapply(small.ints, function(x) x^2)) [1] 385
13 Map only, No Reduce yet library(rmr2) Loading required package: Rcpp Loading required package: RJSONIO Loading required package: bitops Loading required package: digest Loading required package: functional Loading required package: stringr Loading required package: plyr Loading required package: reshape2 ints = to.dfs(1:10) squares = mapreduce( + input=ints, + map=function(k,v) cbind(v, v^2) + ) from.dfs(squares) $key NULL $val [1,] [2,] [3,] [4,] [5,] [6,] [7,] [8,] [9,] [10,] v
14 packagejobjar: [/tmp/rtmpr9tys5/rmr-local-env62ee3e53886e, /tmp/rtmpr9tys5/rmrglobal-env62ee751996c, /tmp/rtmpr9tys5/rmr-streaming-map62ee231197ff, /tmp/hadoopbrokaa/hadoop-unjar /] [] /tmp/streamjob jar tmpdir=null 13/11/21 06:18:23 WARN mapred.jobclient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 13/11/21 06:18:24 INFO mapred.fileinputformat: Total input paths to process : 1 13/11/21 06:18:24 INFO streaming.streamjob: getlocaldirs(): [/tmp/hadoopbrokaa/mapred/local] 13/11/21 06:18:24 INFO streaming.streamjob: Running job: job_ _ /11/21 06:18:24 INFO streaming.streamjob: To kill this job, run: 13/11/21 06:18:24 INFO streaming.streamjob: UNDEF/bin/hadoop job -Dmapred.job. tracker=name-0-1.local:8021 -kill job_ _ /11/21 06:18:24 INFO streaming.streamjob: Tracking URL: /jobdetails.jsp?jobid=job_ _ /11/21 06:18:25 INFO streaming.streamjob: map 0% reduce 0% 13/11/21 06:18:34 INFO streaming.streamjob: map 50% reduce 0% 13/11/21 06:18:36 INFO streaming.streamjob: map 100% reduce 0% 13/11/21 06:18:38 INFO streaming.streamjob: map 100% reduce 100% 13/11/21 06:18:38 INFO streaming.streamjob: Job complete: job_ _ /11/21 06:18:38 INFO streaming.streamjob: Output: /tmp/rtmpr9tys5/file62ee1f1f4715
15 MapReduce in Action input.size=10000 input.ga = to.dfs(cbind(1:input.size, rnorm(input.size))) group = function(x) x%%10 aggregate = function(x) sum(x) result = mapreduce( input.ga, map = function(k, v) keyval(group(v[,1]), v[,2]), reduce = function(k, vv) keyval(k, aggregate(vv)), combine = TRUE ) from.dfs(result) $key [1] $val [1] [10]
16 packagejobjar: [/tmp/rtmpr9tys5/rmr-local-env62ee790bc164, /tmp/rtmpr9tys5/rmr-globalenv62ee4e9d9a75, /tmp/rtmpr9tys5/rmr-streaming-map62ee10105eb4, /tmp/rtmpr9tys5/rmrstreaming-reduce62ee6a9746ba, /tmp/rtmpr9tys5/rmr-streaming-combine62ee5a41c721, /tmp/hadoop-brokaa/hadoop-unjar /] [] /tmp/streamjob jar tmpdir=null 13/11/21 06:31:54 WARN mapred.jobclient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 13/11/21 06:31:54 INFO mapred.fileinputformat: Total input paths to process : 1 13/11/21 06:31:55 INFO streaming.streamjob: getlocaldirs(): [/tmp/hadoopbrokaa/mapred/local] 13/11/21 06:31:55 INFO streaming.streamjob: Running job: job_ _ /11/21 06:31:55 INFO streaming.streamjob: To kill this job, run: 13/11/21 06:31:55 INFO streaming.streamjob: UNDEF/bin/hadoop job -Dmapred.job. tracker=name-0-1.local:8021 -kill job_ _ /11/21 06:31:55 INFO streaming.streamjob: Tracking URL: /jobdetails.jsp?jobid=job_ _ /11/21 06:31:56 INFO streaming.streamjob: map 0% reduce 0% 13/11/21 06:32:07 INFO streaming.streamjob: map 50% reduce 0% 13/11/21 06:32:12 INFO streaming.streamjob: map 100% reduce 0% 13/11/21 06:32:24 INFO streaming.streamjob: map 100% reduce 11% 13/11/21 06:32:25 INFO streaming.streamjob: map 100% reduce 33% 13/11/21 06:32:26 INFO streaming.streamjob: map 100% reduce 52% 13/11/21 06:32:27 INFO streaming.streamjob: map 100% reduce 70% 13/11/21 06:32:28 INFO streaming.streamjob: map 100% reduce 86% 13/11/21 06:32:29 INFO streaming.streamjob: map 100% reduce 100% 13/11/21 06:32:31 INFO streaming.streamjob: Job complete: job_ _ /11/21 06:32:31 INFO streaming.streamjob: Output: /tmp/rtmpr9tys5/file62ee21f87721
17
18
19
20
21
22
23
RHadoop Installation Guide for Red Hat Enterprise Linux
RHadoop Installation Guide for Red Hat Enterprise Linux Version 2.0.2 Update 2 Revolution R, Revolution R Enterprise, and Revolution Analytics are trademarks of Revolution Analytics. All other trademarks
More informationBIG DATA ANALYSIS USING RHADOOP
BIG DATA ANALYSIS USING RHADOOP HARISH D * ANUSHA M.S Dr. DAYA SAGAR K.V ECM & KLUNIVERSITY ECM & KLUNIVERSITY ECM & KLUNIVERSITY Abstract In this electronic age, increasing number of organizations are
More informationIntroduc)on to RHadoop Master s Degree in Informa1cs Engineering Master s Programme in ICT Innova1on: Data Science (EIT ICT Labs Master School)
Introduc)on to RHadoop Master s Degree in Informa1cs Engineering Master s Programme in ICT Innova1on: Data Science (EIT ICT Labs Master School) Academic Year 2015-2106 Contents Introduc1on to MapReduce
More informationRHadoop and MapR. Accessing Enterprise- Grade Hadoop from R. Version 2.0 (14.March.2014)
RHadoop and MapR Accessing Enterprise- Grade Hadoop from R Version 2.0 (14.March.2014) Table of Contents Introduction... 3 Environment... 3 R... 3 Special Installation Notes... 4 Install R... 5 Install
More informationTDWI 2013 Munich. Training - Using R for Business Intelligence in Big Data
TDWI 2013 Munich Training - Using R for Business Intelligence in Big Data Dr. rer. nat. Markus Schmidberger @cloudhpc markus.schmidberger@comsysto.com June 19th, 2013 TDWI 2013 Munich June 19th, 2013 1
More informationBig Data, beating the Skills Gap Using R with Hadoop
Big Data, beating the Skills Gap Using R with Hadoop Using R with Hadoop There are a number of R packages available that can interact with Hadoop, including: hive - Not to be confused with Apache Hive,
More informationTutorial - Big Data Analyses with R
Tutorial - Big Data Analyses with R O Reilly Strata Conference London Dr. rer. nat. Markus Schmidberger @cloudhpc markus.schmidberger@comsysto.com November 13th, 2013 M. Schmidberger Tutorial - Big Data
More informationCS 455 Spring 2015. Word Count Example
CS 455 Spring 2015 Word Count Example Before starting, make sure that you have HDFS and Yarn running, using sbin/start-dfs.sh and sbin/start-yarn.sh Download text copies of at least 3 books from Project
More informationINTEGRATING R AND HADOOP FOR BIG DATA ANALYSIS
INTEGRATING R AND HADOOP FOR BIG DATA ANALYSIS Bogdan Oancea "Nicolae Titulescu" University of Bucharest Raluca Mariana Dragoescu The Bucharest University of Economic Studies, BIG DATA The term big data
More informationUSING HDFS ON DISCOVERY CLUSTER TWO EXAMPLES - test1 and test2
USING HDFS ON DISCOVERY CLUSTER TWO EXAMPLES - test1 and test2 (Using HDFS on Discovery Cluster for Discovery Cluster Users email n.roy@neu.edu if you have questions or need more clarifications. Nilay
More informationTesting 3Vs (Volume, Variety and Velocity) of Big Data
Testing 3Vs (Volume, Variety and Velocity) of Big Data 1 A lot happens in the Digital World in 60 seconds 2 What is Big Data Big Data refers to data sets whose size is beyond the ability of commonly used
More informationParallelization in R, Revisited
April 17, 2012 The Brutal Truth We are here because we love R. Despite our enthusiasm, R has two major limitations, and some people may have a longer list. 1 Regardless of the number of cores on your CPU,
More informationDriving New Value from Big Data Investments
An Introduction to Using R with Hadoop Jeffrey Breen Principal, Think Big Academy jeffrey.breen@thinkbiganalytics.com http://www.thinkbigacademy.com/ Greater Boston user Group Cambridge, MA February 20,
More informationPackage hive. January 10, 2011
Package hive January 10, 2011 Version 0.1-9 Date 2011-01-09 Title Hadoop InteractiVE Description Hadoop InteractiVE, is an R extension facilitating distributed computing via the MapReduce paradigm. It
More informationSpring,2015. Apache Hive BY NATIA MAMAIASHVILI, LASHA AMASHUKELI & ALEKO CHAKHVASHVILI SUPERVAIZOR: PROF. NODAR MOMTSELIDZE
Spring,2015 Apache Hive BY NATIA MAMAIASHVILI, LASHA AMASHUKELI & ALEKO CHAKHVASHVILI SUPERVAIZOR: PROF. NODAR MOMTSELIDZE Contents: Briefly About Big Data Management What is hive? Hive Architecture Working
More informationPackage hive. July 3, 2015
Version 0.2-0 Date 2015-07-02 Title Hadoop InteractiVE Package hive July 3, 2015 Description Hadoop InteractiVE facilitates distributed computing via the MapReduce paradigm through R and Hadoop. An easy
More informationApache Sqoop. A Data Transfer Tool for Hadoop
Apache Sqoop A Data Transfer Tool for Hadoop Arvind Prabhakar, Cloudera Inc. Sept 21, 2011 What is Sqoop? Allows easy import and export of data from structured data stores: o Relational Database o Enterprise
More informationHadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN
Hadoop MPDL-Frühstück 9. Dezember 2013 MPDL INTERN Understanding Hadoop Understanding Hadoop What's Hadoop about? Apache Hadoop project (started 2008) downloadable open-source software library (current
More informationBig Data Analytics Using R
October 23, 2014 Table of contents BIG DATA DEFINITION 1 BIG DATA DEFINITION Definition Characteristics Scaling Challange 2 Divide and Conquer Amdahl s and Gustafson s Law Life experience Where to parallelize?
More informationThe MapReduce Framework
The MapReduce Framework Luke Tierney Department of Statistics & Actuarial Science University of Iowa November 8, 2007 Luke Tierney (U. of Iowa) The MapReduce Framework November 8, 2007 1 / 16 Background
More informationA bit about Hadoop. Luca Pireddu. March 9, 2012. CRS4Distributed Computing Group. luca.pireddu@crs4.it (CRS4) Luca Pireddu March 9, 2012 1 / 18
A bit about Hadoop Luca Pireddu CRS4Distributed Computing Group March 9, 2012 luca.pireddu@crs4.it (CRS4) Luca Pireddu March 9, 2012 1 / 18 Often seen problems Often seen problems Low parallelism I/O is
More informationOTN Developer Day: Oracle Big Data
OTN Developer Day: Oracle Big Data Hands On Lab Manual Oracle Big Data Connectors: Introduction to Oracle R Connector for Hadoop ORACLE R CONNECTOR FOR HADOOP 2.0 HANDS-ON LAB Introduction to Oracle R
More informationWROX Certified Big Data Analyst Program by AnalytixLabs and Wiley
WROX Certified Big Data Analyst Program by AnalytixLabs and Wiley Disclaimer: This material is protected under copyright act AnalytixLabs, 2011. Unauthorized use and/ or duplication of this material or
More informationPackage HadoopStreaming
Package HadoopStreaming February 19, 2015 Type Package Title Utilities for using R scripts in Hadoop streaming Version 0.2 Date 2009-09-28 Author David S. Rosenberg Maintainer
More informationHow To Write A Data Processing Pipeline In R
New features and old concepts for handling large and streaming data in practice Simon Urbanek R Foundation Overview Motivation Custom connections Data processing pipelines Parallel processing Back-end
More informationCOURSE CONTENT Big Data and Hadoop Training
COURSE CONTENT Big Data and Hadoop Training 1. Meet Hadoop Data! Data Storage and Analysis Comparison with Other Systems RDBMS Grid Computing Volunteer Computing A Brief History of Hadoop Apache Hadoop
More informationz/os Hybrid Batch Processing and Big Data Session zba07
Stephen Goetze Kirk Wolf Dovetailed Technologies, LLC z/os Hybrid Batch Processing and Big Data Session zba07 Wednesday May 14 th, 2014 10:30AM Technical University/Symposia materials may not be reproduced
More informationRunning Hadoop On Ubuntu Linux (Multi-Node Cluster) - Michael G...
Go Home About Contact Blog Code Publications DMOZ100k06 Photography Running Hadoop On Ubuntu Linux (Multi-Node Cluster) From Michael G. Noll Contents 1 What we want to do 2 Tutorial approach and structure
More informationITG Software Engineering
Introduction to Apache Hadoop Course ID: Page 1 Last Updated 12/15/2014 Introduction to Apache Hadoop Course Overview: This 5 day course introduces the student to the Hadoop architecture, file system,
More informationThe objective of this lab is to learn how to set up an environment for running distributed Hadoop applications.
Lab 9: Hadoop Development The objective of this lab is to learn how to set up an environment for running distributed Hadoop applications. Introduction Hadoop can be run in one of three modes: Standalone
More informationz/os Hybrid Batch Processing and Big Data
z/os Hybrid Batch Processing and Big Data Stephen Goetze Kirk Wolf Dovetailed Technologies, LLC Thursday, August 7, 2014: 1:30 PM-2:30 PM Session 15496 Insert Custom Session QR if Desired. www.dovetail.com
More informationHDInsight Essentials. Rajesh Nadipalli. Chapter No. 1 "Hadoop and HDInsight in a Heartbeat"
HDInsight Essentials Rajesh Nadipalli Chapter No. 1 "Hadoop and HDInsight in a Heartbeat" In this package, you will find: A Biography of the author of the book A preview chapter from the book, Chapter
More informationCOSC 6397 Big Data Analytics. 2 nd homework assignment Pig and Hive. Edgar Gabriel Spring 2015
COSC 6397 Big Data Analytics 2 nd homework assignment Pig and Hive Edgar Gabriel Spring 2015 2 nd Homework Rules Each student should deliver Source code (.java files) Documentation (.pdf,.doc,.tex or.txt
More informationHadoop Tutorial Group 7 - Tools For Big Data Indian Institute of Technology Bombay
Hadoop Tutorial Group 7 - Tools For Big Data Indian Institute of Technology Bombay Dipojjwal Ray Sandeep Prasad 1 Introduction In installation manual we listed out the steps for hadoop-1.0.3 and hadoop-
More informationA SMART ELEPHANT FOR A SMART-GRID: (ELECTRICAL) TIME-SERIES STORAGE AND ANALYTICS. EDF R&D SIGMA Project Marie-Luce Picard
A SMART ELEPHANT FOR A SMART-GRID: (ELECTRICAL) TIME-SERIES STORAGE AND ANALYTICS WITHIN HADOOP EDF R&D SIGMA Project Marie-Luce Picard Forum TERATEC June 26th 2013 OUTLINE 1. CONTEXT 2. A PROOF OF CONCEPT
More informationApplied Multivariate Analysis - Big data analytics
Applied Multivariate Analysis - Big data analytics Nathalie Villa-Vialaneix nathalie.villa@toulouse.inra.fr http://www.nathalievilla.org M1 in Economics and Economics and Statistics Toulouse School of
More information7 Deadly Hadoop Misconfigurations. Kathleen Ting February 2013
7 Deadly Hadoop Misconfigurations Kathleen Ting February 2013 Who Am I? Kathleen Ting Apache Sqoop Committer, PMC Member Customer Operations Engineering Mgr, Cloudera @kate_ting, kathleen@apache.org 2
More informationBig Data and Scripting map/reduce in Hadoop
Big Data and Scripting map/reduce in Hadoop 1, 2, parts of a Hadoop map/reduce implementation core framework provides customization via indivudual map and reduce functions e.g. implementation in mongodb
More informationSystems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2012/13
Systems Infrastructure for Data Science Web Science Group Uni Freiburg WS 2012/13 Hadoop Ecosystem Overview of this Lecture Module Background Google MapReduce The Hadoop Ecosystem Core components: Hadoop
More informationInstallation Guide Setting Up and Testing Hadoop on Mac By Ryan Tabora, Think Big Analytics
Installation Guide Setting Up and Testing Hadoop on Mac By Ryan Tabora, Think Big Analytics www.thinkbiganalytics.com 520 San Antonio Rd, Suite 210 Mt. View, CA 94040 (650) 949-2350 Table of Contents OVERVIEW
More informationHands-on Exercises with Big Data
Hands-on Exercises with Big Data Lab Sheet 1: Getting Started with MapReduce and Hadoop The aim of this exercise is to learn how to begin creating MapReduce programs using the Hadoop Java framework. In
More informationOLH: Oracle Loader for Hadoop OSCH: Oracle SQL Connector for Hadoop Distributed File System (HDFS)
Use Data from a Hadoop Cluster with Oracle Database Hands-On Lab Lab Structure Acronyms: OLH: Oracle Loader for Hadoop OSCH: Oracle SQL Connector for Hadoop Distributed File System (HDFS) All files are
More informationQsoft Inc www.qsoft-inc.com
Big Data & Hadoop Qsoft Inc www.qsoft-inc.com Course Topics 1 2 3 4 5 6 Week 1: Introduction to Big Data, Hadoop Architecture and HDFS Week 2: Setting up Hadoop Cluster Week 3: MapReduce Part 1 Week 4:
More informationThe Hadoop Eco System Shanghai Data Science Meetup
The Hadoop Eco System Shanghai Data Science Meetup Karthik Rajasethupathy, Christian Kuka 03.11.2015 @Agora Space Overview What is this talk about? Giving an overview of the Hadoop Ecosystem and related
More informationBig Data Operations Guide for Cloudera Manager v5.x Hadoop
Big Data Operations Guide for Cloudera Manager v5.x Hadoop Logging into the Enterprise Cloudera Manager 1. On the server where you have installed 'Cloudera Manager', make sure that the server is running,
More informationXiaoming Gao Hui Li Thilina Gunarathne
Xiaoming Gao Hui Li Thilina Gunarathne Outline HBase and Bigtable Storage HBase Use Cases HBase vs RDBMS Hands-on: Load CSV file to Hbase table with MapReduce Motivation Lots of Semi structured data Horizontal
More informationhadoop Running hadoop on Grid'5000 Vinicius Cogo vielmo@lasige.di.fc.ul.pt Marcelo Pasin pasin@di.fc.ul.pt Andrea Charão andrea@inf.ufsm.
hadoop Running hadoop on Grid'5000 Vinicius Cogo vielmo@lasige.di.fc.ul.pt Marcelo Pasin pasin@di.fc.ul.pt Andrea Charão andrea@inf.ufsm.br Outline 1 Introduction 2 MapReduce 3 Hadoop 4 How to Install
More informationScalable Forensics with TSK and Hadoop. Jon Stewart
Scalable Forensics with TSK and Hadoop Jon Stewart CPU Clock Speed Hard Drive Capacity The Problem CPU clock speed stopped doubling Hard drive capacity kept doubling Multicore CPUs to the rescue!...but
More informationMapReduce Job Processing
April 17, 2012 Background: Hadoop Distributed File System (HDFS) Hadoop requires a Distributed File System (DFS), we utilize the Hadoop Distributed File System (HDFS). Background: Hadoop Distributed File
More informationHadoop Tutorial. General Instructions
CS246: Mining Massive Datasets Winter 2016 Hadoop Tutorial Due 11:59pm January 12, 2016 General Instructions The purpose of this tutorial is (1) to get you started with Hadoop and (2) to get you acquainted
More informationData Analyst Program- 0 to 100
Development Data Analyst Program- 0 to 100 Master the Data Analysis tools like Pig and hive Data Science Build a recommendation engine 1 Data Analyst Program- 0 to 100 HADOOP SCHOOL OF TRAINING Basics
More informationESS event: Big Data in Official Statistics. Antonino Virgillito, Istat
ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web
More informationHadoop Distributed Filesystem. Spring 2015, X. Zhang Fordham Univ.
Hadoop Distributed Filesystem Spring 2015, X. Zhang Fordham Univ. MapReduce Programming Model Split Shuffle Input: a set of [key,value] pairs intermediate [key,value] pairs [k1,v11,v12, ] [k2,v21,v22,
More informationBig Data: Pig Latin. P.J. McBrien. Imperial College London. P.J. McBrien (Imperial College London) Big Data: Pig Latin 1 / 36
Big Data: Pig Latin P.J. McBrien Imperial College London P.J. McBrien (Imperial College London) Big Data: Pig Latin 1 / 36 Introduction Scale Up 1GB 1TB 1PB P.J. McBrien (Imperial College London) Big Data:
More informationIntroduction to Big Data Analysis with R
Introduction to Big Data Analysis with R Yung-Hsiang Huang National Center for High-performance Computing, Taiwan 2014/12/01 Agenda Big Data, Big Challenge Introduction to R Some R-Packages to Deal With
More informationIDS 561 Big data analytics Assignment 1
IDS 561 Big data analytics Assignment 1 Due Midnight, October 4th, 2015 General Instructions The purpose of this tutorial is (1) to get you started with Hadoop and (2) to get you acquainted with the code
More informationProcessing of massive data: MapReduce. 2. Hadoop. New Trends In Distributed Systems MSc Software and Systems
Processing of massive data: MapReduce 2. Hadoop 1 MapReduce Implementations Google were the first that applied MapReduce for big data analysis Their idea was introduced in their seminal paper MapReduce:
More informationBig Data : Experiments with Apache Hadoop and JBoss Community projects
Big Data : Experiments with Apache Hadoop and JBoss Community projects About the speaker Anil Saldhana is Lead Security Architect at JBoss. Founder of PicketBox and PicketLink. Interested in using Big
More informationSQL Server 2012 PDW. Ryan Simpson Technical Solution Professional PDW Microsoft. Microsoft SQL Server 2012 Parallel Data Warehouse
SQL Server 2012 PDW Ryan Simpson Technical Solution Professional PDW Microsoft Microsoft SQL Server 2012 Parallel Data Warehouse Massively Parallel Processing Platform Delivers Big Data HDFS Delivers Scale
More informationHow to properly misuse Hadoop. Marcel Huntemann NERSC tutorial session 2/12/13
How to properly misuse Hadoop Marcel Huntemann NERSC tutorial session 2/12/13 History Created by Doug Cutting (also creator of Apache Lucene). 2002 Origin in Apache Nutch (open source web search engine).
More informationHadoop (Hands On) Irene Finocchi and Emanuele Fusco
Hadoop (Hands On) Irene Finocchi and Emanuele Fusco Big Data Computing March 23, 2015. Master s Degree in Computer Science Academic Year 2014-2015, spring semester I.Finocchi and E.Fusco Hadoop (Hands
More informationThis material is built based on, Patterns covered in this class FILTERING PATTERNS. Filtering pattern
2/23/15 CS480 A2 Introduction to Big Data - Spring 2015 1 2/23/15 CS480 A2 Introduction to Big Data - Spring 2015 2 PART 0. INTRODUCTION TO BIG DATA PART 1. MAPREDUCE AND THE NEW SOFTWARE STACK 1. DISTRIBUTED
More informationSector vs. Hadoop. A Brief Comparison Between the Two Systems
Sector vs. Hadoop A Brief Comparison Between the Two Systems Background Sector is a relatively new system that is broadly comparable to Hadoop, and people want to know what are the differences. Is Sector
More informationUsing distributed technologies to analyze Big Data
Using distributed technologies to analyze Big Data Abhijit Sharma Innovation Lab BMC Software 1 Data Explosion in Data Center Performance / Time Series Data Incoming data rates ~Millions of data points/
More informationBig Data Rethink Algos and Architecture. Scott Marsh Manager R&D Personal Lines Auto Pricing
Big Data Rethink Algos and Architecture Scott Marsh Manager R&D Personal Lines Auto Pricing Agenda History Map Reduce Algorithms History Google talks about their solutions to their problems Map Reduce:
More informationHadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh
1 Hadoop: A Framework for Data- Intensive Distributed Computing CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 2 What is Hadoop? Hadoop is a software framework for distributed processing of large datasets
More informationExtending the Enterprise Data Warehouse with Hadoop Robert Lancaster. Nov 7, 2012
Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster Nov 7, 2012 Who I Am Robert Lancaster Solutions Architect, Hotel Supply Team rlancaster@orbitz.com @rob1lancaster Organizer of Chicago
More informationRecommended Literature for this Lecture
COSC 6339 Big Data Analytics Introduction to MapReduce (III) and 1 st homework assignment Edgar Gabriel Spring 2015 Recommended Literature for this Lecture Andrew Pavlo, Erik Paulson, Alexander Rasin,
More informationResearch Laboratory. Java Web Crawler & Hadoop MapReduce Anri Morchiladze && Bachana Dolidze Supervisor Nodar Momtselidze
Research Laboratory Java Web Crawler & Hadoop MapReduce Anri Morchiladze && Bachana Dolidze Supervisor Nodar Momtselidze 1. Java Web Crawler Description Java Code 2. MapReduce Overview Example of mapreduce
More informationBig Data Analytics Predicting Risk of Readmissions of Diabetic Patients
Big Data Analytics Predicting Risk of Readmissions of Diabetic Patients Saumya Salian 1, Dr. G. Harisekaran 2 1 SRM University, Department of Information and Technology, SRM Nagar, Chennai 603203, India
More informationIntroduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data
Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give
More informationTutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA
Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA http://kzhang6.people.uic.edu/tutorial/amcis2014.html August 7, 2014 Schedule I. Introduction to big data
More informationHadoop (pseudo-distributed) installation and configuration
Hadoop (pseudo-distributed) installation and configuration 1. Operating systems. Linux-based systems are preferred, e.g., Ubuntu or Mac OS X. 2. Install Java. For Linux, you should download JDK 8 under
More information5 HDFS - Hadoop Distributed System
5 HDFS - Hadoop Distributed System 5.1 Definition and Remarks HDFS is a file system designed for storing very large files with streaming data access patterns running on clusters of commoditive hardware.
More informationHadoop Shell Commands
Table of contents 1 DFShell... 3 2 cat...3 3 chgrp...3 4 chmod...3 5 chown...4 6 copyfromlocal... 4 7 copytolocal... 4 8 cp...4 9 du...4 10 dus... 5 11 expunge... 5 12 get... 5 13 getmerge... 5 14 ls...
More informationApache Flume and Apache Sqoop Data Ingestion to Apache Hadoop Clusters on VMware vsphere SOLUTION GUIDE
Apache Flume and Apache Sqoop Data Ingestion to Apache Hadoop Clusters on VMware vsphere SOLUTION GUIDE Table of Contents Apache Hadoop Deployment Using VMware vsphere Big Data Extensions.... 3 Big Data
More informationIntroduction to Hadoop
Introduction to Hadoop Miles Osborne School of Informatics University of Edinburgh miles@inf.ed.ac.uk October 28, 2010 Miles Osborne Introduction to Hadoop 1 Background Hadoop Programming Model Examples
More informationIntroduction to Hadoop
Introduction to Hadoop 1 What is Hadoop? the big data revolution extracting value from data cloud computing 2 Understanding MapReduce the word count problem more examples MCS 572 Lecture 24 Introduction
More informationHadoop Shell Commands
Table of contents 1 FS Shell...3 1.1 cat... 3 1.2 chgrp... 3 1.3 chmod... 3 1.4 chown... 4 1.5 copyfromlocal...4 1.6 copytolocal...4 1.7 cp... 4 1.8 du... 4 1.9 dus...5 1.10 expunge...5 1.11 get...5 1.12
More informationTIBCO ActiveMatrix BusinessWorks Plug-in for Big Data User s Guide
TIBCO ActiveMatrix BusinessWorks Plug-in for Big Data User s Guide Software Release 1.0 November 2013 Two-Second Advantage Important Information SOME TIBCO SOFTWARE EMBEDS OR BUNDLES OTHER TIBCO SOFTWARE.
More informationImpala Introduction. By: Matthew Bollinger
Impala Introduction By: Matthew Bollinger Note: This tutorial borrows heavily from Cloudera s provided Impala tutorial, located here. As such, it uses the Cloudera Quick Start VM, located here. The quick
More informationThis exam contains 13 pages (including this cover page) and 18 questions. Check to see if any pages are missing.
Big Data Processing 2013-2014 Q2 April 7, 2014 (Resit) Lecturer: Claudia Hauff Time Limit: 180 Minutes Name: Answer the questions in the spaces provided on this exam. If you run out of room for an answer,
More informationScalable Network Measurement Analysis with Hadoop. Taghrid Samak and Daniel Gunter Advanced Computing for Sciences, LBNL
Scalable Network Measurement Analysis with Hadoop Taghrid Samak and Daniel Gunter Advanced Computing for Sciences, LBNL Outline Motivation Hadoop overview Approach doing the right thing, Avro what worked,
More informationHadoop Tutorial GridKa School 2011
Hadoop Tutorial GridKa School 2011 Ahmad Hammad, Ariel García Karlsruhe Institute of Technology September 7, 2011 Abstract This tutorial intends to guide you through the basics of Data Intensive Computing
More informationBig Data Too Big To Ignore
Big Data Too Big To Ignore Geert! Big Data Consultant and Manager! Currently finishing a 3 rd Big Data project! IBM & Cloudera Certified! IBM & Microsoft Big Data Partner 2 Agenda! Defining Big Data! Introduction
More informationDice. David Watkins Emily Chen Khaled Atef Phillip Schiffrin. djw2146 ec2805 kaa2168 pjs2186. Manager System Architect Testing Language Guru
Dice David Watkins Emily Chen Khaled Atef Phillip Schiffrin djw2146 ec2805 kaa2168 pjs2186 Manager System Architect Testing Language Guru September 30 th, 2015 1 DESCRIPTION Dice is a distributed systems
More informationHadoopRDF : A Scalable RDF Data Analysis System
HadoopRDF : A Scalable RDF Data Analysis System Yuan Tian 1, Jinhang DU 1, Haofen Wang 1, Yuan Ni 2, and Yong Yu 1 1 Shanghai Jiao Tong University, Shanghai, China {tian,dujh,whfcarter}@apex.sjtu.edu.cn
More informationAnalysis of Web Archives. Vinay Goel Senior Data Engineer
Analysis of Web Archives Vinay Goel Senior Data Engineer Internet Archive Established in 1996 501(c)(3) non profit organization 20+ PB (compressed) of publicly accessible archival material Technology partner
More informationCS242 PROJECT. Presented by Moloud Shahbazi Spring 2015
CS242 PROJECT Presented by Moloud Shahbazi Spring 2015 AGENDA Project Overview Data Collection Indexing Big Data Processing PROJECT- PART1 1.1 Data Collection: 5G < data size < 10G Deliverables: Document
More informationpython hadoop pig October 29, 2015
python hadoop pig October 29, 2015 1 Python Hadoop Pig This notebook aims at showing how to submit a PIG job to remote hadoop cluster (tested with Cloudera). It works better if you know Hadoop otherwise
More informationLecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop
Lecture 32 Big Data 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop 1 2 Big Data Problems Data explosion Data from users on social
More informationDistributed Computing and Hadoop in Statistics
Distributed Computing and Hadoop in Statistics Xiaoling Lu and Bing Zheng Center For Applied Statistics, Renmin University of China, Beijing, China Corresponding author: Xiaoling Lu, e-mail: xiaolinglu@ruc.edu.cn
More informationAssignment 1: MapReduce with Hadoop
Assignment 1: MapReduce with Hadoop Jean-Pierre Lozi January 24, 2015 Provided files following URL: An archive that contains all files you will need for this assignment can be found at the http://sfu.ca/~jlozi/cmpt732/assignment1.tar.gz
More informationIntroduction to Hadoop
1 What is Hadoop? Introduction to Hadoop We are living in an era where large volumes of data are available and the problem is to extract meaning from the data avalanche. The goal of the software tools
More informationHow To Write A Mapreduce Program On An Ipad Or Ipad (For Free)
Course NDBI040: Big Data Management and NoSQL Databases Practice 01: MapReduce Martin Svoboda Faculty of Mathematics and Physics, Charles University in Prague MapReduce: Overview MapReduce Programming
More informationVirtual Machine (VM) For Hadoop Training
2012 coreservlets.com and Dima May Virtual Machine (VM) For Hadoop Training Originals of slides and source code for examples: http://www.coreservlets.com/hadoop-tutorial/ Also see the customized Hadoop
More informationLecture 10 - Functional programming: Hadoop and MapReduce
Lecture 10 - Functional programming: Hadoop and MapReduce Sohan Dharmaraja Sohan Dharmaraja Lecture 10 - Functional programming: Hadoop and MapReduce 1 / 41 For today Big Data and Text analytics Functional
More informationA very short talk about Apache Kylin Business Intelligence meets Big Data. Fabian Wilckens EMEA Solutions Architect
A very short talk about Apache Kylin Business Intelligence meets Big Data Fabian Wilckens EMEA Solutions Architect 1 The challenge today 2 Very quickly: OLAP Online Analytical Processing How many beers
More informationHadoop Hands-On Exercises
Hadoop Hands-On Exercises Lawrence Berkeley National Lab July 2011 We will Training accounts/user Agreement forms Test access to carver HDFS commands Monitoring Run the word count example Simple streaming
More information