To reduce or not to reduce, that is the question
|
|
|
- Tracey Sparks
- 10 years ago
- Views:
Transcription
1 To reduce or not to reduce, that is the question 1 Running jobs on the Hadoop cluster For part 1 of assignment 8, you should have gotten the word counting example from class compiling. To start with, let s walkthrough the process of running code you ve written on the hadoop cluster. 1. Get the code compiled - If you re working on a lab machine: Eclipse already compiles the code for you, so you just need to cd into the bin directory of the project directory within the Eclipse workspace folder where the WordCount code resides. If you re don t have separate src and bin directories 1 then just cd into the base of the project directory. - If you re working on your own computer: * Go to the src directory of the project directory within the Eclipse workspace folder where the WordCount code resides. If you re don t have separate src and bin directories 1 then just cd into the base of the project directory. * Use scp or whatever approach you use to copy files to basin and copy all of the source files in their package directory structure (in this case, demos) to a directory on basin. 1 There s an option to select this when you setup a new project. In general, it s a good idea to do this. 1
2 2. Create a jar file * Login to basin and then cd to the folder where you copied the source files. Compile your code from the command-line: javac cp / usr / l o c a l /hadoop/hadoop core j a r :. demos /. java In this case, there is just one package and one file, however, in general, you will need to list all of the packages that have files you want to compile at the end of the javac call. Hadoop requires that you generate a jar file with all of your code to run it. To create a jar file, make sure you re in the base directory where you just compiled your code (or where Eclipse compiled it for you) and type the following: j a r c v f demos. j a r demos /. c l a s s The first argument is the name of the jar file to create and then everything else following is what should go into the jar file. In general, you will need to list all of the packages that have files you want to include at the end of the call to jar 3. Running your code on the cluster Now that you have a jar file you can run your code on the cluster. When running hadoop programs you use the jar flag to specify the jar file that contains the code and then you specify which classes main class to run (including the package structure): hadoop j a r demos. j a r demos. WordCount You should see the usage statement printed out: WordCount <input_dir> <output_dir> Let s run the full job with some very simple data: hadoop j a r demos. j a r demos. WordCount / user /dkauchak/ lab / output This uses some data in my directory with just a couple of files as input and will output the results to your own output directory. Take a look in the output directory and you should see a single file called part If you look at the contents of this file you should see:! 1. 1 a 1 also 2 and 2 another 1 banana 1 but 1 file 3 2
3 has 2 is 2 more 1 some 2 text 2 the 1 this 3 with 1 word 1 words 2 2 WordCount variations Now that we can compile and run our hadoop programs, let s try out a few variants of our WordCount program. NoOpReducer As I mentioned in class, one of the best ways to debug your program (and to develop it incrementally) is to use a reducer that simply copies its input to the output. In the demos directory on the course web page I have included the NoOpReducer.java class. Download this file and put it into your Eclipse project. Now, change the line in the run that sets the reducer to be: conf.setreducerclass(noopreducer.class); Recompile your code and then run this variant on the cluster. A few thoughts on this: - Don t forget to delete your output directory before you run it. Otherwise, you re going to get an exception. - Both for this lab and when you re writing your own programs, you want to make the process of compiling, copying the files, generating the jar, etc. as simple as possible. My advice is to setup a few terminal windows and designate each window as doing a few commands. You can then simply use the up-arrow to rerun these commands for each variation. For example, my setup is: 1. one terminal session for copying over the.java files 2. one terminal session for compiling and creating the jar file 3. one terminal session for deleting old output directory, executing hadoop command and peeking at output 4. one terminal session for doing more involved things with the hdfs, etc. 3
4 It can take a minute or two to setup, but I can go from a code change on my laptop to running on the hadoop cluster in about 10 seconds. Assuming this runs correctly, you should see something like:! 1. 1 a 1 also 1 also 1 and 1 and 1 another 1 banana 1 but 1 file 1 file 1 file 1 has 1 has 1 is 1 is 1 more 1 some 1 some 1 text 1 text 1 the 1 this 1 this 1 this 1 with 1 word 1 words 1 words 1 In addition to just a map and reduce phase, hadoop also allows for a combiner phase. A combiner implements the reducer interface and runs in between the map phase and the reduce phase. However, there s a catch! Let s run the WordCount example with a combiner and see if you can figure out what the catch is. Uncomment the line conf.setcombinerclass(wordcountreducer.class); in WordCount.java, recompile, etc. and then rerun on the hadoop cluster. What is the combiner doing? 4
5 (Really think about it for a minute! Scroll down when you re ready for the answer.) 5
6 The combiner phase is often referred to as a mini-reduce phase. It does run a full reduce (notice above we just used our WordCountReducer), however, it only runs it locally on the same machine that the map process was run. This means that it only reduces on some of the data. Why might this be useful? Why is this useful for the word count example? 3 Inverted Index Also in the demos directory online I have included another of the classic MapReduce examples called LineIndexer. Download this file into the demos package in your Eclipse workspace and look through the code to see if you can figure out what it does. The LineIndexer example includes one thing we haven t seen before: the use of the Reporter object. Here we re using it to get the split that we re working on and then using that to get the particular filename. Once you think you ve figure out how it works, set it up to run. To run this program, you ll need to change your call to hadoop to: hadoop j a r demos. j a r demos. LineIndexer Look at the output. Did it do what you expected? This function generates what is called an inverted index, that is a mapping from words to the documents that they occur in. This is probably one of the most important structures that enables web search (and many index-based search techniques). This was one of the early motivating examples that led to the development of the MapReduce framework by Google. A few things to think about: Could we use the LineIndexerReducer as a combiner for this example? Try it out and see if you re right. If you don t think you can, you should either get the wrong answer, or more likely, an error. If you do think you can, you should get the same output as before. In this example I have used a HashSet to filter out repeated entries. While this works, it does require populating and clearing the hash set each time reduce is called. If the list of documents that a word occurs in gets very large, this can be problematic (think web scale). Can you think of an alternative using the MapReduce framework (Hint: it involves using a two phase MapReduce.)? 4 grep grep is a great command-line utility that allows you to search for text (and even regular expressions) in files. If the text is found in a line in the file (or the regex matches), that entire line is printed out. grep is very fast, however, if you had a very large file (or a number of very large files) it could still take a long time. 6
7 In the demos directory online, I have included a basic version of grep implemented on MapReduce. Download it and take a look at it. See if you can figure out how it works. There s one new thing for this program and that is passing information to map function. How is this done? The configure function is called whenever a mapper/reducer is constructed. Run the program on the cluster. Again, to do this you ll have to call hadoop jar demos.jar and then tell it to run the grep example. Notice this example take three command-line arguments (try searching for text or banana ). Assuming all goes well, you should get an answer back. What are the numbers that get printed before the lines? Here are a few things to try: Can you use the reducer as a combiner? If so, try it out :) Modify the code so that it prints out both the number as well as the filename. Can you get it to group the results by filename? Modify the code to take a regular expression to match, not just a word. java.util.regex should be useful. To pass a regular expression as a parameter on the command-line you ll probably need to wrap it in quotes. 7
Basic Hadoop Programming Skills
Basic Hadoop Programming Skills Basic commands of Ubuntu Open file explorer Basic commands of Ubuntu Open terminal Basic commands of Ubuntu Open new tabs in terminal Typically, one tab for compiling source
Hadoop Tutorial. General Instructions
CS246: Mining Massive Datasets Winter 2016 Hadoop Tutorial Due 11:59pm January 12, 2016 General Instructions The purpose of this tutorial is (1) to get you started with Hadoop and (2) to get you acquainted
MapReduce. Tushar B. Kute, http://tusharkute.com
MapReduce Tushar B. Kute, http://tusharkute.com What is MapReduce? MapReduce is a framework using which we can write applications to process huge amounts of data, in parallel, on large clusters of commodity
IDS 561 Big data analytics Assignment 1
IDS 561 Big data analytics Assignment 1 Due Midnight, October 4th, 2015 General Instructions The purpose of this tutorial is (1) to get you started with Hadoop and (2) to get you acquainted with the code
Running Hadoop on Windows CCNP Server
Running Hadoop at Stirling Kevin Swingler Summary The Hadoopserver in CS @ Stirling A quick intoduction to Unix commands Getting files in and out Compliing your Java Submit a HadoopJob Monitor your jobs
Extreme computing lab exercises Session one
Extreme computing lab exercises Session one Michail Basios ([email protected]) Stratis Viglas ([email protected]) 1 Getting started First you need to access the machine where you will be doing all
TP1: Getting Started with Hadoop
TP1: Getting Started with Hadoop Alexandru Costan MapReduce has emerged as a leading programming model for data-intensive computing. It was originally proposed by Google to simplify development of web
Research Laboratory. Java Web Crawler & Hadoop MapReduce Anri Morchiladze && Bachana Dolidze Supervisor Nodar Momtselidze
Research Laboratory Java Web Crawler & Hadoop MapReduce Anri Morchiladze && Bachana Dolidze Supervisor Nodar Momtselidze 1. Java Web Crawler Description Java Code 2. MapReduce Overview Example of mapreduce
Hadoop Lab Notes. Nicola Tonellotto November 15, 2010
Hadoop Lab Notes Nicola Tonellotto November 15, 2010 2 Contents 1 Hadoop Setup 4 1.1 Prerequisites........................................... 4 1.2 Installation............................................
Hadoop Installation MapReduce Examples Jake Karnes
Big Data Management Hadoop Installation MapReduce Examples Jake Karnes These slides are based on materials / slides from Cloudera.com Amazon.com Prof. P. Zadrozny's Slides Prerequistes You must have an
CS380 Final Project Evaluating the Scalability of Hadoop in a Real and Virtual Environment
CS380 Final Project Evaluating the Scalability of Hadoop in a Real and Virtual Environment James Devine December 15, 2008 Abstract Mapreduce has been a very successful computational technique that has
Hadoop Basics with InfoSphere BigInsights
An IBM Proof of Technology Hadoop Basics with InfoSphere BigInsights Part: 1 Exploring Hadoop Distributed File System An IBM Proof of Technology Catalog Number Copyright IBM Corporation, 2013 US Government
Using BAC Hadoop Cluster
Using BAC Hadoop Cluster Bodhisatta Barman Roy January 16, 2015 1 Contents 1 Introduction 3 2 Daemon locations 4 3 Pre-requisites 5 4 Setting up 6 4.1 Using a Linux Virtual Machine................... 6
How To Use Hadoop
Hadoop in Action Justin Quan March 15, 2011 Poll What s to come Overview of Hadoop for the uninitiated How does Hadoop work? How do I use Hadoop? How do I get started? Final Thoughts Key Take Aways Hadoop
Hadoop (pseudo-distributed) installation and configuration
Hadoop (pseudo-distributed) installation and configuration 1. Operating systems. Linux-based systems are preferred, e.g., Ubuntu or Mac OS X. 2. Install Java. For Linux, you should download JDK 8 under
Hadoop Basics with InfoSphere BigInsights
An IBM Proof of Technology Hadoop Basics with InfoSphere BigInsights Unit 2: Using MapReduce An IBM Proof of Technology Catalog Number Copyright IBM Corporation, 2013 US Government Users Restricted Rights
Hadoop Tutorial Group 7 - Tools For Big Data Indian Institute of Technology Bombay
Hadoop Tutorial Group 7 - Tools For Big Data Indian Institute of Technology Bombay Dipojjwal Ray Sandeep Prasad 1 Introduction In installation manual we listed out the steps for hadoop-1.0.3 and hadoop-
CS 378 Big Data Programming. Lecture 2 Map- Reduce
CS 378 Big Data Programming Lecture 2 Map- Reduce MapReduce Large data sets are not new What characterizes a problem suitable for MR? Most or all of the data is processed But viewed in small increments
Java Language Tools COPYRIGHTED MATERIAL. Part 1. In this part...
Part 1 Java Language Tools This beginning, ground-level part presents reference information for setting up the Java development environment and for compiling and running Java programs. This includes downloading
CS 378 Big Data Programming
CS 378 Big Data Programming Lecture 2 Map- Reduce CS 378 - Fall 2015 Big Data Programming 1 MapReduce Large data sets are not new What characterizes a problem suitable for MR? Most or all of the data is
Hands-on Exercises with Big Data
Hands-on Exercises with Big Data Lab Sheet 1: Getting Started with MapReduce and Hadoop The aim of this exercise is to learn how to begin creating MapReduce programs using the Hadoop Java framework. In
Cloudera Distributed Hadoop (CDH) Installation and Configuration on Virtual Box
Cloudera Distributed Hadoop (CDH) Installation and Configuration on Virtual Box By Kavya Mugadur W1014808 1 Table of contents 1.What is CDH? 2. Hadoop Basics 3. Ways to install CDH 4. Installation and
Cloud Computing. Chapter 8. 8.1 Hadoop
Chapter 8 Cloud Computing In cloud computing, the idea is that a large corporation that has many computers could sell time on them, for example to make profitable use of excess capacity. The typical customer
Version Control with. Ben Morgan
Version Control with Ben Morgan Developer Workflow Log what we did: Add foo support Edit Sources Add Files Compile and Test Logbook ======= 1. Initial version Logbook ======= 1. Initial version 2. Remove
The objective of this lab is to learn how to set up an environment for running distributed Hadoop applications.
Lab 9: Hadoop Development The objective of this lab is to learn how to set up an environment for running distributed Hadoop applications. Introduction Hadoop can be run in one of three modes: Standalone
Hadoop Setup. 1 Cluster
In order to use HadoopUnit (described in Sect. 3.3.3), a Hadoop cluster needs to be setup. This cluster can be setup manually with physical machines in a local environment, or in the cloud. Creating a
Hadoop Training Hands On Exercise
Hadoop Training Hands On Exercise 1. Getting started: Step 1: Download and Install the Vmware player - Download the VMware- player- 5.0.1-894247.zip and unzip it on your windows machine - Click the exe
Yahoo! Grid Services Where Grid Computing at Yahoo! is Today
Yahoo! Grid Services Where Grid Computing at Yahoo! is Today Marco Nicosia Grid Services Operations [email protected] What is Apache Hadoop? Distributed File System and Map-Reduce programming platform
How To Install Hadoop 1.2.1.1 From Apa Hadoop 1.3.2 To 1.4.2 (Hadoop)
Contents Download and install Java JDK... 1 Download the Hadoop tar ball... 1 Update $HOME/.bashrc... 3 Configuration of Hadoop in Pseudo Distributed Mode... 4 Format the newly created cluster to create
cloud-kepler Documentation
cloud-kepler Documentation Release 1.2 Scott Fleming, Andrea Zonca, Jack Flowers, Peter McCullough, El July 31, 2014 Contents 1 System configuration 3 1.1 Python and Virtualenv setup.......................................
Hadoop Hands-On Exercises
Hadoop Hands-On Exercises Lawrence Berkeley National Lab Oct 2011 We will Training accounts/user Agreement forms Test access to carver HDFS commands Monitoring Run the word count example Simple streaming
A. Aiken & K. Olukotun PA3
Programming Assignment #3 Hadoop N-Gram Due Tue, Feb 18, 11:59PM In this programming assignment you will use Hadoop s implementation of MapReduce to search Wikipedia. This is not a course in search, so
Introduction to Cloud Computing
Introduction to Cloud Computing Qloud Demonstration 15 319, spring 2010 3 rd Lecture, Jan 19 th Suhail Rehman Time to check out the Qloud! Enough Talk! Time for some Action! Finally you can have your own
Tutorial- Counting Words in File(s) using MapReduce
Tutorial- Counting Words in File(s) using MapReduce 1 Overview This document serves as a tutorial to setup and run a simple application in Hadoop MapReduce framework. A job in Hadoop MapReduce usually
Hadoop 2.2.0 MultiNode Cluster Setup
Hadoop 2.2.0 MultiNode Cluster Setup Sunil Raiyani Jayam Modi June 7, 2014 Sunil Raiyani Jayam Modi Hadoop 2.2.0 MultiNode Cluster Setup June 7, 2014 1 / 14 Outline 4 Starting Daemons 1 Pre-Requisites
CS 378 Big Data Programming. Lecture 5 Summariza9on Pa:erns
CS 378 Big Data Programming Lecture 5 Summariza9on Pa:erns Review Assignment 2 Ques9ons? If you d like to use guava (Google collec9ons classes) pom.xml available for assignment 2 Includes dependency for
SparkLab May 2015 An Introduction to
SparkLab May 2015 An Introduction to & Apostolos N. Papadopoulos Assistant Professor Data Engineering Lab, Department of Informatics, Aristotle University of Thessaloniki Abstract Welcome to SparkLab!
HADOOP. Installation and Deployment of a Single Node on a Linux System. Presented by: Liv Nguekap And Garrett Poppe
HADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap And Garrett Poppe Topics Create hadoopuser and group Edit sudoers Set up SSH Install JDK Install Hadoop Editting
CSE 344 Introduction to Data Management. Section 9: AWS, Hadoop, Pig Latin TA: Yi-Shu Wei
CSE 344 Introduction to Data Management Section 9: AWS, Hadoop, Pig Latin TA: Yi-Shu Wei Homework 8 Big Data analysis on billion triple dataset using Amazon Web Service (AWS) Billion Triple Set: contains
S. Bouzefrane. How to set up the Java Card development environment under Windows? Samia Bouzefrane. [email protected]
How to set up the Java Card development environment under Windows? Samia Bouzefrane [email protected] 1 Java Card Classic Edition- August 2012 I. Development tools I.1. Hardware 1. A Java Card platform
CS2510 Computer Operating Systems Hadoop Examples Guide
CS2510 Computer Operating Systems Hadoop Examples Guide The main objective of this document is to acquire some faimiliarity with the MapReduce and Hadoop computational model and distributed file system.
WA1826 Designing Cloud Computing Solutions. Classroom Setup Guide. Web Age Solutions Inc. Copyright Web Age Solutions Inc. 1
WA1826 Designing Cloud Computing Solutions Classroom Setup Guide Web Age Solutions Inc. Copyright Web Age Solutions Inc. 1 Table of Contents Part 1 - Minimum Hardware Requirements...3 Part 2 - Minimum
Virtual Machine (VM) For Hadoop Training
2012 coreservlets.com and Dima May Virtual Machine (VM) For Hadoop Training Originals of slides and source code for examples: http://www.coreservlets.com/hadoop-tutorial/ Also see the customized Hadoop
USING HDFS ON DISCOVERY CLUSTER TWO EXAMPLES - test1 and test2
USING HDFS ON DISCOVERY CLUSTER TWO EXAMPLES - test1 and test2 (Using HDFS on Discovery Cluster for Discovery Cluster Users email [email protected] if you have questions or need more clarifications. Nilay
Netbeans IDE Tutorial for using the Weka API
Netbeans IDE Tutorial for using the Weka API Kevin Amaral University of Massachusetts Boston First, download Netbeans packaged with the JDK from Oracle. http://www.oracle.com/technetwork/java/javase/downloads/jdk-7-netbeans-download-
Workshop on Hadoop with Big Data
Workshop on Hadoop with Big Data Hadoop? Apache Hadoop is an open source framework for distributed storage and processing of large sets of data on commodity hardware. Hadoop enables businesses to quickly
Hadoop Hands-On Exercises
Hadoop Hands-On Exercises Lawrence Berkeley National Lab July 2011 We will Training accounts/user Agreement forms Test access to carver HDFS commands Monitoring Run the word count example Simple streaming
CS 145: NoSQL Activity Stanford University, Fall 2015 A Quick Introdution to Redis
CS 145: NoSQL Activity Stanford University, Fall 2015 A Quick Introdution to Redis For this assignment, compile your answers on a separate pdf to submit and verify that they work using Redis. Installing
CS 103 Lab Linux and Virtual Machines
1 Introduction In this lab you will login to your Linux VM and write your first C/C++ program, compile it, and then execute it. 2 What you will learn In this lab you will learn the basic commands and navigation
Lecture 2 (08/31, 09/02, 09/09): Hadoop. Decisions, Operations & Information Technologies Robert H. Smith School of Business Fall, 2015
Lecture 2 (08/31, 09/02, 09/09): Hadoop Decisions, Operations & Information Technologies Robert H. Smith School of Business Fall, 2015 K. Zhang BUDT 758 What we ll cover Overview Architecture o Hadoop
Android: Setup Hello, World: Android Edition. due by noon ET on Wed 2/22. Ingredients.
Android: Setup Hello, World: Android Edition due by noon ET on Wed 2/22 Ingredients. Android Development Tools Plugin for Eclipse Android Software Development Kit Eclipse Java Help. Help is available throughout
Single Node Hadoop Cluster Setup
Single Node Hadoop Cluster Setup This document describes how to create Hadoop Single Node cluster in just 30 Minutes on Amazon EC2 cloud. You will learn following topics. Click Here to watch these steps
Hypercosm. Studio. www.hypercosm.com
Hypercosm Studio www.hypercosm.com Hypercosm Studio Guide 3 Revision: November 2005 Copyright 2005 Hypercosm LLC All rights reserved. Hypercosm, OMAR, Hypercosm 3D Player, and Hypercosm Studio are trademarks
Extreme computing lab exercises Session one
Extreme computing lab exercises Session one Miles Osborne (original: Sasa Petrovic) October 23, 2012 1 Getting started First you need to access the machine where you will be doing all the work. Do this
map/reduce connected components
1, map/reduce connected components find connected components with analogous algorithm: map edges randomly to partitions (k subgraphs of n nodes) for each partition remove edges, so that only tree remains
How To Write A Mapreduce Program On An Ipad Or Ipad (For Free)
Course NDBI040: Big Data Management and NoSQL Databases Practice 01: MapReduce Martin Svoboda Faculty of Mathematics and Physics, Charles University in Prague MapReduce: Overview MapReduce Programming
Copy the.jar file into the plugins/ subfolder of your Eclipse installation. (e.g., C:\Program Files\Eclipse\plugins)
Beijing Codelab 1 Introduction to the Hadoop Environment Spinnaker Labs, Inc. Contains materials Copyright 2007 University of Washington, licensed under the Creative Commons Attribution 3.0 License --
Penetration Testing Lab. Reconnaissance and Mapping Using Samurai-2.0
Penetration Testing Lab Reconnaissance and Mapping Using Samurai-2.0 Notes: 1. Be careful about running most of these tools against machines without permission. Even the poorest intrusion detection system
OLH: Oracle Loader for Hadoop OSCH: Oracle SQL Connector for Hadoop Distributed File System (HDFS)
Use Data from a Hadoop Cluster with Oracle Database Hands-On Lab Lab Structure Acronyms: OLH: Oracle Loader for Hadoop OSCH: Oracle SQL Connector for Hadoop Distributed File System (HDFS) All files are
Assignment 1: MapReduce with Hadoop
Assignment 1: MapReduce with Hadoop Jean-Pierre Lozi January 24, 2015 Provided files following URL: An archive that contains all files you will need for this assignment can be found at the http://sfu.ca/~jlozi/cmpt732/assignment1.tar.gz
Linux Overview. Local facilities. Linux commands. The vi (gvim) editor
Linux Overview Local facilities Linux commands The vi (gvim) editor MobiLan This system consists of a number of laptop computers (Windows) connected to a wireless Local Area Network. You need to be careful
Hadoop Data Warehouse Manual
Ruben Vervaeke & Jonas Lesy 1 Hadoop Data Warehouse Manual To start off, we d like to advise you to read the thesis written about this project before applying any changes to the setup! The thesis can be
Week 2 Practical Objects and Turtles
Week 2 Practical Objects and Turtles Aims and Objectives Your aim in this practical is: to practise the creation and use of objects in Java By the end of this practical you should be able to: create objects
Hadoop 只 支 援 用 Java 開 發 嘛? Is Hadoop only support Java? 總 不 能 全 部 都 重 新 設 計 吧? 如 何 與 舊 系 統 相 容? Can Hadoop work with existing software?
Hadoop 只 支 援 用 Java 開 發 嘛? Is Hadoop only support Java? 總 不 能 全 部 都 重 新 設 計 吧? 如 何 與 舊 系 統 相 容? Can Hadoop work with existing software? 可 以 跟 資 料 庫 結 合 嘛? Can Hadoop work with Databases? 開 發 者 們 有 聽 到
Java Client Side Application Basics: Decompiling, Recompiling and Signing
Java Client Side Application Basics: Decompiling, Recompiling and Signing Written By: Brad Antoniewicz [email protected] Introduction... 3 Java Web Start and JNLP... 3 Java Archives and META-INF...
Setup Hadoop On Ubuntu Linux. ---Multi-Node Cluster
Setup Hadoop On Ubuntu Linux ---Multi-Node Cluster We have installed the JDK and Hadoop for you. The JAVA_HOME is /usr/lib/jvm/java/jdk1.6.0_22 The Hadoop home is /home/user/hadoop-0.20.2 1. Network Edit
Single Node Setup. Table of contents
Table of contents 1 Purpose... 2 2 Prerequisites...2 2.1 Supported Platforms...2 2.2 Required Software... 2 2.3 Installing Software...2 3 Download...2 4 Prepare to Start the Hadoop Cluster... 3 5 Standalone
IDS and Penetration Testing Lab III Snort Lab
IDS and Penetration Testing Lab III Snort Lab Purpose: In this lab, we will explore a common free Intrusion Detection System called Snort. Snort was written initially for Linux/Unix, but most functionality
IBM Software Hadoop Fundamentals
Hadoop Fundamentals Unit 2: Hadoop Architecture Copyright IBM Corporation, 2014 US Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
A Short Introduction to Writing Java Code. Zoltán Majó
A Short Introduction to Writing Java Code Zoltán Majó Outline Simple Application: Hello World Compiling Programs Manually Using an IDE Useful Resources Outline Simple Application: Hello World Compiling
Network Security, ISA 656, Angelos Stavrou. Snort Lab
Snort Lab Purpose: In this lab, we will explore a common free Intrusion Detection System called Snort. Snort was written initially for Linux/Unix, but most functionality is now available in Windows. In
Creating a Java application using Perfect Developer and the Java Develo...
1 of 10 15/02/2010 17:41 Creating a Java application using Perfect Developer and the Java Development Kit Introduction Perfect Developer has the facility to execute pre- and post-build steps whenever the
Recover Data Like a Forensics Expert Using an Ubuntu Live CD
Recover Data Like a Forensics Expert Using an Ubuntu Live CD There are lots of utilities to recover deleted files, but what if you can t boot up your computer, or the whole drive has been formatted? We
Supplement I.C. Creating, Compiling and Running Java Programs from the Command Window
Supplement I.C Creating, Compiling and Running Java Programs from the Command Window For Introduction to Java Programming By Y. Daniel Liang This supplement covers the following topics: Opening a Command
Project 5 Twitter Analyzer Due: Fri. 2015-12-11 11:59:59 pm
Project 5 Twitter Analyzer Due: Fri. 2015-12-11 11:59:59 pm Goal. In this project you will use Hadoop to build a tool for processing sets of Twitter posts (i.e. tweets) and determining which people, tweets,
A Java Crib Sheet. First: Find the Command Line
A Java Crib Sheet Unlike JavaScript, which is pretty much ready-to-go on any computer with a modern Web browser, Java might be a more complex affair However, logging some time with Java can be fairly valuable,
Step 4: Configure a new Hadoop server This perspective will add a new snap-in to your bottom pane (along with Problems and Tasks), like so:
Codelab 1 Introduction to the Hadoop Environment (version 0.17.0) Goals: 1. Set up and familiarize yourself with the Eclipse plugin 2. Run and understand a word counting program Setting up Eclipse: Step
What We Can Do in the Cloud (2) -Tutorial for Cloud Computing Course- Mikael Fernandus Simalango WISE Research Lab Ajou University, South Korea
What We Can Do in the Cloud (2) -Tutorial for Cloud Computing Course- Mikael Fernandus Simalango WISE Research Lab Ajou University, South Korea Overview Riding Google App Engine Taming Hadoop Summary Riding
Introduction to Running Hadoop on the High Performance Clusters at the Center for Computational Research
Introduction to Running Hadoop on the High Performance Clusters at the Center for Computational Research Cynthia Cornelius Center for Computational Research University at Buffalo, SUNY 701 Ellicott St
UNIVERSITY OF CALIFORNIA Department of Electrical Engineering and Computer Sciences Computer Science Division. P. N. Hilfinger
UNIVERSITY OF CALIFORNIA Department of Electrical Engineering and Computer Sciences Computer Science Division CS 61B Fall 2012 P. N. Hilfinger Version Control and Homework Submission 1 Introduction Your
Manual for BEAR Big Data Ensemble of Adaptations for Regression Version 1.0
Manual for BEAR Big Data Ensemble of Adaptations for Regression Version 1.0 Vahid Jalali David Leake August 9, 2015 Abstract BEAR is a case-based regression learner tailored for big data processing. It
Code::Blocks Student Manual
Code::Blocks Student Manual Lawrence Goetz, Network Administrator Yedidyah Langsam, Professor and Theodore Raphan, Distinguished Professor Dept. of Computer and Information Science Brooklyn College of
Lab 0 (Setting up your Development Environment) Week 1
ECE155: Engineering Design with Embedded Systems Winter 2013 Lab 0 (Setting up your Development Environment) Week 1 Prepared by Kirill Morozov version 1.2 1 Objectives In this lab, you ll familiarize yourself
! E6893 Big Data Analytics:! Demo Session II: Mahout working with Eclipse and Maven for Collaborative Filtering
E6893 Big Data Analytics: Demo Session II: Mahout working with Eclipse and Maven for Collaborative Filtering Aonan Zhang Dept. of Electrical Engineering 1 October 9th, 2014 Mahout Brief Review The Apache
Introduction to Synoptic
Introduction to Synoptic 1 Introduction Synoptic is a tool that summarizes log files. More exactly, Synoptic takes a set of log files, and some rules that tell it how to interpret lines in those logs,
Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA
Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA http://kzhang6.people.uic.edu/tutorial/amcis2014.html August 7, 2014 Schedule I. Introduction to big data
Setting up Hadoop with MongoDB on Windows 7 64-bit
SGT WHITE PAPER Setting up Hadoop with MongoDB on Windows 7 64-bit HCCP Big Data Lab 2015 SGT, Inc. All Rights Reserved 7701 Greenbelt Road, Suite 400, Greenbelt, MD 20770 Tel: (301) 614-8600 Fax: (301)
Map Reduce & Hadoop Recommended Text:
Big Data Map Reduce & Hadoop Recommended Text:! Large datasets are becoming more common The New York Stock Exchange generates about one terabyte of new trade data per day. Facebook hosts approximately
The Maui High Performance Computing Center Department of Defense Supercomputing Resource Center (MHPCC DSRC) Hadoop Implementation on Riptide - -
The Maui High Performance Computing Center Department of Defense Supercomputing Resource Center (MHPCC DSRC) Hadoop Implementation on Riptide - - Hadoop Implementation on Riptide 2 Table of Contents Executive
Microsoft Windows PowerShell v2 For Administrators
Course 50414B: Microsoft Windows PowerShell v2 For Administrators Course Details Course Outline Module 1: Introduction to PowerShell the Basics This module explains how to install and configure PowerShell.
Data processing goes big
Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,
IOIO for Android Beginners Guide Introduction
IOIO for Android Beginners Guide Introduction This is the beginners guide for the IOIO for Android board and is intended for users that have never written an Android app. The goal of this tutorial is to
using version control in system administration
LUKE KANIES using version control in system administration Luke Kanies runs Reductive Labs (http://reductivelabs.com), a startup producing OSS software for centralized, automated server administration.
Tutorial for Assignment 2.0
Tutorial for Assignment 2.0 Florian Klien & Christian Körner IMPORTANT The presented information has been tested on the following operating systems Mac OS X 10.6 Ubuntu Linux The installation on Windows
Hadoop 2.6 Configuration and More Examples
Hadoop 2.6 Configuration and More Examples Big Data 2015 Apache Hadoop & YARN Apache Hadoop (1.X)! De facto Big Data open source platform Running for about 5 years in production at hundreds of companies
