How To Run A Tompouce Cluster On An Ipra (Inria) (Sun) 2 (Sun Geserade) (Sun-Ge) 2/5.2 (

Size: px
Start display at page:

Download "How To Run A Tompouce Cluster On An Ipra (Inria) 2.5.5 (Sun) 2 (Sun Geserade) 2-5.4 (Sun-Ge) 2/5.2 ("

Transcription

1 Running Hadoop and Stratosphere jobs on TomPouce cluster 16 October 2013

2 TomPouce cluster TomPouce is a cluster of 20 calcula@on nodes = 240 cores Located in the Inria Turing building (École Polytechnique) Used jointly by Inria teams Jobs are run with the help of a scheduler SGE (Sun Grid Engine) 2

3 TomPouce cluster SPECIFICATIONS: 20 nodes > bi- processors > 6 cores. Total: 240 cores 48 Gb Ram per node Local space 400 Gb Storage: Dell R510 /home 19 Tb NFS Dell R710 x2 /scratch 37 Tb FHGFS (Fraunhofer FS) Network: Switch Dell 5548 Switch infiniband Mellanox InfiniScale IV QDR 3

4 TomPouce cluster 4

5 1. Copy your job from the local machine to the cluster front node $ scp myjob.jar inria_username@ :~/ myjob.jar will be copied in the folder /home/leo/inria_username. 5

6 2. Connect via ssh to front node $ ssh inria_username@ Welcome to Bright Cluster Manager 6.0 Based on Scientific Linux release 6 Cluster Manager ID: # Use the following commands to adjust your environment: 'module avail ' - show available modules 'module add <module> ' - adds a module to your environment for this session 'module initadd <module> ' - configure module to be loaded at every login IMPORTANT: To connect to the cluster, you ssh key should be stored in the Inria LDAP. If not, send an e- mail with your public ssh key to: helpmi- [email protected] 6

7 3. Log in as clustervision superuser using your LDAP password $ sudo su - clustervision - To execute Hadoop and Stratosphere jobs and edit configura@ons needed. - If you don t have enough permissions, ask for them to: helpmi- [email protected] 7

8 4. Add Hadoop/Stratosphere environment to your session To add Hadoop environment, type: $module add hadoop/1.1.1 To add Stratosphere environment, type: $module add stratosphere/stratosphere - To add an environment automa@cally when you login: $module initadd hadoop/ To check all the environments loaded: $ module list Currently Loaded Modulefiles: 1) gcc/ ) intel-cluster-checker/1.8 3) stratosphere/stratosphere ) sge/ ) openmpi/gcc/64/ ) gromacs/openmpi/gcc/64/ ) hadoop/

9 4. Add Hadoop/Stratosphere environment to your session Hadoop /cm/shared/apps/hadoop/current/ Stratosphere /cm/shared/apps/stratosphere/current/ 9

10 5. Create an executon script (Hadoop) #/bin/bash #$ -N hadoop_run #$ -pe hadoop 12 #$ -j y #$ -o output.$job_id #$ -l h_rt=00:10:00,hadoop=true,excl=true #$ -cwd #$ -q hadoop.q #Copy the input files into the HDFS filesystem hadoop --config /home/guests/clustervision/current/ dfs -copyfromlocal /home/guests/clustervision/tmp /input #Running the hadoop task(s) here. I am specifying the jar, class, run parameters: hadoop --config /home/guests/clustervision/current/ jar myjob.jar org.myorg.job /input /output # Copying the output files from the HDFS filesystem hadoop --config /home/guests/clustervision/current/ fs get /output 10

11 5. Create an executon script (Hadoop) #/bin/bash #$ -N hadoop_run #$ -pe hadoop 12 #$ -j y #$ -o output.$job_id #$ -l h_rt=00:10:00,hadoop=true,excl=true #$ -cwd #$ -q hadoop.q #Copy the input files into the HDFS filesystem hadoop --config /home/guests/clustervision/current/ dfs -copyfromlocal /home/guests/clustervision/tmp /input #Running the hadoop task(s) here. I am specifying the jar, class, run parameters: hadoop --config /home/guests/clustervision/current/ jar myjob.jar org.myorg.job /input /output # Copying the output files from the HDFS filesystem hadoop --config /home/guests/clustervision/current/ fs get /output 11

12 SGE executon parameters: Should be wrigen aher #$ at the beginning of the script. - N <job_name>. Used to give a name to the job to run. - pe <environment> N. Specifies the environment. N is the number of cores (limited to 180). - j y : to use the same output file (errors and standard exit). 12

13 SGE executon parameters: - o output.$job_id: the standard output will be in a file name ouput.$job_id. $JOB_ID will be the number SGE will assign automa@cally to our job. - l name=value. Used to demand a resource. In this case: h_rt=00:10:00 indicates that the job should be killed aher 10 minutes hadoop=true indicates that the job to run is a Hadoop job (it DOES NOT CHANGE for Stratosphere jobs) excl=true indicates that it is executed exclusively 13

14 5. Create an executon script (Hadoop) HADOOP COMMANDS Copy input files into HDFS hadoop --config /home/guests/clustervision/current/ dfs -copyfromlocal /home/guests/clustervision/tmp /input Run Hadoop tasks hadoop --config /home/guests/clustervision/current/ jar /pathtojob/myjob.jar org.myorg.job /input /output Copy output files from HDFS hadoop --config /home/guests/clustervision/current/ fs get /output 14

15 5. Create an executon script (Hadoop) HADOOP COMMANDS Copy input files into HDFS hadoop --config /home/guests/clustervision/current/ dfs -copyfromlocal /home/guests/clustervision/tmp /input Run Hadoop tasks hadoop --config /home/guests/clustervision/current/ jar /pathtojob/myjob.jar org.myorg.job /input /output Copy output files from HDFS hadoop --config /home/guests/clustervision/current/ fs get /output 15

16 5. Create an executon script (Stratosphere): #/bin/bash #$ -N strato_run #$ -pe stratosphere 24 #$ -j y #$ -o output.$job_id #$ -l h_rt=00:10:00,hadoop=true,excl=true #$ -cwd #$ -q hadoop.q export PATH=$PATH:'/cm/shared/apps/hadoop/current/conf/' export STRATOSPHERE_HOME='/cm/shared/apps/stratosphere/current MASTER=`cat /home/guests/clustervision/current/masters` hadoop --config /home/guests/clustervision/current/ dfs -copyfromlocal /home/guests/ clustervision/tmp /var/hadoop/dfs.name.dir $STRATOSPHERE_HOME/bin/pact-client.sh run -j myjob.jar -a 2 hdfs://$master:50040/var/hadoop/ dfs.name.dir/inputfile hdfs://$master:50040/var/hadoop/dfs.name.dir/outputfile hadoop --config /home/guests/clustervision/current/ fs -get /var/hadoop/dfs.name.dir/output 16

17 5. Create an executon script (Stratosphere): #/bin/bash #$ -N strato_run #$ -pe stratosphere 24 #$ -j y #$ -o output.$job_id #$ -l h_rt=00:10:00,hadoop=true,excl=true #$ -cwd #$ -q hadoop.q export PATH=$PATH:'/cm/shared/apps/hadoop/current/conf/' export STRATOSPHERE_HOME='/cm/shared/apps/stratosphere/current MASTER=`cat /home/guests/clustervision/current/masters` hadoop --config /home/guests/clustervision/current/ dfs -copyfromlocal /home/guests/ clustervision/tmp /input $STRATOSPHERE_HOME/bin/pact-client.sh run -j myjob.jar -a 2 hdfs://$master:50040/input hdfs://$master:50040/output hadoop --config /home/guests/clustervision/current/ fs -get /output 17

18 5. Create an executon script (Stratosphere): #/bin/bash #$ -N strato_run #$ -pe stratosphere 24 #$ -j y #$ -o output.$job_id #$ -l h_rt=00:10:00,hadoop=true,excl=true #$ -cwd #$ -q hadoop.q export PATH=$PATH:'/cm/shared/apps/hadoop/current/conf/' export STRATOSPHERE_HOME='/cm/shared/apps/stratosphere/current MASTER=`cat /home/guests/clustervision/current/masters` hadoop --config /home/guests/clustervision/current/ dfs -copyfromlocal /home/guests/ clustervision/tmp /input $STRATOSPHERE_HOME/bin/pact-client.sh run -j myjob.jar -a 2 hdfs://$master:50040/input hdfs://$master:50040/output hadoop --config /home/guests/clustervision/current/ fs -get /output 18

19 5. Create an executon script (Stratosphere) STRATOSPHERE COMMANDS Copy input files into HDFS hadoop --config /home/guests/clustervision/current/ dfs - copyfromlocal /home/guests/clustervision/tmp /input Run Stratosphere tasks $STRATOSPHERE_HOME/bin/pact-client.sh run -j /pathtojob/myjob.jar -a 2 hdfs://$master:50040/input hdfs://$master:50040/output Copy output files from HDFS hadoop --config /home/guests/clustervision/current/ fs -get /output 19

20 6. Submission of a job To submit, execute: $qsub script.qsub Aher submission, you can see the state of execu@on with the command: $ qstat job-id prior name user state submit/start at queue slots ja-task-id strato_run clustervisio r 10/15/ :17:59 [email protected] 24 20

21 6. Submission of a job Or if you want a more detailed informa@on: $qstat t 21

22 7. Logs /home/guests/clustervision/output.$job_id: Output of the job in SGE /home/guests/clustervision/config.$job_id/logs: Logs of Hadoop file system. 22

Grid Engine Basics. Table of Contents. Grid Engine Basics Version 1. (Formerly: Sun Grid Engine)

Grid Engine Basics. Table of Contents. Grid Engine Basics Version 1. (Formerly: Sun Grid Engine) Grid Engine Basics (Formerly: Sun Grid Engine) Table of Contents Table of Contents Document Text Style Associations Prerequisites Terminology What is the Grid Engine (SGE)? Loading the SGE Module on Turing

More information

Introduction to Running Hadoop on the High Performance Clusters at the Center for Computational Research

Introduction to Running Hadoop on the High Performance Clusters at the Center for Computational Research Introduction to Running Hadoop on the High Performance Clusters at the Center for Computational Research Cynthia Cornelius Center for Computational Research University at Buffalo, SUNY 701 Ellicott St

More information

An Introduction to High Performance Computing in the Department

An Introduction to High Performance Computing in the Department An Introduction to High Performance Computing in the Department Ashley Ford & Chris Jewell Department of Statistics University of Warwick October 30, 2012 1 Some Background 2 How is Buster used? 3 Software

More information

SGE Roll: Users Guide. Version @VERSION@ Edition

SGE Roll: Users Guide. Version @VERSION@ Edition SGE Roll: Users Guide Version @VERSION@ Edition SGE Roll: Users Guide : Version @VERSION@ Edition Published Aug 2006 Copyright 2006 UC Regents, Scalable Systems Table of Contents Preface...i 1. Requirements...1

More information

Grid Engine Users Guide. 2011.11p1 Edition

Grid Engine Users Guide. 2011.11p1 Edition Grid Engine Users Guide 2011.11p1 Edition Grid Engine Users Guide : 2011.11p1 Edition Published Nov 01 2012 Copyright 2012 University of California and Scalable Systems This document is subject to the

More information

Streamline Computing Linux Cluster User Training. ( Nottingham University)

Streamline Computing Linux Cluster User Training. ( Nottingham University) 1 Streamline Computing Linux Cluster User Training ( Nottingham University) 3 User Training Agenda System Overview System Access Description of Cluster Environment Code Development Job Schedulers Running

More information

Cluster@WU User s Manual

Cluster@WU User s Manual Cluster@WU User s Manual Stefan Theußl Martin Pacala September 29, 2014 1 Introduction and scope At the WU Wirtschaftsuniversität Wien the Research Institute for Computational Methods (Forschungsinstitut

More information

Manual for using Super Computing Resources

Manual for using Super Computing Resources Manual for using Super Computing Resources Super Computing Research and Education Centre at Research Centre for Modeling and Simulation National University of Science and Technology H-12 Campus, Islamabad

More information

High Performance Computing with Sun Grid Engine on the HPSCC cluster. Fernando J. Pineda

High Performance Computing with Sun Grid Engine on the HPSCC cluster. Fernando J. Pineda High Performance Computing with Sun Grid Engine on the HPSCC cluster Fernando J. Pineda HPSCC High Performance Scientific Computing Center (HPSCC) " The Johns Hopkins Service Center in the Dept. of Biostatistics

More information

Introduction to the SGE/OGS batch-queuing system

Introduction to the SGE/OGS batch-queuing system Grid Computing Competence Center Introduction to the SGE/OGS batch-queuing system Riccardo Murri Grid Computing Competence Center, Organisch-Chemisches Institut, University of Zurich Oct. 6, 2011 The basic

More information

Grid 101. Grid 101. Josh Hegie. [email protected] http://hpc.unr.edu

Grid 101. Grid 101. Josh Hegie. grid@unr.edu http://hpc.unr.edu Grid 101 Josh Hegie [email protected] http://hpc.unr.edu Accessing the Grid Outline 1 Accessing the Grid 2 Working on the Grid 3 Submitting Jobs with SGE 4 Compiling 5 MPI 6 Questions? Accessing the Grid Logging

More information

CS380 Final Project Evaluating the Scalability of Hadoop in a Real and Virtual Environment

CS380 Final Project Evaluating the Scalability of Hadoop in a Real and Virtual Environment CS380 Final Project Evaluating the Scalability of Hadoop in a Real and Virtual Environment James Devine December 15, 2008 Abstract Mapreduce has been a very successful computational technique that has

More information

研 發 專 案 原 始 程 式 碼 安 裝 及 操 作 手 冊. Version 0.1

研 發 專 案 原 始 程 式 碼 安 裝 及 操 作 手 冊. Version 0.1 102 年 度 國 科 會 雲 端 計 算 與 資 訊 安 全 技 術 研 發 專 案 原 始 程 式 碼 安 裝 及 操 作 手 冊 Version 0.1 總 計 畫 名 稱 : 行 動 雲 端 環 境 動 態 群 組 服 務 研 究 與 創 新 應 用 子 計 畫 一 : 行 動 雲 端 群 組 服 務 架 構 與 動 態 群 組 管 理 (NSC 102-2218-E-259-003) 計

More information

The Maui High Performance Computing Center Department of Defense Supercomputing Resource Center (MHPCC DSRC) Hadoop Implementation on Riptide - -

The Maui High Performance Computing Center Department of Defense Supercomputing Resource Center (MHPCC DSRC) Hadoop Implementation on Riptide - - The Maui High Performance Computing Center Department of Defense Supercomputing Resource Center (MHPCC DSRC) Hadoop Implementation on Riptide - - Hadoop Implementation on Riptide 2 Table of Contents Executive

More information

Hadoop 2.2.0 MultiNode Cluster Setup

Hadoop 2.2.0 MultiNode Cluster Setup Hadoop 2.2.0 MultiNode Cluster Setup Sunil Raiyani Jayam Modi June 7, 2014 Sunil Raiyani Jayam Modi Hadoop 2.2.0 MultiNode Cluster Setup June 7, 2014 1 / 14 Outline 4 Starting Daemons 1 Pre-Requisites

More information

Enigma, Sun Grid Engine (SGE), and the Joint High Performance Computing Exchange (JHPCE) Cluster

Enigma, Sun Grid Engine (SGE), and the Joint High Performance Computing Exchange (JHPCE) Cluster Enigma, Sun Grid Engine (SGE), and the Joint High Performance Computing Exchange (JHPCE) Cluster http://www.biostat.jhsph.edu/bit/sge_lecture.ppt.pdf Marvin Newhouse Fernando J. Pineda The JHPCE staff:

More information

Deploying Cloudera CDH (Cloudera Distribution Including Apache Hadoop) with Emulex OneConnect OCe14000 Network Adapters

Deploying Cloudera CDH (Cloudera Distribution Including Apache Hadoop) with Emulex OneConnect OCe14000 Network Adapters Deploying Cloudera CDH (Cloudera Distribution Including Apache Hadoop) with Emulex OneConnect OCe14000 Network Adapters Table of Contents Introduction... Hardware requirements... Recommended Hadoop cluster

More information

Work Environment. David Tur HPC Expert. HPC Users Training September, 18th 2015

Work Environment. David Tur HPC Expert. HPC Users Training September, 18th 2015 Work Environment David Tur HPC Expert HPC Users Training September, 18th 2015 1. Atlas Cluster: Accessing and using resources 2. Software Overview 3. Job Scheduler 1. Accessing Resources DIPC technicians

More information

GRID Computing: CAS Style

GRID Computing: CAS Style CS4CC3 Advanced Operating Systems Architectures Laboratory 7 GRID Computing: CAS Style campus trunk C.I.S. router "birkhoff" server The CAS Grid Computer 100BT ethernet node 1 "gigabyte" Ethernet switch

More information

1.0. User Manual For HPC Cluster at GIKI. Volume. Ghulam Ishaq Khan Institute of Engineering Sciences & Technology

1.0. User Manual For HPC Cluster at GIKI. Volume. Ghulam Ishaq Khan Institute of Engineering Sciences & Technology Volume 1.0 FACULTY OF CUMPUTER SCIENCE & ENGINEERING Ghulam Ishaq Khan Institute of Engineering Sciences & Technology User Manual For HPC Cluster at GIKI Designed and prepared by Faculty of Computer Science

More information

Introduction to Sun Grid Engine (SGE)

Introduction to Sun Grid Engine (SGE) Introduction to Sun Grid Engine (SGE) What is SGE? Sun Grid Engine (SGE) is an open source community effort to facilitate the adoption of distributed computing solutions. Sponsored by Sun Microsystems

More information

Installing and running COMSOL on a Linux cluster

Installing and running COMSOL on a Linux cluster Installing and running COMSOL on a Linux cluster Introduction This quick guide explains how to install and operate COMSOL Multiphysics 5.0 on a Linux cluster. It is a complement to the COMSOL Installation

More information

High Performance Computing Facility Specifications, Policies and Usage. Supercomputer Project. Bibliotheca Alexandrina

High Performance Computing Facility Specifications, Policies and Usage. Supercomputer Project. Bibliotheca Alexandrina High Performance Computing Facility Specifications, Policies and Usage Supercomputer Project Bibliotheca Alexandrina Bibliotheca Alexandrina 1/16 Topics Specifications Overview Site Policies Intel Compilers

More information

Deploy Apache Hadoop with Emulex OneConnect OCe14000 Ethernet Network Adapters

Deploy Apache Hadoop with Emulex OneConnect OCe14000 Ethernet Network Adapters CONNECT - Lab Guide Deploy Apache Hadoop with Emulex OneConnect OCe14000 Ethernet Network Adapters Hardware, software and configuration steps needed to deploy Apache Hadoop 2.4.1 with the Emulex family

More information

Grid Engine 6. Policies. BioTeam Inc. [email protected]

Grid Engine 6. Policies. BioTeam Inc. info@bioteam.net Grid Engine 6 Policies BioTeam Inc. [email protected] This module covers High level policy config Reservations Backfilling Resource Quotas Advanced Reservation Job Submission Verification We ll be talking

More information

MapReduce, Hadoop and Amazon AWS

MapReduce, Hadoop and Amazon AWS MapReduce, Hadoop and Amazon AWS Yasser Ganjisaffar http://www.ics.uci.edu/~yganjisa February 2011 What is Hadoop? A software framework that supports data-intensive distributed applications. It enables

More information

Getting Started with HPC

Getting Started with HPC Getting Started with HPC An Introduction to the Minerva High Performance Computing Resource 17 Sep 2013 Outline of Topics Introduction HPC Accounts Logging onto the HPC Clusters Common Linux Commands Storage

More information

Parallel Processing using the LOTUS cluster

Parallel Processing using the LOTUS cluster Parallel Processing using the LOTUS cluster Alison Pamment / Cristina del Cano Novales JASMIN/CEMS Workshop February 2015 Overview Parallelising data analysis LOTUS HPC Cluster Job submission on LOTUS

More information

Batch Job Analysis to Improve the Success Rate in HPC

Batch Job Analysis to Improve the Success Rate in HPC Batch Job Analysis to Improve the Success Rate in HPC 1 JunWeon Yoon, 2 TaeYoung Hong, 3 ChanYeol Park, 4 HeonChang Yu 1, First Author KISTI and Korea University, [email protected] 2,3, KISTI,[email protected],[email protected]

More information

PBS Tutorial. Fangrui Ma Universit of Nebraska-Lincoln. October 26th, 2007

PBS Tutorial. Fangrui Ma Universit of Nebraska-Lincoln. October 26th, 2007 PBS Tutorial Fangrui Ma Universit of Nebraska-Lincoln October 26th, 2007 Abstract In this tutorial we gave a brief introduction to using PBS Pro. We gave examples on how to write control script, and submit

More information

CycleServer Grid Engine Support Install Guide. version 1.25

CycleServer Grid Engine Support Install Guide. version 1.25 CycleServer Grid Engine Support Install Guide version 1.25 Contents CycleServer Grid Engine Guide 1 Administration 1 Requirements 1 Installation 1 Monitoring Additional OGS/SGE/etc Clusters 3 Monitoring

More information

HPCC USER S GUIDE. Version 1.2 July 2012. IITS (Research Support) Singapore Management University. IITS, Singapore Management University Page 1 of 35

HPCC USER S GUIDE. Version 1.2 July 2012. IITS (Research Support) Singapore Management University. IITS, Singapore Management University Page 1 of 35 HPCC USER S GUIDE Version 1.2 July 2012 IITS (Research Support) Singapore Management University IITS, Singapore Management University Page 1 of 35 Revision History Version 1.0 (27 June 2012): - Modified

More information

Introduction to SDSC systems and data analytics software packages "

Introduction to SDSC systems and data analytics software packages Introduction to SDSC systems and data analytics software packages " Mahidhar Tatineni ([email protected]) SDSC Summer Institute August 05, 2013 Getting Started" System Access Logging in Linux/Mac Use available

More information

Cluster Computing With R

Cluster Computing With R Cluster Computing With R Stowers Institute for Medical Research R/Bioconductor Discussion Group Earl F. Glynn Scientific Programmer 18 December 2007 1 Cluster Computing With R Accessing Linux Boxes from

More information

Linux für bwgrid. Sabine Richling, Heinz Kredel. Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim. 27.

Linux für bwgrid. Sabine Richling, Heinz Kredel. Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim. 27. Linux für bwgrid Sabine Richling, Heinz Kredel Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim 27. June 2011 Richling/Kredel (URZ/RUM) Linux für bwgrid FS 2011 1 / 33 Introduction

More information

The Asterope compute cluster

The Asterope compute cluster The Asterope compute cluster ÅA has a small cluster named asterope.abo.fi with 8 compute nodes Each node has 2 Intel Xeon X5650 processors (6-core) with a total of 24 GB RAM 2 NVIDIA Tesla M2050 GPGPU

More information

High Performance Compute Cluster

High Performance Compute Cluster High Performance Compute Cluster Overview for Researchers and Users Alces Software Limited 2013 COMMERCIAL IN CONFIDENCE 1 Copyrights, Licenses and Acknowledgements Dell, PowerEdge, PowerVault and PowerConnect

More information

Tutorial: Using WestGrid. Drew Leske Compute Canada/WestGrid Site Lead University of Victoria

Tutorial: Using WestGrid. Drew Leske Compute Canada/WestGrid Site Lead University of Victoria Tutorial: Using WestGrid Drew Leske Compute Canada/WestGrid Site Lead University of Victoria Fall 2013 Seminar Series Date Speaker Topic 23 September Lindsay Sill Introduction to WestGrid 9 October Drew

More information

Grid Engine 6. Troubleshooting. BioTeam Inc. [email protected]

Grid Engine 6. Troubleshooting. BioTeam Inc. info@bioteam.net Grid Engine 6 Troubleshooting BioTeam Inc. [email protected] Grid Engine Troubleshooting There are two core problem types Job Level Cluster seems OK, example scripts work fine Some user jobs/apps fail Cluster

More information

Miami University RedHawk Cluster Working with batch jobs on the Cluster

Miami University RedHawk Cluster Working with batch jobs on the Cluster Miami University RedHawk Cluster Working with batch jobs on the Cluster The RedHawk cluster is a general purpose research computing resource available to support the research community at Miami University.

More information

The objective of this lab is to learn how to set up an environment for running distributed Hadoop applications.

The objective of this lab is to learn how to set up an environment for running distributed Hadoop applications. Lab 9: Hadoop Development The objective of this lab is to learn how to set up an environment for running distributed Hadoop applications. Introduction Hadoop can be run in one of three modes: Standalone

More information

Using the Yale HPC Clusters

Using the Yale HPC Clusters Using the Yale HPC Clusters Stephen Weston Robert Bjornson Yale Center for Research Computing Yale University Oct 2015 To get help Send an email to: [email protected] Read documentation at: http://research.computing.yale.edu/hpc-support

More information

The RWTH Compute Cluster Environment

The RWTH Compute Cluster Environment The RWTH Compute Cluster Environment Tim Cramer 11.03.2013 Source: D. Both, Bull GmbH Rechen- und Kommunikationszentrum (RZ) How to login Frontends cluster.rz.rwth-aachen.de cluster-x.rz.rwth-aachen.de

More information

High Performance Computing

High Performance Computing High Performance Computing at Stellenbosch University Gerhard Venter Outline 1 Background 2 Clusters 3 SU History 4 SU Cluster 5 Using the Cluster 6 Examples What is High Performance Computing? Wikipedia

More information

HSearch Installation

HSearch Installation To configure HSearch you need to install Hadoop, Hbase, Zookeeper, HSearch and Tomcat. 1. Add the machines ip address in the /etc/hosts to access all the servers using name as shown below. 2. Allow all

More information

Beyond Windows: Using the Linux Servers and the Grid

Beyond Windows: Using the Linux Servers and the Grid Beyond Windows: Using the Linux Servers and the Grid Topics Linux Overview How to Login & Remote Access Passwords Staying Up-To-Date Network Drives Server List The Grid Useful Commands Linux Overview Linux

More information

Data Analytics. CloudSuite1.0 Benchmark Suite Copyright (c) 2011, Parallel Systems Architecture Lab, EPFL. All rights reserved.

Data Analytics. CloudSuite1.0 Benchmark Suite Copyright (c) 2011, Parallel Systems Architecture Lab, EPFL. All rights reserved. Data Analytics CloudSuite1.0 Benchmark Suite Copyright (c) 2011, Parallel Systems Architecture Lab, EPFL All rights reserved. The data analytics benchmark relies on using the Hadoop MapReduce framework

More information

HDFS to HPCC Connector User's Guide. Boca Raton Documentation Team

HDFS to HPCC Connector User's Guide. Boca Raton Documentation Team Boca Raton Documentation Team HDFS to HPCC Connector User's Guide Boca Raton Documentation Team Copyright We welcome your comments and feedback about this document via email to

More information

AstroCompute. AWS101 - using the cloud for Science. Brendan Bouffler ( boof ) Scientific Computing (SciCo) @ AWS. ska-astrocompute@amazon.

AstroCompute. AWS101 - using the cloud for Science. Brendan Bouffler ( boof ) Scientific Computing (SciCo) @ AWS. ska-astrocompute@amazon. AstroCompute AWS101 - using the cloud for Science Brendan Bouffler ( boof ) Scientific Computing (SciCo) @ AWS [email protected] AWS is hoping to contribute to the development of data processing,

More information

Configuration of High Performance Computing for Medical Imaging and Processing. SunGridEngine 6.2u5

Configuration of High Performance Computing for Medical Imaging and Processing. SunGridEngine 6.2u5 Configuration of High Performance Computing for Medical Imaging and Processing SunGridEngine 6.2u5 A manual guide for installing, configuring and using the cluster. Mohammad Naquiddin Abd Razak Summer

More information

Setup Hadoop On Ubuntu Linux. ---Multi-Node Cluster

Setup Hadoop On Ubuntu Linux. ---Multi-Node Cluster Setup Hadoop On Ubuntu Linux ---Multi-Node Cluster We have installed the JDK and Hadoop for you. The JAVA_HOME is /usr/lib/jvm/java/jdk1.6.0_22 The Hadoop home is /home/user/hadoop-0.20.2 1. Network Edit

More information

Hadoop (pseudo-distributed) installation and configuration

Hadoop (pseudo-distributed) installation and configuration Hadoop (pseudo-distributed) installation and configuration 1. Operating systems. Linux-based systems are preferred, e.g., Ubuntu or Mac OS X. 2. Install Java. For Linux, you should download JDK 8 under

More information

Hodor and Bran - Job Scheduling and PBS Scripts

Hodor and Bran - Job Scheduling and PBS Scripts Hodor and Bran - Job Scheduling and PBS Scripts UND Computational Research Center Now that you have your program compiled and your input file ready for processing, it s time to run your job on the cluster.

More information

Single Node Hadoop Cluster Setup

Single Node Hadoop Cluster Setup Single Node Hadoop Cluster Setup This document describes how to create Hadoop Single Node cluster in just 30 Minutes on Amazon EC2 cloud. You will learn following topics. Click Here to watch these steps

More information

Hadoop Installation. Sandeep Prasad

Hadoop Installation. Sandeep Prasad Hadoop Installation Sandeep Prasad 1 Introduction Hadoop is a system to manage large quantity of data. For this report hadoop- 1.0.3 (Released, May 2012) is used and tested on Ubuntu-12.04. The system

More information

NEC HPC-Linux-Cluster

NEC HPC-Linux-Cluster NEC HPC-Linux-Cluster Hardware configuration: 4 Front-end servers: each with SandyBridge-EP processors: 16 cores per node 128 GB memory 134 compute nodes: 112 nodes with SandyBridge-EP processors (16 cores

More information

Bright Cluster Manager 5.2. Administrator Manual. Revision: 6776. Date: Fri, 27 Nov 2015

Bright Cluster Manager 5.2. Administrator Manual. Revision: 6776. Date: Fri, 27 Nov 2015 Bright Cluster Manager 5.2 Administrator Manual Revision: 6776 Date: Fri, 27 Nov 2015 2012 Bright Computing, Inc. All Rights Reserved. This manual or parts thereof may not be reproduced in any form unless

More information

High-Performance Reservoir Risk Assessment (Jacta Cluster)

High-Performance Reservoir Risk Assessment (Jacta Cluster) High-Performance Reservoir Risk Assessment (Jacta Cluster) SKUA-GOCAD 2013.1 Paradigm 2011.3 With Epos 4.1 Data Management Configuration Guide 2008 2013 Paradigm Ltd. or its affiliates and subsidiaries.

More information

Introduction to Supercomputing with Janus

Introduction to Supercomputing with Janus Introduction to Supercomputing with Janus Shelley Knuth [email protected] Peter Ruprecht [email protected] www.rc.colorado.edu Outline Who is CU Research Computing? What is a supercomputer?

More information

USING HDFS ON DISCOVERY CLUSTER TWO EXAMPLES - test1 and test2

USING HDFS ON DISCOVERY CLUSTER TWO EXAMPLES - test1 and test2 USING HDFS ON DISCOVERY CLUSTER TWO EXAMPLES - test1 and test2 (Using HDFS on Discovery Cluster for Discovery Cluster Users email [email protected] if you have questions or need more clarifications. Nilay

More information

Submitting Jobs to the Sun Grid Engine. CiCS Dept The University of Sheffield. Email [email protected] [email protected].

Submitting Jobs to the Sun Grid Engine. CiCS Dept The University of Sheffield. Email D.Savas@sheffield.ac.uk M.Griffiths@sheffield.ac. Submitting Jobs to the Sun Grid Engine CiCS Dept The University of Sheffield Email [email protected] [email protected] October 2012 Topics Covered Introducing the grid and batch concepts.

More information

HOD Scheduler. Table of contents

HOD Scheduler. Table of contents Table of contents 1 Introduction... 2 2 HOD Users... 2 2.1 Getting Started... 2 2.2 HOD Features...5 2.3 Troubleshooting... 14 3 HOD Administrators... 21 3.1 Getting Started... 22 3.2 Prerequisites...

More information

HPCC - Hrothgar Getting Started User Guide MPI Programming

HPCC - Hrothgar Getting Started User Guide MPI Programming HPCC - Hrothgar Getting Started User Guide MPI Programming High Performance Computing Center Texas Tech University HPCC - Hrothgar 2 Table of Contents 1. Introduction... 3 2. Setting up the environment...

More information

Hadoop Data Warehouse Manual

Hadoop Data Warehouse Manual Ruben Vervaeke & Jonas Lesy 1 Hadoop Data Warehouse Manual To start off, we d like to advise you to read the thesis written about this project before applying any changes to the setup! The thesis can be

More information

A Study of Data Management Technology for Handling Big Data

A Study of Data Management Technology for Handling Big Data Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 9, September 2014,

More information

Running on Blue Gene/Q at Argonne Leadership Computing Facility (ALCF)

Running on Blue Gene/Q at Argonne Leadership Computing Facility (ALCF) Running on Blue Gene/Q at Argonne Leadership Computing Facility (ALCF) ALCF Resources: Machines & Storage Mira (Production) IBM Blue Gene/Q 49,152 nodes / 786,432 cores 768 TB of memory Peak flop rate:

More information

HIPAA Compliance Use Case

HIPAA Compliance Use Case Overview HIPAA Compliance helps ensure that all medical records, medical billing, and patient accounts meet certain consistent standards with regard to documentation, handling, and privacy. Current Situation

More information

Cloudera Distributed Hadoop (CDH) Installation and Configuration on Virtual Box

Cloudera Distributed Hadoop (CDH) Installation and Configuration on Virtual Box Cloudera Distributed Hadoop (CDH) Installation and Configuration on Virtual Box By Kavya Mugadur W1014808 1 Table of contents 1.What is CDH? 2. Hadoop Basics 3. Ways to install CDH 4. Installation and

More information

Clusters in the Cloud

Clusters in the Cloud Clusters in the Cloud Dr. Paul Coddington, Deputy Director Dr. Shunde Zhang, Compu:ng Specialist eresearch SA October 2014 Use Cases Make the cloud easier to use for compute jobs Par:cularly for users

More information

Quick Deployment Step-by-step instructions to deploy Oracle Big Data Lite Virtual Machine

Quick Deployment Step-by-step instructions to deploy Oracle Big Data Lite Virtual Machine Quick Deployment Step-by-step instructions to deploy Oracle Big Data Lite Virtual Machine Version 3.0 Please note: This appliance is for testing and educational purposes only; it is unsupported and not

More information

Single Node Setup. Table of contents

Single Node Setup. Table of contents Table of contents 1 Purpose... 2 2 Prerequisites...2 2.1 Supported Platforms...2 2.2 Required Software... 2 2.3 Installing Software...2 3 Download...2 4 Prepare to Start the Hadoop Cluster... 3 5 Standalone

More information

How To Write A Mapreduce Program On An Ipad Or Ipad (For Free)

How To Write A Mapreduce Program On An Ipad Or Ipad (For Free) Course NDBI040: Big Data Management and NoSQL Databases Practice 01: MapReduce Martin Svoboda Faculty of Mathematics and Physics, Charles University in Prague MapReduce: Overview MapReduce Programming

More information

User Manual: Using Hadoop with WS-PGRADE. workflow.

User Manual: Using Hadoop with WS-PGRADE. workflow. User Manual: Using Hadoop with WS-PGRADE workflows December 9, 2014 1 About This manual explains the configuration of a set of workflows that can be used to submit a Hadoop job through a WS-PGRADE portal.

More information

Big Data Operations Guide for Cloudera Manager v5.x Hadoop

Big Data Operations Guide for Cloudera Manager v5.x Hadoop Big Data Operations Guide for Cloudera Manager v5.x Hadoop Logging into the Enterprise Cloudera Manager 1. On the server where you have installed 'Cloudera Manager', make sure that the server is running,

More information

HPC at IU Overview. Abhinav Thota Research Technologies Indiana University

HPC at IU Overview. Abhinav Thota Research Technologies Indiana University HPC at IU Overview Abhinav Thota Research Technologies Indiana University What is HPC/cyberinfrastructure? Why should you care? Data sizes are growing Need to get to the solution faster Compute power is

More information

NIST/ITL CSD Biometric Conformance Test Software on Apache Hadoop. September 2014. National Institute of Standards and Technology (NIST)

NIST/ITL CSD Biometric Conformance Test Software on Apache Hadoop. September 2014. National Institute of Standards and Technology (NIST) NIST/ITL CSD Biometric Conformance Test Software on Apache Hadoop September 2014 Dylan Yaga NIST/ITL CSD Lead Software Designer Fernando Podio NIST/ITL CSD Project Manager National Institute of Standards

More information

Centrify Server Suite 2015.1 For MapR 4.1 Hadoop With Multiple Clusters in Active Directory

Centrify Server Suite 2015.1 For MapR 4.1 Hadoop With Multiple Clusters in Active Directory Centrify Server Suite 2015.1 For MapR 4.1 Hadoop With Multiple Clusters in Active Directory v1.1 2015 CENTRIFY CORPORATION. ALL RIGHTS RESERVED. 1 Contents General Information 3 Centrify Server Suite for

More information

MapReduce. Tushar B. Kute, http://tusharkute.com

MapReduce. Tushar B. Kute, http://tusharkute.com MapReduce Tushar B. Kute, http://tusharkute.com What is MapReduce? MapReduce is a framework using which we can write applications to process huge amounts of data, in parallel, on large clusters of commodity

More information

How To Install Hadoop 1.2.1.1 From Apa Hadoop 1.3.2 To 1.4.2 (Hadoop)

How To Install Hadoop 1.2.1.1 From Apa Hadoop 1.3.2 To 1.4.2 (Hadoop) Contents Download and install Java JDK... 1 Download the Hadoop tar ball... 1 Update $HOME/.bashrc... 3 Configuration of Hadoop in Pseudo Distributed Mode... 4 Format the newly created cluster to create

More information

Notes on the SNOW/Rmpi R packages with OpenMPI and Sun Grid Engine

Notes on the SNOW/Rmpi R packages with OpenMPI and Sun Grid Engine Notes on the SNOW/Rmpi R packages with OpenMPI and Sun Grid Engine Last updated: 6/2/2008 4:43PM EDT We informally discuss the basic set up of the R Rmpi and SNOW packages with OpenMPI and the Sun Grid

More information

Hadoop. Apache Hadoop is an open-source software framework for storage and large scale processing of data-sets on clusters of commodity hardware.

Hadoop. Apache Hadoop is an open-source software framework for storage and large scale processing of data-sets on clusters of commodity hardware. Hadoop Source Alessandro Rezzani, Big Data - Architettura, tecnologie e metodi per l utilizzo di grandi basi di dati, Apogeo Education, ottobre 2013 wikipedia Hadoop Apache Hadoop is an open-source software

More information

Getting help - guide to the ticketing system. Thomas Röblitz, UiO/USIT/UAV/ITF/FI ;)

Getting help - guide to the ticketing system. Thomas Röblitz, UiO/USIT/UAV/ITF/FI ;) Getting help - guide to the ticketing system Thomas Röblitz, UiO/USIT/UAV/ITF/FI ;) World is perfect, isn t? Why do we need a ticket system? Example 1 User: I am having trouble logging on to abel this

More information

Set JAVA PATH in Linux Environment. Edit.bashrc and add below 2 lines $vi.bashrc export JAVA_HOME=/usr/lib/jvm/java-7-oracle/

Set JAVA PATH in Linux Environment. Edit.bashrc and add below 2 lines $vi.bashrc export JAVA_HOME=/usr/lib/jvm/java-7-oracle/ Download the Hadoop tar. Download the Java from Oracle - Unpack the Comparisons -- $tar -zxvf hadoop-2.6.0.tar.gz $tar -zxf jdk1.7.0_60.tar.gz Set JAVA PATH in Linux Environment. Edit.bashrc and add below

More information

Hadoop Basics with InfoSphere BigInsights

Hadoop Basics with InfoSphere BigInsights An IBM Proof of Technology Hadoop Basics with InfoSphere BigInsights Unit 4: Hadoop Administration An IBM Proof of Technology Catalog Number Copyright IBM Corporation, 2013 US Government Users Restricted

More information

Hadoop Lab - Setting a 3 node Cluster. http://hadoop.apache.org/releases.html. Java - http://wiki.apache.org/hadoop/hadoopjavaversions

Hadoop Lab - Setting a 3 node Cluster. http://hadoop.apache.org/releases.html. Java - http://wiki.apache.org/hadoop/hadoopjavaversions Hadoop Lab - Setting a 3 node Cluster Packages Hadoop Packages can be downloaded from: http://hadoop.apache.org/releases.html Java - http://wiki.apache.org/hadoop/hadoopjavaversions Note: I have tested

More information

New High-performance computing cluster: PAULI. Sascha Frick Institute for Physical Chemistry

New High-performance computing cluster: PAULI. Sascha Frick Institute for Physical Chemistry New High-performance computing cluster: PAULI Sascha Frick Institute for Physical Chemistry 02/05/2012 Sascha Frick (PHC) HPC cluster pauli 02/05/2012 1 / 24 Outline 1 About this seminar 2 New Hardware

More information

Using WestGrid. Patrick Mann, Manager, Technical Operations Jan.15, 2014

Using WestGrid. Patrick Mann, Manager, Technical Operations Jan.15, 2014 Using WestGrid Patrick Mann, Manager, Technical Operations Jan.15, 2014 Winter 2014 Seminar Series Date Speaker Topic 5 February Gino DiLabio Molecular Modelling Using HPC and Gaussian 26 February Jonathan

More information

A. Aiken & K. Olukotun PA3

A. Aiken & K. Olukotun PA3 Programming Assignment #3 Hadoop N-Gram Due Tue, Feb 18, 11:59PM In this programming assignment you will use Hadoop s implementation of MapReduce to search Wikipedia. This is not a course in search, so

More information

CactoScale Guide User Guide. Athanasios Tsitsipas (UULM), Papazachos Zafeirios (QUB), Sakil Barbhuiya (QUB)

CactoScale Guide User Guide. Athanasios Tsitsipas (UULM), Papazachos Zafeirios (QUB), Sakil Barbhuiya (QUB) CactoScale Guide User Guide Athanasios Tsitsipas (UULM), Papazachos Zafeirios (QUB), Sakil Barbhuiya (QUB) Version History Version Date Change Author 0.1 12/10/2014 Initial version Athanasios Tsitsipas(UULM)

More information

HPC @ CRIBI. Calcolo Scientifico e Bioinformatica oggi Università di Padova 13 gennaio 2012

HPC @ CRIBI. Calcolo Scientifico e Bioinformatica oggi Università di Padova 13 gennaio 2012 HPC @ CRIBI Calcolo Scientifico e Bioinformatica oggi Università di Padova 13 gennaio 2012 what is exact? experience on advanced computational technologies a company lead by IT experts with a strong background

More information

JUROPA Linux Cluster An Overview. 19 May 2014 Ulrich Detert

JUROPA Linux Cluster An Overview. 19 May 2014 Ulrich Detert Mitglied der Helmholtz-Gemeinschaft JUROPA Linux Cluster An Overview 19 May 2014 Ulrich Detert JuRoPA JuRoPA Jülich Research on Petaflop Architectures Bull, Sun, ParTec, Intel, Mellanox, Novell, FZJ JUROPA

More information

Hadoop 2.6 Configuration and More Examples

Hadoop 2.6 Configuration and More Examples Hadoop 2.6 Configuration and More Examples Big Data 2015 Apache Hadoop & YARN Apache Hadoop (1.X)! De facto Big Data open source platform Running for about 5 years in production at hundreds of companies

More information

Apache Hadoop new way for the company to store and analyze big data

Apache Hadoop new way for the company to store and analyze big data Apache Hadoop new way for the company to store and analyze big data Reyna Ulaque Software Engineer Agenda What is Big Data? What is Hadoop? Who uses Hadoop? Hadoop Architecture Hadoop Distributed File

More information

Parallel Debugging with DDT

Parallel Debugging with DDT Parallel Debugging with DDT Nate Woody 3/10/2009 www.cac.cornell.edu 1 Debugging Debugging is a methodical process of finding and reducing the number of bugs, or defects, in a computer program or a piece

More information

TP1: Getting Started with Hadoop

TP1: Getting Started with Hadoop TP1: Getting Started with Hadoop Alexandru Costan MapReduce has emerged as a leading programming model for data-intensive computing. It was originally proposed by Google to simplify development of web

More information

Hadoop on the Gordon Data Intensive Cluster

Hadoop on the Gordon Data Intensive Cluster Hadoop on the Gordon Data Intensive Cluster Amit Majumdar, Scientific Computing Applications Mahidhar Tatineni, HPC User Services San Diego Supercomputer Center University of California San Diego Dec 18,

More information

Data management on HPC platforms

Data management on HPC platforms Data management on HPC platforms Transferring data and handling code with Git scitas.epfl.ch September 10, 2015 http://bit.ly/1jkghz4 What kind of data Categorizing data to define a strategy Based on size?

More information