Programme SURF Research Boot Camp 21 April 2016 Registration at

Size: px
Start display at page:

Download "Programme SURF Research Boot Camp 21 April 2016 Registration at"

Transcription

1 Programme SURF Research Boot Camp 21 April 2016 Registration at h PROGRAMMING DATA MANAGEMENT COMPUTE MISCELLANEOUS Introduction to UNIX Jeroen Engelberts SURFsara Data cleaning with OpenRefine Mateusz Kuzak Netherlands escience Center Introduction to compute infrastructures Jan Bot SURFsara Good practices for research data sharing and transfer Niek Bosch SURFsara, Paul van Dijk SURFnet BIG DATA part 1 Scalable data analysis with Apache Spark and Hadoop Mathijs Kattenberg, Jeroen Schot & Machiel Jansen SURFsara 12:30 14:30 h PROGRAMMING DATA MANAGEMENT part 1 COMPUTE MISCELLANEOUS Python crash course Ad Thiers & Maurice Verheesen AT Computing PID and irods web tools to manage and share research data Ton Smeele Utrecht University Christine Staiger SURFsara High performance computing in the cloud Markus van Dijk & Ander Astudillo SURFsara Local and remote data visualisation Paul Melis & Casper van Leeuwen SURFsara BIG DATA part 2 see BIG DATA part 1 15:00 17:00 h PROGRAMMING DATA MANAGEMENT part 2 COMPUTE MISCELLANEOUS Scientific computing with Python Ad Thiers & Maurice Verheesen AT Computing PID and irods command line tools to securely store and manage research data Ton Smeele Utrecht University Christine Staiger SURFsara Cluster computing Jeroen Engelberts SURFsara Version control with Git Carlos Martinez Ortiz Netherlands escience Center BIG DATA part 3 see BIG DATA part 1

2 PROGRAMMING TRACK # Introduction to UNIX Time: h Jeroen Engelberts SURFsara In this part of the Programming track, you will learn the history of UNIX and get acquainted with some basic commands you will need to start using data, compute, and network facilities. work with a UNIX terminal/shell use some basic UNIX commands login to a UNIX cluster follow the Introduction to Cluster Computing and Introduction to HPC Cloud from the Compute track. Some basic knowledge working with a computer. Bring along your own laptop. Make sure that you have installed Firefox on your laptop. Please install the FireSSH plugin, which can be obtained free of charge here: US/firefox/addon/firessh/ You will receive a username and password to login to a UNIX system. # Python crash course Time: 12:30 14:30 h Ad Thiers & Maurice Verheesen AT Computing In this session you will learn the basics of the Python programming language. In last couple of years, Python received much attention, specifically in the realm of technical computing. Its benefits superb integration with existing (C, C++ and Fortran) code, and a very simple, but powerful syntax obviously contribute to this popularity. This workshop will cover basic language constructs like loops, if then else statements and exceptions, and it will briefly deal with data types.

3 read and understand existing Python code create a simple Python script determine whether Python should be part of their 'research toolbox' This workshop assumes some programming / scripting familiarity. Bring your own laptop. A basic Python installation is required. This can be downloaded from A more complete installation for scientific computing with Python is Anaconda available from Be sure to install the Python 3 versions. # Scientific computing with Python Time: 15:00 17:00 h Ad Thiers & Maurice Verheesen AT Computing You will be introduced into the world of scientific Python. Plain Python due to its versatility is not well suited for number crunching. There is, however, a wealth of Python software available modules and packages allowing you to perform computationally intensive jobs. All these packages are built around NumPy and matplotlib. Their basic functionality working with n dimensional arrays and visualizing data will be covered in this workshop to some level. You will also get an overview of the major modules in SciPy an enormous toolkit for scientific calculations. At the end of the track, you are be able to: create a simple NumPy script and manipulate arrays create plots with matplotlib find their way in the SciPy / NumPy toolkit This workshop assumes a basic knowledge of the Python programming language Bring your own laptop. An installation for scientific computing with Python, with NumPy, SciPy and MatPlotLib is required. The Anaconda distribution from contains all prerequisites (and more). Be sure to install the Python 3 versions.

4 DATA MANAGEMENT TRACK # Data Cleaning with OpenRefine Time: h Mateusz Kuzak Netherlands escience Center OpenRefine is a powerful tool for working with messy data e.g. to: clean data transform data from one format into another extending it with web services and external data With OpenRefine you will get a better picture of your dataset. You will learn how to use faceting, clustering and filtering features and to correct errors in a dataset. You will also learn how to find and remove whitespace errors and how to split columns. In addition, you will learn how to move forward and backward on the timeline of changes you applied to the data set and how to script changes for future reuse. At the end, you will export a clean dataset to a new file. See also: for introduction videos on OpenRefine import tabular data to OpenRefine find and correct errors in the dataset find and clean whitespace errors for whole columns script cleaning steps This workshop assumes only familiarity with tabular data formats, like comma or tab separated. Bring your own laptop. You will need Firefox Web Browser installed and the OpenRefine browser plugin. Here is the guide to install OpenRefine: Instructions # Data management part 1: PID and irods web tools to manage and share research data Time: 12:30 14:30 h Ton Smeele Utrecht University Christine Staiger SURFsara

5 The workshop explains the European common data infrastructure EUDAT services including persistent identifiers. You will use web tools such as B2FIND to discover research data sets, B2SHARE to publish your own research data and the Handle/EPIC websites to inspect persistent identifiers. In addition, you securely store and manage your data using web tools to work with an irods data grid. At the end of the workshop, you will be able to: understand the common EUDAT B2 suite data services know how to use web tools to find and share research data sets across Europe know how to use web tools to store and manage research data understand the concept of persistent identifiers for research data and know how to use them This is an introductory workshop open to all disciplines. No specific software has to be installed. Bring your own laptop. # Data management part 2: PID and irods command line tools to securely store and manage research data Time: 15:00 17:00 h Ton Smeele Utrecht University Christine Staiger SURFsara During the workshop you create persistent identifiers for your data. Also the workshop introduces the key functions of a data grid and allows you to experiment with an existing irods grid: save files to the grid and retrieve them again, use advanced search techniques to find your files based on their metadata context, build a pipeline to automate (post)processing of data files. At the end of the workshop, you will be able to: understand how to create EPIC persistent identifiers using Python programs understand the benefits of using data grids for storing research data understand the architecture of data grids know the basic set of commands to interact with a data grid know how to automate pipelines using a data grid This workshop assumes some basic familiarity with terminal command line (Linux or DOS).

6 It also assumes the participant is familiar with general concept of persistent identifiers such as DOI and EPIC (these concepts are introduced in the workshop Data Management Part 1: PID and irods web tools to manage and share research data ). While knowledge of Python programming language is an advantage it is not a prerequisite. Bring your own laptop. SSH/Putty tools is required in order to access the data grid server. o Linux and Mac users don t have to install anything; an SSH client is installed. o Windows users: download and install Putty or SSH US/firefox/addon/firessh/ COMPUTE TRACK # Introduction to compute infrastructures Time: h Jan Bot SURFsara We will provide you with a basic understanding of the different compute infrastructures that are available and whether you should consider using one. This module is a prerequisite for the cluster and cloud compute hands ons. We will explain in which situations these infrastructures are useful and provide you with enough background knowledge to decide which infrastructure is best suited for your research. choose between the different computational infrastructures have a basic understanding of how cluster computing systems work have a basic understanding of how cloud computing works None None

7 # High performance computing in the cloud Time: 12:30 14:30 h Ander Astudillo & Markus van Dijk SURFsara Computing in the cloud allows you flexible and easy access to computing and data resources that you would otherwise have to host yourself. SURFsara runs the HPC Cloud providing an Infrastructure as a Service (IaaS) model (as will be explained in the workshop Introduction to High Performance Computing ). This workshop provides a general introduction to cloud computing, teaches HPC Cloud characteristics and how to use it hands on. At the end of the workshop, your are able to: use the HPC Cloud understand and apply different scaling models for parallel computing build (clusters of) Virtual Machines This workshop assumes familiarity with the Unix command line and SSH (can be learned from Introduction to UNIX, in the first hour of the Programming track). Bring your own laptop with a browser (Chrome or Firefox will do fine) and a SSH client: o Linux and Mac users don t have to install anything; an SSH client is installed. o Windows users: download and install git for windows : for windows.github.io # Cluster computing Time: 15:00 17:00 h Jeroen Engelberts SURFsara In this part of the Compute track, you will learn how the national cluster Lisa and the national supercomputer Cartesius are setup. This presentation will be followed up by a hands on with some small and easy to follow examples on both systems. login to a UNIX cluster prepare, submit and analyze a batch job on the national cluster Lisa / supercomputer Cartesius

8 Some basic knowledge of the UNIX operating system (can be learned from Introduction to UNIX, in the first hour of the Programming track). Bring your laptop. Make sure that you have installed Firefox on your laptop. Please install the FireSSH plugin, which can be obtained free of charge here: US/firefox/addon/firessh/ MISCELLANEOUS TRACK # Good practices for research data sharing and transfer Time : h Niek Bosch SURFsara Paul van DIjk SURFnet Description : You will obtain some basic knowledge of tooling and protocols that will help you to share your data fast, secure and easy! Transferring data to colleagues world wide can sometimes be a real hassle. Whether you are dealing with portable hard disks or transfers via the internet, slow transfer times often occur when using suboptimal protocols or solutions. By following some simple guides and tricks, these problems soon belong to the past. We will also touch topics like legal requirements, encryption of data and the prevention of research data disasters. At the end of the track, you are able to: assess which tools meet with legal guidelines choose the most suitable file systems and protocols to transfer data select suitable data sharing tools and services and learn to know their cons and pros use basic encryption tools like PGP None Software needed: No specific software has to be installed. Bring your own laptop.

9 # Local and remote data visualisation Time: 12:30 14:30 h Paul Melis & Casper van Leeuwen SURFsara Visualisation can play an important role in research, but also in communicating research and results to stakeholders. We will give a practical introduction to both scientific visualisation and information visualisation using open source tools (ParaView, Jupyter, matplotlib). Have an overview of different visualisation domains Create basic scientific visualisations using ParaView Perform basic data visualisation with matplotlib and Jupyter No specific prior knowledge is needed. Bring your own laptop, with a web browser (preferably Chrome). ParaView (version 5.0) needs to installed in advance to work on the exercises. It is available for Windows, MacOS X and Linux. See: # Version control with Git Time: 15:00 17:00 h Carlos Martinez Ortiz Netherlands escience Center Version control is extremely useful to manage collaborative work. Nothing committed to version control (git) will ever be lost, it is always possible to go back in time to see exactly who wrote what on a particular day. When several people collaborate in the same project, git automatically notifies users in case of a conflict between changes made by two people. Lone researchers can also benefit immensely of keeping a record of what was changed, when, and why. Git is extremely useful for all researchers if they ever need to come back to the project later on (e.g., a year later, when memory has faded). set up git for tracking changes in a project use git for working in parallel on the same set of files

10 go back to previous versions of a document. Familiarity with working in the command line (either in Windows, OS X or Linux) is recommended. Installation of git client will be required: scm.com/downloads or for windows.github.io/ Creation of a github account would also be advisable. BIG DATA TRACK # Scalable data analysis with Apache Spark and Hadoop Time: :00 h Mathijs Kattenberg, Jeroen Schot & Machiel Jansen SURFsara You are introduced to the Apache Hadoop and Spark frameworks for processing big data. These frameworks offer a novel way for creating data analysis applications that easily scale over hundreds to thousands of machines. This data parallel approach has been pioneered in industry by tech companies such as Google and Facebook, and is very applicable to many scientific workloads in general. We introduce you to the key concepts and features of the Apache Hadoop and Spark stacks. In addition, you will work on hands on Spark exercises in a Jupyter notebook environment. The presentations, exercises and demos will provide a basic understanding of Hadoop and Spark and teach you about fundamental concepts in big data processing. Understand Spark and Hadoop concepts and fundamentals Understand requirements for scalable applications Run and create basic Spark code in a notebook environment This workshop is for anyone who would like to get started with Apache Spark and Hadoop to build robust and scalable applications. You should be familiar with the basics of programming (preferably Python) and the Unix command line. Most scientific programmers and technically minded researchers will feel right at home. No specific software has to be installed. Bring your own laptop.

SURFsara HPC Cloud Workshop

SURFsara HPC Cloud Workshop SURFsara HPC Cloud Workshop www.cloud.sara.nl Tutorial 2014-06-11 UvA HPC and Big Data Course June 2014 Anatoli Danezi, Markus van Dijk cloud-support@surfsara.nl Agenda Introduction and Overview (current

More information

Unlocking the True Value of Hadoop with Open Data Science

Unlocking the True Value of Hadoop with Open Data Science Unlocking the True Value of Hadoop with Open Data Science Kristopher Overholt Solution Architect Big Data Tech 2016 MinneAnalytics June 7, 2016 Overview Overview of Open Data Science Python and the Big

More information

SURFsara HPC Cloud Workshop

SURFsara HPC Cloud Workshop SURFsara HPC Cloud Workshop doc.hpccloud.surfsara.nl UvA workshop 2016-01-25 UvA HPC Course Jan 2016 Anatoli Danezi, Markus van Dijk cloud-support@surfsara.nl Agenda Introduction and Overview (current

More information

Real-Time Analytics on Large Datasets: Predictive Models for Online Targeted Advertising

Real-Time Analytics on Large Datasets: Predictive Models for Online Targeted Advertising Real-Time Analytics on Large Datasets: Predictive Models for Online Targeted Advertising Open Data Partners and AdReady April 2012 1 Executive Summary AdReady is working to develop and deploy sophisticated

More information

Scaling Out With Apache Spark. DTL Meeting 17-04-2015 Slides based on https://www.sics.se/~amir/files/download/dic/spark.pdf

Scaling Out With Apache Spark. DTL Meeting 17-04-2015 Slides based on https://www.sics.se/~amir/files/download/dic/spark.pdf Scaling Out With Apache Spark DTL Meeting 17-04-2015 Slides based on https://www.sics.se/~amir/files/download/dic/spark.pdf Your hosts Mathijs Kattenberg Technical consultant Jeroen Schot Technical consultant

More information

FREE computing using Amazon EC2

FREE computing using Amazon EC2 FREE computing using Amazon EC2 Seong-Hwan Jun 1 1 Department of Statistics Univ of British Columbia Nov 1st, 2012 / Student seminar Outline Basics of servers Amazon EC2 Setup R on an EC2 instance Stat

More information

SURFsara Data Services

SURFsara Data Services SURFsara Data Services SUPPORTING DATA-INTENSIVE SCIENCES Mark van de Sanden The world of the many Many different users (well organised (international) user communities, research groups, universities,

More information

Assignment # 1 (Cloud Computing Security)

Assignment # 1 (Cloud Computing Security) Assignment # 1 (Cloud Computing Security) Group Members: Abdullah Abid Zeeshan Qaiser M. Umar Hayat Table of Contents Windows Azure Introduction... 4 Windows Azure Services... 4 1. Compute... 4 a) Virtual

More information

Globus Research Data Management: Introduction and Service Overview. Steve Tuecke Vas Vasiliadis

Globus Research Data Management: Introduction and Service Overview. Steve Tuecke Vas Vasiliadis Globus Research Data Management: Introduction and Service Overview Steve Tuecke Vas Vasiliadis Presentations and other useful information available at globus.org/events/xsede15/tutorial 2 Thank you to

More information

Automating Big Data Benchmarking for Different Architectures with ALOJA

Automating Big Data Benchmarking for Different Architectures with ALOJA www.bsc.es Jan 2016 Automating Big Data Benchmarking for Different Architectures with ALOJA Nicolas Poggi, Postdoc Researcher Agenda 1. Intro on Hadoop performance 1. Current scenario and problematic 2.

More information

Lesson 7 - Website Administration

Lesson 7 - Website Administration Lesson 7 - Website Administration If you are hired as a web designer, your client will most likely expect you do more than just create their website. They will expect you to also know how to get their

More information

The Mantid Project. The challenges of delivering flexible HPC for novice end users. Nicholas Draper SOS18

The Mantid Project. The challenges of delivering flexible HPC for novice end users. Nicholas Draper SOS18 The Mantid Project The challenges of delivering flexible HPC for novice end users Nicholas Draper SOS18 What Is Mantid A framework that supports high-performance computing and visualisation of scientific

More information

Microsoft Research Windows Azure for Research Training

Microsoft Research Windows Azure for Research Training Copyright 2013 Microsoft Corporation. All rights reserved. Except where otherwise noted, these materials are licensed under the terms of the Apache License, Version 2.0. You may use it according to the

More information

The full setup includes the server itself, the server control panel, Firebird Database Server, and three sample applications with source code.

The full setup includes the server itself, the server control panel, Firebird Database Server, and three sample applications with source code. Content Introduction... 2 Data Access Server Control Panel... 2 Running the Sample Client Applications... 4 Sample Applications Code... 7 Server Side Objects... 8 Sample Usage of Server Side Objects...

More information

XpoLog Center Suite Log Management & Analysis platform

XpoLog Center Suite Log Management & Analysis platform XpoLog Center Suite Log Management & Analysis platform Summary: 1. End to End data management collects and indexes data in any format from any machine / device in the environment. 2. Logs Monitoring -

More information

Analytic Modeling in Python

Analytic Modeling in Python Analytic Modeling in Python Why Choose Python for Analytic Modeling A White Paper by Visual Numerics August 2009 www.vni.com Analytic Modeling in Python Why Choose Python for Analytic Modeling by Visual

More information

Software Defined Whatever @SURFsara RON TROMPERT

Software Defined Whatever @SURFsara RON TROMPERT Software Defined Whatever @SURFsara RON TROMPERT About SURFsara Supports research in the Netherlands (and abroad) by offering advanced ICT infrastructure, services and expertise National Supercomputer

More information

Week Overview. Installing Linux Linux on your Desktop Virtualization Basic Linux system administration

Week Overview. Installing Linux Linux on your Desktop Virtualization Basic Linux system administration ULI101 Week 06b Week Overview Installing Linux Linux on your Desktop Virtualization Basic Linux system administration Installing Linux Standalone installation Linux is the only OS on the computer Any existing

More information

Microsoft Research Microsoft Azure for Research Training

Microsoft Research Microsoft Azure for Research Training Copyright 2014 Microsoft Corporation. All rights reserved. Except where otherwise noted, these materials are licensed under the terms of the Apache License, Version 2.0. You may use it according to the

More information

Session 85 IF, Predictive Analytics for Actuaries: Free Tools for Life and Health Care Analytics--R and Python: A New Paradigm!

Session 85 IF, Predictive Analytics for Actuaries: Free Tools for Life and Health Care Analytics--R and Python: A New Paradigm! Session 85 IF, Predictive Analytics for Actuaries: Free Tools for Life and Health Care Analytics--R and Python: A New Paradigm! Moderator: David L. Snell, ASA, MAAA Presenters: Brian D. Holland, FSA, MAAA

More information

NaviCell Data Visualization Python API

NaviCell Data Visualization Python API NaviCell Data Visualization Python API Tutorial - Version 1.0 The NaviCell Data Visualization Python API is a Python module that let computational biologists write programs to interact with the molecular

More information

ETHERNET OAM MONITORING

ETHERNET OAM MONITORING ETHERNET OAM MONITORING IN ICINGA AND CACTI Presentation for the edupert Monthly Call Presented by Erik Ruiter SURFsara Science Park 140 1098 XG Amsterdam the Netherlands T +31 (0)20 592 3000 F +31 (0)20

More information

INASP: Effective Network Management Workshops

INASP: Effective Network Management Workshops INASP: Effective Network Management Workshops Linux Familiarization and Commands (Exercises) Based on the materials developed by NSRC for AfNOG 2013, and reused with thanks. Adapted for the INASP Network

More information

3DHOP Local Setup. Lezione 14 Maggio 2015

3DHOP Local Setup. Lezione 14 Maggio 2015 Lezione 14 Maggio 2015 3DHOP what is it? Basically a set of web files :.html (hyper text markup language) The main file, it contains the Web page structure e some basic functions..js (javascript) The brain

More information

Digital Asset Management. Content Control for Valuable Media Assets

Digital Asset Management. Content Control for Valuable Media Assets Digital Asset Management Content Control for Valuable Media Assets Overview Digital asset management is a core infrastructure requirement for media organizations and marketing departments that need to

More information

DATA SCIENCE CURRICULUM WEEK 1 ONLINE PRE-WORK INSTALLING PACKAGES COMMAND LINE CODE EDITOR PYTHON STATISTICS PROJECT O5 PROJECT O3 PROJECT O2

DATA SCIENCE CURRICULUM WEEK 1 ONLINE PRE-WORK INSTALLING PACKAGES COMMAND LINE CODE EDITOR PYTHON STATISTICS PROJECT O5 PROJECT O3 PROJECT O2 DATA SCIENCE CURRICULUM Before class even begins, students start an at-home pre-work phase. When they convene in class, students spend the first eight weeks doing iterative, project-centered skill acquisition.

More information

Analysis Programs DPDAK and DAWN

Analysis Programs DPDAK and DAWN Analysis Programs DPDAK and DAWN An Overview Gero Flucke FS-EC PNI-HDRI Spring Meeting April 13-14, 2015 Outline Introduction Overview of Analysis Programs: DPDAK DAWN Summary Gero Flucke (DESY) Analysis

More information

Web Conferencing Version 8.3 Troubleshooting Guide

Web Conferencing Version 8.3 Troubleshooting Guide System Requirements General Requirements Web Conferencing Version 8.3 Troubleshooting Guide Listed below are the minimum requirements for participants accessing the web conferencing service. Systems which

More information

How to Ingest Data into Google BigQuery using Talend for Big Data. A Technical Solution Paper from Saama Technologies, Inc.

How to Ingest Data into Google BigQuery using Talend for Big Data. A Technical Solution Paper from Saama Technologies, Inc. How to Ingest Data into Google BigQuery using Talend for Big Data A Technical Solution Paper from Saama Technologies, Inc. July 30, 2013 Table of Contents Intended Audience What you will Learn Background

More information

Evaluation of Open Source Data Cleaning Tools: Open Refine and Data Wrangler

Evaluation of Open Source Data Cleaning Tools: Open Refine and Data Wrangler Evaluation of Open Source Data Cleaning Tools: Open Refine and Data Wrangler Per Larsson plarsson@cs.washington.edu June 7, 2013 Abstract This project aims to compare several tools for cleaning and importing

More information

SparkLab May 2015 An Introduction to

SparkLab May 2015 An Introduction to SparkLab May 2015 An Introduction to & Apostolos N. Papadopoulos Assistant Professor Data Engineering Lab, Department of Informatics, Aristotle University of Thessaloniki Abstract Welcome to SparkLab!

More information

Ubuntu Linux Reza Ghaffaripour May 2008

Ubuntu Linux Reza Ghaffaripour May 2008 Ubuntu Linux Reza Ghaffaripour May 2008 Table of Contents What is Ubuntu... 3 How to get Ubuntu... 3 Ubuntu Features... 3 Linux Advantages... 4 Cost... 4 Security... 4 Choice... 4 Software... 4 Hardware...

More information

Grinder in the Cloud. Get Loaded!

Grinder in the Cloud. Get Loaded! Grinder in the Cloud Get Loaded! Contents Contents... 2 Changes... 3 This Document... 3 Intended Audience... 3 Prerequisites... 3 The Solution... 4 Architectural Overview... 4 Benefits... 6 Costs... 6

More information

Must Haves for your Cloud Toolbox Driving DevOps with Crowbar and Dasein

Must Haves for your Cloud Toolbox Driving DevOps with Crowbar and Dasein Must Haves for your Cloud Toolbox Driving DevOps with Crowbar and Dasein Joseph B. George Director, Cloud and Big Data Solutions, Dell Board of Directors, OpenStack Foundation Tim Cook Senior Virtualization

More information

Big Data Paradigms in Python

Big Data Paradigms in Python Big Data Paradigms in Python San Diego Data Science and R Users Group January 2014 Kevin Davenport! http://kldavenport.com kldavenportjr@gmail.com @KevinLDavenport Thank you to our sponsors: Setting up

More information

Corso di Reti di Calcolatori L-A. Cloud Computing

Corso di Reti di Calcolatori L-A. Cloud Computing Università degli Studi di Bologna Facoltà di Ingegneria Corso di Reti di Calcolatori L-A Cloud Computing Antonio Corradi Luca Foschini Some Clouds 1 What is Cloud computing? The architecture and terminology

More information

Implementing Microsoft Azure Infrastructure Solutions 20533B; 5 Days, Instructor-led

Implementing Microsoft Azure Infrastructure Solutions 20533B; 5 Days, Instructor-led Implementing Microsoft Azure Infrastructure Solutions 20533B; 5 Days, Instructor-led Course Description This course is aimed at experienced IT Professionals who currently administer their on-premises infrastructure.

More information

WHITE PAPER. ClusterWorX 2.1 from Linux NetworX. Cluster Management Solution C ONTENTS INTRODUCTION

WHITE PAPER. ClusterWorX 2.1 from Linux NetworX. Cluster Management Solution C ONTENTS INTRODUCTION WHITE PAPER A PRIL 2002 C ONTENTS Introduction 1 Overview 2 Features 2 Architecture 3 Monitoring 4 ICE Box 4 Events 5 Plug-ins 6 Image Manager 7 Benchmarks 8 ClusterWorX Lite 8 Cluster Management Solution

More information

Microsoft Dynamics CRM 2013 Applications Introduction Training Material Version 2.0

Microsoft Dynamics CRM 2013 Applications Introduction Training Material Version 2.0 Microsoft Dynamics CRM 2013 Applications Introduction Training Material Version 2.0 www.firebrandtraining.com Course content Module 0 Course Content and Plan... 4 Objectives... 4 Course Plan... 4 Course

More information

Course 20533B: Implementing Microsoft Azure Infrastructure Solutions

Course 20533B: Implementing Microsoft Azure Infrastructure Solutions Course 20533B: Implementing Microsoft Azure Infrastructure Solutions Sales 406/256-5700 Support 406/252-4959 Fax 406/256-0201 Evergreen Center North 1501 14 th St West, Suite 201 Billings, MT 59102 Course

More information

Cloud Computing an introduction

Cloud Computing an introduction Prof. Dr. Claudia Müller-Birn Institute for Computer Science, Networked Information Systems Cloud Computing an introduction January 30, 2012 Netzprogrammierung (Algorithmen und Programmierung V) Our topics

More information

Zend Server Amazon AMI Quick Start Guide

Zend Server Amazon AMI Quick Start Guide Zend Server Amazon AMI Quick Start Guide By Zend Technologies www.zend.com Disclaimer This is the Quick Start Guide for The Zend Server Zend Server Amazon Machine Image The information in this document

More information

Python for Data Analysis and Visualiza4on. Fang (Cherry) Liu, Ph.D fang.liu@oit.gatech.edu PACE Gatech July 2013

Python for Data Analysis and Visualiza4on. Fang (Cherry) Liu, Ph.D fang.liu@oit.gatech.edu PACE Gatech July 2013 Python for Data Analysis and Visualiza4on Fang (Cherry) Liu, Ph.D PACE Gatech July 2013 Outline System requirements and IPython Why use python for data analysis and visula4on Data set US baby names 1880-2012

More information

Overview. Timeline Cloud Features and Technology

Overview. Timeline Cloud Features and Technology Overview Timeline Cloud is a backup software that creates continuous real time backups of your system and data to provide your company with a scalable, reliable and secure backup solution. Storage servers

More information

Big Data and Cloud Computing for GHRSST

Big Data and Cloud Computing for GHRSST Big Data and Cloud Computing for GHRSST Jean-Francois Piollé (jfpiolle@ifremer.fr) Frédéric Paul, Olivier Archer CERSAT / Institut Français de Recherche pour l Exploitation de la Mer Facing data deluge

More information

CONNECTING TO DEPARTMENT OF COMPUTER SCIENCE SERVERS BOTH FROM ON AND OFF CAMPUS USING TUNNELING, PuTTY, AND VNC Client Utilities

CONNECTING TO DEPARTMENT OF COMPUTER SCIENCE SERVERS BOTH FROM ON AND OFF CAMPUS USING TUNNELING, PuTTY, AND VNC Client Utilities CONNECTING TO DEPARTMENT OF COMPUTER SCIENCE SERVERS BOTH FROM ON AND OFF CAMPUS USING TUNNELING, PuTTY, AND VNC Client Utilities DNS name: turing.cs.montclair.edu -This server is the Departmental Server

More information

How to set up SQL Source Control. The short guide for evaluators

How to set up SQL Source Control. The short guide for evaluators How to set up SQL Source Control The short guide for evaluators Content Introduction Team Foundation Server & Subversion setup Git setup Setup without a source control system Making your first commit Committing

More information

An Introduction to Using Python with Microsoft Azure

An Introduction to Using Python with Microsoft Azure An Introduction to Using Python with Microsoft Azure If you build technical and scientific applications, you're probably familiar with Python. What you might not know is that there are now tools available

More information

Selenium An Effective Weapon In The Open Source Armory

Selenium An Effective Weapon In The Open Source Armory Selenium An Effective Weapon In The Open Source Armory Komal Joshi Director: Atlantis Software Limited Anand Ramdeo Head of Quality Assurance: GCAP Media Agenda Introduction to Selenium Selenium IDE Lets

More information

Classroom Demonstrations of Big Data

Classroom Demonstrations of Big Data Classroom Demonstrations of Big Data Eric A. Suess Abstract We present examples of accessing and analyzing large data sets for use in a classroom at the first year graduate level or senior undergraduate

More information

Data processing goes big

Data processing goes big Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,

More information

Workshop on Hadoop with Big Data

Workshop on Hadoop with Big Data Workshop on Hadoop with Big Data Hadoop? Apache Hadoop is an open source framework for distributed storage and processing of large sets of data on commodity hardware. Hadoop enables businesses to quickly

More information

Improved metrics collection and correlation for the CERN cloud storage test framework

Improved metrics collection and correlation for the CERN cloud storage test framework Improved metrics collection and correlation for the CERN cloud storage test framework September 2013 Author: Carolina Lindqvist Supervisors: Maitane Zotes Seppo Heikkila CERN openlab Summer Student Report

More information

Data-Intensive Programming. Timo Aaltonen Department of Pervasive Computing

Data-Intensive Programming. Timo Aaltonen Department of Pervasive Computing Data-Intensive Programming Timo Aaltonen Department of Pervasive Computing Data-Intensive Programming Lecturer: Timo Aaltonen University Lecturer timo.aaltonen@tut.fi Assistants: Henri Terho and Antti

More information

Databricks. A Primer

Databricks. A Primer Databricks A Primer Who is Databricks? Databricks vision is to empower anyone to easily build and deploy advanced analytics solutions. The company was founded by the team who created Apache Spark, a powerful

More information

Pragmatic Version Control

Pragmatic Version Control Extracted from: Pragmatic Version Control using Subversion, 2nd Edition This PDF file contains pages extracted from Pragmatic Version Control, one of the Pragmatic Starter Kit series of books for project

More information

Scyld Cloud Manager User Guide

Scyld Cloud Manager User Guide Scyld Cloud Manager User Guide Preface This guide describes how to use the Scyld Cloud Manager (SCM) web portal application. Contacting Penguin Computing 45800 Northport Loop West Fremont, CA 94538 1-888-PENGUIN

More information

Apache HBase. Crazy dances on the elephant back

Apache HBase. Crazy dances on the elephant back Apache HBase Crazy dances on the elephant back Roman Nikitchenko, 16.10.2014 YARN 2 FIRST EVER DATA OS 10.000 nodes computer Recent technology changes are focused on higher scale. Better resource usage

More information

Revit products will use multiple cores for many tasks, using up to 16 cores for nearphotorealistic

Revit products will use multiple cores for many tasks, using up to 16 cores for nearphotorealistic Autodesk Revit 2013 Product Line System s and Recommendations Autodesk Revit Architecture 2013 Autodesk Revit MEP 2013 Autodesk Revit Structure 2013 Autodesk Revit 2013 Minimum: Entry-Level Configuration

More information

User Guide FOR TOSHIBA STORAGE PLACE

User Guide FOR TOSHIBA STORAGE PLACE User Guide FOR TOSHIBA STORAGE PLACE (This page left blank for 2-sided "book" printing.) Table of Contents Overview... 5 System Requirements... 5 Storage Place Interfaces... 5 Getting Started... 6 Using

More information

Databricks. A Primer

Databricks. A Primer Databricks A Primer Who is Databricks? Databricks was founded by the team behind Apache Spark, the most active open source project in the big data ecosystem today. Our mission at Databricks is to dramatically

More information

HPC Wales Skills Academy Course Catalogue 2015

HPC Wales Skills Academy Course Catalogue 2015 HPC Wales Skills Academy Course Catalogue 2015 Overview The HPC Wales Skills Academy provides a variety of courses and workshops aimed at building skills in High Performance Computing (HPC). Our courses

More information

Chapter 1 Basic Introduction to Computers. Discovering Computers 2012. Your Interactive Guide to the Digital World

Chapter 1 Basic Introduction to Computers. Discovering Computers 2012. Your Interactive Guide to the Digital World Chapter 1 Basic Introduction to Computers Discovering Computers 2012 Your Interactive Guide to the Digital World Objectives Overview Explain why computer literacy is vital to success in today s world Define

More information

RTI Quick Start Guide for JBoss Operations Network Users

RTI Quick Start Guide for JBoss Operations Network Users RTI Quick Start Guide for JBoss Operations Network Users This is the RTI Quick Start guide for JBoss Operations Network Users. It will help you get RTI installed and collecting data on your application

More information

Software Automated Testing

Software Automated Testing Software Automated Testing Keyword Data Driven Framework Selenium Robot Best Practices Agenda ² Automation Engineering Introduction ² Keyword Data Driven ² How to build a Test Automa7on Framework ² Selenium

More information

European Data Infrastructure - EUDAT Data Services & Tools

European Data Infrastructure - EUDAT Data Services & Tools European Data Infrastructure - EUDAT Data Services & Tools Dr. Ing. Morris Riedel Research Group Leader, Juelich Supercomputing Centre Adjunct Associated Professor, University of iceland BDEC2015, 2015-01-28

More information

Tables in the Cloud. By Larry Ng

Tables in the Cloud. By Larry Ng Tables in the Cloud By Larry Ng The Idea There has been much discussion about Big Data and the associated intricacies of how it can be mined, organized, stored, analyzed and visualized with the latest

More information

Source Code Management for Continuous Integration and Deployment. Version 1.0 DO NOT DISTRIBUTE

Source Code Management for Continuous Integration and Deployment. Version 1.0 DO NOT DISTRIBUTE Source Code Management for Continuous Integration and Deployment Version 1.0 Copyright 2013, 2014 Amazon Web Services, Inc. and its affiliates. All rights reserved. This work may not be reproduced or redistributed,

More information

Web Class Configuration and Test Guide

Web Class Configuration and Test Guide Web Class Configuration and Test Guide Web class visual material is accessed via your web browser via the URL provided for each web class. The new Engage web class system supports most operating systems:

More information

CloudCIX Bootcamp. The essential IaaS getting started guide. http://www.cix.ie

CloudCIX Bootcamp. The essential IaaS getting started guide. http://www.cix.ie The essential IaaS getting started guide. http://www.cix.ie Revision Date: 17 th August 2015 Contents Acronyms... 2 Table of Figures... 3 1 Welcome... 4 2 Architecture... 5 3 Getting Started... 6 3.1 Login

More information

The big data revolution

The big data revolution The big data revolution Friso van Vollenhoven (Xebia) Enterprise NoSQL Recently, there has been a lot of buzz about the NoSQL movement, a collection of related technologies mostly concerned with storing

More information

LSKA 2010 Survey Report Job Scheduler

LSKA 2010 Survey Report Job Scheduler LSKA 2010 Survey Report Job Scheduler Graduate Institute of Communication Engineering {r98942067, r98942112}@ntu.edu.tw March 31, 2010 1. Motivation Recently, the computing becomes much more complex. However,

More information

DevOps Course Content

DevOps Course Content DevOps Course Content INTRODUCTION TO DEVOPS What is DevOps? History of DevOps Dev and Ops DevOps definitions DevOps and Software Development Life Cycle DevOps main objectives Infrastructure As A Code

More information

DocDokuPLM Innovative PLM solution

DocDokuPLM Innovative PLM solution PLM DocDokuPLM Innovative PLM solution DocDokuPLM: a business solution Manage the entire lifecycle of your products from ideas to market and setup your information backbone. DocDokuPLM highlights Anywhere

More information

APP DEV. We build your ideas into web and mobile applications. steicho. Technological Solutions

APP DEV. We build your ideas into web and mobile applications. steicho. Technological Solutions We build your ideas into web and mobile applications. steicho Technological Solutions Automate your processes, through a commercial custom made application We offer software solutions to automate, streamline,

More information

PROGRAMMING FOR BIOLOGISTS. BIOL 6297 Monday, Wednesday 10 am -12 pm

PROGRAMMING FOR BIOLOGISTS. BIOL 6297 Monday, Wednesday 10 am -12 pm PROGRAMMING FOR BIOLOGISTS BIOL 6297 Monday, Wednesday 10 am -12 pm Tomorrow is Ada Lovelace Day Ada Lovelace was the first person to write a computer program Today s Lecture Overview of the course Philosophy

More information

Tips for getting started! with! Virtual Data Center!

Tips for getting started! with! Virtual Data Center! Tips for getting started with Virtual Data Center Last Updated: 1 July 2014 Table of Contents Safe Swiss Cloud Self Service Control Panel 2 Please note the following about for demo accounts: 2 Add an Instance

More information

Scientific Programming, Analysis, and Visualization with Python. Mteor 227 Fall 2015

Scientific Programming, Analysis, and Visualization with Python. Mteor 227 Fall 2015 Scientific Programming, Analysis, and Visualization with Python Mteor 227 Fall 2015 Python The Big Picture Interpreted General purpose, high-level Dynamically type Multi-paradigm Object-oriented Functional

More information

Building a Continuous Integration Pipeline with Docker

Building a Continuous Integration Pipeline with Docker Building a Continuous Integration Pipeline with Docker August 2015 Table of Contents Overview 3 Architectural Overview and Required Components 3 Architectural Components 3 Workflow 4 Environment Prerequisites

More information

Deployment Guide: Unidesk and Hyper- V

Deployment Guide: Unidesk and Hyper- V TECHNICAL WHITE PAPER Deployment Guide: Unidesk and Hyper- V This document provides a high level overview of Unidesk 3.x and Remote Desktop Services. It covers how Unidesk works, an architectural overview

More information

ABOUT TOOLS4EVER ABOUT DELOITTE RISK SERVICES

ABOUT TOOLS4EVER ABOUT DELOITTE RISK SERVICES CONTENTS About Tools4ever... 3 About Deloitte Risk Services... 3 HelloID... 4 Microsoft Azure... 5 HelloID Security Architecture... 6 Scenarios... 8 SAML Identity Provider (IDP)... 8 Service Provider SAML

More information

SSH Connections MACs the MAC XTerm application can be used to create an ssh connection, no utility is needed.

SSH Connections MACs the MAC XTerm application can be used to create an ssh connection, no utility is needed. Overview of MSU Compute Servers The DECS Linux based compute servers are well suited for programs that are too slow to run on typical desktop computers but do not require the power of supercomputers. The

More information

Autodesk Revit 2016 Product Line System Requirements and Recommendations

Autodesk Revit 2016 Product Line System Requirements and Recommendations Autodesk Revit 2016 Product Line System Requirements and Recommendations Autodesk Revit 2016, Autodesk Revit Architecture 2016, Autodesk Revit MEP 2016, Autodesk Revit Structure 2016 Minimum: Entry-Level

More information

Connecting to the School of Computing Servers and Transferring Files

Connecting to the School of Computing Servers and Transferring Files Connecting to the School of Computing Servers and Transferring Files Connecting This document will provide instructions on how to connect to the School of Computing s server. Connect Using a Mac or Linux

More information

Pearson Onscreen Platform (POP) Using POP Offline testing system guide

Pearson Onscreen Platform (POP) Using POP Offline testing system guide Pearson Onscreen Platform (POP) Version 1.0 October 2014 02 What s in this guide? Contents 1 Before you start 2 Download a test 3 Play test 4 Upload response Read more Read more Read more Read more 03

More information

Web Hosting. E-Mail Hosting. Cloud File Hosting. The Genio Group (214) 732-7411 info@thegeniogroup.com www.thegeniogroup.com

Web Hosting. E-Mail Hosting. Cloud File Hosting. The Genio Group (214) 732-7411 info@thegeniogroup.com www.thegeniogroup.com Web Hosting E-Mail Hosting Cloud File Hosting Genio Hosting Servers All of Genio s Hosting Servers run on Apple hardware running Mac OS X Server. Mac OS X Server leverages the computing power of 64-bit

More information

How To Use Senior Systems Cloud Services

How To Use Senior Systems Cloud Services Senior Systems Cloud Services In this guide... Senior Systems Cloud Services 1 Cloud Services User Guide 2 Working In Your Cloud Environment 3 Cloud Profile Management Tool 6 How To Save Files 8 How To

More information

Course 20533: Implementing Microsoft Azure Infrastructure Solutions

Course 20533: Implementing Microsoft Azure Infrastructure Solutions Course 20533: Implementing Microsoft Azure Infrastructure Solutions Overview About this course This course is aimed at experienced IT Professionals who currently administer their on-premises infrastructure.

More information

Enhanced Research Data Management and Publication with Globus

Enhanced Research Data Management and Publication with Globus Enhanced Research Data Management and Publication with Globus Vas Vasiliadis Jim Pruyne Presented at OR2015 June 8, 2015 Presentations and other useful information available at globus.org/events/or2015/tutorial

More information

C T D W C O N F E R E N C E J U N E 1 7, 1 8 2 0 1 4 C O L L I E R A N D C L A Y S T E V E N S 1

C T D W C O N F E R E N C E J U N E 1 7, 1 8 2 0 1 4 C O L L I E R A N D C L A Y S T E V E N S 1 C O L L I E R A N D C L A Y S T E V E N S 1 CHROMEBOOK C O L L I E R A N D C L A Y S T E V E N S 2 Overview Constant internet connection Synced to the cloud server so everything you do is automatically

More information

SSH to BeagleBone Black over USB

SSH to BeagleBone Black over USB SSH to BeagleBone Black over USB Created by Simon Monk Last updated on 2015-06-01 12:50:09 PM EDT Guide Contents Guide Contents Overview You Will Need Preparation Installing Drivers (Windows) Installing

More information

VCL Access. VCL provides access to Linux and Windows 7 Virtual Machines. Users will only see those images that they are authorized to access.

VCL Access. VCL provides access to Linux and Windows 7 Virtual Machines. Users will only see those images that they are authorized to access. What is VCL? VCL (Virtual Computer Lab) is a service running on servers in IIT s datacenter that enables users to schedule and connect to virtual desktops running specific academic software applications

More information

A Sales Strategy to Increase Function Bookings

A Sales Strategy to Increase Function Bookings A Sales Strategy to Increase Function Bookings It s Time to Start Selling Again! It s time to take on a sales oriented focus for the bowling business. Why? Most bowling centres have lost the art and the

More information

Tableau Online. Understanding Data Updates

Tableau Online. Understanding Data Updates Tableau Online Understanding Data Updates Author: Francois Ajenstat July 2013 p2 Whether your data is in an on-premise database, a database, a data warehouse, a cloud application or an Excel file, you

More information

Virtual Machines and Cloud Cluster. Dan Thanh Ton University of Colorado Denver 2010 SIParCS Internship Mentor: Irfan Elahi

Virtual Machines and Cloud Cluster. Dan Thanh Ton University of Colorado Denver 2010 SIParCS Internship Mentor: Irfan Elahi Virtual Machines and Cloud Cluster Dan Thanh Ton University of Colorado Denver 2010 SIParCS Internship Mentor: Irfan Elahi Overview Installed two operadng systems on one computer Installed two virtual

More information

An Introduction to High Performance Computing in the Department

An Introduction to High Performance Computing in the Department An Introduction to High Performance Computing in the Department Ashley Ford & Chris Jewell Department of Statistics University of Warwick October 30, 2012 1 Some Background 2 How is Buster used? 3 Software

More information

APACHE WEB SERVER. Andri Mirzal, PhD N28-439-03

APACHE WEB SERVER. Andri Mirzal, PhD N28-439-03 APACHE WEB SERVER Andri Mirzal, PhD N28-439-03 Introduction The Apache is an open source web server software program notable for playing a key role in the initial growth of the World Wide Web Typically

More information

Open Cloud System. (Integration of Eucalyptus, Hadoop and AppScale into deployment of University Private Cloud)

Open Cloud System. (Integration of Eucalyptus, Hadoop and AppScale into deployment of University Private Cloud) Open Cloud System (Integration of Eucalyptus, Hadoop and into deployment of University Private Cloud) Thinn Thu Naing University of Computer Studies, Yangon 25 th October 2011 Open Cloud System University

More information

VMware vsphere Data Protection 6.1

VMware vsphere Data Protection 6.1 VMware vsphere Data Protection 6.1 Technical Overview Revised August 10, 2015 Contents Introduction... 3 Architecture... 3 Deployment and Configuration... 5 Backup... 6 Application Backup... 6 Backup Data

More information