cloud-kepler Documentation



Similar documents
IDS 561 Big data analytics Assignment 1

Hadoop Installation MapReduce Examples Jake Karnes

Hadoop Tutorial. General Instructions

NIST/ITL CSD Biometric Conformance Test Software on Apache Hadoop. September National Institute of Standards and Technology (NIST)

Hadoop (pseudo-distributed) installation and configuration

Cassandra Installation over Ubuntu 1. Installing VMware player:

Basic Hadoop Programming Skills

Cloudera Distributed Hadoop (CDH) Installation and Configuration on Virtual Box

Kognitio Technote Kognitio v8.x Hadoop Connector Setup

Quick Deployment Step-by-step instructions to deploy Oracle Big Data Lite Virtual Machine

CDH installation & Application Test Report

ECT362 Installing Linux Virtual Machine in KL322

Using VirtualBox ACHOTL1 Virtual Machines

Accessing RCS IBM Console in Windows Using Linux Virtual Machine

UBUNTU VIRTUAL MACHINE + CAFFE MACHINE LEARNING LIBRARY SETUP TUTORIAL

Running Knn Spark on EC2 Documentation

Hadoop Training Hands On Exercise

The objective of this lab is to learn how to set up an environment for running distributed Hadoop applications.

Tutorial- Counting Words in File(s) using MapReduce

Installing Proview on an Windows XP machine

Linux Development Environment Description Based on VirtualBox Structure

Student installation of TinyOS

1. Downloading. 2. Installation and License Acquiring. Xilinx ISE Webpack + Project Setup Instructions

Installing an open source version of MateCat

USING HDFS ON DISCOVERY CLUSTER TWO EXAMPLES - test1 and test2

研 發 專 案 原 始 程 式 碼 安 裝 及 操 作 手 冊. Version 0.1

Virtual Machine (VM) For Hadoop Training

Hadoop Data Warehouse Manual

LSN 10 Linux Overview

Partek Flow Installation Guide

CycleServer Grid Engine Support Install Guide. version 1.25

CDH 5 Quick Start Guide

Single Node Hadoop Cluster Setup

Installing Sun's VirtualBox on Windows XP and setting up an Ubuntu VM

How To Install Hadoop From Apa Hadoop To (Hadoop)

Revolution R Enterprise 7 Hadoop Configuration Guide

A Study of Data Management Technology for Handling Big Data

CS380 Final Project Evaluating the Scalability of Hadoop in a Real and Virtual Environment

Setup Hadoop On Ubuntu Linux. ---Multi-Node Cluster

How to Create, Setup, and Configure an Ubuntu Router with a Transparent Proxy.

From Relational to Hadoop Part 1: Introduction to Hadoop. Gwen Shapira, Cloudera and Danil Zburivsky, Pythian

User Manual - Help Utility Download MMPCT. (Mission Mode Project Commercial Taxes) User Manual Help-Utility

To reduce or not to reduce, that is the question

How To Write A Mapreduce Program On An Ipad Or Ipad (For Free)

Using BAC Hadoop Cluster

The BackTrack Successor

The VHD is separated into a series of WinRar files; they can be downloaded from the following page:

Introduction to Running Hadoop on the High Performance Clusters at the Center for Computational Research

AlienVault Unified Security Management (USM) 4.x-5.x. Deploying HIDS Agents to Linux Hosts

CPE111 COMPUTER EXPLORATION

DraganFly Guardian: API Instillation Instructions

Hadoop Lab Notes. Nicola Tonellotto November 15, 2010

Witango Application Server 6. Installation Guide for OS X

Server & Workstation Installation of Client Profiles for Windows

Hadoop Basics with InfoSphere BigInsights

Local Caching Servers (LCS): User Manual

TP1: Getting Started with Hadoop

Creating a DUO MFA Service in AWS

Apache Hadoop 2.0 Installation and Single Node Cluster Configuration on Ubuntu A guide to install and setup Single-Node Apache Hadoop 2.

USB HSPA Modem. User Manual

Running and Scheduling QGIS Processing Jobs

Active Directory Integration for Greentree

A SHORT INTRODUCTION TO DUPLICITY WITH CLOUD OBJECT STORAGE. Version

Lecture 2 (08/31, 09/02, 09/09): Hadoop. Decisions, Operations & Information Technologies Robert H. Smith School of Business Fall, 2015

OpenGeo Suite for Linux Release 3.0

CS2510 Computer Operating Systems Hadoop Examples Guide

Secure Browser Installation Manual

Extreme computing lab exercises Session one

The Tor VM Project. Installing the Build Environment & Building Tor VM. Copyright The Tor Project, Inc. Authors: Martin Peck and Kyle Williams

WA1826 Designing Cloud Computing Solutions. Classroom Setup Guide. Web Age Solutions Inc. Copyright Web Age Solutions Inc. 1

Quick Deployment: Step-by-step instructions to deploy the SampleApp Virtual Machine v406

HADOOP. Installation and Deployment of a Single Node on a Linux System. Presented by: Liv Nguekap And Garrett Poppe

INSTALL ZENTYAL SERVER

Hadoop Tutorial Group 7 - Tools For Big Data Indian Institute of Technology Bombay

1. Product Information

Call Recorder Quick CD Access System

Online Backup Client User Manual Linux

2. Installation Instructions - Windows (Download)

Tutorial. Reference for more thorough Mininet walkthrough if desired

JAMF Software Server Installation Guide for Linux. Version 8.6

Hadoop Hands-On Exercises

Server Installation/Upgrade Guide

MapReduce, Hadoop and Amazon AWS

TCB No September Technical Bulletin. GS FLX+ System & GS FLX System. Installation of 454 Sequencing System Software v2.

Introduction. Installation of SE S AM E BARCODE virtual machine distribution. (Windows / Mac / Linux)

ULTEO OPEN VIRTUAL DESKTOP V4.0

RecoveryVault Express Client User Manual

Introduction to Operating Systems

Installing Java. Table of contents

CISE Research Infrastructure: Mid-Scale Infrastructure - NSFCloud (CRI: NSFCloud)

SAS Marketing Automation 4.4. Unix Install Instructions for Hot Fix 44MA10

SOS SO S O n O lin n e lin e Bac Ba kup cku ck p u USER MANUAL

Table of Content. Official website:

AzMERIT Secure Browser Installation Manual For Technology Coordinators

Bringing the Eko VM Home (302)

TSM for Windows Installation Instructions: Download the latest TSM Client Using the following link:

Spectrum Spatial Analyst Version 4.0. Installation Guide for Linux. Contents:

How to Install and Configure EBF15328 for MapR or with MapReduce v1

Transcription:

cloud-kepler Documentation Release 1.2 Scott Fleming, Andrea Zonca, Jack Flowers, Peter McCullough, El July 31, 2014

Contents 1 System configuration 3 1.1 Python and Virtualenv setup....................................... 3 1.2 Hadoop setup............................................... 3 1.3 Lein setup................................................ 5 1.4 LEMUR setup.............................................. 5 1.5 References................................................ 6 2 Quickstart Guide 7 2.1 Specifying the data to download..................................... 7 2.2 Configuration file options........................................ 7 3 Retrieving and downloading data 9 3.1 get_data Get data from MAST or hard disk............................ 9 3.2 join_quarters Stitch multiple quarters of data together..................... 9 4 BLS pulse algorithm 11 4.1 drive_bls_pulse Driver interface to BLS pulse......................... 11 4.2 bls_pulse_python Naive pure Python implementation..................... 11 4.3 bls_pulse_vec Vectorized Python implementation........................ 11 4.4 bls_pulse_cython Optimized Cython implementation..................... 11 5 detrend Detrend lightcurve data 13 6 clean_signal Signal cleaning (removal of strong periodic signals) 15 7 postprocessing Analyze output from BLS pulse 17 8 utils Utility functions 19 Python Module Index 21 i

ii

cloud-kepler Documentation, Release 1.2 cloud-kepler is a cloud-enabled Kepler planet searching pipeline. Contents: Contents 1

cloud-kepler Documentation, Release 1.2 2 Contents

CHAPTER 1 System configuration 1.1 Python and Virtualenv setup To set up Python and Virtualenv, run the following commands from a terminal: cd ~/temp curl -L -o virtualenv.py https://raw.github.com/pypa/virtualenv/master/virtualenv.py python virtualenv.py cloud-kepler --no-site-packages. cloud-kepler/bin/activate pip install numpy pip install simplejson pip install pyfits Test that the basic python code is working: cat {DIRECTORY_WITH_CLOUD_KEPLER}/test/test_q1.txt python {DIRECTORY_WITH_CLOUD_KEPLER}/python/down If it starts downloading and spewing base64 encoded numpy arrays, then you re good. 1.2 Hadoop setup Install Oracle VM VirtualBox 4.2.14 from VirtualBox-4.2.14-86644-win from https://www.virtualbox.org/wiki/downloads Extract cloudera-quickstart-demo-vm-4.3.0-virtualbox.tar.gz from https://ccp.cloudera.com/display/support/cloudera+quickstart+v Enter the created folder and extract cloudera-quickstart-demo-vm-4.3.0-virtualbox.tar, you should end up with cloudera-quickstart-demo-vm.ovf and cloudera-quickstart-demo-vm.vmdk in whatever folder you extracted to Open up Oracle VM VirtualBox Manager Select the New icon, the Create Virtual Machine window boots up. For operating system, select Linux and Ubuntu For memory size, select 4096 MB For Hard Drive, select Use an existing virtual hard drive and path to cloudera-quickstart-demo-vm.vmdk Press Create. Virtual machine now selectable in the main window on virtualbox manager. Press the Settings button, opens the settings window. Choose the system tab 3

cloud-kepler Documentation, Release 1.2 Change chipset to ICH9, make sure Enable IO APIC is checked. Select it and pressed Start, boot begins, this part takes a little while. If it gets stuck on any one step for more than 20 minutes, you can assume something is wrong. Eventually the boot sequence will end and you will see a desktop in your virtual machine. Success! 1.2.1 WordCount Example Note that this assumes a cloudera vm distribution of hadoop. Inside your virtual machine, go to the Cloudera Hadoop Tutorial at http://www.cloudera.com/content/clouderacontent/cloudera-docs/hadooptutorial/cdh4/hadoop-tutorial/ht_topic_5_1.html Copy the source code for WordCount and past it into the gedit text editor. Save as WordCount.java in the cloudera s home folder. Per the instructions there, open terminal, cd to the home directory, then run as follows: mkdir wordcount_classes javac -cp /usr/lib/hadoop/*:/usr/lib/hadoop/client-0.20/* -d wordcount_classes WordCount.java Right click on the wordcount_classes folder you made (it will be in the home directory) and select compress. Choose.jar as the file format and wordcount as filename: echo "Hello World Bye World" > file0 echo "Hello Hadoop Goodbye Hadoop" > file1 hadoop fs -mkdir /user/cloudera /user/cloudera/wordcount /user/cloudera/wordcount/input hadoop fs -put file* /user/cloudera/wordcount/input hadoop jar wordcount.jar org.myorg.wordcount /user/cloudera/wordcount/input output According to the Cloudera Tutoria, this should be all you need to do, but I got an error message here, so everything is not quite right yet. When you first log onto the virtual machine, it should begin with a firefox window open to some kind of cloudera page. Go to this and click the Cloudera Manager link. Enter admin and admin as a username and password to access it. Now you can see the health of your setup s various components. mapreduce1 will probably be listed as in poor health. click on it You should see that the jobtracker is the problem. Return to terminal: sudo -u hdfs hadoop fs -mkdir /tmp/mapred/system sudo -u hdfs hadoop fs -chown mapred:hadoop /tmp/mapred/system Then restart jobtracker by clicking instances the instances tab, clicking on jobtracker, clicking to the processes tab, selecting the actions tab in the corner, and selecting restart: hadoop jar wordcount.jar org.myorg.wordcount /user/cloudera/wordcount/input output This time it should work: hadoop fs -cat output/part-00000 This will open up the output folder for you from the hadoop run. It should look like this: 4 Chapter 1. System configuration

cloud-kepler Documentation, Release 1.2 Bye 1 Goodbye 1 Hadoop 2 Hello 2 World 2 If it looks like that then you are good. It is worth noting that Hadoop won t work unless the directory you set as your output both does not currently exist and is in your hadoop fs home directory. 1.3 Lein setup Note that this assumes a cloudera vm distribution of hadoop. You can find Lein at https://github.com/technomancy/leiningen Download the script from https://raw.github.com/technomancy/leiningen/stable/bin/lein and place it wherever you want: export $HOME=/home cd cd.. cd etc/profile.d sudo vim lein.sh On one line of the file write export PATH=$PATH:{wherever your lein file is located} (in my case /home/cloudera/desktop) Save the file and exit. Exit and reenter terminal to get back to you home directory: chmod 755 {location of lein} Lein should now be functioning, call lein in terminal to test. 1.4 LEMUR setup Note that this assumes a cloudera vm distribution of hadoop. Lemur can be downloaded from http://download.climate.com/lemur/releases/lemur-1.3.1.tgz. follow that link and the file should appear in your download folder. Extract it, and then put it wherever you want it to be: export $HOME=/home cd cd.. cd etc/profile.d sudo vim lemur.sh You are now writing a file which will allow your system to recognize lemur. on the first line of the file write export LEMUR_HOME={wherever you saved your lemur file} (in my case /home/cloudera/desktop/lemur). on the second line of the file write export LEMUR_AWS_ACCESS_KEY={your aws access key} 1.3. Lein setup 5

cloud-kepler Documentation, Release 1.2 on the third line of the file write export LEMUR_AWS_SECRET_KEY={your aws secret key} on the fourth line of the file write export PATH=$PATH:$LEMUR_HOME/bin save the file and exit. Lemur should now work, call lemur in terminal to test. 1.5 References Koch, D.G., Borucki, W.J., Basri, G., et al. 2010, The Astrophysical Journal Letters, 713, L79 10.1088/2041-8205/713/2/L79 Kovacs, G., Zucker, S., & Mazeh, T. 2002, Astronomy & Astrophysics, 391, 369 10.1051/0004-6361:20020802 Still, M., & Barclay, T. 2012, Astrophysics Source Code Library, 8004 LEMUR launcher, Limote M. et al. 2012 The Climate Corporation 6 Chapter 1. System configuration

CHAPTER 2 Quickstart Guide A normal run of cloud-kepler can be started by: more input.txt python get_data.py mast python join_quarters.py python drive_bls_pulse.py -c con This sequence downloads all data from MAST and runs it through the algorithm with the parameters in a configuration file. 2.1 Specifying the data to download The input file (or lines typed directly to stdin) should include the KIC ID, quarter number, and cadence identifier on each line, such as: 011013072 1 llc 011013072 2 slc 011600006 * llc The special quarter identifier * will download all available quarters for the given KIC ID. slc indicates short-cadence data and llc indicates long-cadence data. The Python script get_data.py also accepts the keyword data followed by an absolute or relative filepath of a top-level data directory, with the same structure as the Kepler archive on MAST; use this option instead of mast if your data is stored locally. 2.2 Configuration file options There are several options that can be specified in a configuration file; the same options can be specified via command line options, but they will be overriden by the file if it is provided (with the -c flag). A standard configuration file looks like: [DEFAULT] segment = 2 min_duration = 0.01 max_duration = 0.5 n_bins = 1000 direction = 0 mode = cython print_format = encode verbose = no profiling = off 7

cloud-kepler Documentation, Release 1.2 Additional options will be added as needed, such as for detrending flags. 8 Chapter 2. Quickstart Guide

CHAPTER 3 Retrieving and downloading data 3.1 get_data Get data from MAST or hard disk 3.2 join_quarters Stitch multiple quarters of data together 9

cloud-kepler Documentation, Release 1.2 10 Chapter 3. Retrieving and downloading data

CHAPTER 4 BLS pulse algorithm 4.1 drive_bls_pulse Driver interface to BLS pulse 4.2 bls_pulse_python Naive pure Python implementation 4.3 bls_pulse_vec Vectorized Python implementation 4.4 bls_pulse_cython Optimized Cython implementation 11

cloud-kepler Documentation, Release 1.2 12 Chapter 4. BLS pulse algorithm

CHAPTER 5 detrend Detrend lightcurve data 13

cloud-kepler Documentation, Release 1.2 14 Chapter 5. detrend Detrend lightcurve data

CHAPTER 6 clean_signal Signal cleaning (removal of strong periodic signals) 15

cloud-kepler Documentation, Release 1.2 16 Chapter 6. clean_signal Signal cleaning (removal of strong periodic signals)

CHAPTER 7 postprocessing Analyze output from BLS pulse 17

cloud-kepler Documentation, Release 1.2 18 Chapter 7. postprocessing Analyze output from BLS pulse

CHAPTER 8 utils Utility functions 19

cloud-kepler Documentation, Release 1.2 20 Chapter 8. utils Utility functions

Python Module Index p postprocessing, 17 21