Parallel and Large Scale Learning with scikit-learn



Similar documents
Machine Learning in Python with scikit-learn. O Reilly Webcast Aug. 2014

Big Data Paradigms in Python

Parallel Programming Map-Reduce. Needless to Say, We Need Machine Learning for Big Data

Scalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011

HDFS Cluster Installation Automation for TupleWare

Assignment # 1 (Cloud Computing Security)

Apache Hadoop. Alexandru Costan

Developing MapReduce Programs

Introduction to Big Data! with Apache Spark" UC#BERKELEY#

Distributed Computing and Big Data: Hadoop and MapReduce

Fast Analytics on Big Data with H20

Hadoop & Spark Using Amazon EMR

Tackling Big Data with MATLAB Adam Filion Application Engineer MathWorks, Inc.

CSE-E5430 Scalable Cloud Computing Lecture 2

Mark Bennett. Search and the Virtual Machine

Tutorial for Assignment 2.0

16.1 MAPREDUCE. For personal use only, not for distribution. 333

Hadoop Architecture. Part 1

Clusters in the Cloud

Oracle Database Security and Audit

Parallel Databases. Parallel Architectures. Parallelism Terminology 1/4/2015. Increase performance by performing operations in parallel

Bringing Big Data Modelling into the Hands of Domain Experts

Amazon EC2 Product Details Page 1 of 5

Big Data Technology Map-Reduce Motivation: Indexing in Search Engines

Moving From Hadoop to Spark

Leveraging BlobSeer to boost up the deployment and execution of Hadoop applications in Nimbus cloud environments on Grid 5000

Introduction to Big Data with Apache Spark UC BERKELEY

A Performance Analysis of Distributed Indexing using Terrier

Scalable Cloud Computing Solutions for Next Generation Sequencing Data

Apache Spark : Fast and Easy Data Processing Sujee Maniyam Elephant Scale LLC sujee@elephantscale.com

Introduction to Hadoop

DATA SCIENCE CURRICULUM WEEK 1 ONLINE PRE-WORK INSTALLING PACKAGES COMMAND LINE CODE EDITOR PYTHON STATISTICS PROJECT O5 PROJECT O3 PROJECT O2

Introduction to Highly Available NFS Server on scale out storage systems based on GlusterFS

Hadoop Ecosystem B Y R A H I M A.

Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging

Amazon Web Services Primer. William Strickland COP 6938 Fall 2012 University of Central Florida

USER CONFERENCE 2011 SAN FRANCISCO APRIL Running MarkLogic in the Cloud DEVELOPER LOUNGE LAB

Virtual Appliance Setup Guide

Introduction to Hadoop

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES

OpenTOSCA Release v1.1. Contact: Documentation Version: March 11, 2014 Current version:

Big Data Analytics with Spark and Oscar BAO. Tamas Jambor, Lead Data Scientist at Massive Analytic

IMPROVED FAIR SCHEDULING ALGORITHM FOR TASKTRACKER IN HADOOP MAP-REDUCE

CIS 4930/6930 Spring 2014 Introduction to Data Science /Data Intensive Computing. University of Florida, CISE Department Prof.

Map Reduce & Hadoop Recommended Text:

Lecture 6 Cloud Application Development, using Google App Engine as an example

HADOOP MOCK TEST HADOOP MOCK TEST I

Hadoop MapReduce and Spark. Giorgio Pedrazzi, CINECA-SCAI School of Data Analytics and Visualisation Milan, 10/06/2015

Jeffrey D. Ullman slides. MapReduce for data intensive computing

Unlocking the True Value of Hadoop with Open Data Science

VMware vcenter Log Insight Getting Started Guide

Scaling Out With Apache Spark. DTL Meeting Slides based on

Cloud Computing for Research. Jeff Barr - January 2011

Equalizer VLB Beta I. Copyright 2008 Equalizer VLB Beta I 1 Coyote Point Systems Inc.

H2O on Hadoop. September 30,

VMware Data Recovery. Administrator's Guide EN

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

CS380 Final Project Evaluating the Scalability of Hadoop in a Real and Virtual Environment

Hadoop Parallel Data Processing

Characterizing Task Usage Shapes in Google s Compute Clusters

VMware vcenter Log Insight Getting Started Guide

COURSE CONTENT Big Data and Hadoop Training

Cloudera Manager Training: Hands-On Exercises

Mammoth Scale Machine Learning!

Spark in Action. Fast Big Data Analytics using Scala. Matei Zaharia. project.org. University of California, Berkeley UC BERKELEY

Sriram Krishnan, Ph.D.

Chapter 7. Using Hadoop Cluster and MapReduce

Developing High-Performance, Scalable, cost effective storage solutions with Intel Cloud Edition Lustre* and Amazon Web Services

Tutorial: Using HortonWorks Sandbox 2.3 on Amazon Web Services

Ceph. A complete introduction.

Apache Traffic Server Extensible Host Resolution

Healthstone Monitoring System

MapReduce Evaluator: User Guide

Spark and the Big Data Library

Monitis Project Proposals for AUA. September 2014, Yerevan, Armenia

COMP 598 Applied Machine Learning Lecture 21: Parallelization methods for large-scale machine learning! Big Data by the numbers

Discovery Guide. Secret Server. Table of Contents

Google File System. Web and scalability

Journée Thématique Big Data 13/03/2015

Take An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc

Unitrends Virtual Backup Installation Guide Version 8.0

Analysis Tools and Libraries for BigData

Running a Workflow on a PowerCenter Grid

A programming model in Cloud: MapReduce

Cloud Computing at Google. Architecture

Intellicus Enterprise Reporting and BI Platform

VMware Identity Manager Connector Installation and Configuration

Comsol Multiphysics. Running COMSOL on the Amazon Cloud. VERSION 4.3a

MapReduce and Hadoop. Aaron Birkland Cornell Center for Advanced Computing. January 2012

MOSIX: High performance Linux farm

THE EXPAND PARALLEL FILE SYSTEM A FILE SYSTEM FOR CLUSTER AND GRID COMPUTING. José Daniel García Sánchez ARCOS Group University Carlos III of Madrid

Neptune. A Domain Specific Language for Deploying HPC Software on Cloud Platforms. Chris Bunch Navraj Chohan Chandra Krintz Khawaja Shams

On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform

MAPREDUCE Programming Model

Alfresco Enterprise on AWS: Reference Architecture

Gluster Filesystem 3.3 Beta 2 Hadoop Compatible Storage

Big Data at Spotify. Anders Arpteg, Ph D Analytics Machine Learning, Spotify

Parallel Computing: Strategies and Implications. Dori Exterman CTO IncrediBuild.

Transcription:

Parallel and Large Scale Learning with scikit-learn Data Science London Meetup - Mar. 2013

About me Regular contributor to scikit-learn Interested in NLP, Computer Vision, Predictive Modeling & ML in general Interested in Cloud Tech and Scaling Stuff Starting my own ML consulting business: http://ogrisel.com

Outline The Problem and the Ecosystem Scaling Text Classification Scaling Forest Models Introduction to IPython.parallel & StarCluster Scaling Model Selection & Evaluation

Parts of the Ecosystem Single Machine with Multiple Cores multiprocessing Multiple Machines with Multiple Cores

The Problem Big CPU (Supercomputers - MPI) Simulating stuff from models Big Data (Google scale - MapReduce) Counting stuff in logs / Indexing the Web Machine Learning? often somewhere in the middle

Cross Validation Input Data Labels to Predict

Cross Validation A B C A B C

Cross Validation A B C A B C Subset of the data used to train the model Held-out test set for evaluation

Cross Validation A B C A B C A C B A C B B C A B C A

Model Selection param_1 in [1, 10, 100] param_2 in [1e3, 1e4, 1e5] the Hyperparameters hell Find the best combination of parameters that maximizes the Cross Validated Score

Grid Search param_1 (1, 1e3) (10, 1e3) (100, 1e3) param_2 (1, 1e4) (10, 1e4) (100, 1e4) (1, 1e5) (10, 1e5) (100, 1e5)

(1, 1e3) (10, 1e3) (100, 1e3) (1, 1e4) (10, 1e4) (100, 1e4) (1, 1e5) (10, 1e5) (100, 1e5)

Grid Search: Qualitative Results

Grid Search: Cross Validated Scores

Parallel ML Use Cases Stateless Feature Extraction Model Assessment with Cross Validation Model Selection with Grid Search Bagging Models: Random Forests In-Loop Averaged Models

Embarrassingly Parallel ML Use Cases Stateless Feature Extraction Model Assessment with Cross Validation Model Selection with Grid Search Bagging Models: Random Forests In-Loop Averaged Models

Inter-Process Comm. Use Cases Stateless Feature Extraction Model Assessment with Cross Validation Model Selection with Grid Search Bagging Models: Random Forests In-Loop Averaged Models

Scaling Text Feature Extraction The Hashing Trick

(Count TfIdf)Vectorizer Scalability Issues Builds an In-Memory Vocabulary from text tokens to integer feature indices A Big Python dict: slow to (un)pickle Large Corpus: ~10^6 tokens Vocabulary == Statefulness == Sync barrier No easy way to run in parallel

>>> from sklearn.feature_extraction.text... import TfidfVectorizer >>> vec = TfidfVectorizer() >>> vec.fit(["the cat sat on the mat."]) >>> vec.vocabulary_ {u'cat': 0, u'mat': 1, u'on': 2, u'sat': 3, u'the': 4}

The Hashing Trick Replace the Python dict by a hash function: >>> from sklearn.utils.murmurhash import * >>> murmurhash3_bytes_u32('cat', 0) % 10 9L >>> murmurhash3_bytes_u32('sat', 0) % 10 0L Does not need any memory storage Hashing is stateless: can run in parallel!

>>> from sklearn.feature_extraction.text... import HashingVectorizer >>> vec = HashingVectorizer() >>> out = vec.transform([... "The cat sat on the mat."]) >>> out.shape (1, 1048576) >>> out.nnz # number of non-zero elements 5

Some Numbers

TfidfVectorizer Loading 20 newsgroups dataset for all categories 11314 documents - 22.055MB (training set) 7532 documents - 13.801MB (testing set) Extracting features from the training dataset using a sparse vectorizer done in 12.881007s at 1.712MB/s n_samples: 11314, n_features: 129792 Extracting features from the test dataset using the same vectorizer done in 4.043470s at 3.413MB/s n_samples: 7532, n_features: 129792

HashingVectorizer Loading 20 newsgroups dataset for all categories 11314 documents - 22.055MB (training set) 7532 documents - 13.801MB (testing set) Extracting features from the training dataset using a sparse vectorizer done in 5.281561s at 4.176MB/s n_samples: 11314, n_features: 65536 Extracting features from the test dataset using the same vectorizer done in 3.413027s at 4.044MB/s n_samples: 7532, n_features: 65536

HashingVectorizer on Amazon Reviews Music reviews: 216MB XML file 140MB raw text / 174,180 reviews: 53s Books reviews: 1.3GB XML file 900MB raw text / 975,194 reviews: ~6min https://gist.github.com/ogrisel/4313514

Parallel Text Classification

HowTo: Parallel Text Classification All Text Data All Labels to Predict

Partition the Text Data Text Data 1 Text Data 2 Text Data 3 Labels 1 Labels 2 Labels 3

Vectorizer in Parallel Text Data 1 Text Data 2 Text Data 3 Labels 1 Labels 2 Labels 3 vec vec vec Vec Data 1 Labels 1 Vec Data 2 Labels 2 Text Data 3 Labels 3

Train Linear Models in Parallel Text Data 1 Text Data 2 Text Data 3 Labels 1 Labels 2 Labels 3 vec vec vec Vec Data 1 Labels 1 Vec Data 2 Labels 2 Text Data 3 Labels 3 clf_1 clf_2 clf_3

Collect Models and Average clf = ( clf_1 + clf_2 + clf_3 ) / 3

Averaging Linear Models >>> clf = clone(clf_1) >>> clf.coef_ += clf_2.coef_ >>> clf.coef_ += clf_3.coef_ >>> clf.intercept_ += clf_2.intercept_ >>> clf.intercept_ += clf_3.intercept_ >>> clf.coef_ /= 3; clf.intercept_ /= 3

Averaging Linear Models >>> clf = clone(clf_1) >>> clf.coef_ += clf_2.coef_ >>> clf.coef_ += clf_3.coef_ >>> clf.intercept_ += clf_2.intercept_ >>> clf.intercept_ += clf_3.intercept_ >>> clf.coef_ /= 3; clf.intercept_ /= 3

Averaging Linear Models >>> clf = clone(clf_1) >>> clf.coef_ += clf_2.coef_ >>> clf.coef_ += clf_3.coef_ >>> clf.intercept_ += clf_2.intercept_ >>> clf.intercept_ += clf_3.intercept_ >>> clf.coef_ /= 3; clf.intercept_ /= 3

Averaging Linear Models >>> clf = clone(clf_1) >>> clf.coef_ += clf_2.coef_ >>> clf.coef_ += clf_3.coef_ >>> clf.intercept_ += clf_2.intercept_ >>> clf.intercept_ += clf_3.intercept_ >>> clf.coef_ /= 3; clf.intercept_ /= 3

Training Forest Models in Parallel

Tricks Try: ExtraTreesClassifier instead of: RandomForestClassifier Faster to train Sometimes better generalization too Both kind of Forest Models are naturally embarrassingly parallel models.

HowTo: Parallel Forests All Data All Labels to Predict

Partition Replicate the Dataset All Data All Data All Data All Labels All Labels All Labels

Train Forest Models in Parallel All Data All Data All Data All Labels All Labels All Labels clf_1 clf_2 clf_3 Seed each model with a different random_state integer!

Collect Models and Combine Forest Models naturally do the averaging at prediction time. clf = ( clf_1 + clf_2 + clf_3 ) >>> clf = clone(clf_1) >>> clf.estimators_ += clf_2.estimators_ >>> clf.estimators_ += clf_3.estimators_

What if my data does not fit in memory?

HowTo: Parallel Forests (for large datasets) All Data All Labels to Predict

Partition Replicate Partition the Dataset Data 1 Data 2 Data 3 Labels 1 Labels 2 Labels 3

Train Forest Models in Parallel Data 1 Data 2 Data 3 Labels 1 Labels 2 Labels 3 clf_1 clf_2 clf_3

Collect Models and Sum clf = ( clf_1 + clf_2 + clf_3 ) >>> clf = clone(clf_1) >>> clf.estimators_ += clf_2.estimators_ >>> clf.estimators_ += clf_3.estimators_

Warning Models trained on the partitioned dataset are not exactly equivalent of models trained on the unpartitioned dataset If very much data: does not matter much in practice: Gilles Louppe & Pierre Geurts http://www.cs.bris.ac.uk/~flach/

Implementing Parallelization with Python

Single Machine with Multiple Cores

multiprocessing >>> from multiprocessing import Pool >>> p = Pool(4) >>> p.map(type, [1, 2., '3']) [int, float, str] >>> r = p.map_async(type, [1, 2., '3']) >>> r.get() [int, float, str]

multiprocessing Part of the standard lib Simple API Cross-Platform support (even Windows!) Some support for shared memory Support for synchronization (Lock)

multiprocessing: limitations No docstrings in the source code! Very tricky to use the shared memory values with NumPy Bad support for KeyboardInterrupt fork without exec on POSIX

transparent disk-caching of the output values and lazy re-evaluation (memoization) easy simple parallel computing logging and tracing of the execution

joblib.parallel >>> from os.path.join >>> from joblib import Parallel, delayed >>> Parallel(2)(delayed(join)('/ect', s)... for s in 'abc') ['/ect/a', '/ect/b', '/ect/c']

Usage in scikit-learn Cross Validation cross_val(model, X, y, n_jobs=4, cv=3) Grid Search GridSearchCV(model, n_jobs=4, cv=3).fit(x, y) Random Forests RandomForestClassifier(n_jobs=4).fit(X, y)

joblib.parallel: shared memory (dev) >>> from joblib import Parallel, delayed >>> import numpy as np >>> Parallel(2, max_nbytes=1e6)(... delayed(type)(np.zeros(int(i)))... for i in [1e4, 1e6]) [<type 'numpy.ndarray'>, <class 'numpy.core.memmap.memmap'>]

Only 3 allocated datasets shared by all the concurrent workers performing the grid search. (1, 1e3) (10, 1e3) (100, 1e3) (1, 1e4) (10, 1e4) (100, 1e4) (1, 1e5) (10, 1e5) (100, 1e5)

Problems with multiprocessing & joblib Current Implementation uses fork without exec under Unix Break some optimized runtimes: OpenBlas Grand Central Dispatch under OSX Will be fixed in Python 3 at some point...

Multiple Machines with Multiple Cores

IPython.parallel Parallel Processing Library Interactive Exploratory Shell Multi Core & Distributed

Working in the Cloud Launch a cluster of machines in one cmd: $ starcluster start mycluster -s 3 \ -b 0.07 --force-spot-master $ starcluster sshmaster mycluster Supports Spot Instances provisioning Ships blas, atlas, numpy, scipy IPython plugin, Hadoop plugin and more

[global] DEFAULT_TEMPLATE=ip [key mykey] KEY_LOCATION=~/.ssh/mykey.rsa [plugin ipcluster] SETUP_CLASS = starcluster.plugins.ipcluster.ipcluster ENABLE_NOTEBOOK = True [plugin packages] setup_class = pypackage.pypackagesetup packages = msgpack-python, scikit-learn [cluster ip] KEYNAME = mykey CLUSTER_USER = ipuser NODE_IMAGE_ID = ami-999d49f0 NODE_INSTANCE_TYPE = c1.xlarge DISABLE_QUEUE = True SPOT_BID = 0.10 PLUGINS = packages, ipcluster

$ starcluster start -s 3 --force-spot-master demo_cluster StarCluster - (http://star.mit.edu/cluster) (v. 0.9999) Software Tools for Academics and Researchers (STAR) Please submit bug reports to starcluster@mit.edu >>> Using default cluster template: ip >>> Validating cluster template settings... >>> Cluster template settings are valid >>> Starting cluster... >>> Launching a 3-node cluster... >>> Launching master node (ami: ami-999d49f0, type: c1.xlarge)... >>> Creating security group @sc-demo_cluster... SpotInstanceRequest:sir-d10e3412 >>> Launching node001 (ami: ami-999d49f0, type: c1.xlarge) SpotInstanceRequest:sir-3cad4812 >>> Launching node002 (ami: ami-999d49f0, type: c1.xlarge) SpotInstanceRequest:sir-1a918014 >>> Waiting for cluster to come up... (updating every 5s) >>> Waiting for open spot requests to become active... 3/3 100% >>> Waiting for all nodes to be in a 'running' state... 3/3 100% >>> Waiting for SSH to come up on all nodes... 3/3 100% >>> Waiting for cluster to come up took 5.087 mins >>> The master node is ec2-54-243-24-93.compute-1.amazonaws.com

>>> Configuring cluster... >>> Running plugin starcluster.clustersetup.defaultclustersetup >>> Configuring hostnames... 3/3 100% >>> Creating cluster user: ipuser (uid: 1001, gid: 1001) 3/3 100% >>> Configuring scratch space for user(s): ipuser 3/3 100% >>> Configuring /etc/hosts on each node 3/3 100% >>> Starting NFS server on master >>> Configuring NFS exports path(s): /home >>> Mounting all NFS export path(s) on 2 worker node(s) 2/2 100% >>> Setting up NFS took 0.151 mins >>> Configuring passwordless ssh for root >>> Configuring passwordless ssh for ipuser >>> Running plugin ippackages >>> Installing Python packages on all nodes: >>> $ pip install -U msgpack-python >>> $ pip install -U scikit-learn >>> Installing 2 python packages took 1.12 mins

>>> Running plugin ipcluster >>> Writing IPython cluster config files >>> Starting the IPython controller and 7 engines on master >>> Waiting for JSON connector file... /Users/ogrisel/.starcluster/ipcluster/SecurityGroup:@sc-demo_cluster-us-east-1.json 100% Time: 00:00:00 0.00 B/s >>> Authorizing tcp ports [1000-65535] on 0.0.0.0/0 for: IPython controller >>> Adding 16 engines on 2 nodes 2/2 100% >>> Setting up IPython web notebook for user: ipuser >>> Creating SSL certificate for user ipuser >>> Authorizing tcp ports [8888-8888] on 0.0.0.0/0 for: notebook >>> IPython notebook URL: https://ec2-54-243-24-93.compute-1.amazonaws.com:8888 >>> The notebook password is: zyhomhea8rtjscxj *** WARNING - Please check your local firewall settings if you're having *** WARNING - issues connecting to the IPython notebook >>> IPCluster has been started on SecurityGroup:@sc-demo_cluster for user 'ipuser' with 23 engines on 3 nodes. To connect to cluster from your local machine use: from IPython.parallel import Client client = Client('/Users/ogrisel/.starcluster/ipcluster/SecurityGroup:@sc-demo_cluster-useast-1.json', sshkey='/users/ogrisel/.ssh/mykey.rsa') See the IPCluster plugin doc for usage details: http://star.mit.edu/cluster/docs/latest/plugins/ipython.html >>> IPCluster took 0.679 mins >>> Configuring cluster took 3.454 mins >>> Starting cluster took 8.596 mins

Demo!

Perspectives

2012 results by Stanford / Google

The YouTube Neuron

Thanks http://scikit-learn.org http://packages.python.org/joblib http://ipython.org http://star.mit.edu/cluster/ http://speakerdeck.com/ogrisel @ogrisel

If we had more time...

MapReduce? mapper [ [ [ (k1, v1), (k3, v3), reducer (k5, v6), (k2, v2), mapper (k4, v4), (k6, v6),...... reducer... ] ] ] mapper

Why MapReduce does not always work Write a lot of stuff to disk for failover Inefficient for small to medium problems [(k, v)] mapper [(k, v)] reducer [(k, v)] Data and model params as (k, v) pairs? Complex to leverage for Iterative Algorithms

When MapReduce is useful for ML Data Preprocessing & Feature Extraction Parsing, Filtering, Cleaning Computing big JOINs & Aggregates Random Sampling Computing ensembles on partitions

The AllReduce Pattern Compute an aggregate (average) of active node data Do not clog a single node with incoming data transfer Traditionally implemented in MPI systems

AllReduce 0/3 Initial State Value: 1.0 Value: 2.0 Value: 0.5 Value: 1.1 Value: 3.2 Value: 0.9

AllReduce 1/3 Spanning Tree Value: 1.0 Value: 2.0 Value: 0.5 Value: 1.1 Value: 3.2 Value: 0.9

AllReduce 2/3 Upward Averages Value: 1.0 Value: 2.0 Value: 0.5 Value: 1.1 (1.1, 1) Value: 3.2 (3.1, 1) Value: 0.9 (0.9, 1)

AllReduce 2/3 Upward Averages Value: 1.0 Value: 2.0 (2.1, 3) Value: 0.5 (0.7, 2) Value: 1.1 (1.1, 1) Value: 3.2 (3.1, 1) Value: 0.9 (0.9, 1)

AllReduce 2/3 Upward Averages Value: 1.0 (1.38, 6) Value: 2.0 (2.1, 3) Value: 0.5 (0.7, 2) Value: 1.1 (1.1, 1) Value: 3.2 (3.1, 1) Value: 0.9 (0.9, 1)

AllReduce 3/3 Downward Updates Value: 1.38 Value: 2.0 (2.1, 3) Value: 0.5 (0.7, 2) Value: 1.1 (1.1, 1) Value: 3.2 (3.1, 1) Value: 0.9 (0.9, 1)

AllReduce 3/3 Downward Updates Value: 1.38 Value: 1.38 Value: 1.38 Value: 1.1 (1.1, 1) Value: 3.2 (3.1, 1) Value: 0.9 (0.9, 1)

AllReduce 3/3 Downward Updates Value: 1.38 Value: 1.38 Value: 1.38 Value: 1.38 Value: 1.38 Value: 1.38

AllReduce Final State Value: 1.38 Value: 1.38 Value: 1.38 Value: 1.38 Value: 1.38 Value: 1.38

AllReduce Implementations http://mpi4py.scipy.org IPC directly w/ IPython.parallel https://github.com/ipython/ipython/tree/ master/docs/examples/parallel/interengine

Killall IPython engines on StarCluster [plugin ipcluster] SETUP_CLASS = starcluster.plugins.ipcluster.ipcluster ENABLE_NOTEBOOK = True NOTEBOOK_DIRECTORY = notebooks [plugin ipclusterrestart] SETUP_CLASS = starcluster.plugins.ipcluster.ipclusterrestartengines

$ starcluster runplugin ipclusterrestart demo_cluster StarCluster - (http://star.mit.edu/cluster) (v. 0.9999) Software Tools for Academics and Researchers (STAR) Please submit bug reports to starcluster@mit.edu >>> Running plugin ipclusterrestart >>> Restarting 23 engines on 3 nodes 3/3 100%