Setting Up the ARC Activity-Based Model in the Cloud: Lessons to Date. Ben Stabler, PB. Guy Rousseau, ARC. Matthew Martimo, Citilabs



Similar documents
Cloud Computing and Amazon Web Services

Cloud computing is a marketing term that means different things to different people. In this presentation, we look at the pros and cons of using

CloudFTP: A free Storage Cloud

An Esri White Paper January 2011 Estimating the Cost of a GIS in the Amazon Cloud

An Introduction to Cloud Computing Concepts

Cloud Computing. Adam Barker

Cloud Computing For Bioinformatics

Data Centers and Cloud Computing

Technical Aspects to GIS in the Cloud

Cloud Computing Solutions for Genomics Across Geographic, Institutional and Economic Barriers

Estimating the Cost of a GIS in the Amazon Cloud. An Esri White Paper August 2012

The world s most popular transportation modeling suite

Migration Scenario: Migrating Batch Processes to the AWS Cloud

GeoCloud Project Report USGS/EROS Spatial Data Warehouse Project

Amazon Hosted ESRI GeoPortal Server. GeoCloud Project Report

GeoCloud Project Report GEOSS Clearinghouse

Tool - 1: Health Center

Introduction to Engineering Using Robotics Experiments Lecture 18 Cloud Computing

Amazon EC2 XenApp Scalability Analysis

ST 810, Advanced computing

Web Application Deployment in the Cloud Using Amazon Web Services From Infancy to Maturity

WILLAMALANE PARK AND RECREATION DISTRICT. Springfield, Oregon GIS ASSET MANAGEMENT IN A PARK AND RECREATION DISTRICT ESRI USER CONFERENCE JULY 2014

Chapter 19 Cloud Computing for Multimedia Services

jbase 5 Install on Amazon AWS a Primer

Visualisation in the Google Cloud

Amazon Cloud Storage Options

SETUP AND MANAGEMENT OF MODELS IN CUBE CLOUD. Colby M. Brown, AICP PTP

CLOUD PERFORMANCE TESTING - KEY CONSIDERATIONS (COMPLETE ANALYSIS USING RETAIL APPLICATION TEST DATA)

Hosting Blackbaud Software in the Cloud

How cloud computing can transform your business landscape

FREE computing using Amazon EC2

Installing and Configuring Windows Server Module Overview 14/05/2013. Lesson 1: Planning Windows Server 2008 Installation.

WHITE PAPER SETTING UP AND USING ESTATE MASTER ON THE CLOUD INTRODUCTION

Build Your Own Performance Test Lab in the Cloud. Leslie Segal Testware Associate, Inc.

Using ArcGIS for Server in the Amazon Cloud

Where We Are. References. Cloud Computing. Levels of Service. Cloud Computing History. Introduction to Data Management CSE 344

Building your Big Data Architecture on Amazon Web Services

Using SUSE Studio to Build and Deploy Applications on Amazon EC2. Guide. Solution Guide Cloud Computing.

LabStats 5 System Requirements

The Easiest Way to Run Spark Jobs. How-To Guide

How To Test A Web Server

Getting Familiar with Cloud Terminology. Cloud Dictionary

Cloud Computing for Education Workshop

CSE 344 Introduction to Data Management. Section 9: AWS, Hadoop, Pig Latin TA: Yi-Shu Wei

How to Ingest Data into Google BigQuery using Talend for Big Data. A Technical Solution Paper from Saama Technologies, Inc.

Outline. What is cloud computing? History Cloud service models Cloud deployment forms Advantages/disadvantages

Part V Applications. What is cloud computing? SaaS has been around for awhile. Cloud Computing: General concepts

CHAPTER 2 BACKGROUND AND OBJECTIVE OF PRESENT WORK

How To Set Up Wiremock In Anhtml.Com On A Testnet On A Linux Server On A Microsoft Powerbook 2.5 (Powerbook) On A Powerbook 1.5 On A Macbook 2 (Powerbooks)

In a dynamic economic environment, your company s survival

Cloud computing. Examples

Cloud Computing. Following the American Psychological Association s Guidelines. Dustin Self. The University of North Texas

Continuous Integration (CI) for Mobile Applications

Amazon Elastic Compute Cloud Getting Started Guide. My experience

Last time. Today. IaaS Providers. Amazon Web Services, overview

Scalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011

DISTRIBUTED SYSTEMS AND CLOUD COMPUTING. A Comparative Study

owncloud Enterprise Edition on IBM Infrastructure

ArcGIS for Server: In the Cloud

MyCloudLab: An Interactive Web-based Management System for Cloud Computing Administration

White Paper. Cloud Native Advantage: Multi-Tenant, Shared Container PaaS. Version 1.1 (June 19, 2012)

The Theory And Practice of Testing Software Applications For Cloud Computing. Mark Grechanik University of Illinois at Chicago

WELCOME TO CITUS CLOUD LOAD TEST

Cloud Computing INTRODUCTION

Description of Application

The Cloud Hosting Revolution: Learn How to Cut Costs and Eliminate Downtime with GlowHost's Cloud Hosting Services

Leveraging Public Clouds to Ensure Data Availability

The Cloud at Crawford. Evaluating the pros and cons of cloud computing and its use in claims management

Cloud Computing demystified! ISACA-IIA Joint Meeting Dec 9, 2014 By: Juman Doleh-Alomary Office of Internal Audit

The Cost of the Cloud. Steve Saporta CTO, SwipeToSpin Mar 20, 2015

Liferay Portal Performance. Benchmark Study of Liferay Portal Enterprise Edition

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

Alfresco Enterprise on AWS: Reference Architecture

Understanding ArcGIS Deployments in Public and Private Cloud. Marwa Mabrouk

Zadara Storage Cloud A

A Web Base Information System Using Cloud Computing

InstaFile. Complete Document management System

01/02/2012. Meet CloudOne. James Pietrocarlo Vice President of Business Development CloudOne.

Cloud Computing Deja Vu

References. Introduction to Database Systems CSE 444. Motivation. Basic Features. Outline: Database in the Cloud. Outline

Introduction to Database Systems CSE 444

Deployment Options for Microsoft Hyper-V Server

Performance Optimization Guide

Cloud Computing Submitted By : Fahim Ilyas ( ) Submitted To : Martin Johnson Submitted On: 31 st May, 2009

ediscovery and Search of Enterprise Data in the Cloud

AMAZON S3: ARCHITECTING FOR RESILIENCY IN THE FACE OF FAILURES Jason McHugh

Session 3. the Cloud Stack, SaaS, PaaS, IaaS

Virtualization and Cloud Computing

OpenNebula Open Souce Solution for DC Virtualization. C12G Labs. Online Webinar

Transcription:

Setting Up the ARC Activity-Based Model in the Cloud: Lessons to Date Ben Stabler, PB Guy Rousseau, ARC Matthew Martimo, Citilabs November 16, 2011 Topic Area: Innovations in Computation Introduction The purpose of this paper is to describe our efforts to migrate the Atlanta Regional Commission (ARC) Activity-Based Model (ABM) modeling system to the cloud, and to share lessons learned. While the existing ABM runs well on ARC s modeling cluster, there are times when the cluster is busy and a run of the ABM is needed. Running the model in the cloud represents a potential solution to this problem. This paper begins with a brief overview of cloud computing, followed by a short review of the ARC ABM. The paper then focuses on options for cloud-based modeling, setting up the ARC cloudbased ABM, problems encountered and solved, and resulting run times as compared with the existing ARC modeling cluster. This paper concludes with a discussion of our in-progress efforts to fine tune the ARC cloud-based ABM so it can be run efficiently for multiple runs by multiple users at the same time. Cloud Computing Cloud computing is the on-demand use of remote computer resources. Examples include computing solutions such as Gmail, Survey Monkey, and Google Docs, where the computing solution is run entirely by remote machines. Other examples of cloud computing are the Amazon Elastic Compute Cloud (EC2) 1, a cloud computing service for software developers, and Citilabs Cube Cloud Services, which uses Amazon EC2. The key features of cloud computing are scalability, a fee structure that usually is ondemand or subscription-based, and support for multiple user/instances. Scalability is the ability for the implementation to scale up or down by adding or subtracting more computing resources such as more processors in order to solve different size problems or to solve problems faster. For the ARC ABM, the model scales well by the number of households, since most of the sub-models are solved independently by household. Since the cloud is remote, and is often not owned by the user, the user rents the resources on-demand or via a subscription-based fee structure and therefore only pays for what they use. For the ARC ABM, this means paying for computing time on multiple computing cores and also for data transfer. A cloud solution is often setup to support 1

multiple users/instances, so multiple users can run multiple model runs by interacting with a client user interface that coordinates with a cluster of computers (i.e. a cloud). For the ARC ABM, this means being able to run multiple simultaneous runs as opposed to one run at a time on the existing ARC modeling cluster. ARC ABM Review The ARC ABM is based on the CT-RAMP (Coordinated Travel-Regional Activity-Based Modeling Platform) family of ABMs developed, or being developed, in Columbus, Atlanta, the San Francisco Bay Area, San Diego, Phoenix, Chicago, Miami, and other regions. Various components of the modeling system have been presented at TRB, TRB Planning Applications, and the previous ITM conferences 234. The model includes explicit intra-household interactions, a continuous temporal dimension (in hourly time periods), and integration of location, time-of-day, and mode choice models. The ARC region is split into over 2000 zones (with over 6000 transit subzones), has two time periods (AM peak and off peak) for network level-of-service matrices, and has a population size of about 1.7 million households in 2005 and 2.7 million in 2030. As presented at the previous ITM conference in 2010 5, the ARC ABM is implemented in Java and Cube. Java is responsible for the internal demand models such as workplace location, tour generation, tour mode choice, stop location choice, and trip mode choice. Cube is used for networks, assignments and skimming, overall model running, and ancillary models such as the external model and truck model. The model uses the Java Parallel Processing Framework 6 and Cube Cluster 7 to thread and distribute work across multiple machines. The base year model runs 3 feedback loops in about 16 hours on the ARC modeling cluster using three Windows 64bit machines with 8 processors and 32 GB of RAM each. This exact same model was transferred to the cloud for testing. Setting Up the ARC Cloud Based ABM There are two basic approaches to a cloud-based modeling system. The first is a more open system that allows the user to configure and use custom remote machines. An example is renting a few instances (i.e. remote machines) from Amazon EC2 and then configuring them as needed. The Amazon EC2 system is designed for software developers so it is not very easy to use, but it is very flexible. The second approach is a more closed system that is configured ahead of time and includes a client user interface that sits between the user and the remote machines in order to handle setting up the instances, adding instances, starting the model runs, ensuring the runs complete, and getting results. An example is Cube Cloud Services, which wraps a web-based user interface around Amazon EC2 and allows the user to upload files, run models, and get results. The advantage of this approach is ease-ofuse and true scalability, while the disadvantage is a lack of flexibility since the user interface dictates what is possible. We tested both approaches. 2

The first step to setting up the cloud is to select and configure the machine instances. There are a number of instance options available from Amazon, including a machine that is similar to those in ARC s modeling cluster: High-Memory Quadruple Extra Large Instances (68GB RAM, 8 virtual cores, 1690GB HD, Windows 64bit) The user also needs to select either on-demand instance pricing or reserved instance pricing. On-demand pricing is simply pay by the hour pricing. Reserve pricing consists of a one-time fee for one year or three years and then a lower per hour use fee than ondemand pricing. For the ARC ABM base model run, the computing cost is approximately $120, which is 16 hours x $2.48/hr x 3 instances. If the $24,000 three year reserve pricing fee is paid, then the same model run costs $46. The model would need to be run about 324 times over the three year period to justify the reserve pricing option. The next step was to configure the instances and upload the model files. The cloud machines were configured through Amazon s AWS Management Console 8 and remote desktop to install software such as Java, which is required by the ARC ABM. Citilabs installed a keyless version of Cube on the machine instances for testing purposes. Finally, the ARC ABM specific software and input files were copied to the machines via remote desktop and the cluster configured for running the model. After setting up the cloud-based cluster, a small sample of households was run through the model to ensure the model setup worked. Not surprisingly, the first run failed. Review of the model run revealed that a low-level C DLL used by the model for reading Cube matrices had a small bug related to referencing objects in memory after the objects had been released. This issue only came up in the virtualized cloud-based computing environment since memory is reclaimed much more efficiently than in actual computing environments such as traditional desktop and server configurations. The code was fixed, the DLL re-compiled, and the model successfully run. Approach One Results Two model runs were completed with the first approach. The first is a three machine run with six feedback loops (due to a tighter overall model convergence criteria than previously used) and the second is a six machine run with six feedback loops. As shown in the table below, the three instance run completed in a comparable amount of time to the ARC modeling cluster (17 hours for three iterations as compared to about 16 hours for the ARC run). The increase in computing instances for the second run resulted in about a 35 percent reduction in run times. 3

Table 1 Cloud ABM Run Times Iteration Run Times (hh:mm) Reduction 3 machine 6 machine 1 3:11 3:03 4% 2 6:11 2:50 54% 3 7:38 4:07 46% 4 7:16 4:37 36% 5 7:32 5:04 33% 6 11:37 7:30 35% Total Java Run Time 43:25 27:11 37% Total Model Run Time 48:10 31:25 35% There are a few details to note in the model run time comparisons. The Java run time is the run time for the CT-RAMP ABM model components, while the model run time minus the Java run time is the run time for everything else. The second item to note is the first iteration six machine run did not scale well, which was not expected, and is currently being investigated. Overall, these results show that the cloud-based ABM produces reasonable run times and has reasonable scalability. After completing the model run, the outputs were zipped up and posted to ftp for download. Amazon charges $0.10 per GB for data transferred into the cloud and about $0.11 - $0.15 per GB for data transferred out. The 15 GB of outputs cost about $2.25 to transfer out of the cloud. Approach Two Improved Scalability The cloud-based ABM tested above used Amazon EC2 machines via remote desktop, which is very similar to how the model is run at ARC by agency staff and consultants. As a result, the cloud cluster can only be used for one run at a time, and has no cluster management user interface, which is useful for letting users know that the cluster is busy for example. The approach two configuration solves these problems and allows ARC s member agencies to run the model (and pay for it) as well. It is not yet known though what the pricing model will be. Setting up the ARC ABM with Cube Cloud Services required some improvements to the CT-RAMP to make the configuration of the modeling cluster for distributed computing more generic and scalable. This included scripting to make the starting and stopping of 4

remote instance machines that are used by CT-RAMP to solve sub-models by household. Instead of being hard-wired to three or four machines, the setup was made more flexible so the user is able to specify at run time how many instances (machines) to use. Citilabs product handled starting machine instances, starting the model run, copying output files, and shutting down instances. Approach Two Results A series of model runs were completed with approach two. As shown in the table and illustration below, the increase in computing instances resulted in significant non-linear reductions in the CT-RAMP run times. Doubling the number of cores from 32 to 64 reduced the run time by 37 percent. Doubling it again reduced the run time relative to the 32 core run by 55 percent. The 256 and 512 core runs show little improvement beyond the 128 core run. Based on this, it appears the computing power sweet spot is somewhere around 128 cores. Table 2 Cloud ABM Run Times Approach Two Iteration HH Sample Rate Cores (Machines) & Run Times 32 64 128 256 512 (4) (8) (16) (32) (64) 1 25% 2:03 1:29 1:06 1:02 1:10 2 50% 2:51 2:02 1:27 1:18 1:31 3 75% 4:06 2:42 1:59 1:41 1:47 4 100% 5:35 3:27 2:20 2:12 2:09 5 100% 6:05 3:16 2:22 2:09 2:10 Total Java Run Time 20:40 12:56 9:14 8:22 8:47 Run Time Reduction - 37% 55% 60% 58% 5

Figure 1 ARC ABM Cloud Run Times There are a couple of interesting findings from this test. The first is that there are dimensioning returns on additional computing power. One of the likely bottlenecks in the process is the household data manager, which stores all the households in memory and handles a significant amount of I/O related to passing data to/from the worker machines. With so many worker machines, the household manager becomes a bottleneck. Some solutions to this problem are to increase the household job size in order to reduce communication, move the household data manager to a dedicated machine, and to add additional household data managers, each responsible for a subset of households. A second finding is that larger core configurations can better handle additional work, since the 128 core setup runs relatively faster for larger sample rates than the 32 core setup. 6

Conclusions and Next Steps The purpose of this effort was to transfer the ARC ABM into the cloud to better understand run times and scalability. Overall, the effort was quite successful, as the configuration and running of the model went largely as expected. The run times were comparable to the ARC cluster runs and the model scaled well with the addition of computing cores, although dimensioning returns were observed. Some work was required to make the configuration of the cluster more generic and some additional work is planned to fine-tune the cloud-based model setup. This in-progress effort has also assisted ARC in determining how to best provide future access to its ABM for planning partners and stakeholders. Currently, it appears unrealistic to expect all potential users (municipalities, counties, GDOT, consultants) of the model to invest in the computing power required to run the ABM in a short amount of time. Thus, cloud computing, depending on the cost, offers the alternative of accessing the ARC ABM on an as-needed and pay-as-you go basis, to any eventual ABM user. Given today s needs for quick answers to complex policy questions from decision-makers, the cloud-based ABM allows for virtually unlimited model runs within reasonable timelines, and relieves the burden placed upon ARC s server infrastructure, in addition to providing a viable option for model runs when servers are down or busy. 7

References 1 http://aws.amazon.com/documentation/ec2 2 Parsons Brinckerhoff. (2009). Activity Based Travel Model Specifications: Coordinated Travel Regional Activity Based Modeling Platform (CT RAMP) for the Atlanta Region and the San Francisco Bay Area. 3 Vovsha, P., Freedman, J., Gupta, S., Sun, W., Livshits, V., (2010). Workplace Choice Model: Insights into Spatial Patterns of Commuting in Three Metropolitan Regions. Innovations in Travel Modeling, Tempe, Arizona. 4 Vovsha, P., Freedman, J., Sun, W., Livshits, V., (2011). Activity Based Models in Practice: CT RAMP Experience. Transportation Research Board, Washington, DC. 5 Stabler, B., Hicks, J., Rousseau, G., Nicholson, J, Simons, C., Freedman, J., Purvis, C., Ory, D. (2010). Computation Challenges of Implementing the Atlanta Regional Commission Activity Based Modeling System. Innovations in Travel Modeling, Tempe, Arizona. 6 Java Parallel Processing Framework, http://www.jppf.org 7 http://www.citilabs.com/products/cube/cube cluster 8 http://aws.amazon.com/console 8