A Framework forhigh Performance Image AnalysisPipelines overcloud Resources

Similar documents

BIGS: A Framework for Large-Scale Image Processing and Analysis Over Distributed and Heterogeneous Computing Resources

Hadoop & Spark Using Amazon EMR

Cloud Utilization for Online Price Intelligence

Cloud computing - Architecting in the cloud

Elastic Management of Cluster based Services in the Cloud

Open & Big Data for Life Imaging Technical aspects : existing solutions, main diﬃculties. Pierre Mouillard MD

2) Xen Hypervisor 3) UEC

How To Create A Grid On A Microsoft Web Server On A Pc Or Macode (For Free) On A Macode Or Ipad (For A Limited Time) On An Ipad Or Ipa (For Cheap) On Pc Or Micro

Cloud Computing. Chapter 1 Introducing Cloud Computing

Cloud Computing. Chapter 1 Introducing Cloud Computing

Apache Hama Design Document v0.6

Cloud Models and Platforms

IT services for analyses of various data samples

Enhanced Research Data Management and Publication with Globus

Dynamic Resource Distribution Across Clouds

Final Project Proposal. CSCI.6500 Distributed Computing over the Internet

Aggregating IaaS Service

Cloud Computing with Red Hat Solutions. Sivaram Shunmugam Red Hat Asia Pacific Pte Ltd.

Securing Data in the Virtual Data Center and Cloud: Requirements for Effective Encryption

The Total Cost of (Non) Ownership of a NoSQL Database Cloud Service

Globus Research Data Management: Introduction and Service Overview. Steve Tuecke Vas Vasiliadis

TECHNOLOGY WHITE PAPER Jan 2016

APPLICATION NOTE. Elastic Scalability. for HetNet Deployment, Management & Optimization

AWS Account Setup and Services Overview

Remote sensing information cloud service: research and practice

Background on Elastic Compute Cloud (EC2) AMI s to choose from including servers hosted on different Linux distros

Leveraging BlobSeer to boost up the deployment and execution of Hadoop applications in Nimbus cloud environments on Grid 5000

Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect

Triplestore Testing in the Cloud with Clojure. Ryan Senior

Ironfan Your Foundation for Flexible Big Data Infrastructure

McAfee Public Cloud Server Security Suite

An Introduction to Cloud Computing Concepts

Migration Scenario: Migrating Batch Processes to the AWS Cloud

CONDOR CLUSTERS ON EC2

Cloud Computing for Education Workshop

AWS Service Catalog. User Guide

Using SUSE Studio to Build and Deploy Applications on Amazon EC2. Guide. Solution Guide Cloud Computing.

Gaia Reply TM as a Service. Mobile Living Framework

Subash Krishnaswamy Applications Software Technology Corporation

Towards a New Model for the Infrastructure Grid

Introduction to Cloud Computing

Financial Services Grid Computing on Amazon Web Services January 2013 Ian Meyers

Alfresco Enterprise on AWS: Reference Architecture

ediscovery and Search of Enterprise Data in the Cloud

Real Time Big Data Processing

ArcGIS for Server in the Cloud

IMCM: A Flexible Fine-Grained Adaptive Framework for Parallel Mobile Hybrid Cloud Applications

VM Management for Green Data Centres with the OpenNebula Virtual Infrastructure Engine

Scalable Application. Mikalai Alimenkou

Cloud Computing: Computing as a Service. Prof. Daivashala Deshmukh Maharashtra Institute of Technology, Aurangabad

Visualisation in the Google Cloud

VX 9000E WiNG Express Manager INSTALLATION GUIDE

Viswanath Nandigam Sriram Krishnan Chaitan Baru

Putchong Uthayopas, Kasetsart University

HADOOP BIG DATA DEVELOPER TRAINING AGENDA

Document Management. Document Management for the Agile Enterprise. AuraTech Pte Ltd

Topology Aware Analytics for Elastic Cloud Services

GATECloud.net: Cloud Infrastructure for Large-Scale, Open-Source Text Processing

Virtualization s Evolution

Building your Big Data Architecture on Amazon Web Services

The Big Data Ecosystem at LinkedIn. Presented by Zhongfang Zhuang

Presenting Mongoose A New Approach to Traffic Capture (patent pending) presented by Ron McLeod and Ashraf Abu Sharekh January 2013

CHAPTER 8 CLOUD COMPUTING

Hunk & Elas=c MapReduce: Big Data Analy=cs on AWS

CloudCenter Full Lifecycle Management. An application-defined approach to deploying and managing applications in any datacenter or cloud environment

RemoteApp Publishing on AWS

CLOUD COMPUTING. When It's smarter to rent than to buy

Automated Application Provisioning for Cloud

ST 810, Advanced computing

Databricks. A Primer

Web Application Deployment in the Cloud Using Amazon Web Services From Infancy to Maturity

CLOUD COMPUTING & DIGITAL CUSTOMER EXPERIENCE. Nicola Previati Territory Manager Italy

Grinder in the Cloud. Get Loaded!

An Esri White Paper January 2011 Estimating the Cost of a GIS in the Amazon Cloud

Cloud Computing project Report

Cloud Computing. Adam Barker

Amazon Hosted ESRI GeoPortal Server. GeoCloud Project Report

PROPOSAL To Develop an Enterprise Scale Disease Modeling Web Portal For Ascel Bio Updated March 2015

Cloud/SaaS enablement of existing applications

Machine Learning and Cloud Computing. trends, issues, solutions. EGI-InSPIRE RI

Introduction to Big Data Training

TECHNOLOGY WHITE PAPER Jun 2012

Cloud Computing and Open Source: Watching Hype meet Reality

Luis Melo Head of CRM/CX. Capventis. Policy Automation. Knowledge Management. Field Service Management. Web Customer Service

MyCloudLab: An Interactive Web-based Management System for Cloud Computing Administration

PostgreSQL Performance Characteristics on Joyent and Amazon EC2

IEEE International Conference on Computing, Analytics and Security Trends CAST-2016 (19 21 December, 2016) Call for Paper

Moving beyond Virtualization as you make your Cloud journey. David Angradi

Développement logiciel pour le Cloud (TLC)

Cloud JPL Science Data Systems

Apache Hadoop. Alexandru Costan

SPACK FIREWALL RESTRICTION WITH SECURITY IN CLOUD OVER THE VIRTUAL ENVIRONMENT

Amazon EC2 Product Details Page 1 of 5

1 What is Cloud Computing? Cloud Infrastructures OpenStack Amazon EC CAMF Cloud Application Management

Client Overview. Engagement Situation. Key Requirements

Introduction to DevOps on AWS

QLIKVIEW INTEGRATION TION WITH AMAZON REDSHIFT John Park Partner Engineering

ITIL Asset and Configuration Management in the Cloud. January 2016

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control

Transcription:

A Framework forhigh Performance Image AnalysisPipelines overcloud Resources Raúl Ramos-Pollán, Ángel Cruz-Roa, Fabio González-Osorio Bioingenium Research Group Universidad Nacional de Colombia 1

1 Motivation Motivation Bioingenium research group www.bioingenium.unal.edu.co Large scale image processing pipelines Initially focused on medical imaging Support for machine learning processes Seldom availability of computing resources Limitation on collections size, applicability, algorithms design 2

1 Motivation Strategy BuildonexperienceswithHadoop/ Grid, etc. Profit from whatever resources available Decouple algorithm design from deployment Adopt Big Data principles and technologies Streamline software development process Unify coherent algorithms repository 3

2 Image Processing Framework ImageProcessingPipelines INPUT IMAGES PATCH EXTRACTION FEATURES EXTRACTION FEATURES CLUSTERING LATENT SEMANTICS AUTOMATIC ANNOTATION 4

2 Image Processing Framework ImageProcessingFramework EXPERIMENTER PIPELINE DEFINITION STAGE 1 STAGE 2 STAGE 4 STAGE 3 INPUT OUTPUT DATA WORKER WORKER SCHEDULE GENERATOR CLIENT at EXPERIMENTER S DESKTOP SCHEDULE NOSQL DATABASE WORKER ADDED LATER WORKERS (AMAZON, DESKTOPs) 5

2 Image Processing Framework Pipeline definitionexample ####################################### # FIRST STAGE: Patch Sampling ####################################### stage.01.task: ROIsFeatureExtractionTask stage.01.numberofpartitions: 10 stage.01.roialgorithm: RandomPatchExtractor stage.01.fealgorithm: GrayHistogram stage.01.randompatchextractor.blocksize: 18 stage.01.randompatchextractor.size: 256 stage.01.input.table: corel5k.imgs stage.01.output.table: corel5k.sample ########################################### # SECOND STAGE: CodeBook Construction ########################################### stage.02.task: KMeans stage.02.kmeans.kvalue: 10 stage.02.kmeans.maxnumberofiterations: 5 stage.02.kmeans.minchangepercentage: 0.1 stage.02.kmeans.numberofpartitions: 10 stage.02.input.table: corel5k.sample stage.02.output.table: corel5k.codebook #################################### # THIRD STAGE: Bag of Features Histograms #################################### stage.03.task: BagOfFeaturesExtractionTask stage.03.numberofpartitions: 10 stage.03.maximagesize: 256 stage.03.minimagesize: 20 stage.03.indexcodebook: true stage.03.featureextractor: GrayHistogram stage.03.patchextractor: RegularGridPatchExtractor stage.03.spatiallayoutextractor: IdentityROIExtractor stage.03.codebookid: STAGE-OUTPUT 2 stage.03.regulargridpatchextractor.blocksize: 18 stage.03.regulargridpatchextractor.stepsize: 9 stage.03.input.table: corel5k.imgs stage.03.output.table: corel5k.histograms SCHEDULE OF JOBS 6

2 Image Processing Framework Framework architecture TASKS API TASK PATTERN API WORKER API KMEANS BAG of FEATURES SCHEDULES INPUT/OUTPUT DATASETS UNDERLYING NoSQL STORAGE ITERATIVE DATA PARTITION DYNAMO DB IMPL HBASE IMPL FILE SYSTEM IMPL STORAGE API CORE schedule generator data management jobs wrapper WORKER AMZON VM LAUNCHER JWS LAUNCHER CMD LINE LAUNCHER CMD LINE TOOL WEB CONSOLE CLIENT API EXPERIMENTER 7

2 Image Processing Framework Algorithmspatterns BEFORE PROCESSING ALL PARTITIONS PARTITION 1 PARTITION 2 PARTITION N START ITERATION BEFORE PROCESSING ALL PARTITIONS SAME PROCESS START PARTITION PROCESS DATA ITEM FINALIZE PARTITION SAME PROCESS START PARTITION PROCESS DATA ITEM FINALIZE PARTITION SAME PROCESS START PARTITION PROCESS DATA ITEM FINALIZE PARTITION PARTITION 1 PARTITION 2 AFTER PROCESSING ALL PARTITIONS PARTITION N AFTER PROCESSING ALL PARTITIONS FINALIZE ITERATION OUTPUT DATA PARTITION PROCESS DATA ITEM INPUT DATA PARTITION DATA PARTITION ITERATIVE ALGORITHMS ONLINE ALGORITHMS 8

3 Experimentation GoalsforExperimentsontheCloud After using our framework in an opportunistic setting Understand how to deploy our framework on Amazon Cloud Services Validate transparency for the experimenter Understand cost patterns 9

3 Experimentation Cloud deployment AMAZON CLOUD WORKER WORKER AMAZON LOCAL NETWORK WORKER WORKER S3 Dynamo DB WORKER PIPELINES SCHEDULES DATASETS LOCAL FRAMEWORK WEB BROWSER AMAZON API MANAGE WORKERS (LAUNCH, SHUTDOWN) EXPERIMENTER 10

3 Experimentation Cloud usage # prepare files with AWS credentials and desired pipeline > load.imgs images/home/me/clef/images 306,530 files loaded into table images > pipeline.load bof.pipeline pipeline successfully loaded. pipeline number is 4 > pipeline.prepare 4 pipeline generated 56 jobs > aws.launchvms ami-30495 15 launching 15 instances of ami-30495 > pipeline.info 4.. > aws.termvms ami-30495 5 terminating 5 instances of ami-30495 > pipeline.prepare 4 11

3 Experimentation Experimentssetup simple experiments before committing budget for Amazon 500KB images Amazon EC2 t1.micro instances(613mb 1 virtual core) fe-simple simmulated image feature extraction 0-2 secs fe-complex simmulated image feature extraction 5-8 secs 12

3 Experimentation Experimentsresult100 imgs FE SIMPLE FE COMPLEX 13

3 Experimentation Experimentsresults1000 imgs FE SIMPLE FE COMPLEX 14

3 Experimentation Experimentsscalability 1000 imgs 100 imgs 15

3 Experimentation Additionalexperiments Images from ImageCLEF2012 medical challenge. http://www.imageclef.org/2012/medical 306,530 medical images from journal papers in biomedical sciences x-rays, MRI, PET, diagrams, ultrasound, microscopy, gammagraphy, mostly in JPEG format at medium resolution (avg width 512 pixels). 16

3 Experimentation Additionalexperiments CEDD EXTRACTION OF 3K ImageCLEFMedical 3 STAGE PIPELINE ON 30K ImageCLEF Medical 1 Patch extraction 2 Kmeans codebook 3 BoF Representation 17

Additionalexperiments Submission to ImageCLEF2012 medical All 300K images processed and annotated 36 workers over 6 scattered servers One run(parameters configuration) 116 minselapsedtime 40 hourscompute time 3 Experimentation Ad-hoc image retrieval task: 1st outof 53 in textual classification 3rd outof 36 in visual classification http://www.imageclef.org/2012/medical 18

4 Conclusions Conclusions Tool for supporting the full image processing lifecycle First scalability behavior on Amazon Cloud Decoupled algorithm design from deployment Adapted to our reality grasp whatever computer resources available Streamlined internal software process Unified software repository AGILE EXPERIMENTATION LIFECYCLE LITTLE A PRIORI KNOWLEDGE ON COMPUTING RESOURCES AVAILABLE 19

4 Conclusions Futurework Extend support to additional machine learning processes Larger scalability experiments(# workers) Optimize Amazon usage(dynamodb) Better understand Amazon costs patterns Experience with NoSQL scalability(hbase) Better understand limitations of each deployment model (opportunistic, cluster, cloud) Move to >1M image datasets 20