Tunebot in the Cloud. Arefin Huq 18 Mar 2010



Similar documents
Technology and Cost Considerations for Cloud Deployment: Amazon Elastic Compute Cloud (EC2) Case Study

Amazon Elastic Beanstalk

Amazon Elastic Compute Cloud Getting Started Guide. My experience

Hosting Requirements Smarter Balanced Assessment Consortium Contract 11 Test Delivery System. American Institutes for Research

Web Application Deployment in the Cloud Using Amazon Web Services From Infancy to Maturity

Benchmarking Cassandra on Violin

BASICS OF SCALING: LOAD BALANCERS

OTM in the Cloud. Ryan Haney

Variations in Performance and Scalability when Migrating n-tier Applications to Different Clouds

When talking about hosting

Amazon Hosted ESRI GeoPortal Server. GeoCloud Project Report

Hosting Requirements Smarter Balanced Assessment Consortium Contract 11 Test Delivery System. American Institutes for Research

HP OO 10.X - SiteScope Monitoring Templates

Scalable Architecture on Amazon AWS Cloud

Cloud Computing Workload Benchmark Report

Scalable Application. Mikalai Alimenkou

Cloud Performance Benchmark Series

An Overview of Amazon Silk Amazon s new cloud-powered browser

EXECUTIVE SUMMARY CONTENTS. 1. Summary 2. Objectives 3. Methodology and Approach 4. Results 5. Next Steps 6. Glossary 7. Appendix. 1.

Performance test report

How AWS Pricing Works

A Middleware Strategy to Survive Compute Peak Loads in Cloud

Drupal in the Cloud. by Azhan Founder/Director S & A Solutions

Informatica Data Director Performance

AWS Account Setup and Services Overview

How AWS Pricing Works May 2015

Scalable Run-Time Correlation Engine for Monitoring in a Cloud Computing Environment

Amazon EC2 Product Details Page 1 of 5

membase.org: The Simple, Fast, Elastic NoSQL Database NorthScale Matt Ingenthron OSCON 2010

Cloud Compu)ng. [Stephan Bergemann, Björn Bi2ns] IP 2011, Virrat

TECHNOLOGY WHITE PAPER Jan 2016

VMUnify EC2 Gateway Guide

Cloud Based Tes,ng & Capacity Planning (CloudPerf)

FortiGate Amazon Machine Image (AMI) Selection Guide for Amazon EC2

!"#$%&'()*'+),-./)0' 9##+':,%-.;),0'

Cloud Performance Benchmark Series

Performance Benchmark for Cloud Databases

CUMULUX WHICH CLOUD PLATFORM IS RIGHT FOR YOU? COMPARING CLOUD PLATFORMS. Review Business and Technology Series

Introduction to Cloud Computing on Amazon Web Services (AWS) with focus on EC2 and S3. Horst Lueck

WEBAPP PATTERN FOR APACHE TOMCAT - USER GUIDE

Architecting ColdFusion For Scalability And High Availability. Ryan Stewart Platform Evangelist

A Monitored Student Testing Application Using Cloud Computing

TECHNOLOGY WHITE PAPER Jun 2012

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

Introduction to Cloud Computing

JAVA IN THE CLOUD PAAS PLATFORM IN COMPARISON

OUR TEAM. Enterprise Application Experts

AppDynamics Lite Performance Benchmark. For KonaKart E-commerce Server (Tomcat/JSP/Struts)

This document will list the ManageEngine Applications Manager best practices

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9a: Cloud Computing WHAT IS CLOUD COMPUTING? 2

PostgreSQL Performance Characteristics on Joyent and Amazon EC2

Développement logiciel pour le Cloud (TLC)

Liferay Portal s Document Library: Architectural Overview, Performance and Scalability

Building Success on Acquia Cloud:

Liferay Portal Performance. Benchmark Study of Liferay Portal Enterprise Edition

ArcGIS 10.3 Server on Amazon Web Services

Estimating the Cost of a GIS in the Amazon Cloud. An Esri White Paper August 2012

Resource Sizing: Spotfire for AWS

Public Cloud Offerings and Private Cloud Options. Week 2 Lecture 4. M. Ali Babar

Shadi Khalifa Database Systems Laboratory (DSL)

How To Choose Between A Relational Database Service From Aws.Com

EWeb: Highly Scalable Client Transparent Fault Tolerant System for Cloud based Web Applications

Introduction to DevOps on AWS

Administrative Issues

Managing and Conducting Biomedical Research on the Cloud Prasad Patil

Legal Notices Introduction... 3

Cloud Computing. Adam Barker

Handling Flash Crowds From Your Garage

DataCenter optimization for Cloud Computing

Business white paper. HP Process Automation. Version 7.0. Server performance

Open Cloud System. (Integration of Eucalyptus, Hadoop and AppScale into deployment of University Private Cloud)

Introduction 1 Performance on Hosted Server 1. Benchmarks 2. System Requirements 7 Load Balancing 7

1.0 Hardware Requirements:

Capacity Planning Guide for Adobe LiveCycle Data Services 2.6

Design for Failure High Availability Architectures using AWS

The Cost of the Cloud. Steve Saporta CTO, SwipeToSpin Mar 20, 2015

How To Monitor A Server With Zabbix

ITEC 495 Capstone Project Ideas

McAfee Public Cloud Server Security Suite

Amazon CloudWatch to monitor cloud resource usage

f...-. I enterprise Amazon SimpIeDB Developer Guide Scale your application's database on the cloud using Amazon SimpIeDB Prabhakar Chaganti Rich Helms

Storage Options in the AWS Cloud: Use Cases

Clustering with Tomcat. Introduction. O'Reilly Network: Clustering with Tomcat. by Shyam Kumar Doddavula 07/17/2002

Cloud Computing For Bioinformatics

Session 3. the Cloud Stack, SaaS, PaaS, IaaS

Economic Cloud Computing What to keep in mind when using the Cloud...

CentOS Linux 5.2 and Apache 2.2 vs. Microsoft Windows Web Server 2008 and IIS 7.0 when Serving Static and PHP Content

Evidence based performance tuning of

bla bla OPEN-XCHANGE Open-Xchange Hardware Needs

Dualog Connection Suite Hardware and Software Requirements

Amazon Web Services Yu Xiao

A Comparison of Clouds: Amazon Web Services, Windows Azure, Google Cloud Platform, VMWare and Others (Fall 2012)

MySQL and Virtualization Guide

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

Web Application Hosting in the AWS Cloud Best Practices

Very Large Enterprise Network Deployment, 25,000+ Users

Amazon Web Services. Elastic Compute Cloud (EC2) and more...

Alfresco Enterprise on AWS: Reference Architecture

Transcription:

Tunebot in the Cloud Arefin Huq 18 Mar 2010

What is Tunebot?

What is Tunebot? http://tunebot.cs.northwestern.edu Automated online music search engine for query-by-humming (QBH).

What is Tunebot? http://tunebot.cs.northwestern.edu Automated online music search engine for query-by-humming (QBH). Users sing or hum tunes to search. Queries are matched against other sung examples that have been contributed.

What is Tunebot? http://tunebot.cs.northwestern.edu Automated online music search engine for query-by-humming (QBH). Users sing or hum tunes to search. Queries are matched against other sung examples that have been contributed. Current DB: nearly 10K examples of over 3000 songs

What is Tunebot? A project of the Interactive Audio Lab led by Prof. Bryan Pardo (EECS) http://music.cs.northwestern.edu Single-machine locally-hosted installation Web/Flash/iPhone client PHP/MySQL server-side front-end on Apache Java/MySQL back-end as a Tomcat servlet

Tunebot Traffic (Source: Google Analytics)

Architecture

Goals Improve query response time to < 1 sec Typical query takes 5 seconds to complete Computation is linear in DB size

Goals Improve query response time to < 1 sec Typical query takes 5 seconds to complete Computation is linear in DB size Handle larger database DB expected to grow to critical mass of 10K songs

Goals Improve query response time to < 1 sec Typical query takes 5 seconds to complete Computation is linear in DB size Handle larger database DB expected to grow to critical mass of 10K songs Adapt to growing and varying load Handle traffic spikes

Project Status Task Port front-end to Linux Port back-end to Linux Write load-testing framework Deploy back-end in Cloud (single instance) Deploy front-end in Cloud (single instance) Decouple database from front-end Deploy load-balancing front-end with replicated DBs Parallelize search for single user Dynamic provisioning

Project Status Task Port front-end to Linux Port back-end to Linux Write load-testing framework Deploy back-end in Cloud (single instance) Deploy front-end in Cloud (single instance) Decouple database from front-end Deploy load-balancing front-end with replicated DBs Parallelize search for single user Dynamic provisioning Done? Yes Yes No Yes Yes No No Yes No

Linux Port

Linux Port equivalent software installation and configuration file system dependencies permissions groups, sticky bits, umask Adobe Flash Media Server

Amazon Cloud

Amazon Cloud http://ec2-174-129-48-175.compute-1.amazonaws.com/beta/ Ubuntu 9.10 64-bit AMI EC2 m1.large instance 2 virtual cores x 2 EC2 compute units 7.5 GB memory high IO Elastic Block Store

Parallelization

Algorithm Users contribute songs to the database by singing a portion of the melody and labeling it. This is converted to an internal representation (NIS) that is similar to MIDI or musical notation. This representation is the database key. When a user searches for a song their singing is also converted to NIS format.

Parallelization Search requires a linear scan of the database and a score of how well the query matches each key. Overall algorithm is O(n). Cost of computing each score is proportional to st where s and t are the lengths of the query and key respectively. Solution is to compute scores in parallel.

That s not the problem. Except

Except That s not the problem. Most of the time is spent converting the input audio file to NIS format. (all times in seconds) Query Length Query response time (HttpServlet:doPost) Audio conversion time 0.19 0.147 0.024 18.11 4.99 4.58 48.51 13.2 12.4

Speeding up Audio Conversion Replace hand-coded conversion with optimized third-party code. Eliminate lots of unnecessary data copying. Improve parameter choices? Use System.arrayCopy() instead of for-loops. Convert audio while streaming? Parallelize the audio conversion?

Parallelization But O(n) will be an issue eventually.

Parallelization But O(n) will be an issue eventually. (Intel 2.4GHz Core 2 Quad 64-bit Java) Query Length 1 thread 4 threads Speedup 0.19 s 40 ms 23 ms 1.7x 18.11 s 331 ms 195 ms 1.7x 48.51 s 749 ms 425 ms 1.8x

Database and File System

Database and File System Front-end and back-end talk directly to the database and file system. Back-end assumes all files (currently 30+ GB) are available. Requires architectural redesign to change. Amazon Cloud does not have a notion of shared file system. Instance Store EBS Simple Storage Service

Database and File System Database usage pattern is simple mostly SELECT with few predicates or joins. Amazon Simple DB Amazon Relational Database Service

Dynamic Provisioning Amazon CloudWatch Amazon Elastic Load Balancing Amazon Auto Scaling All depends on decoupling database and file system and managing concurrency. Need to remove hardcoded URLs in PHP and Flash.