Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu



Similar documents
What is cloud computing?

Cloud Computing Training

Open Cloud System. (Integration of Eucalyptus, Hadoop and AppScale into deployment of University Private Cloud)

Cloud Computing. Summary

Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu

Introduction to Cloud Computing

Cloud Computing Now and the Future Development of the IaaS

Introduction to Cloud Computing

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 14

Cloud Computing and Amazon Web Services

BIG DATA TRENDS AND TECHNOLOGIES

Cloud Computing. Technologies and Types

Open source Google-style large scale data analysis with Hadoop

Cloud Computing Summary and Preparation for Examination

ATI Cloud Computing.

Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop

Big Data Explained. An introduction to Big Data Science.

DATA MINING WITH HADOOP AND HIVE Introduction to Architecture

Hadoop: Distributed Data Processing. Amr Awadallah Founder/CTO, Cloudera, Inc. ACM Data Mining SIG Thursday, January 25 th, 2010

Cloud Computing. What is it? Presented by Prof. Dr.Prabhas CHONGSTITVATANA Asst. Prof. Dr.Chaiyachet SAIVICHIT. Source : Montana State Library Archive

Large-scale Data Processing on the Cloud

Cloud Computing and Amazon Web Services. CJUG March, 2009 Tom Malaher

Introduction to Cloud : Cloud and Cloud Storage. Lecture 2. Dr. Dalit Naor IBM Haifa Research Storage Systems. Dalit Naor, IBM Haifa Research

Ø Teaching Evaluations. q Open March 3 through 16. Ø Final Exam. q Thursday, March 19, 4-7PM. Ø 2 flavors: q Public Cloud, available to public

CPS 216: Advanced Database Systems (Data-intensive Computing Systems) Shivnath Babu

Cloud Courses Description

How To Understand Cloud Computing

Application Development. A Paradigm Shift

Cloud Courses Description

Coding Techniques for Efficient, Reliable Networked Distributed Storage in Data Centers

What Is It? Business Architecture Research Challenges Bibliography. Cloud Computing. Research Challenges Overview. Carlos Eduardo Moreira dos Santos

Open source large scale distributed data management with Google s MapReduce and Bigtable

Cloud Computing. Adam Barker

Viswanath Nandigam Sriram Krishnan Chaitan Baru

Research Paper Available online at: A COMPARATIVE STUDY OF CLOUD COMPUTING SERVICE PROVIDERS

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware

How To Understand Cloud Computing

Where We Are. References. Cloud Computing. Levels of Service. Cloud Computing History. Introduction to Data Management CSE 344

Large-Scale Data Processing

Cloud Computing: Computing as a Service. Prof. Daivashala Deshmukh Maharashtra Institute of Technology, Aurangabad

Cloud Computing. Lecture 24 Cloud Platform Comparison

Security and Privacy in Cloud Computing

High Performance Computing Cloud Computing. Dr. Rami YARED

Hadoop and Map-Reduce. Swati Gore

Emerging Technology for the Next Decade

Hadoop IST 734 SS CHUNG

Introduction to Engineering Using Robotics Experiments Lecture 18 Cloud Computing

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

CSE-E5430 Scalable Cloud Computing Lecture 2

THE HADOOP DISTRIBUTED FILE SYSTEM

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 15

ArcGIS for Server: In the Cloud

Big Data and Apache Hadoop s MapReduce

24/11/14. During this course. Internet is everywhere. Frequency barrier hit. Management costs increase. Advanced Distributed Systems Cloud Computing

Data Centers and Cloud Computing

Open Cirrus: Towards an Open Source Cloud Stack

6.S897 Large-Scale Systems

How To Understand Cloud Computing

Cloud Computing An Elephant In The Dark

Cloud computing - Architecting in the cloud

How To Compare Cloud Computing To Cloud Platforms And Cloud Computing

Using Cloud Services for Test Environments A case study of the use of Amazon EC2

Private Clouds with Open Source

Building Out Your Cloud-Ready Solutions. Clark D. Richey, Jr., Principal Technologist, DoD

Apache Hadoop. Alexandru Costan

Introduction to Big Data & Basic Data Analysis. Freddy Wetjen, National Library of Norway.

Contents. 1. Introduction

Business applications:

Large scale processing using Hadoop. Ján Vaňo

Cloud computing. Examples

Unit 10b: Introduction to Cloud Computing

APP DEVELOPMENT ON THE CLOUD MADE EASY WITH PAAS

Market Basket Analysis Algorithm on Map/Reduce in AWS EC2

Computing. M< MORGAN KAUfMANM. Distributed and Cloud. Processing to the. From Parallel. Internet of Things. Geoffrey C. Fox. Dongarra.

MapReduce with Apache Hadoop Analysing Big Data

BIG DATA What it is and how to use?

MapReduce, Hadoop and Amazon AWS

Elastic Cloud Computing in the Open Cirrus Testbed implemented via Eucalyptus

Building Blocks of the Private Cloud

Cloud Computing and Big Data What Technical Writers Need to Know

Hadoop implementation of MapReduce computational model. Ján Vaňo

Cloud Computing. Chapter 1 Introducing Cloud Computing

CLOUD COMPUTING USING HADOOP TECHNOLOGY

Data Centers and Cloud Computing. Data Centers

Systems Engineering II. Pramod Bhatotia TU Dresden dresden.de

A Comparison of Clouds: Amazon Web Services, Windows Azure, Google Cloud Platform, VMWare and Others (Fall 2012)

Transcription:

Lecture 1 Introduction to Cloud Computing Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu

Outline What is cloud computing? How it evolves? What are the underlying key technologies? 2 / 44

Question #1 What is Cloud Computing? 3 / 44

What is Cloud Computing? 5 people might have 8 answers! Widely distributed, network based, storage, computation, utility computing, HaaS, PaaS, SaaS. 5 / 44

A Customer-Oriented Definition Anytime, Anywhere, With any device, Accessing any services How many of them you still store on your local computer? Email Calendars and contacts Photo/video sharing Document sharing, or Anything? 6 / 44

A Business-Oriented Definition Key Characteristic Universal Access Scalable Services Infrastructure managing the scaling, not applications Elasticity: Expenses only incurred when they are needed New Application Service Models XaaS = X as a Service Pay-as-you-go 7 / 44

Public Cloud #1: Amazon Amazon EC2 Elastic Cloud Computing virtual servers for rent, called Amazon Machine Images (AMIs) based on Xen priced on per hour from $0.085 to $1 Amazon S3 Simple Storage Service up to $0.18 per GB storage from $0.10 per GB transfer via o REST o SOAP o BitTorrent 8 / 44

Public Cloud #2: Google A web application development framework and hosting solution rolled into one That uses the infrastructure available at Google o so their servers + storage: BigTable Charges o 500MB of storage o up to 5 million page views a month o 10 applications per developer account o pay for an extension Python/JAVA and GAE SDK 9 / 44

Public cloud #3: Microsoft Azure Services Released in Feb, 2010 A cloud service operating system that supports the service development/ hosting/management environment o.net Services o SQL Services o Live Services Pricing 10 / 44

Cloud Services Taxonomy Increasing Virtualization Flexibility of Offering SaaS Software as a Service PaaS Platform as a Service IaaS Infrastructure as a Service 11 / 44

Everything as a Service 12 / 44

How it evolves? 1. Web-scale problems 2. Large data centers 3. Different models of computing 13 / 44

Web-scale Problem Characteristics: Definitely data-intensive May also be processing intensive Examples: Crawling, indexing, searching, mining the Web Post-genomics life sciences research Other scientific data (physics, astronomers, etc.) Sensor networks Web 2.0 applications 14 / 44

How much data? Wayback Machine has 2 PB + 20 TB/month (2006) Google processes 20 PB a day (2008) all words ever spoken by human beings ~ 5 EB(1K PB) NOAA has ~1 PB climate data (2007) CERN s LHC will generate 15 PB a year (2008) 640K ought to be enough for anybody. 15 / 44

Data Inspiration (Banko and Brill, ACL 2001) (Brants et al., EMNLP 2007) / 44 16

What to do with more data? Answering factoid questions Pattern matching on the Web Works amazingly well Who shot Abraham Lincoln? X shot Abraham Lincoln Learning relations Start with seed instances Search for patterns on the Web Using patterns to find more instances Birthday-of(Mozart, 1756) Birthday-of(Einstein, 1879) (Brill et al., TREC 2001; Lin, ACM TOIS 2007) (Agichtein and Gravano, DL 2000; Ravichandran and Hovy, ACL 2002; ) Wolfgang Amadeus Mozart (1756-1791) Einstein was born in 1879 PERSON (DATE PERSON was born in DATE 17 / 44

Large Data Centers Web-scale problems? Throw more machines at it! Clear trend: centralization of computing resources in large data centers Necessary ingredients: commodity, network, juice, and space Analogy to the power station Important Issues: Redundancy Efficiency Utilization Management April 17, 11 18 / 44

A Sample Cloud Topology Introduction to Cloud Computing Core Switch Top of the Rack Switch Rack Servers

Scale of Industry Datacenters Microsoft [NYTimes, 2008] 150,000 machines Growth rate of 10,000 per month Largest datacenter: 48,000 machines 80,000 total running Bing Yahoo! [Hadoop Summit, 2009] 25,000 machines Split into clusters of 4000 AWS EC2 (Oct 2009) 40,000 machines 8 cores/machine Google (Rumored) several hundreds of thousands of machines 20 / 44

Different Models of Computing from 2005, EC2, hadoop Map-reduce etc 2000s, Voluteer computing, Utility Computing, Globus, Bonic, glite 1990s, WWW, CORBA, DCOM, RMI 1960 1980s 21 / 44

Grid Computing Vs Cloud Computing Reference:http://www.slideshare.net/DSPIP/cloud-computing-introduction-2978287 4/17/11 22 / 44

What( s new) in Today s Clouds? On-demand access: Pay-as-you-go, no upfront commitment. Anyone can access it Data-intensive Nature: What was MBs has now become TBs. Daily logs, forensics, Web data, etc. New Cloud Programming Paradigms: MapReduce/Hadoop, Pig Latin, DryadLinq, Swift, and many others. High in accessibility and ease of programmability Combination of one or more of these gives rise to novel and unsolved distributed computing problems in cloud computing. 23 / 44

Outline What is cloud computing? Who is in this game? How it evolves? What are the underlying key technologies? 24 / 44

Service Model and Key Technologies Infrastructure as a Service (IaaS) Why buy machines when you can rent cycles? Key Technology: Virtualization Platform as a Service (PaaS) Give me nice API and take care of the implementation Key Technology: New cloud programming paradigm, i.e. MapReduce, PIG, HIVE etc Software as a Service (SaaS) Just run it for me! Key Technology: Everybody has their own secret sauces, but Ajax is de-facto front-end. 25 / 44

Virtualization at a Glance Run multiple virtual computers on one physical box Lots of way to do it Xen VMWare Parallels Amazon AMI Microsoft Hype V 26 / 44

Virtualization Benefit App App App App App App Operating System OS OS OS Hardware Hypervisor Traditional Stack Hardware 5 to 15 % utilization only Virtualized Stack High utilization and standardization 27 / 44

New Cloud Programming Paradigms A typical cloud-era task: Given 100 computers, how do you compute the frequency of words in 1T text files? To utilize the underlying computing power, you need a new paradigm for storing and processing large scale of data 28 / 44

New Cloud Programming Paradigms Google Hadoop (Open Source) Dist. File System GFS HDFS No-SQL DB BigTable HBase Programming Framework High-level Language MapReduce Sawzall Hadoop MapReduce PIG (Yahoo) / Hive(FaceBook) Microsoft Dryad (Generalized MR) DryadLING 29 / 44

A Brief History in this Area 2003, First MapReduce Lib developed in Google 2003, 2004, and 2006, Google published papers on GFS/MapReduce/BigTable. 2005- Now, Hadoop project (open source version of GFS/BigTable/MapReduce), initiated by Doug Cutting, sponrsed by Yahoo 2008/2009, Yahoo/Facebook contributed PIG/Hive on top of hadoop. 30 / 44

Cloud Computing Usages Google (MapReduce) Indexing: a chain of 24 MapReduce jobs 450K nodes, ~200K jobs processing 50PB/month (in 2006) Yahoo! (Hadoop + Pig) WebMap: a chain of 100 MapReduce jobs 2500 nodes, 280 TB of data, Facebook (Hadoop + Hive) 2250 nodes, adding 80-90TB/day (in 2010) 25K jobs/day Taobao (Hadoop + TFS + Hbase) 1300 nodes, 9.3PB (2010) 1800 hadoop jobs per day Baidu Their own implementation of hadoop in C++ 4000 nodes (2010) Guess what is the main-stream configuration for each node? 31 / 44

What We Cover in this Course? SaaS Software as a Service PaaS Platform as a Service IaaS Infrastructure as a Service #2: The application development on top of PaaS platform #1: The technology drives PaaS 32 / 44

Skills you would learn Only if you put enough efforts on it. Skill 1 st: Know how to process Terabytes of data. A basic skill for anyone who stays in IT industry in this digital-era. Skill 2 nd: Know how to put up the platform if you are given the chance & resources. Critical for anyone who want to become an excellent engineering in a big corporation Skill 3 rd: Know how to quickly implement your ideas on top of cloud service, and make it big. Critical for anyone who want to run your own startups and dreaming to be a billionaire. 33 / 44

Thank you & Welcome to Cloud Era! Introduction to Cloud Computing