Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu



Similar documents
How To Scale Out Of A Nosql Database

Getting Started with Hadoop. Raanan Dagan Paul Tibaldi

Lecture 10: HBase! Claudia Hauff (Web Information Systems)!

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

HYBRID CLOUD SUPPORT FOR LARGE SCALE ANALYTICS AND WEB PROCESSING. Navraj Chohan, Anand Gupta, Chris Bunch, Kowshik Prakasam, and Chandra Krintz

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

What We Can Do in the Cloud (2) -Tutorial for Cloud Computing Course- Mikael Fernandus Simalango WISE Research Lab Ajou University, South Korea

A programming model in Cloud: MapReduce

Hadoop Ecosystem B Y R A H I M A.

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 15

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

Hadoop 只 支 援 用 Java 開 發 嘛? Is Hadoop only support Java? 總 不 能 全 部 都 重 新 設 計 吧? 如 何 與 舊 系 統 相 容? Can Hadoop work with existing software?

Hadoop. Sunday, November 25, 12

Lecture 10 Fundamentals of GAE Development. Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu

Prepared By : Manoj Kumar Joshi & Vikas Sawhney

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 14

Open Cloud System. (Integration of Eucalyptus, Hadoop and AppScale into deployment of University Private Cloud)

Sentimental Analysis using Hadoop Phase 2: Week 2

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview

Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. Big Data Management and Analytics

The Cloud to the rescue!

Jeffrey D. Ullman slides. MapReduce for data intensive computing

The Hadoop Eco System Shanghai Data Science Meetup

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Big Data Course Highlights

Peers Techno log ies Pv t. L td. HADOOP

Case Study : 3 different hadoop cluster deployments

Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop

MapReduce with Apache Hadoop Analysing Big Data

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu

TRAINING PROGRAM ON BIGDATA/HADOOP

Big Data Explained. An introduction to Big Data Science.

Realtime Apache Hadoop at Facebook. Jonathan Gray & Dhruba Borthakur June 14, 2011 at SIGMOD, Athens

Cloud Computing Training

Accelerating and Simplifying Apache

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

Data-Intensive Programming. Timo Aaltonen Department of Pervasive Computing

Hadoop IST 734 SS CHUNG

BIG DATA TRENDS AND TECHNOLOGIES

BIG DATA What it is and how to use?

Google Cloud Platform The basics

BIG DATA HADOOP TRAINING

Hadoop implementation of MapReduce computational model. Ján Vaňo

Cloudera Certified Developer for Apache Hadoop

appscale: open-source platform-level cloud computing

Introduction To Hive

Chase Wu New Jersey Ins0tute of Technology

Dominik Wagenknecht Accenture

Where We Are. References. Cloud Computing. Levels of Service. Cloud Computing History. Introduction to Data Management CSE 344

Workshop on Hadoop with Big Data

Lecture 6 Cloud Application Development, using Google App Engine as an example

SOA, case Google. Faculty of technology management Information Technology Service Oriented Communications CT30A8901.

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

Large Scale Text Analysis Using the Map/Reduce

MySQL and Hadoop. Percona Live 2014 Chris Schneider

Big Data with Component Based Software

Cloud Computing at Google. Architecture

Qsoft Inc

Distributed Data Parallel Computing: The Sector Perspective on Big Data

Big Data and Scripting Systems build on top of Hadoop

Pro Apache Hadoop. Second Edition. Sameer Wadkar. Madhu Siddalingaiah

PaaS - Platform as a Service Google App Engine

Introduction to Cloud Computing

!"#$%&' ( )%#*'+,'-#.//"0( !"#$"%&'()*$+()',!-+.'/', 4(5,67,!-+!"89,:*$;'0+$.<.,&0$'09,&)"/=+,!()<>'0, 3, Processing LARGE data sets

Big Data: Using ArcGIS with Apache Hadoop. Erik Hoel and Mike Park

ITG Software Engineering

Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam

Big Data With Hadoop

Introduction to Hbase Gkavresis Giorgos 1470

Hadoop Job Oriented Training Agenda

Session: Big Data get familiar with Hadoop to use your unstructured data Udo Brede Dell Software. 22 nd October :00 Sesión B - DB2 LUW

Xiaoming Gao Hui Li Thilina Gunarathne

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

Open Source Technologies on Microsoft Azure

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2012/13

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9a: Cloud Computing WHAT IS CLOUD COMPUTING? 2

HDFS. Hadoop Distributed File System

PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS

The Big Data Ecosystem at LinkedIn. Presented by Zhongfang Zhuang

Apache Flink Next-gen data analysis. Kostas

HBase A Comprehensive Introduction. James Chin, Zikai Wang Monday, March 14, 2011 CS 227 (Topics in Database Management) CIT 367

Upcoming Announcements

CSE-E5430 Scalable Cloud Computing Lecture 2

Introduction to Cloud Computing

Apache HBase. Crazy dances on the elephant back

Big Data and Apache Hadoop s MapReduce

Hypertable Architecture Overview

Integrating Big Data into the Computing Curricula

Cloudy with a chance of 0-day

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Fast Data in the Era of Big Data: Tiwtter s Real-Time Related Query Suggestion Architecture

Transcription:

Lecture 4 Introduction to Hadoop & GAE Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu

Outline Introduction to Hadoop The Hadoop ecosystem Related projects How to start Introduction to GAE What is GAE Overview of runtime environment Scalable services Advantages and limitations Billing and free quotas Demo, and how to start 2 / 30

Hadoop Stack & Google s Equivalents Google MapReduce GFS BigTable Hadoop Hadoop MapReduce HDFS HBase Programming Framework Distributed File System Distributed Column Database Sawzall PIG / Hive High-level Language Chubby Zookeeper Distributed Consensus Engine 3 / 30

Pig Data-flow oriented language Pig latin Datatypes include sets, associative arrays, tuples High-level language for routing data, allows easy integration of Java for complex tasks Developed at Yahoo! 4 / 30

Hive SQL-based data warehousing app Feature set is similar to Pig Language is more strictly SQL-esque Supports SELECT, JOIN, GROUP BY, etc. Features for analyzing very large data sets Partition columns Sampling Buckets Developed at Facebook 5 / 30

HBase Row/Column store Billions of rows * millions of columns Column-oriented nulls are free Untyped stores bytes. Constraint access model (key,value) look up Limited transactions ( only one row) 6 / 30

Hbase Data Model Data schema Disk storage 7 / 30

Hbase Design & Features Design similar to GFS Features Name node Master server Data node Region server, organized in columns and cells Fault tolerant and auto load balancing Fast access to cells, and fast scan over the ranges of rows. More flexible schema than traditional database. Less transaction support and weak consistency guarantee 8 / 30

HBase as a MapReduce Input Each row is an input record to MapReduce MapReduce jobs can sort/search/index/query data in bulk *If you are interested in knowing more about HBase, you may take a look at Cloudera s training video on HBase. 9 / 30

Zookeeper Distributed consensus engine Provides well-defined concurrent access semantics: Leader election Service discovery Distributed locking / mutual exclusion Message board / mailboxes 10 / 30

Pipes, Streaming Multi-language connector libraries for MapReduce Write native-code MapReduce in C++ Write MapReduce passes in arbitrary scripting languages 11 / 30

Hadoop related projects Avro: A data serialization system Chukwa Hadoop log aggregation Scribe More general log aggregation Mahout Machine learning library Cassandra Column store database on a P2P backend 12 / 30

Hadoop Status Still under active development Current stable release: 0.20.2 ( Hadoop official websites) There are some other well-maintained distribution Cloudera s CDH2 Yahoo s Distribution: Hadoop 0.20.10 Supported platform Linux as production platform/win32 as a dev platform Get yourself started with (Also Lab1 s task) Download a Hadoop stable release Setup a single-node Hadoop installation Try out the HDFS operations Read WordCount Example codes, and run your first MR job on Hadoop 13 / 30

Introduction to Google App Engine (GAE) SaaS Software as a Service PaaS Platform as a Service #2: The application development on top of PaaS platform IaaS Infrastructure as a Service #1: The technology drives PaaS 14 / 30

What is Google App Engine A PaaS platform for hosting web applications in Googlemanaged data centers. Released on April 08 with Python support. Java included on May 09. + + = Google App Engine Java Language Google Web Toolkit Google App Engine for Java 15 / 30

A Traditional Scalable Website 16 / 30

A GAE Scalable Website 17 / 30

GAE Advantages Easy to use, scale and manage Run your application on Google s infrastructure Forgot worries of managing your servers Think about developing more features for your web, let Google manage the rest No server restart, no network issues 18 18 / 30

GAE Architecture 19 19 / 30

GAE Java Runtime Environment Java 6 VM Servlet 2.5 Container HTTP Session support (need to enable explicitly) JDO/JPA for Datastore API JSR 107 for Memcache API javax.mail for Mail API javax.net.urlconnection for URLFetch API http://code.google.com/appengine/docs/java/runtime.html 20 / 30

Java Standards on GAE http://code.google.com/appengine/docs/java/runtime.html 21 / 30

Datastore API Storing data and manipulation Based on Bigtable Not a relational database GQL (Google Query Language) Need to use JDO/JPA http://code.google.com/appengine/docs/java/datastore/ 22 22 / 30

Memcache Better than Datastore Storage on memory rather on disk Arbitrary key-value pair mapping It implements JCache interface 1MB limit per entry Free quota 8.6M/day, 800 request/sec http://code.google.com/appengine/docs/java/memcache/ 23 23 / 30

Users & Authentication @gmail.com address Apps for Domain Admin Privileges http://code.google.com/appengine/docs/java/users/ 24 24 / 30

URLFetch Load external URL Asynchronous support HTTP/HTTPS Max 10 second response Max 1MB data http://code.google.com/appengine/docs/java/urlfetch/ 25 25 / 30

Even More Datastore database storage and operations Memcache API high performance in-memory key-value cache User Accounts using Google accounts for authentication URLFetch invoking external URLs Mail sending mail from your application XMPP sending/receiving XMPP-compatible instant messages Task Queues for invoking background processes Images for image manipulation Cron Jobs scheduled tasks on defined time http://code.google.com/appengine/docs/java/apis.html 26 26 / 30

Who is using GAE? http://code.google.com/appengine/casestudies.html 27 / 30

GAE Demo Demo site: http://shen-ma.appspot.com/ Source availale at: https://code.google.com/p/shenma-wish/ 28 / 30

How Do You Start The best way to learn is by practice! Following GAE s Getting-Started: Java, and have your first application online in 2 hrs. (Also Lab 1 Task) Recommend everybody using Eclipse as Dev IDE, GAE offers a very nice plugin Other GAE examples available on our course website 29 / 30

Intro done, ready to get your hands dirty!