Big Data and Cloud Computing



Similar documents
SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

BIG DATA TRENDS AND TECHNOLOGIES

How To Scale Out Of A Nosql Database

Scalable Architecture on Amazon AWS Cloud

You should have a working knowledge of the Microsoft Windows platform. A basic knowledge of programming is helpful but not required.

How To Handle Big Data With A Data Scientist

INTRODUCTION TO CASSANDRA

So What s the Big Deal?

MapReduce with Apache Hadoop Analysing Big Data

Hadoop. Sunday, November 25, 12

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

In-Memory Analytics for Big Data

BIG DATA-AS-A-SERVICE

Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect

NoSQL Data Base Basics

Big Data and Apache Hadoop s MapReduce

NoSQL for SQL Professionals William McKnight

Putting Apache Kafka to Use!

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014

Lecture Data Warehouse Systems

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce

Big Data Technologies Compared June 2014

Real-Time Handling of Network Monitoring Data Using a Data-Intensive Framework

Big Data on Microsoft Platform

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

Big Data: Tools and Technologies in Big Data

Hadoop implementation of MapReduce computational model. Ján Vaňo

An Approach to Implement Map Reduce with NoSQL Databases

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN

Hadoop IST 734 SS CHUNG

Big Data Buzzwords From A to Z. By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012

Open source large scale distributed data management with Google s MapReduce and Bigtable

Large scale processing using Hadoop. Ján Vaňo

Cloud Computing Now and the Future Development of the IaaS

Where We Are. References. Cloud Computing. Levels of Service. Cloud Computing History. Introduction to Data Management CSE 344

Testing Big data is one of the biggest

Cloud & Big Data a perfect marriage? Patrick Valduriez

Implement Hadoop jobs to extract business value from large and varied data sets

Cloud Scale Distributed Data Storage. Jürmo Mehine

The Future of Data Management

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect

Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu

Benchmarking and Analysis of NoSQL Technologies

BIG DATA TOOLS. Top 10 open source technologies for Big Data

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

A Brief Outline on Bigdata Hadoop

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc All Rights Reserved

BIG DATA: STORAGE, ANALYSIS AND IMPACT GEDIMINAS ŽYLIUS

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

Big Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth

Navigating the Big Data infrastructure layer Helena Schwenk

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)

Big Data Analytics: Hadoop-Map Reduce & NoSQL Databases

Beyond Web Application Log Analysis using Apache TM Hadoop. A Whitepaper by Orzota, Inc.

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

Cloud Big Data Architectures

Sentimental Analysis using Hadoop Phase 2: Week 2

Chapter 7. Using Hadoop Cluster and MapReduce

Integrating Hadoop. Into Business Intelligence & Data Warehousing. Philip Russom TDWI Research Director for Data Management, April

Big Data: Beyond the Hype

HDP Hadoop From concept to deployment.

A Survey on Big Data Analytical Tools

Data Integration Checklist

An Integrated Analytics & Big Data Infrastructure September 21, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle Enterprise

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

GigaSpaces Real-Time Analytics for Big Data

An Industrial Perspective on the Hadoop Ecosystem. Eldar Khalilov Pavel Valov

Addressing Open Source Big Data, Hadoop, and MapReduce limitations

Luncheon Webinar Series May 13, 2013

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

Next presentation starting soon Business Analytics using Big Data to gain competitive advantage

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Comparison of the Frontier Distributed Database Caching System with NoSQL Databases

Open Source Technologies on Microsoft Azure

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

Architectures for Big Data Analytics A database perspective

Big Data - Infrastructure Considerations

Introduction to NoSQL Databases. Tore Risch Information Technology Uppsala University

Can the Elephants Handle the NoSQL Onslaught?

Application Development. A Paradigm Shift

The Inside Scoop on Hadoop

A Comparison of Clouds: Amazon Web Services, Windows Azure, Google Cloud Platform, VMWare and Others (Fall 2012)

Virtualizing Apache Hadoop. June, 2012

Preparing Your Data For Cloud

Transcription:

. or g Chunming Rong Tomasz Wiktor Wlodarczyk Chair (IEEE CloudCom) Big Data Chair (IEEE CloudCom) Head (CIPSI) Administrative Head (CIPSI) Professor (UiS) Associate Professor (UiS) chunming.rong@uis.no chunming.rong@uis.no

Index Characterization of Cloud Computing Characterization of Big Data Comparison with HPC Example Applications Activities and Events to Follow 2

3

What is different in cloud? Resource Virtualization Data-driven Shared 4

Virtualized Computing Resource International standards I EE E E E 802.20 Mobile B WA WAN 3G I E E E 802.16 B WA BAN ET SI HiperA H ipera c c ess es s IE I E E E 802.16a WMA WM A N MAN ET SI HiperM AN HiperMA I E E E 802.11 WL A N LAN ET SI HiperL A N I E E E 802.15 B luetoo th luetooth PAN ET SI HiperP A N ERP Ethernet LAN Backbone Network Backbone Fibre Radiolink Satellite Opr Workstation Ethernet LANLAN Ethernet M N A MAN WiMax PAN Wireless sensornetwork Floater 5

You control Shared control What Changes? Vendor control On Premise IaaS Application Application Server Storage Network PaaS SaaS Application Application Virtual Machine Virtual Machine Server Server Server Storage Storage Storage Network Network Network Virtual Machine Mather, Kumaraswamy and Latif, Cloud Security and Privacy, O Reilly 2009 Center for IP-based Service Innovation 6

Openness Shareability and Freedom Open software Open services Open data 7

2020 1 billion new Internet users 20+ million apps 30+ billion devices 1+ trillion sensors 50+ million petabytes of data Everyone, everything interconnected 8

Requirements by Today s Users Accessibility Access from anywhere and from multiple devices Shareability Make sharing as easy as creating and saving Freedom Users don t want their data held hostage Simplicity Easy-to-learn, easy-to-use Security Trust that data will not be lost or seen by unwanted parties 9

Oceans of Data, Skinny Pipes 1 Terabyte Easy to store Hard to move Disks MB / s Time Seagate Barracuda 115 2.3 hours Seagate Cheetah 125 2.2 hours Networks MB / s Time Home Internet < 0.625 > 18.5 days Gigabit Ethernet < 125 > 2.2 hours 10

Scalable Computing Resource 11

Data-Intensive Computing Challenge For computation that accesses 1 TB in 5 minutes Data distributed over 1000+ disks Assuming uniform data partitioning Compute using 1000+ processors Connected by 10 gigabit Ethernet (or equivalent) System Requirements Lots of disks Lots of processors Virtualized architecture in different locations Huge load on network 12

Desiderata for Data Intensive Systems Focus on Data Problem-Centric Programming From simple queries to massive computations Robust Fault Tolerance Platform-independent expression of data parallelism Interactive Access Petabytes, not peta-flops Component failures are handled as routine events Contrast to existing High Performance Computing (HPC) systems 13

Simplistic Comparison Database Mngt System Data stored according to schema Map/Reduce Declarative query language Many sophisticated optimizations Support small & large queries Limited scaling Data stored as unstructured files User-defined map & reduce functions Runtime system fairly straightforward Batch processing of data only Designed to operate on massive scale 14

Cloud Data Management Database Management Systems Relational Database Management Systems (RDBMS) Object-Oriented Database Management Systems (OODBMS) Non-Relational, Distributed DB Mgmt Systems (NRDBMS) Not only Structured Query Language (NoSQL) Online Transaction Processing (OLTP) Real-time Data Warehousing Online Analytical Processing (OLAP) Operational Data Stores (ODS) Enterprise Data Warehouse (EDW) 15

16

Big-Data Aggregated Data From the Following Sources: Traditional Sensory Social Aggregators Predominantly: NRDBMS Column Family Stores: Key-Values Stores: App Engine DataStore (Google), DynamoDB & SimpleDB (AWS) Document Databases: Cassandra (FaceBook), BigTable (Google), HBase (Apache) CouchDB (Apache), MongoDB Graph Databases: Neo4J (Neo Technology) 17

ERAC project (supported by NFR) Efficient and Robust Architecture for the Big Data Cloud 2012 2016 18

Big-Data Processing Serial Processing Hadoop Hadoop Distributed File System (HDFS) Hive Data Warehouse Pig Querying Language Parallel Processing HadoopDB (Yale) Other Analytics Processing Google MapReduce Splunk for Security Information / Event Management [SIEM] 19

Open-source Java MapReduce for reliable, scalable, distributed computing. Moving Computation is Cheaper than Moving Data 20

21

HadoopDB Open Source Parallel Database A hybrid of DBMS and MapReduce technologies that targets analytical workloads Designed to run on a shared-nothing cluster of commodity machines, or in the cloud An attempt to fill the gap in the market for a free and open source parallel DBMS Much more scalable than currently available parallel database systems and DBMS/MapReduce hybrid systems. As scalable as Hadoop, while achieving superior performance on structured data analysis workloads Database layer with a cluster of multiple single-node DBMS servers Hadoop/MapReduce as a communication layer that coordinates the multiple nodes each running eg. PostgreSQL or MySQL Hive as the translation layer A shared-nothing parallel database, that business analysts can interact with using a SQL-like language. 22

23

ERAC project (supported by NFR) Efficient and Robust Architecture for the Big Data Cloud 2012 2016 24

CIPSI Infrastructure: Data cluster Click to edit Master text styles Second level Third level Fourth level Fifth level 39 nodes 1 TB RAM 25

26

Time Series Data To store, index and serve metrics collected from large scale systems To make this data easily accessible and graphable 27

Fine grained Real-time Monitoring Get real-time state information about infrastructure and services Understand outages or how complex systems interact together Measure SLAs (availability, latency, etc.) Tune applications and databases for maximum performance Do capacity planning 28

OpenTSDB 29

Smart Grid @ Clouds 30

Self learning Energy Efficient buildings and open Spaces http://www.seeds-fp7.eu EeB-ICT-2011.6.4 ICT for energy-efficient buildings and spaces of public use UiS Demo Space 31

Second level Click to edit Master text styles Analysis of IoT Data Third level Fourth level Fifth level 32

Click to edit Master text styles Second level Third level Fourth level Fifth level Analysis of IoT Data 33

Aging in Place Safer@Home Click to edit Master text styles Second level Third level Fourth level Fifth level Analysis of IoT Data 34

35

36

e-health @ Clouds Integrated Service Platform Anywhere Access... Home Automation Notificatio Data Analytic Workflow n Server Engine SLA, Billing, Provisioning Monitoring Planning Care Sensors Subscriber, Service Management Internet Security IP-based Networks Social interaction/ Gaming l n rt u c e S Smart Device Access Gateway Home Gateway Firewall Internet Services Health & Care Service Portal Services Stakeholders 37

Secure Architecture Data Collector Configurabl e Secure Lightweight Data Receiver Deidentified Analysis of IoT Data Data Store 38

Lysstyring 39

40

Science @ Cloud 41

Norway and North America Program Education is currently one the biggest challenges in Big Data and Dataintensive Science 2 MNOK (2012 2016) staff and student exchanges; joint curriculum development, teaching and student supervision further funding collaboration 43

201 0. or g 200 9 201 1 201 2 44

CloudCom 2013 Bristol, UK, Dec. 2-5, 2013 http://2013.cloudcom.org

EU-China Workshop on HPC Cloud & Big-Data Stavanger, 20 21 June, 2013 http://euchina2013.cloudcom.org