Cosmos. Big Data and Big Challenges. Pat Helland July 2011

Similar documents
Cosmos. Big Data and Big Challenges. Ed Harris - Microsoft Online Services Division HTPS 2011

Big Data With Hadoop

Data Management in the Cloud

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Counterfactual analysis: a Big Data case-study using Cosmos/SCOPE. Ed Snelson

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

An Oracle White Paper November Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

Hadoop IST 734 SS CHUNG

Chapter 7. Using Hadoop Cluster and MapReduce

CitusDB Architecture for Real-Time Big Data

Agenda. Some Examples from Yahoo! Hadoop. Some Examples from Yahoo! Crawling. Cloud (data) management Ahmed Ali-Eldin. First part: Second part:

Distributed File Systems

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

Google File System. Web and scalability

How To Handle Big Data With A Data Scientist

Planning the Installation and Installing SQL Server

BIG DATA TECHNOLOGY. Hadoop Ecosystem

Scala Storage Scale-Out Clustered Storage White Paper

The Sierra Clustered Database Engine, the technology at the heart of

Amazon EC2 Product Details Page 1 of 5

NoSQL for SQL Professionals William McKnight

CSE-E5430 Scalable Cloud Computing Lecture 2

Realtime Apache Hadoop at Facebook. Jonathan Gray & Dhruba Borthakur June 14, 2011 at SIGMOD, Athens

Apache Flink Next-gen data analysis. Kostas

Parallel Processing of cluster by Map Reduce

Prepared By : Manoj Kumar Joshi & Vikas Sawhney

BigData. An Overview of Several Approaches. David Mera 16/12/2013. Masaryk University Brno, Czech Republic

Hadoop for MySQL DBAs. Copyright 2011 Cloudera. All rights reserved. Not to be reproduced without prior written consent.

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015

Hadoop s Entry into the Traditional Analytical DBMS Market. Daniel Abadi Yale University August 3 rd, 2010

Building your Big Data Architecture on Amazon Web Services

Hadoop: Embracing future hardware

Data Warehousing and Analytics Infrastructure at Facebook. Ashish Thusoo & Dhruba Borthakur athusoo,dhruba@facebook.com

Architectures for massive data management

How To Protect Your Data From Being Damaged On Vsphere Vdp Vdpa Vdpo Vdprod (Vmware) Vsphera Vdpower Vdpl (Vmos) Vdper (Vmom

Data-Intensive Computing with Map-Reduce and Hadoop

SQL Server 2012 Performance White Paper

bigdata Managing Scale in Ontological Systems

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

The Google File System

Open source Google-style large scale data analysis with Hadoop

Data Mining in the Swamp

THE HADOOP DISTRIBUTED FILE SYSTEM

Big Data Technology Map-Reduce Motivation: Indexing in Search Engines

Big + Fast + Safe + Simple = Lowest Technical Risk

BIG DATA What it is and how to use?

Jeffrey D. Ullman slides. MapReduce for data intensive computing

Take An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc

Hadoop Architecture. Part 1

Using distributed technologies to analyze Big Data

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee

FAWN - a Fast Array of Wimpy Nodes

Bigdata High Availability (HA) Architecture

Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. Big Data Management and Analytics

NoSQL and Hadoop Technologies On Oracle Cloud

CS246: Mining Massive Datasets Jure Leskovec, Stanford University.

Lecture 10: HBase! Claudia Hauff (Web Information Systems)!

Hadoop & its Usage at Facebook

DATA MINING WITH HADOOP AND HIVE Introduction to Architecture

Open source large scale distributed data management with Google s MapReduce and Bigtable

Hadoop & its Usage at Facebook

MapReduce Jeffrey Dean and Sanjay Ghemawat. Background context

Where We Are. References. Cloud Computing. Levels of Service. Cloud Computing History. Introduction to Data Management CSE 344

Data processing goes big

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Big data management with IBM General Parallel File System

In Memory Accelerator for MongoDB

Assignment # 1 (Cloud Computing Security)

Hadoop and Relational Database The Best of Both Worlds for Analytics Greg Battas Hewlett Packard

Real Time Big Data Processing

Understanding Neo4j Scalability

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee June 3 rd, 2008

GraySort and MinuteSort at Yahoo on Hadoop 0.23

Deploying Flash- Accelerated Hadoop with InfiniFlash from SanDisk

HadoopRDF : A Scalable RDF Data Analysis System

SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011

Cisco UCS and Fusion- io take Big Data workloads to extreme performance in a small footprint: A case study with Oracle NoSQL database

Upgrading to Microsoft SQL Server 2008 R2 from Microsoft SQL Server 2008, SQL Server 2005, and SQL Server 2000

HadoopTM Analytics DDN

Introduction to Big Data! with Apache Spark" UC#BERKELEY#

Cloud Computing at Google. Architecture

3Gen Data Deduplication Technical

Testing Big data is one of the biggest

NoSQL Data Base Basics

BookKeeper. Flavio Junqueira Yahoo! Research, Barcelona. Hadoop in China 2011

Hadoop Distributed File System. T Seminar On Multimedia Eero Kurkela

Hadoop & Spark Using Amazon EMR

A programming model in Cloud: MapReduce

Journal of science STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS)

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc All Rights Reserved

Introduction to Cloud Computing

Benchmarking Cassandra on Violin

Internals of Hadoop Application Framework and Distributed File System

HADOOP ON ORACLE ZFS STORAGE A TECHNICAL OVERVIEW

SAP HANA SAP s In-Memory Database. Dr. Martin Kittel, SAP HANA Development January 16, 2013

Transcription:

Cosmos Big Data and Big Challenges Pat Helland July 2011 1

Outline Introduction Cosmos Overview The Structured s Project Some Other Exciting Projects Conclusion 2

What Is COSMOS? Petabyte Store and Computation System About 62 physical petabytes stored (~275 logical petabytes stored) Tens of thousands of computers across many datacenters Massively parallel processing based on Dryad Similar to MapReduce but can represent arbitrary DAGs of computation Automatic computation placement with data (Structured Computation Optimized for Parallel Execution) SQL-like language with set-oriented record and column manipulation Automatically compiled and optimized for execution over Dryad Management of hundreds of Virtual Clusters for computation allocation Buy your machines and give them to COSMOS Guaranteed that many compute resources May use more when they are not in use Ubiquitous access to OSD s data Combining knowledge from different datasets is today s secret sauce 3

Cosmos and OSD Computation OSD Applications fall into two broad categories: Back-end: Massive batch processing creates new datasets Front-end: Online request processing serves up and captures information Cosmos provides storage and computation for Back-End Batch data analysis It does not support storage and computation needs for the Front-End Internet Cosmos! Crawling Other Data OSD Computing/Storage Back-End Front-End On- Batch Line Web- Data Analysis Serving Large Data for Read-Only Results On-Line Datasets Work User & System Data

COSMOS: The Service Data drives search and advertising Web pages: Links, text, titles, etc Search logs: What people searched for, what they clicked, etc IE logs: What sites people visit, the browsing order, etc Advertising logs: What ads do people click on, what was shown, etc We generate about 2 PB every day SearchUX is hundreds of TB Toolbar is many 10s of TB Search is hundreds of TB Web snapshots are many 10s of TB MSN, Hotmail, IE, web, etc COSMOS is the backbone for Bing analysis and relevance Click-stream information is imported from many sources and cooked Queries analyzing user context, click commands, and success are processed COSMOS is a service We run the code ourselves (on many tens of thousands of servers) Users simply feed in data, submit jobs, and extract the results 5

Outline Introduction Cosmos Overview The Structured s Project Some Other Exciting Projects Conclusion 6

PN PN PN Cosmos Architecture from 100,000 Feet Layer: -- Code is submitted to the Compiler -- The optimizer make decisions about execution plan and parallelism -- Algebra (describing the job) is built to run on the Runtime Execution Layer: -- Jobs queues up per Virtual Cluster -- When a job starts, it gets a Job Mgr to deploy work in parallel close to its data -- Many Processing Nodes (PNs) host execution vertices running code Layer Execution Layer Job Mgr Data = SELECT * FROM S WHERE Col-1 > 10 Compiler Run/T Optimizer Run/T Run/T Store Layer: -- Many Extent Nodes store and compress replicated extents on disk -- Extents are combined to make unstructured streams -- CSM (COSMOS Store Manager) handles names, streams, & replication Store Layer Foo Bar Zot Extent Extent Extent Extent EN EN EN EN 7

Extent Nodes: The Store Layer Implement a file system holding extents Each extent is up to 2GB Compression and fault detection are important parts of the EN CSM: COSMOS Store Manager Instructs replication across 3 different ENs per extent Manages composition of streams out of extents Manages the namespace of streams Store Layer Foo Bar Zot Extent Extent Extent Extent EN EN EN EN 8

PN PN PN Execution Engine: The Execution Engine Takes the plan for the parallel execution of a job and finds computers to perform the work Responsible for the placement of the computation close to the data it reads Ensures all the inputs for the computation are available before firing it up Responsible for failures and restarts Dryad is similar to Map-Reduce Execution Layer Run/T Job Mgr Run/T Run/T Foo Bar Zot 9

The Language (Structured Computation Optimized for Parallel Execution) Heavily influenced by SQL and relational algebra Changed to deal with input and output streams -A Input Output -1 Scope Job -2 -B -3 Input Arrives as Sets of Records Computation Occurs as Sets of Records Output Written as Sets of Records is a high level declarative language for data manipulation It translates very naturally into parallel computation 10

The Compiler and Optimizer The Compiler and Optimizer take programs and create: The algebra describing the computation The breakdown of the work into processing units The description of the inputs and outputs from the processing units Many decisions about compiling and optimizing are driven by data size and minimizing data movement Layer Data = SELECT * FROM S WHERE Col-1 > 10 Compiler Optimizer Run/T Run/T Run/T 11

Virtual Cluster: a management tool The Virtual Cluster Allocates resources across groups within OSD Cost model captured in a queue of work (with priority) within the VC Each Virtual Cluster has a guaranteed capacity We will bump other users of the VC s capacity if necessary The VC can use other idle capacity 100 Hi-Pri PNs Work Queue VC-A 500 Hi-Pri PNs Work Queue VC-B 20 Hi-Pri PNs Work Queue VC-C 1000 Hi-Pri PNs Work Queue VC-D Work Queue 350 Hi-Pri PNs VC-E 12

Outline Introduction Cosmos Overview The Structured s Project Some Other Exciting Projects Conclusion 13

Introducing Structured s Cosmos currently supports streams An unstructured byte stream of data Created by append-only writing to the end of the stream Structured streams are streams with metadata Metadata defines column structure and affinity/clustering information Structured streams simplify extractors and outputters A structured stream may be imported into scope without an extractor Structured streams offer performance improvements Column features allow for processing optimizations Affinity management can dramatically improve performance Key-oriented features offer (sometimes very significant) access performance improvements A Sequence of Bytes Today s s (unstructured streams) New Structured s Record-Oriented Access A Sequence of Bytes Metadata

Today s Use of Extractors and Outputters Extractors Programs to input data and supply metadata Outputters Take Scope data and create a bytestream for storage Discards metadata known to the system Unstructured A Extractor Metadata Scope Scope Processing with Metadata, Structure, and Relational Ops Outputter source = EXTRACT col1, col2 FROM A Data = SELECT * FROM source <process Data> D Unstructured OUTPUT Data to D

Metadata, s, Extractors, & Outputters Scope has metadata for the data it is processing Extractors provide metadata info as they suck up unstructured streams Processing the Scope queries ensures metadata is preserved The new results may have different metadata than the old Scope knows the new metadata Scope writes structured streams The internal information used by Scope is written out as metadata Scope reads structured streams Reading a structured stream allows later jobs to see the metadata The Representation of a Structured on Disk Is Only Visible to Scope! Unstructured A Extractor Metadata C Metadata Structured Scope Structured B Metadata Scope Processing with Metadata, Structure, and Relational Ops Outputter D Unstructured Note: No Cosmos Notion of Metadata for D -- 16 Only the Outputter Knows

s, Metadata, and Increased Performance By adding metadata (describing the stream) into the stream, we can provide performance improvements: Cluster-Key access: random reads of records identified by key Partitioning and affinity: data to be processed together (sometimes across multiple streams), can be placed together for faster processing Metadata for a structured stream is kept inside the stream The stream is a self-contained unit The structured stream is still an unstructured stream (plus some stuff) Cluster-Key Access A Metadata Partitioning and Affinity

Cluster-Key Lookup Cluster-Key Indices make a huge performance improvement Today: If you want a few records, you must process the whole stream Structured s: Directly access the records by cluster-key index How it works: Cluster-Key lookup is implemented by having indexing information contained in the metadata inside the stream The records must be stored in cluster-key order to use cluster-key lookup Cosmos managed index generated at structured stream creation Lookup D Foo Metadata (including index) A E J N Q W A B C D E F G H I J K L M N O P Q W X Y Z

Implementing Partitioning and Affinity Joins across streams can be very expensive Network traffic is a major expense when joining large datasets together Placing related data together can dramatically reduce processing cost We affinitize data when we believe it is likely to be processed together Affinitization places the data close together If we want affinity, we create a partition as we create a structured stream A partition is a subset of the stream intended to be affinitized together Scope Affinitized Data Is Stored Close Together

Case Study 1: Aggregation SELECT GetDomain(URL) AS Domain, SUM((MyNewScoreFunction(A, B, )) AS TotalScore FROM Web-Table GROUP BY Domain; SELECT TOP 100 Domain ORDER BY TotalScore; Super Expensive Expensive Structured Datasets (Sstream) (partitioned by URL, sorted by URL) Unstructured Datasets Much more efficient w/o shuffling data across network

Case Study 2: Selection SELECT URL, feature1, feature2 FROM Web-Table WHERE URL == www.imdb.com; Partition Range Metadata Partition Metadata P100 www.imc.com www.imovie.com P101 www.imz.com www.inode.com Judiciously choose partition Push predicate close to data Massive data reads Unstructured Datasets Structured Datasets (Sstream) (partitioned by URL, sorted by URL)

Case Study 3: Join Multiple Datasets SELECT URL, COUNT(*) AS TotalClicks FROM Web-Table AS W, Click- AS C WHERE GetDomain(W.URL) == www.shopping.com AND W.URL == C.URL AND W.Outlinks > 10 GROUP BY URL; SELECT TOP 100 URL ORDER BY TotalClicks; Targeted partitions Affinitized location Super Expensive Expensive Massive data reads Structured Datasets (Sstream) (partitioned by URL, sorted by URL) Unstructured Datasets Much more efficient w/o shuffling data across network

Outline Introduction Cosmos Overview The Structured s Project Some Other Exciting Projects Conclusion 23

Reliable Pub-Sub Event Processing Cosmos will add high performance pub-sub event processing Publications receive append-only events Subscriptions define the binding of publications to event processing app code Publications and subscriptions are designed to handle many tens of thousands of events per second Massively partitioned publications Cosmos managed pools of event processors with automatic load balancing Events may be appended to publications by other event processors or by external applications feeding work into Cosmos Publication. Subscription Event Processor App Code 24

High-Performance Event Processing Event processors (user application code) may: Read and update records within tables Append to publications Each event will be consumed in a transaction atomic with its table and publication changes Transactions may touch any record(s) in the tables These may be running on thousands of computers Pub-A Event-X Subscription Read B1 Event Processor Event-X Event-Y Update B2 Read C1 Update C2 Pub-D Atomic Transaction Boundary Rec-B1 Rec-B2 Rec-C1 Rec-C2 Table-B Table-C 25

Combining Random & Sequential Processing Random Processing: Event processor applications may be randomly reading and updating very large tables with extremely large throughput Applications external to Cosmos may access tables for random reads & updates Transactions control atomic updates by event processors Changes are accumulated as deltas visible to other event processors as soon as the transaction commits Sequential Processing: Massively parallel jobs may read consistent snapshots of the same tables being updated by event processors Very interesting optimization tradeoffs in the storage, placement, and representation of data for sequential versus random access The use of SSD offers very interesting opportunities Of course, there s not much SSD compared to the size of the data we manage 26

Outline Introduction Cosmos Overview The Structured s Project Some Other Exciting Projects Conclusion 27