Cosmos. Big Data and Big Challenges. Ed Harris - Microsoft Online Services Division HTPS 2011

Similar documents

Cosmos. Big Data and Big Challenges. Pat Helland July 2011

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Big Data With Hadoop

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Data Management in the Cloud

Counterfactual analysis: a Big Data case-study using Cosmos/SCOPE. Ed Snelson

The Sierra Clustered Database Engine, the technology at the heart of

Chapter 7. Using Hadoop Cluster and MapReduce

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

An Oracle White Paper November Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

How To Handle Big Data With A Data Scientist

CitusDB Architecture for Real-Time Big Data

BIG DATA TECHNOLOGY. Hadoop Ecosystem

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

BigData. An Overview of Several Approaches. David Mera 16/12/2013. Masaryk University Brno, Czech Republic

Hadoop IST 734 SS CHUNG

The Google File System

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Take An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015

NoSQL for SQL Professionals William McKnight

Distributed File Systems

Cloud Computing at Google. Architecture

Prepared By : Manoj Kumar Joshi & Vikas Sawhney

Data Warehousing and Analytics Infrastructure at Facebook. Ashish Thusoo & Dhruba Borthakur athusoo,dhruba@facebook.com

Hadoop s Entry into the Traditional Analytical DBMS Market. Daniel Abadi Yale University August 3 rd, 2010

CSE-E5430 Scalable Cloud Computing Lecture 2

CDH AND BUSINESS CONTINUITY:

Big Data Technology Map-Reduce Motivation: Indexing in Search Engines

Apache Hadoop. Alexandru Costan

Google File System. Web and scalability

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Planning the Installation and Installing SQL Server

BIG DATA What it is and how to use?

Testing Big data is one of the biggest

Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. Big Data Management and Analytics

Data Mining in the Swamp

Scala Storage Scale-Out Clustered Storage White Paper

Parallel Processing of cluster by Map Reduce

Hadoop for MySQL DBAs. Copyright 2011 Cloudera. All rights reserved. Not to be reproduced without prior written consent.

Lecture 10: HBase! Claudia Hauff (Web Information Systems)!

SQL Server 2012 Performance White Paper

Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities

Hadoop and Relational Database The Best of Both Worlds for Analytics Greg Battas Hewlett Packard

Real-time Streaming Analysis for Hadoop and Flume. Aaron Kimball odiago, inc. OSCON Data 2011

Manifest for Big Data Pig, Hive & Jaql

MapReduce Jeffrey Dean and Sanjay Ghemawat. Background context

Hadoop Architecture. Part 1

Data-Intensive Computing with Map-Reduce and Hadoop

ISSN: (Online) Volume 3, Issue 4, April 2015 International Journal of Advance Research in Computer Science and Management Studies

Energy Efficient MapReduce

Trafodion Operational SQL-on-Hadoop

Assignment # 1 (Cloud Computing Security)

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Upgrading to Microsoft SQL Server 2008 R2 from Microsoft SQL Server 2008, SQL Server 2005, and SQL Server 2000

Realtime Apache Hadoop at Facebook. Jonathan Gray & Dhruba Borthakur June 14, 2011 at SIGMOD, Athens

In Memory Accelerator for MongoDB

NoSQL and Hadoop Technologies On Oracle Cloud

CS246: Mining Massive Datasets Jure Leskovec, Stanford University.

Big Data on Microsoft Platform

Performance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems

Journal of science STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS)

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

Agenda. Some Examples from Yahoo! Hadoop. Some Examples from Yahoo! Crawling. Cloud (data) management Ahmed Ali-Eldin. First part: Second part:

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee

Using distributed technologies to analyze Big Data

A programming model in Cloud: MapReduce

<Insert Picture Here> Oracle and/or Hadoop And what you need to know

Integrating VoltDB with Hadoop

Internals of Hadoop Application Framework and Distributed File System

DATA MINING WITH HADOOP AND HIVE Introduction to Architecture

CS2510 Computer Operating Systems

CS2510 Computer Operating Systems

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc All Rights Reserved

A Next-Generation Analytics Ecosystem for Big Data. Colin White, BI Research September 2012 Sponsored by ParAccel

Embedded inside the database. No need for Hadoop or customcode. True real-time analytics done per transaction and in aggregate. On-the-fly linking IP

Data processing goes big

Apache Flink Next-gen data analysis. Kostas

Advanced In-Database Analytics

Bringing Big Data Modelling into the Hands of Domain Experts

Building your Big Data Architecture on Amazon Web Services

Analysing Large Web Log Files in a Hadoop Distributed Cluster Environment

Big Data Analysis and HADOOP

ASTERIX: An Open Source System for Big Data Management and Analysis (Demo) :: Presenter :: Yassmeen Abu Hasson

EXPERIMENTATION. HARRISON CARRANZA School of Computer Science and Mathematics

Big data management with IBM General Parallel File System

A very short Intro to Hadoop

Hadoop & Spark Using Amazon EMR

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce

Deploying Hadoop with Manager

NetApp Big Content Solutions: Agile Infrastructure for Big Data

How To Protect Your Data From Being Damaged On Vsphere Vdp Vdpa Vdpo Vdprod (Vmware) Vsphera Vdpower Vdpl (Vmos) Vdper (Vmom

Big Data Introduction

Big Data JAMES WARREN. Principles and best practices of NATHAN MARZ MANNING. scalable real-time data systems. Shelter Island

Transcription:

Cosmos Big Data and Big Challenges Ed Harris - Microsoft Online Services Division HTPS 2011 1

Outline Introduction Cosmos Overview The Structured s Project Conclusion 2

What Is COSMOS? Petabyte Store and Computation System Several hundred petabytes of data Tens of thousands of computers across many datacenters Massively parallel processing based on Dryad Similar to MapReduce but can represent arbitrary DAGs of computation Automatic computation placement with data (Structured Computation Optimized for Parallel Execution) SQL-like language with set-oriented record and column manipulation Automatically compiled and optimized for execution over Dryad Management of hundreds of Virtual Clusters for computation allocation Buy your machines and give them to COSMOS Guaranteed at least that many compute resources concurrently May receive more resources when other VCs are under-used Ubiquitous access to OSD s data Combining knowledge from different datasets is today s secret sauce 3

Cosmos and OSD Computation OSD Applications fall into two broad categories: Back-end: Massive batch processing creates new datasets Front-end: Online request processing serves up and captures information Cosmos provides storage and computation for Back-End Batch data analysis It does not support storage and computation needs for the Front-End Internet Cosmos! Crawling Other Data OSD Computing/Storage Back-End Front-End On- Batch Line Web- Data Analysis Serving Large Data for Read-Only Results On-Line Datasets Work User & System Data

COSMOS: The Service Data drives search and advertising Web pages: Links, text, titles, etc Search logs: What people searched for, what they clicked, etc Browse logs: What sites people visit, the browsing order, etc Advertising logs: What ads do people click on, what was shown, etc We ingest or generate a couple of PiB every day Bing, MSN, Hotmail, Client telemetry Web crawl snapshots Structured data feeds Long tail of other data sets of interest COSMOS is the backbone for Bing analysis and relevance Click-stream information is imported from many sources and cooked Queries analyzing user context, click commands, and success are processed COSMOS is a service We run the code ourselves (on many tens of thousands of servers) Users simply feed in data, submit jobs, and extract the results 5

Outline Introduction Cosmos Overview The Structured s Project Conclusion 6

PN PN PN Cosmos Architecture from 100,000 Feet Layer: -- Code is submitted to the Compiler -- The optimizer make decisions about execution plan and parallelism -- Algebra (describing the job) is built to run on the Runtime Execution Layer: -- Jobs queues up per Virtual Cluster -- When a job starts, it gets a Job Mgr to deploy work in parallel close to its data -- Many Processing Nodes (PNs) host execution vertices running code Layer Execution Layer Job Mgr Data = SELECT * FROM S WHERE Col-1 > 10 Compiler Run/T Optimizer Run/T Run/T Store Layer: -- Many Extent Nodes store and compress replicated extents on disk -- Extents are combined to make unstructured streams -- CSM (COSMOS Store Manager) handles names, streams, & replication Store Layer Foo Bar Zot Extent Extent Extent Extent EN EN EN EN 7

Extent Nodes: The Store Layer Implement a file system holding extents Each extent is up to 2GB Compression and fault detection are important parts of the EN CSM: COSMOS Store Manager Instructs replication across 3 different ENs per extent Manages composition of streams out of extents Manages the namespace of streams Store Layer Foo Bar Zot Extent Extent Extent Extent EN EN EN EN 8

PN PN PN Execution Engine: The Execution Engine Takes the plan for the parallel execution of a job and finds computers to perform the work Responsible for the placement of the computation close to the data it reads Ensures all the inputs for the computation are available before firing it up Responsible for failures and restarts Dryad is similar to Map-Reduce Execution Layer Run/T Job Mgr Run/T Run/T Foo Bar Zot 9

The Language (Structured Computation Optimized for Parallel Execution) Heavily influenced by SQL and relational algebra Changed to deal with input and output streams -A Input Output -1 Scope Job -2 -B -3 Input Arrives as Sets of Records Computation Occurs as Sets of Records Output Written as Sets of Records is a high level declarative language for data manipulation It translates very naturally into parallel computation 10

The Compiler and Optimizer The Compiler and Optimizer take programs and create: The algebra describing the computation The breakdown of the work into processing units The description of the inputs and outputs from the processing units Many decisions about compiling and optimizing are driven by data size and minimizing data movement Layer Data = SELECT * FROM S WHERE Col-1 > 10 Compiler Optimizer Run/T Run/T Run/T 11

Virtual Cluster: a management tool The Virtual Cluster Allocates resources across groups within OSD Cost model captured in a queue of work (with priority) within the VC Each Virtual Cluster has a guaranteed capacity We will bump other users of the VC s capacity if necessary The VC can use other idle capacity 100 Hi-Pri PNs Work Queue VC-A 500 Hi-Pri PNs Work Queue VC-B 20 Hi-Pri PNs Work Queue VC-C 1000 Hi-Pri PNs Work Queue VC-D Work Queue 350 Hi-Pri PNs VC-E 12

Outline Introduction Cosmos Overview The Structured s Project Conclusion 13

Introducing Structured s Cosmos originally supported streams An unstructured byte stream of data Created by append-only writing to the end of the stream Structured streams are streams with metadata Metadata defines column structure and affinity/clustering information Structured streams simplify extractors and outputters A structured stream may be imported into scope without an extractor Structured streams offer performance improvements Column features allow for processing optimizations Affinity management can dramatically improve performance Key-oriented features offer (sometimes very significant) access performance improvements A Sequence of Bytes Today s s (unstructured streams) New Structured s Record-Oriented Access A Sequence of Bytes Metadata

Today s Use of Extractors and Outputters Extractors Programs to input data and supply metadata Outputters Take Scope data and create a bytestream for storage Discards metadata known to the system Unstructured A Extractor Metadata Scope Scope Processing with Metadata, Structure, and Relational Ops Outputter source = EXTRACT col1, col2 FROM A Data = SELECT * FROM source <process Data> D Unstructured OUTPUT Data to D

Metadata, s, Extractors, & Outputters Scope has metadata for the data it is processing Extractors provide metadata info as they suck up unstructured streams Processing the Scope queries ensures metadata is preserved The new results may have different metadata than the old Scope knows the new metadata Scope writes structured streams The internal information used by Scope is written out as metadata Scope reads structured streams Reading a structured stream allows later jobs to see the metadata The Representation of a Structured on Disk Is Only Visible to Scope! Unstructured A Extractor Metadata C Metadata Structured Scope Structured B Metadata Scope Processing with Metadata, Structure, and Relational Ops Outputter D Unstructured Note: No Cosmos Notion of Metadata for D -- 16 Only the Outputter Knows

s, Metadata, and Increased Performance By adding metadata (describing the stream) into the stream, we can provide performance improvements: Cluster-Key access: random reads of records identified by key Partitioning and affinity: data to be processed together (sometimes across multiple streams), can be placed together for faster processing Metadata for a structured stream is kept inside the stream The stream is a self-contained unit The structured stream is still an unstructured stream (plus some stuff) Cluster-Key Access A Metadata Partitioning and Affinity

Cluster-Key Lookup Cluster-Key Indices make a huge performance improvement Today: If you want a few records, you must process the whole stream Structured s: Directly access the records by cluster-key index How it works: Cluster-Key lookup is implemented by having indexing information contained in the metadata inside the stream The records must be stored in cluster-key order to use cluster-key lookup Cosmos managed index generated at structured stream creation Lookup D Foo Metadata (including index) A E J N Q W A B C D E F G H I J K L M N O P Q W X Y Z

Implementing Partitioning and Affinity Joins across streams can be very expensive Network traffic is a major expense when joining large datasets together Placing related data together can dramatically reduce processing cost We affinitize data when we believe it is likely to be processed together Affinitization places the data close together If we want affinity, we create a partition as we create a structured stream A partition is a subset of the stream intended to be affinitized together Scope Affinitized Data Is Stored Close Together

Case Study 1: Aggregation SELECT GetDomain(URL) AS Domain, SUM((MyNewScoreFunction(A, B, )) AS TotalScore FROM Web-Table GROUP BY Domain; SELECT TOP 100 Domain ORDER BY TotalScore; Super Expensive Expensive Structured Datasets (Sstream) (partitioned by URL, sorted by URL) Unstructured Datasets Much more efficient w/o shuffling data across network

Case Study 2: Selection SELECT URL, feature1, feature2 FROM Web-Table WHERE URL == www.imdb.com; Partition Range Metadata Partition Metadata P100 www.imc.com www.imovie.com P101 www.imz.com www.inode.com Judiciously choose partition Push predicate close to data Massive data reads Unstructured Datasets Structured Datasets (Sstream) (partitioned by URL, sorted by URL)

Case Study 3: Join Multiple Datasets SELECT URL, COUNT(*) AS TotalClicks FROM Web-Table AS W, Click- AS C WHERE GetDomain(W.URL) == www.shopping.com AND W.URL == C.URL AND W.Outlinks > 10 GROUP BY URL; SELECT TOP 100 URL ORDER BY TotalClicks; Targeted partitions Affinitized location Super Expensive Expensive Massive data reads Structured Datasets (Sstream) (partitioned by URL, sorted by URL) Unstructured Datasets Much more efficient w/o shuffling data across network

Outline Introduction Cosmos Overview The Structured s Project Conclusion 23

Takeaways Cosmos manages OSD s (Online Services Division) data and computation Hundreds of petabytes of data Many thousands of very large batch jobs per day Tens of thousands of servers in multiple datacenters All the data in the same addressable store is essential to the business The money lies in combining data in surprising ways Cosmos is evolving: Structured s combines large scale parallelism with DB techniques to dramatically improve the efficiencies of computation In the works: Integrated random and sequential processing over the same data Reliable workflow via pub-sub Transactionally correct changes over many petabytes of data Cosmos needs fresh blood! Send your students our way! Bing sponsors 60+ PhD and undergraduate internships per year edharris@microsoft.com Microsoft Confidential 24