Cloud Computing. Up until now

Transcription

1 Cloud Computing Lecture 19 Cloud Programming Up until now Introduction, Definition of Cloud Computing Pre-Cloud Large Scale Computing: Grid Computing Content Distribution Networks Cycle-Sharing Distributed Scheduling Cloud: Map Reduce Storage Execution Monitoring Programming (Azure, Zookeeper, Pig Latin) 1

2 Outline Cloud Programming Models Pig DryadLINQ Percolator Pig There are large scale data operations that take too many steps to model in MapReduce. Creating complex workflows becomes cumbersome. Pig is a high level programming language from Hadoopfor processing massive amounts of records. Provides common operations like join, group, filter, sort. 2

3 Pig Pig provides an execution engine atop Hadoop: Removes need for users to tune Hadoopfor their needs. Insulates users from changes in Hadoop interfaces. Pig are written in Pig Latin and converted into MapReduce processes. InPigLatinvariablesare listsoftuples. Pig Latin Example -- max_temp.pig: Calculate the yearly maximum temperature records = LOAD 'input/ncdc/micro-tab/sample.txt' AS (year:chararray, temperature:int, quality:int); filtered_records = FILTER records BY temperature!= 9999 AND (quality == 0 OR quality == 1 OR quality == 4 OR quality == 5 OR quality == 9); grouped_records = GROUP filtered_records BY year; max_temp = FOREACH grouped_records GENERATE group, MAX(filtered_records.temperature); DUMP max_temp; 3

4 PigPen: Eclipse pluginfor Graphical Pig Programming Pig Latin 4

5 Pig Latin: Key and Manipulation Operators Pig Latin: Map Related Operators 5

6 Pig Latin: Reduce Related Operators DryadLINQ High performance programming model from Microsoft. Query based. Run app locally, run queries in large-scale infra-structure Approach related to Hadoop s Pig. 6

7 Components Windows HPC Server 2008: cluster management, scheduling. Dryad: distributed execution engine, fault recovery, distribution, scalability based on partitioned data sets. LINQ:.NET extensions for querying data sources; exploits parallelism, uniform data model. DryadLINQArchitecture Machine Learning Image Processing Graph Analysis.NET Applications Data Mining DryadLINQ Dryad HPC Job Scheduler Windows HPC Server 2008 Windows HPC Server 2008 Windows HPC Server 2008 Windows HPC Server

8 Dryad Provides a flexible execution layer: Graph based execution. Graph (code for the nodes, arcs, data serialization) described in a high-level language. Provides transparent distribution: Distributes the code and routes the code. Creates processes close to the data. Masks cluster and network faults. Example of a Dryad program (DAG) Inputs Computation Channels (file, fifo, pipe) Outputs 8

9 Pipesin2D LINQ: Language Integrated Query Declarative extension to C# and VB.NET to iterate over collections: In memory. Through data providers. Similar to SQL. Very popular: Easy to use. Reduces the amount of code. Supported by many tools. 9

10 Example Before (SQL): SELECT [t0].[contactid], [t0].[firstname], [t0].[lastname], [t0].[dateofbirth],[t0].[phone], [t0].[ ], [t0].[state] FROM [Contact] AS [t0] WHERE [t0].[dateofbirth]) ORDER BY [t0].[dateofbirth] DESC Now (LINQ): var q = from c in db.contact where c.dateofbirth.addyears(35) > DateTime.Now orderby c.dateofbirth descending select c; But, mainly it simplifies queries against sets of objects, XML, DataSets: string[] names = {"John", "Peter", "Joe", "Patrick", "Donald", "Eric"}; IEnumerable<string> nameswithfivecharacters = from name in names where name.length< 5 select name; DryadLINQ: LINQ using Clusters LINQ declarative programming using clusters. Automatic Parallelization: Exploiting multi-node and multi-core parallelism. Integrated with VisualStudio and.net Dynamic type-checking and automatic serialization. 10

11 Development/Execution Cycle Development/Execution Cycle DryadLINQprograms run on the user s PC: Programmed and executed locally. When there are calls to a PartitionedTable, the query is built (code generation, execution plan, optimization) and the job is submitted to the HPC Server. HPC Server allocates resources for the job and schedules the Dryad Job Manager (DJM). The DJM schedules the tasks in the nodes of the DAG. When the query completes, the local execution continues. 11

12 DryadLINQ: LINQ+Dryad Automatic Plan Generation Distributed Execution by Dryad LINQ Query Plan Dryad var logentries = from line in logs where!line.startswith("#") select new LogEntry(line); logs where select A Simple LINQ Query 12

13 TheSameQueryin DryadLINQ PartitionedTable<T> Fundamental data structure for DryadLINQ. Scalable partitioned container for.net objects. Derives from IQueryable<T> and IEnumerable<T>. DryadLINQ operators consume and produce PartitionedTable<T>. DryadLINQ generates the code that serializes the application s objects. Thestoragecanbepartitionedfiles, partitionedsql database tables or the cluster s file system. 13

14 PartitionedFiles: Containerfor PartitionedTable<T> \\HPCMETAHN01\XC\output\520a0fcf\Part : table piece number : table piece size HPCMETAHN01: table piece node PartitionedFiles: Containerfor PartitionedTable<T> 14

15 A Typical Query Choose non-commented lines from the log. Choose the logentriescreated by user jvert. Group the accesses from user jvertby page and count the occurrences. Order accessed pages by frequency. Parallel Execution ofa DryadDAG logs logentries user accesses htmaccesses output 15

16 Query Execution Plan The query is separated from the execution context: The necessary code is referenced by the query (data structures and auxiliary algorithms). References to local variables are eliminated by partial evaluation. The serialization code and the code for the nodes is generated automatically. Example of a DryadLINQ Execution Plan 16

17 XML Representation: from DryadLINQ to Dryad List of file to be placed in the cluster Node definition DryadLINQ Generated Code 17

18 MapReduce using DryadLINQ Execution layer Work = DAG Policies (plugins) Program = graph Complex ( + funcs.) New (< 2 years) Growing Proprietary (Microsoft) Dryad vs. Map- Reduce Executable Map+sort+reduce No policies Program = map+reduce Simple Mature (> 4 years) Widespread Hadoop 36 18

19 Percolator: Incrementally Indexing the Web 19

20 Duplicate Elimination with MapReduce Indexing system is a chain of many MapReduces Index Refresh with MapReduce Should we index the new document? o New doc could be a dup of any previously crawled o Requires that we map over entire repository 20

21 Indexing System Goals What do we want from an ideal indexing system? Large repository of documents: Upper bound on index size Higher-quality index: e.g. more links Small delay between crawl and index: "freshness" MapReduceindexing system: Days from crawl to index Incremental Indexing Maintain a random-access repository in BigTable. Indexes allow avoiding a global scan. Incrementally mutate state as URLs are crawled. 21

22 Incremental Indexing on Bigtable URL Checksum PageRank IsCanonical? nyt.com 0xabcdef01 6 no nytimes.com 0xabcdef01 9 yes Checksum 0xabcdef01 Canonical nytimes.com What happens if we process both URLs simultaneously? Percolator: Incremental Infrastructure Adds distributed transactions to Bigtable (0) Transaction t; (1) string contents = t.get(row, "raw", "doc"); (2) Hash h = Hash32(contents);... // Potential conflict with concurrent execution (3) t.set(h, "canonical", "dup_table", row);... (4) t.commit(); Simple API: Get(), Set(), Commit(), Iterate 22

23 BigTable Recap BigTable is a sorted (row,column, timestamp) store: Data is partitioned into row ranges called tablets. Tablets are spread across many machines. Implementing Distributed Transactions Provide snapshot isolation semantics. Multi-version protocol (mapped to BigTable). Two-phase commit, coordinated by client. Locks stored in special BigTable columns: 23

24 Transaction Commit Notifications: tracking work Users register "observers" on a column: Executed when any row in that column is written. Each observer runs in a new transaction. Run at most once per write. Applications are structured as a series of Observers: 24

25 Additional BigTable Columns for Percolator Implementing Notifications If notify column is set, observer must be run. Implemented using a randomized distributed scan: Finds pending works, runs observers in thread pool. Scan is efficient, only scans notify column bits. 25

26 Bus Clumping Randomized scanners tend to clump: Reduces effective parallelism Overloads Bigtable servers Solution: Try to obtain a lightweight scanner lock per row. If unsuccessful, jump to a random point in the table. Running Percolator Each machine runs: Worker binary linked with observer code. Bigtable tablet server GFS chunkserver 26

27 Percolator: small, random disk I/O many RPCsper phase, per document Very Different Access Patterns MapReduce: streaming I/O Many documents per RPC, per phase Infrastructure is much better suited to the MR model. Even though it does "extra" work, it does so very efficiently. MR v. Percolator: Performance Fraction of Repository refreshed / hour 27

28 MR v. Percolator: Experience Conversion of an MR-based pipeline to Percolator. Pros: Freshness: indexing delay dropped from days to minutes Scalability: o More throughput: Just buy more CPUs o Bigger repository: Only limited by disk space Utilization: immune to stragglers Cons: Need to reason about concurrency More expensive per document processed (~2x) Running 1000 threads on Linux Percolator uses a thread-per-request model: Pros: Application developers write "straight line" code Meaningful stack traces: easy debugging / profiling Easy scalability on many-core machines Cons: Kernel scalability: kernel locks were held while doing work on each of 1000 threads during process exit Good detector for thread-local storage in libraries Lock contention on, e.g., caches 28

29 Conclusion Percolator now building the "Caffeine" websearch index 50% fresher results 3x larger repository Existence proof for distributed transactions at web scale. Next Time... Cloud Monitoring 29