Paolo Costa

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Paolo Costa costa@imperial.ac.uk"

Transcription

1 joint work with Ant Rowstron, Austin Donnelly, and Greg O Shea (MSR Cambridge) Hussam Abu-Libdeh, Simon Schubert (Interns) Paolo Costa

2 Paolo Costa CamCube - Rethinking the Data Center Cluster 2/54

3 Data Centers Large group of networked servers Typically built from commodity hardware Scale-out vs. scale-up Many low-cost PCs rather than few expensive high-end servers Economies of scale Custom software stack Microsoft: Autopilot, Dryad Google: GFS, BigTable, Reduce Yahoo: Hadoop, HDFS Amazon: Dynamo, Astrolabe Facebook: Cassandra, Scribe, HBase Focus of this talk Paolo Costa CamCube - Rethinking the Data Center Cluster 3/54

4 Mismatch between abstraction & reality How programmers see the network Key-based Overlay networks Explicit structure Symmetry of role Paolo Costa CamCube - Rethinking the Data Center Cluster 4/54

5 Paolo Costa CamCube - Rethinking the Data Center Cluster 5/54

6 ~12 racks / pod 10 Gbps 10 Gbps ~ servers / rack Paolo Costa CamCube - Rethinking the Data Center Cluster 6/54

7 ~12 racks / pod 10 Gbps 10 Gbps ~ servers / rack Paolo Costa CamCube - Rethinking the Data Center Cluster 7/54

8 ~12 racks / pod 10 Gbps 10 Gbps Data center services are distributed but they have no control over the network Must infer network properties (e.g., locality) Must use inefficient overlay networks No bandwidth guarantee ~ servers / rack Paolo Costa CamCube - Rethinking the Data Center Cluster 8/54

9 Top-down (applications) Custom routing? Overlays Priority traffic? Use multiple TCP flows Locality? Use traceroute and ping TCP Incast? Add random jitter Bottom-up (network) More bandwidth? Use richer topologies (e.g., fat-trees) Fate-sharing? Use prediction models to estimate flow length / traffic patterns TCP Incast? Use ECN in routers Paolo Costa CamCube - Rethinking the Data Center Cluster 9/24

10 The network is still a black box Good for a multi-owned, heterogeneous Internet Limiting for a single-owned, homogenous data center DCs are not mini-internets single owner / administration domain we know (and define) the topology low hardware/software diversity no malicious or selfish node Is TCP/IP still appropriate? Paolo Costa CamCube - Rethinking the Data Center Cluster 10/54

11 A data center from the perspective of a distributed systems builder What happens if you integrate network & compute? Distributed Systems CamCube HPC Networking Paolo Costa CamCube - Rethinking the Data Center Cluster 11/54

12 Service composition Architecture TCP/IP Link and key-based API Topology Paolo Costa CamCube - Rethinking the Data Center Cluster 12/54

13 y (1,2,1) (1,2,2) (1,2,2) (1,2,1) No distinction between overlay and underlying network Nodes have (x,y,z)coordinate defines key-space Simple one hop API to send/receive packets all other functionality implemented as services Key coordinates are re-mapped in case of failures enable a KBR-like API z x Paolo Costa CamCube - Rethinking the Data Center Cluster 13/54

14 y (1,2,2) (1,2,1) No distinction between overlay and underlying network Nodes have (x,y,z)coordinate defines key-space Simple one hop API to send/receive packets all other functionality implemented as services Key coordinates are re-mapped in case of failures enable a KBR-like API z x Paolo Costa CamCube - Rethinking the Data Center Cluster 14/54

15 Fetch mail costa:1234 (0,0,0) (1,0,0) (2,0,0) Packets move from service to service and server to server Services can intercept and process packets along the way Keys are re-mapped in case of failures Server resources (RAM, Disks, CPUs) are accessible to all services Paolo Costa CamCube - Rethinking the Data Center Cluster 15/54

16 Fetch mail costa:1234 (0,0,0) (1,0,0) (2,0,0) Packets move from service to service and server to server Services can intercept and process packets along the way Keys are re-mapped in case of failures Server resources (RAM, Disks, CPUs) are accessible to all services Paolo Costa CamCube - Rethinking the Data Center Cluster 16/54

17 Fetch mail costa:1234 (0,0,0) (1,0,0) (2,0,0) Packets move from service to service and server to server Services can intercept and process packets along the way Keys are re-mapped in case of failures Server resources (RAM, Disks, CPUs) are accessible to all services Paolo Costa CamCube - Rethinking the Data Center Cluster 17/54

18 Fetch mail costa:1234 (0,0,0) (1,0,0) (2,0,0) Packets move from service to service and server to server Services can intercept and process packets along the way Keys are re-mapped in case of failures Server resources (RAM, Disks, CPUs) are accessible to all services Paolo Costa CamCube - Rethinking the Data Center Cluster 18/54

19 Fetch mail costa:1234 (0,0,0) (1,0,0) (2,0,0) Packets move from service to service and server to server Services can intercept and process packets along the way Keys are re-mapped in case of failures Server resources (RAM, Disks, CPUs) are accessible to all services Paolo Costa CamCube - Rethinking the Data Center Cluster 19/54

20 Fetch mail costa:1234 (0,0,0) (1,0,0) (2,0,0) Packets move from service to service and server to server Services can intercept and process packets along the way Keys are re-mapped in case of failures Server resources (RAM, Disks, CPUs) are accessible to all services Paolo Costa CamCube - Rethinking the Data Center Cluster 20/54

21 Fetch mail costa:1234 (0,0,0) (1,0,0) (2,0,0) Packets move from service to service and server to server Services can intercept and process packets along the way Keys are re-mapped in case of failures Server resources (RAM, Disks, CPUs) are accessible to all services Paolo Costa CamCube - Rethinking the Data Center Cluster 21/54

22 Fetch mail costa:1234 (0,0,0) (1,0,0) (2,0,0) Packets move from service to service and server to server Services can intercept and process packets along the way Keys are re-mapped in case of failures Server resources (RAM, Disks, CPUs) are accessible to all services Paolo Costa CamCube - Rethinking the Data Center Cluster 22/54

23 Fetch mail costa:1234 (0,0,0) (1,0,0) (2,0,0) Packets move from service to service and server to server Services can intercept and process packets along the way Keys are re-mapped in case of failures Server resources (RAM, Disks, CPUs) are accessible to all services Paolo Costa CamCube - Rethinking the Data Center Cluster 23/54

24 Fetch mail costa:1234 (0,0,0) (1,0,0) (2,0,0) Packets move from service to service and server to server Services can intercept and process packets along the way Keys are re-mapped in case of failures Server resources (RAM, Disks, CPUs) are accessible to all services Paolo Costa CamCube - Rethinking the Data Center Cluster 24/54

25 Fetch mail costa:1234 (0,0,0) (1,0,0) (2,0,0) Packets move from service to service and server to server Services can intercept and process packets along the way Keys are re-mapped in case of failures Server resources (RAM, Disks, CPUs) are accessible to all services Paolo Costa CamCube - Rethinking the Data Center Cluster 25/54

26 Fetch mail costa:1234 (0,0,0) (1,0,0) (2,0,0) Packets move from service to service and server to server Services can intercept and process packets along the way Keys are re-mapped in case of failures Server resources (RAM, Disks, CPUs) are accessible to all services Paolo Costa CamCube - Rethinking the Data Center Cluster 26/54

27 Fetch mail costa:1234 (0,0,0) (1,0,0) (2,0,0) Packets move from service to service and server to server Services can intercept and process packets along the way Keys are re-mapped in case of failures Server resources (RAM, Disks, CPUs) are accessible to all services Paolo Costa CamCube - Rethinking the Data Center Cluster 27/54

28 (0,0,0) (1,0,0) (2,0,0) Packets move from service to service and server to server Services can intercept and process packets along the way Keys are re-mapped in case of failures Server resources (RAM, Disks, CPUs) are accessible to all services Paolo Costa CamCube - Rethinking the Data Center Cluster 28/54

29 (1,1,0) (0,0,0) (1,0,0) (2,0,0) Packets move from service to service and server to server Services can intercept and process packets along the way Keys are re-mapped in case of failures Server resources (RAM, Disks, CPUs) are accessible to all services Paolo Costa CamCube - Rethinking the Data Center Cluster 29/54

30 (1,1,0) (0,0,0) (1,0,0) (2,0,0) Packets move from service to service and server to server Services can intercept and process packets along the way Keys are re-mapped in case of failures Server resources (RAM, Disks, CPUs) are accessible to all services Paolo Costa CamCube - Rethinking the Data Center Cluster 30/54

31 (1,1,0) (0,0,0) (1,0,0) (2,0,0) Packets move from service to service and server to server Services can intercept and process packets along the way Keys are re-mapped in case of failures Server resources (RAM, Disks, CPUs) are accessible to all services Paolo Costa CamCube - Rethinking the Data Center Cluster 31/54

32 (0,0,0) (1,0,0) (2,0,0) Packets move from service to service and server to server Services can intercept and process packets along the way Keys are re-mapped in case of failures Server resources (RAM, Disks, CPUs) are accessible to all services Paolo Costa CamCube - Rethinking the Data Center Cluster 32/54

33 27-server (3x3x3) prototype quad-core Intel Xeon Ghz 12GB RAM 6 Intel PRO/1000 PT 1 Gbps ports Runtime & services implemented in C# Multicast, key-value store, Reduce, graph processing engine, Experiment all servers saturating 6 links Gbps (9K pkts) using < 22% CPU Paolo Costa CamCube - Rethinking the Data Center Cluster 33/54

34 Reduce originally developed by Google open-source version (Hadoop) used by several companies Yahoo, Facebook, Twitter, Amazon, many applications data mining, search, indexing Simple abstraction map: apply a function that generates (key, value) pairs reduce: combine intermediate results Paolo Costa CamCube - Rethinking the Data Center Cluster 34/54

35 Reduce Paolo Costa CamCube - Rethinking the Data Center Cluster 35/54

36 Results stored on a single node Can handle all reduce functions Bottleneck (network and CPU) Reduce Reduce Paolo Costa CamCube - Rethinking the Data Center Cluster 36/54

37 CPU & network load distributed N 2 flows: needs full bisection bw results are split across R servers not always applicable (e.g., max) Reduce Reduce Reduce Reduce Reduce Reduce Reduce Reduce Paolo Costa CamCube - Rethinking the Data Center Cluster 37/54

38 Reduce Paolo Costa CamCube - Rethinking the Data Center Cluster 38/54

39 Use hierarchical aggregation! Reduce Paolo Costa CamCube - Rethinking the Data Center Cluster 39/54

40 Paolo Costa CamCube - Rethinking the Data Center Cluster 40/54

41 Intermediate results can be aggregated at each hop Aggregate Aggregate Aggregate Paolo Costa CamCube - Rethinking the Data Center Cluster 41/54

42 Paolo Costa CamCube - Rethinking the Data Center Cluster 42/54

43 In-network aggregation Paolo Costa CamCube - Rethinking the Data Center Cluster 43/54

44 Traffic reduced at each hop No need for full bisection bw Final results stored on a single node In-network aggregation Paolo Costa CamCube - Rethinking the Data Center Cluster 44/54

45 Camdoop builds 1 tree failure resilient (up to) 6 Gbps throughput / server Paolo Costa CamCube - Rethinking the Data Center Cluster 45/54

46 Camdoop builds 1 tree 6 disjoint trees failure resilient (up to) 6 Gbps throughput / server Paolo Costa CamCube - Rethinking the Data Center Cluster 46/54

47 27-server testbed 27 GB input size 1GB / server Output size/ intermediate size (S) S=1/N 0 (full aggregation) S=1 (no aggregation) Two configurations: All-to-one All-to-all Paolo Costa CamCube - Rethinking the Data Center Cluster 47/54

48 Time (s) logscale Worse switch 10 Camdoop (no agg) Camdoop Switch 1 Gbps (bound) Full aggregation Output size / intermediate size (S) No aggregation Better Paolo Costa CamCube - Rethinking the Data Center Cluster 48/54

49 Time (s) logscale Impact of aggregation Performance independent of S Impact of higher bandwidth Worse Better 10 Facebook reported aggregation ratio switch Camdoop (no agg) Camdoop Switch 1 Gbps (bound) Full aggregation Output size / intermediate size (S) No aggregation Paolo Costa CamCube - Rethinking the Data Center Cluster 49/54

50 Time (s) logscale Worse switch Camdoop (no agg) Camdoop Switch 1 Gbps (bound) Output size / intermediate size (S) Full aggregation No aggregation Better Paolo Costa CamCube - Rethinking the Data Center Cluster 50/54

51 Time (s) logscale Impact of higher bandwidth Worse 1000 Impact of aggregation 100 switch Camdoop (no agg) Camdoop Switch 1 Gbps (bound) Output size / intermediate size (S) Full aggregation No aggregation Better Paolo Costa CamCube - Rethinking the Data Center Cluster 51/54

52 Time (ms) Worse switch Camdoop (no agg) Camdoop Better 0 20 KB 200 KB Input data size / server Paolo Costa CamCube - Rethinking the Data Center Cluster 52/54

53 Time (ms) In-network aggregation beneficial also for low-latency services Worse switch Camdoop (no agg) Camdoop Better 0 20 KB 200 KB Input data size / server Paolo Costa CamCube - Rethinking the Data Center Cluster 53/54

54 Building large-scale services is hard mismatch between physical and logical topology the network is a black box CamCube designed from the perspective of developers explicit topology and key-based abstraction Benefits simplified development more efficient applications and better fault-tolerance H. Abu-Libdeh, P. Costa, A. Rowstron, G. O'Shea, and A. Donnelly. Symbiotic Routing for Future Data Centres, In ACM SIGCOMM, P. Costa, A. Donnelly, A. Rowstron, and G. O'Shea. Camdoop: Exploiting Innetwork Aggregation for Big Data Applications, In USENIX NSDI, 2012.

55 Paolo Costa

Ant Rowstron. Joint work with Paolo Costa, Austin Donnelly and Greg O Shea Microsoft Research Cambridge. Hussam Abu-Libdeh, Simon Schubert Interns

Ant Rowstron. Joint work with Paolo Costa, Austin Donnelly and Greg O Shea Microsoft Research Cambridge. Hussam Abu-Libdeh, Simon Schubert Interns Ant Rowstron Joint work with Paolo Costa, Austin Donnelly and Greg O Shea Microsoft Research Cambridge Hussam Abu-Libdeh, Simon Schubert Interns Thinking of large-scale data centers Microsoft, Google,

More information

Symbiotic Routing in Future Data Centers

Symbiotic Routing in Future Data Centers Symbiotic Routing in Future Data Centers Hussam Abu-Libdeh Microsoft Research Cambridge, UK. hussam@cs.cornell.edu Greg O Shea Microsoft Research Cambridge, UK. gregos@microsoft.com Paolo Costa Microsoft

More information

Camdoop: Exploiting In-network Aggregation for Big Data Applications

Camdoop: Exploiting In-network Aggregation for Big Data Applications : Exploiting In-network Aggregation for Big Data Applications Paolo Costa Austin Donnelly Antony Rowstron Greg O Shea Microsoft Research Cambridge Imperial College London Abstract Large companies like

More information

Camdoop: Exploiting In-network Aggregation for Big Data Applications

Camdoop: Exploiting In-network Aggregation for Big Data Applications : Exploiting In-network Aggregation for Big Data Applications Paolo Costa Austin Donnelly Antony Rowstron Greg O Shea Microsoft Research Cambridge Imperial College London Abstract Large companies like

More information

Towards Rack-scale Computing Challenges and Opportunities

Towards Rack-scale Computing Challenges and Opportunities Towards Rack-scale Computing Challenges and Opportunities Paolo Costa paolo.costa@microsoft.com joint work with Raja Appuswamy, Hitesh Ballani, Sergey Legtchenko, Dushyanth Narayanan, Ant Rowstron Hardware

More information

Operating Systems. Cloud Computing and Data Centers

Operating Systems. Cloud Computing and Data Centers Operating ystems Fall 2014 Cloud Computing and Data Centers Myungjin Lee myungjin.lee@ed.ac.uk 2 Google data center locations 3 A closer look 4 Inside data center 5 A datacenter has 50-250 containers A

More information

Computing at the Edge: Improving the Performance of Big Data and Mobile Apps with In-Network Computation

Computing at the Edge: Improving the Performance of Big Data and Mobile Apps with In-Network Computation Computing at the Edge: Improving the Performance of Big Data and Mobile Apps with In-Network Computation Peter Pietzuch (joint work with Andreas Pamboris*, Lukas Rupprecht, Luo Mai, Abdul Alim, Paolo Costa*,

More information

Big Data and Apache Hadoop s MapReduce

Big Data and Apache Hadoop s MapReduce Big Data and Apache Hadoop s MapReduce Michael Hahsler Computer Science and Engineering Southern Methodist University January 23, 2012 Michael Hahsler (SMU/CSE) Hadoop/MapReduce January 23, 2012 1 / 23

More information

Load Balancing Mechanisms in Data Center Networks

Load Balancing Mechanisms in Data Center Networks Load Balancing Mechanisms in Data Center Networks Santosh Mahapatra Xin Yuan Department of Computer Science, Florida State University, Tallahassee, FL 33 {mahapatr,xyuan}@cs.fsu.edu Abstract We consider

More information

Advanced Computer Networks. Datacenter Network Fabric

Advanced Computer Networks. Datacenter Network Fabric Advanced Computer Networks 263 3501 00 Datacenter Network Fabric Patrick Stuedi Spring Semester 2014 Oriana Riva, Department of Computer Science ETH Zürich 1 Outline Last week Today Supercomputer networking

More information

Firebird meets NoSQL (Apache HBase) Case Study

Firebird meets NoSQL (Apache HBase) Case Study Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI

More information

Network-Aware Scheduling of MapReduce Framework on Distributed Clusters over High Speed Networks

Network-Aware Scheduling of MapReduce Framework on Distributed Clusters over High Speed Networks Network-Aware Scheduling of MapReduce Framework on Distributed Clusters over High Speed Networks Praveenkumar Kondikoppa, Chui-Hui Chiu, Cheng Cui, Lin Xue and Seung-Jong Park Department of Computer Science,

More information

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms Distributed File System 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributed File System Don t move data to workers move workers to the data! Store data on the local disks of nodes

More information

Hadoop Architecture. Part 1

Hadoop Architecture. Part 1 Hadoop Architecture Part 1 Node, Rack and Cluster: A node is simply a computer, typically non-enterprise, commodity hardware for nodes that contain data. Consider we have Node 1.Then we can add more nodes,

More information

Big Data in the Enterprise: Network Design Considerations

Big Data in the Enterprise: Network Design Considerations White Paper Big Data in the Enterprise: Network Design Considerations What You Will Learn This document examines the role of big data in the enterprise as it relates to network design considerations. It

More information

Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging

Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging Outline High Performance Computing (HPC) Towards exascale computing: a brief history Challenges in the exascale era Big Data meets HPC Some facts about Big Data Technologies HPC and Big Data converging

More information

Hadoop IST 734 SS CHUNG

Hadoop IST 734 SS CHUNG Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to

More information

Exploiting In-network Processing for Big Data Management

Exploiting In-network Processing for Big Data Management Exploiting In-network Processing for Big Data Management Lukas Rupprecht supervised by Dr. Peter Pietzuch Department of Computing Imperial College London Expected Graduation Date 216 l.rupprecht12@imperial.ac.uk

More information

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software WHITEPAPER Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software SanDisk ZetaScale software unlocks the full benefits of flash for In-Memory Compute and NoSQL applications

More information

A Study on Workload Imbalance Issues in Data Intensive Distributed Computing

A Study on Workload Imbalance Issues in Data Intensive Distributed Computing A Study on Workload Imbalance Issues in Data Intensive Distributed Computing Sven Groot 1, Kazuo Goda 1, and Masaru Kitsuregawa 1 University of Tokyo, 4-6-1 Komaba, Meguro-ku, Tokyo 153-8505, Japan Abstract.

More information

Large-Scale Data Processing

Large-Scale Data Processing Large-Scale Data Processing Eiko Yoneki eiko.yoneki@cl.cam.ac.uk http://www.cl.cam.ac.uk/~ey204 Systems Research Group University of Cambridge Computer Laboratory 2010s: Big Data Why Big Data now? Increase

More information

From GWS to MapReduce: Google s Cloud Technology in the Early Days

From GWS to MapReduce: Google s Cloud Technology in the Early Days Large-Scale Distributed Systems From GWS to MapReduce: Google s Cloud Technology in the Early Days Part II: MapReduce in a Datacenter COMP6511A Spring 2014 HKUST Lin Gu lingu@ieee.org MapReduce/Hadoop

More information

Load Balancing in Data Center Networks

Load Balancing in Data Center Networks Load Balancing in Data Center Networks Henry Xu Computer Science City University of Hong Kong HKUST, March 2, 2015 Background Aggregator Aggregator Aggregator Worker Worker Worker Worker Low latency for

More information

Introduction to Hadoop and MapReduce

Introduction to Hadoop and MapReduce Introduction to Hadoop and MapReduce THE CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION Large-scale Computation Traditional solutions for computing large quantities of data

More information

Computer Networks COSC 6377

Computer Networks COSC 6377 Computer Networks COSC 6377 Lecture 25 Fall 2011 November 30, 2011 1 Announcements Grades will be sent to each student for verificagon P2 deadline extended 2 Large- scale computagon Search Engine Tasks

More information

Network Virtualization and Data Center Networks 263-3825-00 Data Center Virtualization - Basics. Qin Yin Fall Semester 2013

Network Virtualization and Data Center Networks 263-3825-00 Data Center Virtualization - Basics. Qin Yin Fall Semester 2013 Network Virtualization and Data Center Networks 263-3825-00 Data Center Virtualization - Basics Qin Yin Fall Semester 2013 1 Walmart s Data Center 2 Amadeus Data Center 3 Google s Data Center 4 Data Center

More information

Networking Virtualization Using FPGAs

Networking Virtualization Using FPGAs Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Massachusetts,

More information

Comparative analysis of Google File System and Hadoop Distributed File System

Comparative analysis of Google File System and Hadoop Distributed File System Comparative analysis of Google File System and Hadoop Distributed File System R.Vijayakumari, R.Kirankumar, K.Gangadhara Rao Dept. of Computer Science, Krishna University, Machilipatnam, India, vijayakumari28@gmail.com

More information

DEPLOYING AND MONITORING HADOOP MAP-REDUCE ANALYTICS ON SINGLE-CHIP CLOUD COMPUTER

DEPLOYING AND MONITORING HADOOP MAP-REDUCE ANALYTICS ON SINGLE-CHIP CLOUD COMPUTER DEPLOYING AND MONITORING HADOOP MAP-REDUCE ANALYTICS ON SINGLE-CHIP CLOUD COMPUTER ANDREAS-LAZAROS GEORGIADIS, SOTIRIOS XYDIS, DIMITRIOS SOUDRIS MICROPROCESSOR AND MICROSYSTEMS LABORATORY ELECTRICAL AND

More information

Optimizing Data Center Networks for Cloud Computing

Optimizing Data Center Networks for Cloud Computing PRAMAK 1 Optimizing Data Center Networks for Cloud Computing Data Center networks have evolved over time as the nature of computing changed. They evolved to handle the computing models based on main-frames,

More information

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware Created by Doug Cutting and Mike Carafella in 2005. Cutting named the program after

More information

NoSQL Data Base Basics

NoSQL Data Base Basics NoSQL Data Base Basics Course Notes in Transparency Format Cloud Computing MIRI (CLC-MIRI) UPC Master in Innovation & Research in Informatics Spring- 2013 Jordi Torres, UPC - BSC www.jorditorres.eu HDFS

More information

Shared Parallel File System

Shared Parallel File System Shared Parallel File System Fangbin Liu fliu@science.uva.nl System and Network Engineering University of Amsterdam Shared Parallel File System Introduction of the project The PVFS2 parallel file system

More information

CSE-E5430 Scalable Cloud Computing Lecture 2

CSE-E5430 Scalable Cloud Computing Lecture 2 CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 14.9-2015 1/36 Google MapReduce A scalable batch processing

More information

Nobody ever got fired for using Hadoop on a cluster

Nobody ever got fired for using Hadoop on a cluster Nobody ever got fired for using Hadoop on a cluster Antony Rowstron Dushyanth Narayanan Austin Donnelly Greg O Shea Andrew Douglas Microsoft Research, Cambridge Abstract The norm for data analytics is

More information

MMPTCP: A Novel Transport Protocol for Data Centre Networks

MMPTCP: A Novel Transport Protocol for Data Centre Networks MMPTCP: A Novel Transport Protocol for Data Centre Networks Morteza Kheirkhah FoSS, Department of Informatics, University of Sussex Modern Data Centre Networks FatTree It provides full bisection bandwidth

More information

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop) CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop) Rezaul A. Chowdhury Department of Computer Science SUNY Stony Brook Spring 2016 MapReduce MapReduce is a programming model

More information

MapReduce and Hadoop Distributed File System V I J A Y R A O

MapReduce and Hadoop Distributed File System V I J A Y R A O MapReduce and Hadoop Distributed File System 1 V I J A Y R A O The Context: Big-data Man on the moon with 32KB (1969); my laptop had 2GB RAM (2009) Google collects 270PB data in a month (2007), 20000PB

More information

Introduction to Big Data Training

Introduction to Big Data Training Introduction to Big Data Training The quickest way to be introduce with NOSQL/BIG DATA offerings Learn and experience Big Data Solutions including Hadoop HDFS, Map Reduce, NoSQL DBs: Document Based DB

More information

Intro to Map/Reduce a.k.a. Hadoop

Intro to Map/Reduce a.k.a. Hadoop Intro to Map/Reduce a.k.a. Hadoop Based on: Mining of Massive Datasets by Ra jaraman and Ullman, Cambridge University Press, 2011 Data Mining for the masses by North, Global Text Project, 2012 Slides by

More information

Portland: how to use the topology feature of the datacenter network to scale routing and forwarding

Portland: how to use the topology feature of the datacenter network to scale routing and forwarding LECTURE 15: DATACENTER NETWORK: TOPOLOGY AND ROUTING Xiaowei Yang 1 OVERVIEW Portland: how to use the topology feature of the datacenter network to scale routing and forwarding ElasticTree: topology control

More information

Network performance in virtual infrastructures

Network performance in virtual infrastructures Network performance in virtual infrastructures A closer look at Amazon EC2 Alexandru-Dorin GIURGIU University of Amsterdam System and Network Engineering Master 03 February 2010 Coordinators: Paola Grosso

More information

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack HIGHLIGHTS Real-Time Results Elasticsearch on Cisco UCS enables a deeper

More information

A Policy-aware Switching Layer for Data Centers

A Policy-aware Switching Layer for Data Centers A Policy-aware Switching Layer for Data Centers Dilip Joseph Arsalan Tavakoli Ion Stoica University of California at Berkeley 1 Problem: Middleboxes are hard to deploy Place on network path Overload path

More information

How swift is your Swift? Ning Zhang, OpenStack Engineer at Zmanda Chander Kant, CEO at Zmanda

How swift is your Swift? Ning Zhang, OpenStack Engineer at Zmanda Chander Kant, CEO at Zmanda How swift is your Swift? Ning Zhang, OpenStack Engineer at Zmanda Chander Kant, CEO at Zmanda 1 Outline Build a cost-efficient Swift cluster with expected performance Background & Problem Solution Experiments

More information

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee dhruba@apache.org dhruba@facebook.com

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee dhruba@apache.org dhruba@facebook.com Hadoop Distributed File System Dhruba Borthakur Apache Hadoop Project Management Committee dhruba@apache.org dhruba@facebook.com Hadoop, Why? Need to process huge datasets on large clusters of computers

More information

Radhika Niranjan Mysore, Andreas Pamboris, Nathan Farrington, Nelson Huang, Pardis Miri, Sivasankar Radhakrishnan, Vikram Subramanya and Amin Vahdat

Radhika Niranjan Mysore, Andreas Pamboris, Nathan Farrington, Nelson Huang, Pardis Miri, Sivasankar Radhakrishnan, Vikram Subramanya and Amin Vahdat Radhika Niranjan Mysore, Andreas Pamboris, Nathan Farrington, Nelson Huang, Pardis Miri, Sivasankar Radhakrishnan, Vikram Subramanya and Amin Vahdat 1 PortLand In A Nutshell PortLand is a single logical

More information

Data Center Networks

Data Center Networks Data Center Networks (Lecture #3) 1/04/2010 Professor H. T. Kung Harvard School of Engineering and Applied Sciences Copyright 2010 by H. T. Kung Main References Three Approaches VL2: A Scalable and Flexible

More information

Cloud Computing Where ISR Data Will Go for Exploitation

Cloud Computing Where ISR Data Will Go for Exploitation Cloud Computing Where ISR Data Will Go for Exploitation 22 September 2009 Albert Reuther, Jeremy Kepner, Peter Michaleas, William Smith This work is sponsored by the Department of the Air Force under Air

More information

MapReduce with Apache Hadoop Analysing Big Data

MapReduce with Apache Hadoop Analysing Big Data MapReduce with Apache Hadoop Analysing Big Data April 2010 Gavin Heavyside gavin.heavyside@journeydynamics.com About Journey Dynamics Founded in 2006 to develop software technology to address the issues

More information

DATA MINING WITH HADOOP AND HIVE Introduction to Architecture

DATA MINING WITH HADOOP AND HIVE Introduction to Architecture DATA MINING WITH HADOOP AND HIVE Introduction to Architecture Dr. Wlodek Zadrozny (Most slides come from Prof. Akella s class in 2014) 2015-2025. Reproduction or usage prohibited without permission of

More information

Comparative analysis of mapreduce job by keeping data constant and varying cluster size technique

Comparative analysis of mapreduce job by keeping data constant and varying cluster size technique Comparative analysis of mapreduce job by keeping data constant and varying cluster size technique Mahesh Maurya a, Sunita Mahajan b * a Research Scholar, JJT University, MPSTME, Mumbai, India,maheshkmaurya@yahoo.co.in

More information

Walmart s Data Center. Amadeus Data Center. Google s Data Center. Data Center Evolution 1.0. Data Center Evolution 2.0

Walmart s Data Center. Amadeus Data Center. Google s Data Center. Data Center Evolution 1.0. Data Center Evolution 2.0 Walmart s Data Center Network Virtualization and Data Center Networks 263-3825-00 Data Center Virtualization - Basics Qin Yin Fall emester 2013 1 2 Amadeus Data Center Google s Data Center 3 4 Data Center

More information

Data Center Network Topologies: FatTree

Data Center Network Topologies: FatTree Data Center Network Topologies: FatTree Hakim Weatherspoon Assistant Professor, Dept of Computer Science CS 5413: High Performance Systems and Networking September 22, 2014 Slides used and adapted judiciously

More information

What Is Datacenter (Warehouse) Computing. Distributed and Parallel Technology. Datacenter Computing Application Programming

What Is Datacenter (Warehouse) Computing. Distributed and Parallel Technology. Datacenter Computing Application Programming Distributed and Parallel Technology Datacenter and Warehouse Computing Hans-Wolfgang Loidl School of Mathematical and Computer Sciences Heriot-Watt University, Edinburgh 0 Based on earlier versions by

More information

Advanced Computer Networks. Scheduling

Advanced Computer Networks. Scheduling Oriana Riva, Department of Computer Science ETH Zürich Advanced Computer Networks 263-3501-00 Scheduling Patrick Stuedi, Qin Yin and Timothy Roscoe Spring Semester 2015 Outline Last time Load balancing

More information

VON/K: A Fast Virtual Overlay Network Embedded in KVM Hypervisor for High Performance Computing

VON/K: A Fast Virtual Overlay Network Embedded in KVM Hypervisor for High Performance Computing Journal of Information & Computational Science 9: 5 (2012) 1273 1280 Available at http://www.joics.com VON/K: A Fast Virtual Overlay Network Embedded in KVM Hypervisor for High Performance Computing Yuan

More information

Cisco s Massively Scalable Data Center

Cisco s Massively Scalable Data Center Cisco s Massively Scalable Data Center Network Fabric for Warehouse Scale Computer At-A-Glance Datacenter is the Computer MSDC is the Network Cisco s Massively Scalable Data Center (MSDC) is a framework

More information

Comparing the Network Performance of Windows File Sharing Environments

Comparing the Network Performance of Windows File Sharing Environments Technical Report Comparing the Network Performance of Windows File Sharing Environments Dan Chilton, Srinivas Addanki, NetApp September 2010 TR-3869 EXECUTIVE SUMMARY This technical report presents the

More information

Jeffrey D. Ullman slides. MapReduce for data intensive computing

Jeffrey D. Ullman slides. MapReduce for data intensive computing Jeffrey D. Ullman slides MapReduce for data intensive computing Single-node architecture CPU Machine Learning, Statistics Memory Classical Data Mining Disk Commodity Clusters Web data sets can be very

More information

Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck

Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck Sockets vs. RDMA Interface over 1-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck Pavan Balaji Hemal V. Shah D. K. Panda Network Based Computing Lab Computer Science and Engineering

More information

Question: 3 When using Application Intelligence, Server Time may be defined as.

Question: 3 When using Application Intelligence, Server Time may be defined as. 1 Network General - 1T6-521 Application Performance Analysis and Troubleshooting Question: 1 One component in an application turn is. A. Server response time B. Network process time C. Application response

More information

Can High-Performance Interconnects Benefit Memcached and Hadoop?

Can High-Performance Interconnects Benefit Memcached and Hadoop? Can High-Performance Interconnects Benefit Memcached and Hadoop? D. K. Panda and Sayantan Sur Network-Based Computing Laboratory Department of Computer Science and Engineering The Ohio State University,

More information

GraySort and MinuteSort at Yahoo on Hadoop 0.23

GraySort and MinuteSort at Yahoo on Hadoop 0.23 GraySort and at Yahoo on Hadoop.23 Thomas Graves Yahoo! May, 213 The Apache Hadoop[1] software library is an open source framework that allows for the distributed processing of large data sets across clusters

More information

A Comparative Study of Data Center Network Architectures

A Comparative Study of Data Center Network Architectures A Comparative Study of Data Center Network Architectures Kashif Bilal Fargo, ND 58108, USA Kashif.Bilal@ndsu.edu Limin Zhang Fargo, ND 58108, USA limin.zhang@ndsu.edu Nasro Min-Allah COMSATS Institute

More information

Lecture 7: Data Center Networks"

Lecture 7: Data Center Networks Lecture 7: Data Center Networks" CSE 222A: Computer Communication Networks Alex C. Snoeren Thanks: Nick Feamster Lecture 7 Overview" Project discussion Data Centers overview Fat Tree paper discussion CSE

More information

The Comprehensive Performance Rating for Hadoop Clusters on Cloud Computing Platform

The Comprehensive Performance Rating for Hadoop Clusters on Cloud Computing Platform The Comprehensive Performance Rating for Hadoop Clusters on Cloud Computing Platform Fong-Hao Liu, Ya-Ruei Liou, Hsiang-Fu Lo, Ko-Chin Chang, and Wei-Tsong Lee Abstract Virtualization platform solutions

More information

Large-Scale Distributed Systems. Datacenter Networks. COMP6511A Spring 2014 HKUST. Lin Gu lingu@ieee.org

Large-Scale Distributed Systems. Datacenter Networks. COMP6511A Spring 2014 HKUST. Lin Gu lingu@ieee.org Large-Scale Distributed Systems Datacenter Networks COMP6511A Spring 2014 HKUST Lin Gu lingu@ieee.org Datacenter Networking Major Components of a Datacenter Computing hardware (equipment racks) Power supply

More information

B4: Experience with a Globally-Deployed Software Defined WAN TO APPEAR IN SIGCOMM 13

B4: Experience with a Globally-Deployed Software Defined WAN TO APPEAR IN SIGCOMM 13 B4: Experience with a Globally-Deployed Software Defined WAN TO APPEAR IN SIGCOMM 13 Google s Software Defined WAN Traditional WAN Routing Treat all bits the same 30% ~ 40% average utilization Cost of

More information

Transparent and Flexible Network Management for Big Data Processing in the Cloud

Transparent and Flexible Network Management for Big Data Processing in the Cloud Transparent and Flexible Network Management for Big Data Processing in the Cloud Anupam Das, Cristian Lumezanu, Yueping Zhang, Vishal Singh, Guofei Jiang, Curtis Yu UIUC NEC Labs UC Riverside Abstract

More information

Longer is Better? Exploiting Path Diversity in Data Centre Networks

Longer is Better? Exploiting Path Diversity in Data Centre Networks Longer is Better? Exploiting Path Diversity in Data Centre Networks Fung Po (Posco) Tso, Gregg Hamilton, Rene Weber, Colin S. Perkins and Dimitrios P. Pezaros University of Glasgow Cloud Data Centres Are

More information

Data-Intensive Computing with Map-Reduce and Hadoop

Data-Intensive Computing with Map-Reduce and Hadoop Data-Intensive Computing with Map-Reduce and Hadoop Shamil Humbetov Department of Computer Engineering Qafqaz University Baku, Azerbaijan humbetov@gmail.com Abstract Every day, we create 2.5 quintillion

More information

MapReduce and Hadoop Distributed File System

MapReduce and Hadoop Distributed File System MapReduce and Hadoop Distributed File System 1 B. RAMAMURTHY Contact: Dr. Bina Ramamurthy CSE Department University at Buffalo (SUNY) bina@buffalo.edu http://www.cse.buffalo.edu/faculty/bina Partially

More information

SMB Advanced Networking for Fault Tolerance and Performance. Jose Barreto Principal Program Managers Microsoft Corporation

SMB Advanced Networking for Fault Tolerance and Performance. Jose Barreto Principal Program Managers Microsoft Corporation SMB Advanced Networking for Fault Tolerance and Performance Jose Barreto Principal Program Managers Microsoft Corporation Agenda SMB Remote File Storage for Server Apps SMB Direct (SMB over RDMA) SMB Multichannel

More information

Data Center Network Architectures

Data Center Network Architectures Servers Servers Servers Data Center Network Architectures Juha Salo Aalto University School of Science and Technology juha.salo@aalto.fi Abstract Data centers have become increasingly essential part of

More information

SMB Direct for SQL Server and Private Cloud

SMB Direct for SQL Server and Private Cloud SMB Direct for SQL Server and Private Cloud Increased Performance, Higher Scalability and Extreme Resiliency June, 2014 Mellanox Overview Ticker: MLNX Leading provider of high-throughput, low-latency server

More information

Software Defined Network Support for Real Distributed Systems

Software Defined Network Support for Real Distributed Systems Software Defined Network Support for Real Distributed Systems Chen Liang Project for Reading and Research Spring 2012, Fall 2012 Abstract Software defined network is an emerging technique that allow users

More information

Application Development. A Paradigm Shift

Application Development. A Paradigm Shift Application Development for the Cloud: A Paradigm Shift Ramesh Rangachar Intelsat t 2012 by Intelsat. t Published by The Aerospace Corporation with permission. New 2007 Template - 1 Motivation for the

More information

Data Center Networking with Multipath TCP

Data Center Networking with Multipath TCP Data Center Networking with Multipath TCP Costin Raiciu, Christopher Pluntke, Sebastien Barre, Adam Greenhalgh, Damon Wischik, Mark Handley Hotnets 2010 報 告 者 : 莊 延 安 Outline Introduction Analysis Conclusion

More information

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, sborkar95@gmail.com Assistant Professor, Information

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University. http://cs246.stanford.edu

CS246: Mining Massive Datasets Jure Leskovec, Stanford University. http://cs246.stanford.edu CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 2 CPU Memory Machine Learning, Statistics Classical Data Mining Disk 3 20+ billion web pages x 20KB = 400+ TB

More information

Yahoo! Cloud Serving Benchmark

Yahoo! Cloud Serving Benchmark Yahoo! Cloud Serving Benchmark Overview and results March 31, 2010 Brian F. Cooper cooperb@yahoo-inc.com Joint work with Adam Silberstein, Erwin Tam, Raghu Ramakrishnan and Russell Sears System setup and

More information

Performance characterization report for Microsoft Hyper-V R2 on HP StorageWorks P4500 SAN storage

Performance characterization report for Microsoft Hyper-V R2 on HP StorageWorks P4500 SAN storage Performance characterization report for Microsoft Hyper-V R2 on HP StorageWorks P4500 SAN storage Technical white paper Table of contents Executive summary... 2 Introduction... 2 Test methodology... 3

More information

The Performance Characteristics of MapReduce Applications on Scalable Clusters

The Performance Characteristics of MapReduce Applications on Scalable Clusters The Performance Characteristics of MapReduce Applications on Scalable Clusters Kenneth Wottrich Denison University Granville, OH 43023 wottri_k1@denison.edu ABSTRACT Many cluster owners and operators have

More information

PortLand:! A Scalable Fault-Tolerant Layer 2 Data Center Network Fabric

PortLand:! A Scalable Fault-Tolerant Layer 2 Data Center Network Fabric PortLand:! A Scalable Fault-Tolerant Layer 2 Data Center Network Fabric Radhika Niranjan Mysore, Andreas Pamboris, Nathan Farrington, Nelson Huang, Pardis Miri, Sivasankar Radhakrishnan, Vikram Subramanya,

More information

A Review on the Role of Big Data in Business

A Review on the Role of Big Data in Business Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IJCSMC, Vol. 3, Issue.

More information

Chapter 7. Using Hadoop Cluster and MapReduce

Chapter 7. Using Hadoop Cluster and MapReduce Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in

More information

Hadoop and Map-Reduce. Swati Gore

Hadoop and Map-Reduce. Swati Gore Hadoop and Map-Reduce Swati Gore Contents Why Hadoop? Hadoop Overview Hadoop Architecture Working Description Fault Tolerance Limitations Why Map-Reduce not MPI Distributed sort Why Hadoop? Existing Data

More information

Hadoop Distributed File System. T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela

Hadoop Distributed File System. T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela Hadoop Distributed File System T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela Agenda Introduction Flesh and bones of HDFS Architecture Accessing data Data replication strategy Fault tolerance

More information

102 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 19, NO. 1, FEBRUARY 2011

102 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 19, NO. 1, FEBRUARY 2011 102 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 19, NO. 1, FEBRUARY 2011 Scalable and Cost-Effective Interconnection of Data-Center Servers Using Dual Server Ports Dan Li, Member, IEEE, Chuanxiong Guo, Haitao

More information

Event Processing over a Distributed JSON Store - Design and Performance -

Event Processing over a Distributed JSON Store - Design and Performance - Processing over a Distributed JSON Store - Design and Performance - IBM Research Miki Enoki, Jerome Simeon Hiroshi Horii, Martin Hirzel Business Processing -driven architecture to coordinate human activities

More information

MANAGEMENT OF DATA REPLICATION FOR PC CLUSTER BASED CLOUD STORAGE SYSTEM

MANAGEMENT OF DATA REPLICATION FOR PC CLUSTER BASED CLOUD STORAGE SYSTEM MANAGEMENT OF DATA REPLICATION FOR PC CLUSTER BASED CLOUD STORAGE SYSTEM Julia Myint 1 and Thinn Thu Naing 2 1 University of Computer Studies, Yangon, Myanmar juliamyint@gmail.com 2 University of Computer

More information

Multipath TCP in Data Centres (work in progress)

Multipath TCP in Data Centres (work in progress) Multipath TCP in Data Centres (work in progress) Costin Raiciu Joint work with Christopher Pluntke, Adam Greenhalgh, Sebastien Barre, Mark Handley, Damon Wischik Data Centre Trends Cloud services are driving

More information

Data Center Network Topologies: VL2 (Virtual Layer 2)

Data Center Network Topologies: VL2 (Virtual Layer 2) Data Center Network Topologies: VL2 (Virtual Layer 2) Hakim Weatherspoon Assistant Professor, Dept of Computer cience C 5413: High Performance ystems and Networking eptember 26, 2014 lides used and adapted

More information

A Performance Analysis of Distributed Indexing using Terrier

A Performance Analysis of Distributed Indexing using Terrier A Performance Analysis of Distributed Indexing using Terrier Amaury Couste Jakub Kozłowski William Martin Indexing Indexing Used by search

More information

RAMCloud: Scalable Datacenter Storage Entirely in DRAM. John Ousterhout Stanford University

RAMCloud: Scalable Datacenter Storage Entirely in DRAM. John Ousterhout Stanford University RAMCloud: Scalable Datacenter Storage Entirely in DRAM John Ousterhout Stanford University Introduction New research project at Stanford Create large-scale storage systems entirely in DRAM Interesting

More information

Hitachi s HSP Hyper-Converged Appliance makes Big Data Analytics fit the Enterprise

Hitachi s HSP Hyper-Converged Appliance makes Big Data Analytics fit the Enterprise Hitachi s HSP Hyper-Converged Appliance makes Big Data Analytics fit the Enterprise Prepared by: George Crump, Lead Analyst Prepared: January 2016 Hitachi s HSP Hyper-Converged Appliance makes Big Data

More information

Compute Summit. Santa Clara. Friday, January 18, 13

Compute Summit. Santa Clara. Friday, January 18, 13 Compute Summit January 16-17, 2013 Santa Clara Compute Summit Disaggregated Rack Jason Taylor, PhD Director, Capacity Engineering & Analysis Agenda 1 Facebook Scale & Infrastructure 2 Disaggregated Rack

More information

Coding Techniques for Efficient, Reliable Networked Distributed Storage in Data Centers

Coding Techniques for Efficient, Reliable Networked Distributed Storage in Data Centers Coding Techniques for Efficient, Reliable Networked Distributed Storage in Data Centers Anwitaman Datta Joint work with Frédérique Oggier (SPMS) School of Computer Engineering Nanyang Technological University

More information

Data Mining with Hadoop at TACC

Data Mining with Hadoop at TACC Data Mining with Hadoop at TACC Weijia Xu Data Mining & Statistics Data Mining & Statistics Group Main activities Research and Development Developing new data mining and analysis solutions for practical

More information