Paolo Costa
|
|
|
- Shon McBride
- 10 years ago
- Views:
Transcription
1 joint work with Ant Rowstron, Austin Donnelly, and Greg O Shea (MSR Cambridge) Hussam Abu-Libdeh, Simon Schubert (Interns) Paolo Costa [email protected]
2 Paolo Costa CamCube - Rethinking the Data Center Cluster 2/54
3 Data Centers Large group of networked servers Typically built from commodity hardware Scale-out vs. scale-up Many low-cost PCs rather than few expensive high-end servers Economies of scale Custom software stack Microsoft: Autopilot, Dryad Google: GFS, BigTable, Reduce Yahoo: Hadoop, HDFS Amazon: Dynamo, Astrolabe Facebook: Cassandra, Scribe, HBase Focus of this talk Paolo Costa CamCube - Rethinking the Data Center Cluster 3/54
4 Mismatch between abstraction & reality How programmers see the network Key-based Overlay networks Explicit structure Symmetry of role Paolo Costa CamCube - Rethinking the Data Center Cluster 4/54
5 Paolo Costa CamCube - Rethinking the Data Center Cluster 5/54
6 ~12 racks / pod 10 Gbps 10 Gbps ~ servers / rack Paolo Costa CamCube - Rethinking the Data Center Cluster 6/54
7 ~12 racks / pod 10 Gbps 10 Gbps ~ servers / rack Paolo Costa CamCube - Rethinking the Data Center Cluster 7/54
8 ~12 racks / pod 10 Gbps 10 Gbps Data center services are distributed but they have no control over the network Must infer network properties (e.g., locality) Must use inefficient overlay networks No bandwidth guarantee ~ servers / rack Paolo Costa CamCube - Rethinking the Data Center Cluster 8/54
9 Top-down (applications) Custom routing? Overlays Priority traffic? Use multiple TCP flows Locality? Use traceroute and ping TCP Incast? Add random jitter Bottom-up (network) More bandwidth? Use richer topologies (e.g., fat-trees) Fate-sharing? Use prediction models to estimate flow length / traffic patterns TCP Incast? Use ECN in routers Paolo Costa CamCube - Rethinking the Data Center Cluster 9/24
10 The network is still a black box Good for a multi-owned, heterogeneous Internet Limiting for a single-owned, homogenous data center DCs are not mini-internets single owner / administration domain we know (and define) the topology low hardware/software diversity no malicious or selfish node Is TCP/IP still appropriate? Paolo Costa CamCube - Rethinking the Data Center Cluster 10/54
11 A data center from the perspective of a distributed systems builder What happens if you integrate network & compute? Distributed Systems CamCube HPC Networking Paolo Costa CamCube - Rethinking the Data Center Cluster 11/54
12 Service composition Architecture TCP/IP Link and key-based API Topology Paolo Costa CamCube - Rethinking the Data Center Cluster 12/54
13 y (1,2,1) (1,2,2) (1,2,2) (1,2,1) No distinction between overlay and underlying network Nodes have (x,y,z)coordinate defines key-space Simple one hop API to send/receive packets all other functionality implemented as services Key coordinates are re-mapped in case of failures enable a KBR-like API z x Paolo Costa CamCube - Rethinking the Data Center Cluster 13/54
14 y (1,2,2) (1,2,1) No distinction between overlay and underlying network Nodes have (x,y,z)coordinate defines key-space Simple one hop API to send/receive packets all other functionality implemented as services Key coordinates are re-mapped in case of failures enable a KBR-like API z x Paolo Costa CamCube - Rethinking the Data Center Cluster 14/54
15 Fetch mail costa:1234 (0,0,0) (1,0,0) (2,0,0) Packets move from service to service and server to server Services can intercept and process packets along the way Keys are re-mapped in case of failures Server resources (RAM, Disks, CPUs) are accessible to all services Paolo Costa CamCube - Rethinking the Data Center Cluster 15/54
16 Fetch mail costa:1234 (0,0,0) (1,0,0) (2,0,0) Packets move from service to service and server to server Services can intercept and process packets along the way Keys are re-mapped in case of failures Server resources (RAM, Disks, CPUs) are accessible to all services Paolo Costa CamCube - Rethinking the Data Center Cluster 16/54
17 Fetch mail costa:1234 (0,0,0) (1,0,0) (2,0,0) Packets move from service to service and server to server Services can intercept and process packets along the way Keys are re-mapped in case of failures Server resources (RAM, Disks, CPUs) are accessible to all services Paolo Costa CamCube - Rethinking the Data Center Cluster 17/54
18 Fetch mail costa:1234 (0,0,0) (1,0,0) (2,0,0) Packets move from service to service and server to server Services can intercept and process packets along the way Keys are re-mapped in case of failures Server resources (RAM, Disks, CPUs) are accessible to all services Paolo Costa CamCube - Rethinking the Data Center Cluster 18/54
19 Fetch mail costa:1234 (0,0,0) (1,0,0) (2,0,0) Packets move from service to service and server to server Services can intercept and process packets along the way Keys are re-mapped in case of failures Server resources (RAM, Disks, CPUs) are accessible to all services Paolo Costa CamCube - Rethinking the Data Center Cluster 19/54
20 Fetch mail costa:1234 (0,0,0) (1,0,0) (2,0,0) Packets move from service to service and server to server Services can intercept and process packets along the way Keys are re-mapped in case of failures Server resources (RAM, Disks, CPUs) are accessible to all services Paolo Costa CamCube - Rethinking the Data Center Cluster 20/54
21 Fetch mail costa:1234 (0,0,0) (1,0,0) (2,0,0) Packets move from service to service and server to server Services can intercept and process packets along the way Keys are re-mapped in case of failures Server resources (RAM, Disks, CPUs) are accessible to all services Paolo Costa CamCube - Rethinking the Data Center Cluster 21/54
22 Fetch mail costa:1234 (0,0,0) (1,0,0) (2,0,0) Packets move from service to service and server to server Services can intercept and process packets along the way Keys are re-mapped in case of failures Server resources (RAM, Disks, CPUs) are accessible to all services Paolo Costa CamCube - Rethinking the Data Center Cluster 22/54
23 Fetch mail costa:1234 (0,0,0) (1,0,0) (2,0,0) Packets move from service to service and server to server Services can intercept and process packets along the way Keys are re-mapped in case of failures Server resources (RAM, Disks, CPUs) are accessible to all services Paolo Costa CamCube - Rethinking the Data Center Cluster 23/54
24 Fetch mail costa:1234 (0,0,0) (1,0,0) (2,0,0) Packets move from service to service and server to server Services can intercept and process packets along the way Keys are re-mapped in case of failures Server resources (RAM, Disks, CPUs) are accessible to all services Paolo Costa CamCube - Rethinking the Data Center Cluster 24/54
25 Fetch mail costa:1234 (0,0,0) (1,0,0) (2,0,0) Packets move from service to service and server to server Services can intercept and process packets along the way Keys are re-mapped in case of failures Server resources (RAM, Disks, CPUs) are accessible to all services Paolo Costa CamCube - Rethinking the Data Center Cluster 25/54
26 Fetch mail costa:1234 (0,0,0) (1,0,0) (2,0,0) Packets move from service to service and server to server Services can intercept and process packets along the way Keys are re-mapped in case of failures Server resources (RAM, Disks, CPUs) are accessible to all services Paolo Costa CamCube - Rethinking the Data Center Cluster 26/54
27 Fetch mail costa:1234 (0,0,0) (1,0,0) (2,0,0) Packets move from service to service and server to server Services can intercept and process packets along the way Keys are re-mapped in case of failures Server resources (RAM, Disks, CPUs) are accessible to all services Paolo Costa CamCube - Rethinking the Data Center Cluster 27/54
28 (0,0,0) (1,0,0) (2,0,0) Packets move from service to service and server to server Services can intercept and process packets along the way Keys are re-mapped in case of failures Server resources (RAM, Disks, CPUs) are accessible to all services Paolo Costa CamCube - Rethinking the Data Center Cluster 28/54
29 (1,1,0) (0,0,0) (1,0,0) (2,0,0) Packets move from service to service and server to server Services can intercept and process packets along the way Keys are re-mapped in case of failures Server resources (RAM, Disks, CPUs) are accessible to all services Paolo Costa CamCube - Rethinking the Data Center Cluster 29/54
30 (1,1,0) (0,0,0) (1,0,0) (2,0,0) Packets move from service to service and server to server Services can intercept and process packets along the way Keys are re-mapped in case of failures Server resources (RAM, Disks, CPUs) are accessible to all services Paolo Costa CamCube - Rethinking the Data Center Cluster 30/54
31 (1,1,0) (0,0,0) (1,0,0) (2,0,0) Packets move from service to service and server to server Services can intercept and process packets along the way Keys are re-mapped in case of failures Server resources (RAM, Disks, CPUs) are accessible to all services Paolo Costa CamCube - Rethinking the Data Center Cluster 31/54
32 (0,0,0) (1,0,0) (2,0,0) Packets move from service to service and server to server Services can intercept and process packets along the way Keys are re-mapped in case of failures Server resources (RAM, Disks, CPUs) are accessible to all services Paolo Costa CamCube - Rethinking the Data Center Cluster 32/54
33 27-server (3x3x3) prototype quad-core Intel Xeon Ghz 12GB RAM 6 Intel PRO/1000 PT 1 Gbps ports Runtime & services implemented in C# Multicast, key-value store, Reduce, graph processing engine, Experiment all servers saturating 6 links Gbps (9K pkts) using < 22% CPU Paolo Costa CamCube - Rethinking the Data Center Cluster 33/54
34 Reduce originally developed by Google open-source version (Hadoop) used by several companies Yahoo, Facebook, Twitter, Amazon, many applications data mining, search, indexing Simple abstraction map: apply a function that generates (key, value) pairs reduce: combine intermediate results Paolo Costa CamCube - Rethinking the Data Center Cluster 34/54
35 Reduce Paolo Costa CamCube - Rethinking the Data Center Cluster 35/54
36 Results stored on a single node Can handle all reduce functions Bottleneck (network and CPU) Reduce Reduce Paolo Costa CamCube - Rethinking the Data Center Cluster 36/54
37 CPU & network load distributed N 2 flows: needs full bisection bw results are split across R servers not always applicable (e.g., max) Reduce Reduce Reduce Reduce Reduce Reduce Reduce Reduce Paolo Costa CamCube - Rethinking the Data Center Cluster 37/54
38 Reduce Paolo Costa CamCube - Rethinking the Data Center Cluster 38/54
39 Use hierarchical aggregation! Reduce Paolo Costa CamCube - Rethinking the Data Center Cluster 39/54
40 Paolo Costa CamCube - Rethinking the Data Center Cluster 40/54
41 Intermediate results can be aggregated at each hop Aggregate Aggregate Aggregate Paolo Costa CamCube - Rethinking the Data Center Cluster 41/54
42 Paolo Costa CamCube - Rethinking the Data Center Cluster 42/54
43 In-network aggregation Paolo Costa CamCube - Rethinking the Data Center Cluster 43/54
44 Traffic reduced at each hop No need for full bisection bw Final results stored on a single node In-network aggregation Paolo Costa CamCube - Rethinking the Data Center Cluster 44/54
45 Camdoop builds 1 tree failure resilient (up to) 6 Gbps throughput / server Paolo Costa CamCube - Rethinking the Data Center Cluster 45/54
46 Camdoop builds 1 tree 6 disjoint trees failure resilient (up to) 6 Gbps throughput / server Paolo Costa CamCube - Rethinking the Data Center Cluster 46/54
47 27-server testbed 27 GB input size 1GB / server Output size/ intermediate size (S) S=1/N 0 (full aggregation) S=1 (no aggregation) Two configurations: All-to-one All-to-all Paolo Costa CamCube - Rethinking the Data Center Cluster 47/54
48 Time (s) logscale Worse switch 10 Camdoop (no agg) Camdoop Switch 1 Gbps (bound) Full aggregation Output size / intermediate size (S) No aggregation Better Paolo Costa CamCube - Rethinking the Data Center Cluster 48/54
49 Time (s) logscale Impact of aggregation Performance independent of S Impact of higher bandwidth Worse Better 10 Facebook reported aggregation ratio switch Camdoop (no agg) Camdoop Switch 1 Gbps (bound) Full aggregation Output size / intermediate size (S) No aggregation Paolo Costa CamCube - Rethinking the Data Center Cluster 49/54
50 Time (s) logscale Worse switch Camdoop (no agg) Camdoop Switch 1 Gbps (bound) Output size / intermediate size (S) Full aggregation No aggregation Better Paolo Costa CamCube - Rethinking the Data Center Cluster 50/54
51 Time (s) logscale Impact of higher bandwidth Worse 1000 Impact of aggregation 100 switch Camdoop (no agg) Camdoop Switch 1 Gbps (bound) Output size / intermediate size (S) Full aggregation No aggregation Better Paolo Costa CamCube - Rethinking the Data Center Cluster 51/54
52 Time (ms) Worse switch Camdoop (no agg) Camdoop Better 0 20 KB 200 KB Input data size / server Paolo Costa CamCube - Rethinking the Data Center Cluster 52/54
53 Time (ms) In-network aggregation beneficial also for low-latency services Worse switch Camdoop (no agg) Camdoop Better 0 20 KB 200 KB Input data size / server Paolo Costa CamCube - Rethinking the Data Center Cluster 53/54
54 Building large-scale services is hard mismatch between physical and logical topology the network is a black box CamCube designed from the perspective of developers explicit topology and key-based abstraction Benefits simplified development more efficient applications and better fault-tolerance H. Abu-Libdeh, P. Costa, A. Rowstron, G. O'Shea, and A. Donnelly. Symbiotic Routing for Future Data Centres, In ACM SIGCOMM, P. Costa, A. Donnelly, A. Rowstron, and G. O'Shea. Camdoop: Exploiting Innetwork Aggregation for Big Data Applications, In USENIX NSDI, 2012.
55 Paolo Costa
Camdoop: Exploiting In-network Aggregation for Big Data Applications
: Exploiting In-network Aggregation for Big Data Applications Paolo Costa Austin Donnelly Antony Rowstron Greg O Shea Microsoft Research Cambridge Imperial College London Abstract Large companies like
Towards Rack-scale Computing Challenges and Opportunities
Towards Rack-scale Computing Challenges and Opportunities Paolo Costa [email protected] joint work with Raja Appuswamy, Hitesh Ballani, Sergey Legtchenko, Dushyanth Narayanan, Ant Rowstron Hardware
Operating Systems. Cloud Computing and Data Centers
Operating ystems Fall 2014 Cloud Computing and Data Centers Myungjin Lee [email protected] 2 Google data center locations 3 A closer look 4 Inside data center 5 A datacenter has 50-250 containers A
Big Data and Apache Hadoop s MapReduce
Big Data and Apache Hadoop s MapReduce Michael Hahsler Computer Science and Engineering Southern Methodist University January 23, 2012 Michael Hahsler (SMU/CSE) Hadoop/MapReduce January 23, 2012 1 / 23
Advanced Computer Networks. Datacenter Network Fabric
Advanced Computer Networks 263 3501 00 Datacenter Network Fabric Patrick Stuedi Spring Semester 2014 Oriana Riva, Department of Computer Science ETH Zürich 1 Outline Last week Today Supercomputer networking
Load Balancing Mechanisms in Data Center Networks
Load Balancing Mechanisms in Data Center Networks Santosh Mahapatra Xin Yuan Department of Computer Science, Florida State University, Tallahassee, FL 33 {mahapatr,xyuan}@cs.fsu.edu Abstract We consider
Network-Aware Scheduling of MapReduce Framework on Distributed Clusters over High Speed Networks
Network-Aware Scheduling of MapReduce Framework on Distributed Clusters over High Speed Networks Praveenkumar Kondikoppa, Chui-Hui Chiu, Cheng Cui, Lin Xue and Seung-Jong Park Department of Computer Science,
Hadoop Architecture. Part 1
Hadoop Architecture Part 1 Node, Rack and Cluster: A node is simply a computer, typically non-enterprise, commodity hardware for nodes that contain data. Consider we have Node 1.Then we can add more nodes,
How To Scale Out Of A Nosql Database
Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 [email protected] www.scch.at Michael Zwick DI
Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms
Distributed File System 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributed File System Don t move data to workers move workers to the data! Store data on the local disks of nodes
Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software
WHITEPAPER Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software SanDisk ZetaScale software unlocks the full benefits of flash for In-Memory Compute and NoSQL applications
Big Data in the Enterprise: Network Design Considerations
White Paper Big Data in the Enterprise: Network Design Considerations What You Will Learn This document examines the role of big data in the enterprise as it relates to network design considerations. It
Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging
Outline High Performance Computing (HPC) Towards exascale computing: a brief history Challenges in the exascale era Big Data meets HPC Some facts about Big Data Technologies HPC and Big Data converging
Hadoop IST 734 SS CHUNG
Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to
Load Balancing in Data Center Networks
Load Balancing in Data Center Networks Henry Xu Computer Science City University of Hong Kong HKUST, March 2, 2015 Background Aggregator Aggregator Aggregator Worker Worker Worker Worker Low latency for
Comparative analysis of Google File System and Hadoop Distributed File System
Comparative analysis of Google File System and Hadoop Distributed File System R.Vijayakumari, R.Kirankumar, K.Gangadhara Rao Dept. of Computer Science, Krishna University, Machilipatnam, India, [email protected]
A Study on Workload Imbalance Issues in Data Intensive Distributed Computing
A Study on Workload Imbalance Issues in Data Intensive Distributed Computing Sven Groot 1, Kazuo Goda 1, and Masaru Kitsuregawa 1 University of Tokyo, 4-6-1 Komaba, Meguro-ku, Tokyo 153-8505, Japan Abstract.
From GWS to MapReduce: Google s Cloud Technology in the Early Days
Large-Scale Distributed Systems From GWS to MapReduce: Google s Cloud Technology in the Early Days Part II: MapReduce in a Datacenter COMP6511A Spring 2014 HKUST Lin Gu [email protected] MapReduce/Hadoop
Computer Networks COSC 6377
Computer Networks COSC 6377 Lecture 25 Fall 2011 November 30, 2011 1 Announcements Grades will be sent to each student for verificagon P2 deadline extended 2 Large- scale computagon Search Engine Tasks
Large-Scale Data Processing
Large-Scale Data Processing Eiko Yoneki [email protected] http://www.cl.cam.ac.uk/~ey204 Systems Research Group University of Cambridge Computer Laboratory 2010s: Big Data Why Big Data now? Increase
Network Virtualization and Data Center Networks 263-3825-00 Data Center Virtualization - Basics. Qin Yin Fall Semester 2013
Network Virtualization and Data Center Networks 263-3825-00 Data Center Virtualization - Basics Qin Yin Fall Semester 2013 1 Walmart s Data Center 2 Amadeus Data Center 3 Google s Data Center 4 Data Center
Networking Virtualization Using FPGAs
Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Massachusetts,
NoSQL Data Base Basics
NoSQL Data Base Basics Course Notes in Transparency Format Cloud Computing MIRI (CLC-MIRI) UPC Master in Innovation & Research in Informatics Spring- 2013 Jordi Torres, UPC - BSC www.jorditorres.eu HDFS
Optimizing Data Center Networks for Cloud Computing
PRAMAK 1 Optimizing Data Center Networks for Cloud Computing Data Center networks have evolved over time as the nature of computing changed. They evolved to handle the computing models based on main-frames,
DEPLOYING AND MONITORING HADOOP MAP-REDUCE ANALYTICS ON SINGLE-CHIP CLOUD COMPUTER
DEPLOYING AND MONITORING HADOOP MAP-REDUCE ANALYTICS ON SINGLE-CHIP CLOUD COMPUTER ANDREAS-LAZAROS GEORGIADIS, SOTIRIOS XYDIS, DIMITRIOS SOUDRIS MICROPROCESSOR AND MICROSYSTEMS LABORATORY ELECTRICAL AND
Shared Parallel File System
Shared Parallel File System Fangbin Liu [email protected] System and Network Engineering University of Amsterdam Shared Parallel File System Introduction of the project The PVFS2 parallel file system
Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware
Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware Created by Doug Cutting and Mike Carafella in 2005. Cutting named the program after
CSE-E5430 Scalable Cloud Computing Lecture 2
CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University [email protected] 14.9-2015 1/36 Google MapReduce A scalable batch processing
MMPTCP: A Novel Transport Protocol for Data Centre Networks
MMPTCP: A Novel Transport Protocol for Data Centre Networks Morteza Kheirkhah FoSS, Department of Informatics, University of Sussex Modern Data Centre Networks FatTree It provides full bisection bandwidth
Intro to Map/Reduce a.k.a. Hadoop
Intro to Map/Reduce a.k.a. Hadoop Based on: Mining of Massive Datasets by Ra jaraman and Ullman, Cambridge University Press, 2011 Data Mining for the masses by North, Global Text Project, 2012 Slides by
CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)
CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop) Rezaul A. Chowdhury Department of Computer Science SUNY Stony Brook Spring 2016 MapReduce MapReduce is a programming model
Network performance in virtual infrastructures
Network performance in virtual infrastructures A closer look at Amazon EC2 Alexandru-Dorin GIURGIU University of Amsterdam System and Network Engineering Master 03 February 2010 Coordinators: Paola Grosso
MapReduce and Hadoop Distributed File System V I J A Y R A O
MapReduce and Hadoop Distributed File System 1 V I J A Y R A O The Context: Big-data Man on the moon with 32KB (1969); my laptop had 2GB RAM (2009) Google collects 270PB data in a month (2007), 20000PB
Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack
Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack HIGHLIGHTS Real-Time Results Elasticsearch on Cisco UCS enables a deeper
What Is Datacenter (Warehouse) Computing. Distributed and Parallel Technology. Datacenter Computing Application Programming
Distributed and Parallel Technology Datacenter and Warehouse Computing Hans-Wolfgang Loidl School of Mathematical and Computer Sciences Heriot-Watt University, Edinburgh 0 Based on earlier versions by
Introduction to Big Data Training
Introduction to Big Data Training The quickest way to be introduce with NOSQL/BIG DATA offerings Learn and experience Big Data Solutions including Hadoop HDFS, Map Reduce, NoSQL DBs: Document Based DB
How swift is your Swift? Ning Zhang, OpenStack Engineer at Zmanda Chander Kant, CEO at Zmanda
How swift is your Swift? Ning Zhang, OpenStack Engineer at Zmanda Chander Kant, CEO at Zmanda 1 Outline Build a cost-efficient Swift cluster with expected performance Background & Problem Solution Experiments
How To Build A Policy Aware Switching Layer For Data Center Data Center Servers
A Policy-aware Switching Layer for Data Centers Dilip Joseph Arsalan Tavakoli Ion Stoica University of California at Berkeley 1 Problem: Middleboxes are hard to deploy Place on network path Overload path
Radhika Niranjan Mysore, Andreas Pamboris, Nathan Farrington, Nelson Huang, Pardis Miri, Sivasankar Radhakrishnan, Vikram Subramanya and Amin Vahdat
Radhika Niranjan Mysore, Andreas Pamboris, Nathan Farrington, Nelson Huang, Pardis Miri, Sivasankar Radhakrishnan, Vikram Subramanya and Amin Vahdat 1 PortLand In A Nutshell PortLand is a single logical
Data Center Networks
Data Center Networks (Lecture #3) 1/04/2010 Professor H. T. Kung Harvard School of Engineering and Applied Sciences Copyright 2010 by H. T. Kung Main References Three Approaches VL2: A Scalable and Flexible
DATA MINING WITH HADOOP AND HIVE Introduction to Architecture
DATA MINING WITH HADOOP AND HIVE Introduction to Architecture Dr. Wlodek Zadrozny (Most slides come from Prof. Akella s class in 2014) 2015-2025. Reproduction or usage prohibited without permission of
Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee [email protected] [email protected]
Hadoop Distributed File System Dhruba Borthakur Apache Hadoop Project Management Committee [email protected] [email protected] Hadoop, Why? Need to process huge datasets on large clusters of computers
Walmart s Data Center. Amadeus Data Center. Google s Data Center. Data Center Evolution 1.0. Data Center Evolution 2.0
Walmart s Data Center Network Virtualization and Data Center Networks 263-3825-00 Data Center Virtualization - Basics Qin Yin Fall emester 2013 1 2 Amadeus Data Center Google s Data Center 3 4 Data Center
Cloud Computing Where ISR Data Will Go for Exploitation
Cloud Computing Where ISR Data Will Go for Exploitation 22 September 2009 Albert Reuther, Jeremy Kepner, Peter Michaleas, William Smith This work is sponsored by the Department of the Air Force under Air
Data Center Network Topologies: FatTree
Data Center Network Topologies: FatTree Hakim Weatherspoon Assistant Professor, Dept of Computer Science CS 5413: High Performance Systems and Networking September 22, 2014 Slides used and adapted judiciously
MapReduce with Apache Hadoop Analysing Big Data
MapReduce with Apache Hadoop Analysing Big Data April 2010 Gavin Heavyside [email protected] About Journey Dynamics Founded in 2006 to develop software technology to address the issues
Advanced Computer Networks. Scheduling
Oriana Riva, Department of Computer Science ETH Zürich Advanced Computer Networks 263-3501-00 Scheduling Patrick Stuedi, Qin Yin and Timothy Roscoe Spring Semester 2015 Outline Last time Load balancing
Comparative analysis of mapreduce job by keeping data constant and varying cluster size technique
Comparative analysis of mapreduce job by keeping data constant and varying cluster size technique Mahesh Maurya a, Sunita Mahajan b * a Research Scholar, JJT University, MPSTME, Mumbai, India,[email protected]
VON/K: A Fast Virtual Overlay Network Embedded in KVM Hypervisor for High Performance Computing
Journal of Information & Computational Science 9: 5 (2012) 1273 1280 Available at http://www.joics.com VON/K: A Fast Virtual Overlay Network Embedded in KVM Hypervisor for High Performance Computing Yuan
Can High-Performance Interconnects Benefit Memcached and Hadoop?
Can High-Performance Interconnects Benefit Memcached and Hadoop? D. K. Panda and Sayantan Sur Network-Based Computing Laboratory Department of Computer Science and Engineering The Ohio State University,
Portland: how to use the topology feature of the datacenter network to scale routing and forwarding
LECTURE 15: DATACENTER NETWORK: TOPOLOGY AND ROUTING Xiaowei Yang 1 OVERVIEW Portland: how to use the topology feature of the datacenter network to scale routing and forwarding ElasticTree: topology control
Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck
Sockets vs. RDMA Interface over 1-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck Pavan Balaji Hemal V. Shah D. K. Panda Network Based Computing Lab Computer Science and Engineering
Cisco s Massively Scalable Data Center
Cisco s Massively Scalable Data Center Network Fabric for Warehouse Scale Computer At-A-Glance Datacenter is the Computer MSDC is the Network Cisco s Massively Scalable Data Center (MSDC) is a framework
Comparing the Network Performance of Windows File Sharing Environments
Technical Report Comparing the Network Performance of Windows File Sharing Environments Dan Chilton, Srinivas Addanki, NetApp September 2010 TR-3869 EXECUTIVE SUMMARY This technical report presents the
Jeffrey D. Ullman slides. MapReduce for data intensive computing
Jeffrey D. Ullman slides MapReduce for data intensive computing Single-node architecture CPU Machine Learning, Statistics Memory Classical Data Mining Disk Commodity Clusters Web data sets can be very
Lecture 7: Data Center Networks"
Lecture 7: Data Center Networks" CSE 222A: Computer Communication Networks Alex C. Snoeren Thanks: Nick Feamster Lecture 7 Overview" Project discussion Data Centers overview Fat Tree paper discussion CSE
Question: 3 When using Application Intelligence, Server Time may be defined as.
1 Network General - 1T6-521 Application Performance Analysis and Troubleshooting Question: 1 One component in an application turn is. A. Server response time B. Network process time C. Application response
GraySort and MinuteSort at Yahoo on Hadoop 0.23
GraySort and at Yahoo on Hadoop.23 Thomas Graves Yahoo! May, 213 The Apache Hadoop[1] software library is an open source framework that allows for the distributed processing of large data sets across clusters
The Comprehensive Performance Rating for Hadoop Clusters on Cloud Computing Platform
The Comprehensive Performance Rating for Hadoop Clusters on Cloud Computing Platform Fong-Hao Liu, Ya-Ruei Liou, Hsiang-Fu Lo, Ko-Chin Chang, and Wei-Tsong Lee Abstract Virtualization platform solutions
Large-Scale Distributed Systems. Datacenter Networks. COMP6511A Spring 2014 HKUST. Lin Gu [email protected]
Large-Scale Distributed Systems Datacenter Networks COMP6511A Spring 2014 HKUST Lin Gu [email protected] Datacenter Networking Major Components of a Datacenter Computing hardware (equipment racks) Power supply
SMB Direct for SQL Server and Private Cloud
SMB Direct for SQL Server and Private Cloud Increased Performance, Higher Scalability and Extreme Resiliency June, 2014 Mellanox Overview Ticker: MLNX Leading provider of high-throughput, low-latency server
SMB Advanced Networking for Fault Tolerance and Performance. Jose Barreto Principal Program Managers Microsoft Corporation
SMB Advanced Networking for Fault Tolerance and Performance Jose Barreto Principal Program Managers Microsoft Corporation Agenda SMB Remote File Storage for Server Apps SMB Direct (SMB over RDMA) SMB Multichannel
Application Development. A Paradigm Shift
Application Development for the Cloud: A Paradigm Shift Ramesh Rangachar Intelsat t 2012 by Intelsat. t Published by The Aerospace Corporation with permission. New 2007 Template - 1 Motivation for the
B4: Experience with a Globally-Deployed Software Defined WAN TO APPEAR IN SIGCOMM 13
B4: Experience with a Globally-Deployed Software Defined WAN TO APPEAR IN SIGCOMM 13 Google s Software Defined WAN Traditional WAN Routing Treat all bits the same 30% ~ 40% average utilization Cost of
Transparent and Flexible Network Management for Big Data Processing in the Cloud
Transparent and Flexible Network Management for Big Data Processing in the Cloud Anupam Das, Cristian Lumezanu, Yueping Zhang, Vishal Singh, Guofei Jiang, Curtis Yu UIUC NEC Labs UC Riverside Abstract
Chapter 7. Using Hadoop Cluster and MapReduce
Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in
A Comparative Study of Data Center Network Architectures
A Comparative Study of Data Center Network Architectures Kashif Bilal Fargo, ND 58108, USA [email protected] Limin Zhang Fargo, ND 58108, USA [email protected] Nasro Min-Allah COMSATS Institute
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, [email protected] Assistant Professor, Information
Longer is Better? Exploiting Path Diversity in Data Centre Networks
Longer is Better? Exploiting Path Diversity in Data Centre Networks Fung Po (Posco) Tso, Gregg Hamilton, Rene Weber, Colin S. Perkins and Dimitrios P. Pezaros University of Glasgow Cloud Data Centres Are
Data Center Networking with Multipath TCP
Data Center Networking with Multipath TCP Costin Raiciu, Christopher Pluntke, Sebastien Barre, Adam Greenhalgh, Damon Wischik, Mark Handley Hotnets 2010 報 告 者 : 莊 延 安 Outline Introduction Analysis Conclusion
Data-Intensive Computing with Map-Reduce and Hadoop
Data-Intensive Computing with Map-Reduce and Hadoop Shamil Humbetov Department of Computer Engineering Qafqaz University Baku, Azerbaijan [email protected] Abstract Every day, we create 2.5 quintillion
Software Defined Network Support for Real Distributed Systems
Software Defined Network Support for Real Distributed Systems Chen Liang Project for Reading and Research Spring 2012, Fall 2012 Abstract Software defined network is an emerging technique that allow users
Performance characterization report for Microsoft Hyper-V R2 on HP StorageWorks P4500 SAN storage
Performance characterization report for Microsoft Hyper-V R2 on HP StorageWorks P4500 SAN storage Technical white paper Table of contents Executive summary... 2 Introduction... 2 Test methodology... 3
MapReduce and Hadoop Distributed File System
MapReduce and Hadoop Distributed File System 1 B. RAMAMURTHY Contact: Dr. Bina Ramamurthy CSE Department University at Buffalo (SUNY) [email protected] http://www.cse.buffalo.edu/faculty/bina Partially
The Performance Characteristics of MapReduce Applications on Scalable Clusters
The Performance Characteristics of MapReduce Applications on Scalable Clusters Kenneth Wottrich Denison University Granville, OH 43023 [email protected] ABSTRACT Many cluster owners and operators have
CS246: Mining Massive Datasets Jure Leskovec, Stanford University. http://cs246.stanford.edu
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 2 CPU Memory Machine Learning, Statistics Classical Data Mining Disk 3 20+ billion web pages x 20KB = 400+ TB
Multipath TCP in Data Centres (work in progress)
Multipath TCP in Data Centres (work in progress) Costin Raiciu Joint work with Christopher Pluntke, Adam Greenhalgh, Sebastien Barre, Mark Handley, Damon Wischik Data Centre Trends Cloud services are driving
Yahoo! Cloud Serving Benchmark
Yahoo! Cloud Serving Benchmark Overview and results March 31, 2010 Brian F. Cooper [email protected] Joint work with Adam Silberstein, Erwin Tam, Raghu Ramakrishnan and Russell Sears System setup and
PortLand:! A Scalable Fault-Tolerant Layer 2 Data Center Network Fabric
PortLand:! A Scalable Fault-Tolerant Layer 2 Data Center Network Fabric Radhika Niranjan Mysore, Andreas Pamboris, Nathan Farrington, Nelson Huang, Pardis Miri, Sivasankar Radhakrishnan, Vikram Subramanya,
Hadoop and Map-Reduce. Swati Gore
Hadoop and Map-Reduce Swati Gore Contents Why Hadoop? Hadoop Overview Hadoop Architecture Working Description Fault Tolerance Limitations Why Map-Reduce not MPI Distributed sort Why Hadoop? Existing Data
Hadoop Distributed File System. T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela
Hadoop Distributed File System T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela Agenda Introduction Flesh and bones of HDFS Architecture Accessing data Data replication strategy Fault tolerance
A Performance Analysis of Distributed Indexing using Terrier
A Performance Analysis of Distributed Indexing using Terrier Amaury Couste Jakub Kozłowski William Martin Indexing Indexing Used by search
MANAGEMENT OF DATA REPLICATION FOR PC CLUSTER BASED CLOUD STORAGE SYSTEM
MANAGEMENT OF DATA REPLICATION FOR PC CLUSTER BASED CLOUD STORAGE SYSTEM Julia Myint 1 and Thinn Thu Naing 2 1 University of Computer Studies, Yangon, Myanmar [email protected] 2 University of Computer
OpenFlow and Onix. OpenFlow: Enabling Innovation in Campus Networks. The Problem. We also want. How to run experiments in campus networks?
OpenFlow and Onix Bowei Xu [email protected] [1] McKeown et al., "OpenFlow: Enabling Innovation in Campus Networks," ACM SIGCOMM CCR, 38(2):69-74, Apr. 2008. [2] Koponen et al., "Onix: a Distributed Control
Data Center Network Topologies: VL2 (Virtual Layer 2)
Data Center Network Topologies: VL2 (Virtual Layer 2) Hakim Weatherspoon Assistant Professor, Dept of Computer cience C 5413: High Performance ystems and Networking eptember 26, 2014 lides used and adapted
Data Center Network Architectures
Servers Servers Servers Data Center Network Architectures Juha Salo Aalto University School of Science and Technology [email protected] Abstract Data centers have become increasingly essential part of
Towards Predictable Datacenter Networks
Towards Predictable Datacenter Networks Hitesh Ballani, Paolo Costa, Thomas Karagiannis and Ant Rowstron Microsoft Research, Cambridge This talk is about Guaranteeing network performance for tenants in
Coding Techniques for Efficient, Reliable Networked Distributed Storage in Data Centers
Coding Techniques for Efficient, Reliable Networked Distributed Storage in Data Centers Anwitaman Datta Joint work with Frédérique Oggier (SPMS) School of Computer Engineering Nanyang Technological University
Core and Pod Data Center Design
Overview The Core and Pod data center design used by most hyperscale data centers is a dramatically more modern approach than traditional data center network design, and is starting to be understood by
Windows Server 2008 R2 Hyper-V Live Migration
Windows Server 2008 R2 Hyper-V Live Migration Table of Contents Overview of Windows Server 2008 R2 Hyper-V Features... 3 Dynamic VM storage... 3 Enhanced Processor Support... 3 Enhanced Networking Support...
Self-Compressive Approach for Distributed System Monitoring
Self-Compressive Approach for Distributed System Monitoring Akshada T Bhondave Dr D.Y Patil COE Computer Department, Pune University, India Santoshkumar Biradar Assistant Prof. Dr D.Y Patil COE, Computer
Data Mining with Hadoop at TACC
Data Mining with Hadoop at TACC Weijia Xu Data Mining & Statistics Data Mining & Statistics Group Main activities Research and Development Developing new data mining and analysis solutions for practical
Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! [email protected]
Big Data Processing, 2014/15 Lecture 5: GFS & HDFS!! Claudia Hauff (Web Information Systems)! [email protected] 1 Course content Introduction Data streams 1 & 2 The MapReduce paradigm Looking behind
