Web Email DNS Peer-to-peer systems (file sharing, CDNs, cycle sharing)



Similar documents
1. Comments on reviews a. Need to avoid just summarizing web page asks you for:

Lecture 3: Scaling by Load Balancing 1. Comments on reviews i. 2. Topic 1: Scalability a. QUESTION: What are problems? i. These papers look at

RAID Storage, Network File Systems, and DropBox

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

Solbox Cloud Storage Acceleration

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

Chapter 11 Distributed File Systems. Distributed File Systems

Middleware and Distributed Systems. System Models. Dr. Martin v. Löwis. Freitag, 14. Oktober 11

Distributed Systems LEEC (2005/06 2º Sem.)

Content Delivery Networks. Shaxun Chen April 21, 2009

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Distributed Systems. 23. Content Delivery Networks (CDN) Paul Krzyzanowski. Rutgers University. Fall 2015

Distributed File Systems

Principles and characteristics of distributed systems and environments

Lectures 9 Advanced Operating Systems Fundamental Security. Computer Systems Administration TE2003

DFSgc. Distributed File System for Multipurpose Grid Applications and Cloud Computing

Data Management in an International Data Grid Project. Timur Chabuk 04/09/2007

In Memory Accelerator for MongoDB

Distribution transparency. Degree of transparency. Openness of distributed systems

Distributed File Systems

Data Consistency on Private Cloud Storage System

SCALABILITY AND AVAILABILITY

Distributed File Systems

Distributed Systems Lecture 1 1

Chapter 17: Distributed Systems

Simple Solution for a Location Service. Naming vs. Locating Entities. Forwarding Pointers (2) Forwarding Pointers (1)

MCSE Core exams (Networking) One Client OS Exam. Core Exams (6 Exams Required)

Distributed Systems. 25. Content Delivery Networks (CDN) 2014 Paul Krzyzanowski. Rutgers University. Fall 2014

System Models for Distributed and Cloud Computing

Distributed Systems. REK s adaptation of Prof. Claypool s adaptation of Tanenbaum s Distributed Systems Chapter 1

How To Understand The Power Of A Content Delivery Network (Cdn)

CHAPTER 7 SUMMARY AND CONCLUSION

Outline. Definition. Name spaces Name resolution Example: The Domain Name System Example: X.500, LDAP. Names, Identifiers and Addresses

P2P Storage Systems. Prof. Chun-Hsin Wu Dept. Computer Science & Info. Eng. National University of Kaohsiung

Diagram 1: Islands of storage across a digital broadcast workflow

I N T E R S Y S T E M S W H I T E P A P E R F O R F I N A N C I A L SERVICES EXECUTIVES. Deploying an elastic Data Fabric with caché

Cluster Computing. ! Fault tolerance. ! Stateless. ! Throughput. ! Stateful. ! Response time. Architectures. Stateless vs. Stateful.

Virtual machine interface. Operating system. Physical machine interface

Overlay Networks. Slides adopted from Prof. Böszörményi, Distributed Systems, Summer 2004.

The old Internet. Software in the Network: Outline. Traditional Design. 1) Basic Caching. The Arrival of Software (in the network)

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9a: Cloud Computing WHAT IS CLOUD COMPUTING? 2

Lecture 36: Chapter 6

ICP. Cache Hierarchies. Squid. Squid Cache ICP Use. Squid. Squid

G Porcupine. Robert Grimm New York University

Design and Evolution of the Apache Hadoop File System(HDFS)

LARGE SCALE INTERNET SERVICES

Big Data Management and NoSQL Databases

Scalability of web applications. CSCI 470: Web Science Keith Vertanen

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

How To Understand The Concept Of A Distributed System

Cloud Based Application Architectures using Smart Computing

Load Balancing in Distributed Data Base and Distributed Computing System

Network File System (NFS)

Operating Systems 4 th Class

Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand

Planning Domain Controller Capacity

Planning and Maintaining a Microsoft Windows Server Network Infrastructure

Chapter 18: Database System Architectures. Centralized Systems

Data Center Content Delivery Network

Multiple Public IPs (virtual service IPs) are supported either to cover multiple network segments or to increase network performance.

Big data management with IBM General Parallel File System

Department of Computer Science Institute for System Architecture, Chair for Computer Networks. Caching, Content Distribution and Load Balancing

Object Storage: A Growing Opportunity for Service Providers. White Paper. Prepared for: 2012 Neovise, LLC. All Rights Reserved.

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

Copyright 1

Hadoop Architecture. Part 1

Distributed Data Stores

Network File System (NFS) Pradipta De

The Cloud Trade Off IBM Haifa Research Storage Systems

Naming vs. Locating Entities

Measuring CDN Performance. Hooman Beheshti, VP Technology

Seminar Presentation for ECE 658 Instructed by: Prof.Anura Jayasumana Distributed File Systems

Apache Hadoop. Alexandru Costan

Highly Available Mobile Services Infrastructure Using Oracle Berkeley DB

Chapter 7: Distributed Systems: Warehouse-Scale Computing. Fall 2011 Jussi Kangasharju

Web Application Hosting Cloud Architecture

Distributed System: Definition

A SURVEY OF POPULAR CLUSTERING TECHNOLOGIES

Finding a needle in Haystack: Facebook s photo storage IBM Haifa Research Storage Systems

Deploying Riverbed wide-area data services in a LeftHand iscsi SAN Remote Disaster Recovery Solution

Client/server and peer-to-peer models: basic concepts

InterWorx Clustering Guide. by InterWorx LLC

HDFS Architecture Guide

A Content-Based Load Balancing Algorithm for Metadata Servers in Cluster File Systems*

POWER ALL GLOBAL FILE SYSTEM (PGFS)

A Scheme for Implementing Load Balancing of Web Server

Reference Model for Cloud Applications CONSIDERATIONS FOR SW VENDORS BUILDING A SAAS SOLUTION

TRACE PERFORMANCE TESTING APPROACH. Overview. Approach. Flow. Attributes

Centralized Systems. A Centralized Computer System. Chapter 18: Database System Architectures

CHAPTER 2 MODELLING FOR DISTRIBUTED NETWORK SYSTEMS: THE CLIENT- SERVER MODEL

Transactions and ACID in MongoDB

Transcription:

1 1 Distributed Systems What are distributed systems? How would you characterize them? Components of the system are located at networked computers Cooperate to provide some service No shared memory Communication (sending messages): delays, unreliable No common clock (makes coordination much more difficult) Independent node failure modes Hardware/software heterogeneity (e.g., little/big endian) Concurrency (an issue that was already present in an OS). Means that we can scale the capacity of the system by adding more resources. Examples of distributed systems? Web Email DNS Peer-to-peer systems (file sharing, CDNs, cycle sharing)

Operating Systems Summer 2008 2 Distributed File Systems (NFS,...) Sensor Networks Akamai CDN Why do we care about them? What is their purpose? What do we gain from them? Resource sharing (e.g., shared file server) Overcome geographic separation (e.g., web) Increase reliability (build reliable systems from unreliable components) Aggregate resources for high capacity (cycles: SETI@home, bandwidth: BitTorrent, disks: Freenet, Amazon s S3) Challenges. System design how to partition responsibility among different nodes. How to coordinate them? Design distributed protocols. Failures communication, hw, sw. Ideally, distributed system should appear as local system that never fails. In practice, impossible to achieve. Security adversary may compromise machines, manipulate messages, subvert protocols

Operating Systems Summer 2008 3 Performance sometimes with additional constrains: wide-area network latencies, sensor networks have energy constrains, message loss, failures. Scalability E.g., a cluster of file servers. How should throughput vs. number of machines curve look like? In practice you hit bottlenecks, then how should it look like? Need to avoid phenomena like thrashing. Topic: System Design Example: distributed file system. Many clients, access file server over a network, which then stores files on local disk. Operations? Read file, Write file, Create, Move, etc. Problem: server becomes overloaded. Solutions? 1. Caching at the client. Read files recently written. Problem? With concurrency, cache entries may become stale. Solutions? a. Locking + Invalidations from server. Problem: what if messages take too long to arrive at the client? Will server wait? What if client crashed? Possible solution: Leases. b. Client checks last modified time before reading cached copy. Tradeoff? More simple and robust, less efficient. What about writes? Write-through or write-back policy? 2. Partition state onto 2 servers. How? E.g. different users can be assigned different servers. But how to ensure good load balancing?

Operating Systems Summer 2008 4 Statically? And what if suddenly some files become much more popular? Topic: Naming And how does client refer to remote files? Need some name that is translated into a file object. How to choose these names? Example: A trick similar to web is to embedd server name in the file name, so a name from standpoint of client is: /dist-fs/x.uni-sb.de/a.txt Problem? What if location changes? Alternative: use a directory server /dist-fs/myserver/a.txt Directory server translates name to address of file server What if directory server becomes overloaded? Caching also helps, but can also use a hierarchical namespace. Do you know examples of hierarchical namespaces? Topic: Fault Tolerance Back to single server design, how to survive faults?

Operating Systems Summer 2008 5 Solution: replicate data. Problem: replica consistency. Don t want to delete file and let it reappear. Problem: how to ensure failure independence? Problem: availability in presence of a partition vs. consistency? Topic: security Most of the problems were mentioned in last lectures, but now have to be solved in a distributed setting. E.g., authentication. When I give my credit card number to a company s website, how do I ensure it s really that website I m talking to? PKIs can help, but need to bootstrap trust. Also many new problems like DDoS / bot nets. Topic: p2p What if I want to store my files remotely (e.g., for backup) but cannot afford server infrastructure? Peer-to-peer computing can provide a cooperative, volunteer-based solution. These systems can have huge number of nodes. Are self-organizing. Usually no distinction between clients and servers (nodes have to do a bit of both).

Operating Systems Summer 2008 6 Basic problem: how to I partition data? Then how do I locate it? Real-world problems: e.g., free-riding, need incentives. Sybil attacks, need stronger identities or high barrier to entering the system.