Imagine we re building a web service that has a two-tier architecture:

Similar documents
Multiprocessor Cache Coherence

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

CHAPTER 6: DISTRIBUTED FILE SYSTEMS

ZooKeeper. Table of contents

Web DNS Peer-to-peer systems (file sharing, CDNs, cycle sharing)

Distributed Systems. REK s adaptation of Prof. Claypool s adaptation of Tanenbaum s Distributed Systems Chapter 1

Distributed File Systems

Transactions and ACID in MongoDB

Chapter 12: Multiprocessor Architectures. Lesson 09: Cache Coherence Problem and Cache synchronization solutions Part 1

Database Replication with Oracle 11g and MS SQL Server 2008

Cluster Computing. ! Fault tolerance. ! Stateless. ! Throughput. ! Stateful. ! Response time. Architectures. Stateless vs. Stateful.

The Quest for Speed - Memory. Cache Memory. A Solution: Memory Hierarchy. Memory Hierarchy

Multi-core architectures. Jernej Barbic , Spring 2007 May 3, 2007

Recovery and the ACID properties CMPUT 391: Implementing Durability Recovery Manager Atomicity Durability

Distributed Software Systems

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

Massive Data Storage

1. Comments on reviews a. Need to avoid just summarizing web page asks you for:

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Copyright 1

Distributed File Systems

The Sierra Clustered Database Engine, the technology at the heart of

Geo-Replication in Large-Scale Cloud Computing Applications

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

Why Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat

Vorlesung Rechnerarchitektur 2 Seite 178 DASH

Distributed Data Management

CSE 160 Lecture 5. The Memory Hierarchy False Sharing Cache Coherence and Consistency. Scott B. Baden

Centralized Systems. A Centralized Computer System. Chapter 18: Database System Architectures

SAN Conceptual and Design Basics

Network File System (NFS) Pradipta De

Scalability of web applications. CSCI 470: Web Science Keith Vertanen

Principles and characteristics of distributed systems and environments

Developing Scalable Java Applications with Cacheonix

Lecture 3: Scaling by Load Balancing 1. Comments on reviews i. 2. Topic 1: Scalability a. QUESTION: What are problems? i. These papers look at

Big data management with IBM General Parallel File System

Chapter 11 Distributed File Systems. Distributed File Systems

CAP Theorem and Distributed Database Consistency. Syed Akbar Mehdi Lara Schmidt

High Availability for Database Systems in Cloud Computing Environments. Ashraf Aboulnaga University of Waterloo

Brewer s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services

Distributed systems Lecture 6: Elec3ons, consensus, and distributed transac3ons. Dr Robert N. M. Watson

Principles of Computer System Design An Introduction

Original-page small file oriented EXT3 file storage system

Lecture 23: Multiprocessors

The Google File System

Reliable Adaptable Network RAM

Stress Testing Technologies for Citrix MetaFrame. Michael G. Norman, CEO December 5, 2001

Distributed Data Stores

What is a bus? A Bus is: Advantages of Buses. Disadvantage of Buses. Master versus Slave. The General Organization of a Bus

Lesson Objectives. To provide a grand tour of the major operating systems components To provide coverage of basic computer system organization

Cloud Computing at Google. Architecture

TIBCO ActiveSpaces Use Cases How in-memory computing supercharges your infrastructure

Google File System. Web and scalability

Avoid a single point of failure by replicating the server Increase scalability by sharing the load among replicas

Distributed Databases

RADOS: A Scalable, Reliable Storage Service for Petabyte- scale Storage Clusters

Chapter 18: Database System Architectures. Centralized Systems

RAMCloud and the Low- Latency Datacenter. John Ousterhout Stanford University

Configuring Apache Derby for Performance and Durability Olav Sandstå

More on Session State

DFSgc. Distributed File System for Multipurpose Grid Applications and Cloud Computing

SanDisk ION Accelerator High Availability

The Oracle Universal Server Buffer Manager

WHITE PAPER The Storage Holy Grail: Decoupling Performance from Capacity

Internet Content Distribution

This paper defines as "Classical"

Agenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance.

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

How To Improve Performance On A Single Chip Computer

Distributed Systems. Examples. Advantages and disadvantages. CIS 505: Software Systems. Introduction to Distributed Systems

Load Balancing in the Cloud Computing Using Virtual Machine Migration: A Review

CHAPTER 2 MODELLING FOR DISTRIBUTED NETWORK SYSTEMS: THE CLIENT- SERVER MODEL

V3 Storage Architecture Overview and Implications for VDI. May 2016

How To Make A Distributed System Transparent

Special Relativity and the Problem of Database Scalability

Project #2. CSE 123b Communications Software. HTTP Messages. HTTP Basics. HTTP Request. HTTP Request. Spring Four parts

Client/Server and Distributed Computing

Bigdata High Availability (HA) Architecture

Course Content. Transactions and Concurrency Control. Objectives of Lecture 4 Transactions and Concurrency Control

Synchronization in. Distributed Systems. Cooperation and Coordination in. Distributed Systems. Kinds of Synchronization.

Data Consistency on Private Cloud Storage System

Distribution transparency. Degree of transparency. Openness of distributed systems

Central vs. Distributed Systems

Virtual machine interface. Operating system. Physical machine interface

Analyzing IBM i Performance Metrics

Name: 1. CS372H: Spring 2009 Final Exam

Concurrency control. Concurrency problems. Database Management System

Distributed File Systems. Chapter 10

Virtual Shared Memory (VSM)

Social Networking at Scale. Sanjeev Kumar Facebook

Chapter 10: Virtual Memory. Lesson 08: Demand Paging and Page Swapping

Chapter 17: Distributed Systems

Consistency Trade-offs for SDN Controllers. Colin Dixon, IBM February 5, 2014

Distributed Systems LEEC (2005/06 2º Sem.)

Parallel Algorithm Engineering

(Pessimistic) Timestamp Ordering. Rules for read and write Operations. Pessimistic Timestamp Ordering. Write Operations and Timestamps

This talk is mostly about Data Center Replication, but along the way we'll have to talk about why you'd want transactionality arnd the Low-Level API.

Using NCache for ASP.NET Sessions in Web Farms

GIVE YOUR ORACLE DBAs THE BACKUPS THEY REALLY WANT

Transcription:

Overview of coherence and consistency Coherence - motivation Imagine we re building a web service that has a two-tier architecture: a web FE that receives requests from clients and generates HTML a that the web FE interrogates in order to populate the HTML with data Advantages of this design: FE can be stateless; means that it is easy to recover from failure just by restarting it provides you with strong durability and consistency guarantees web FE Network Disadvantages: is over the network, so may become a performance bottleneck o the latency of a lookup is impacted by the network path and the complexity of the database, especially if involves hitting the disk o the bandwidth of the service is impacted by the network bandwidth, and the bandwidth of the database Idea: introduce a cache on the same machine as the web FE cache stores data in memory, not on disk, so is fast cache is resident on the same machine as the FE, so has low latency and high bandwidth all reads and writes go through the cache; assuming there is locality in the workload, cache capacity can be less than capacity and things are still great Issues that are created: writes eventually need to propagate from the cache to the database for durability o write-through: slow but safe o write-back: fast but unsafe need high hit rate to be effective o e.g., cache needs to warm up after restart consistency problems? no reads see most recent write need an eviction policy web FE $ Network

Eventually, the single FE will become a bottleneck of the system, so want to scale up by adding more FEs. Also get the fringe benefit of some fault tolerance; clients can contact any FE to get service. Issues that are created: entries in a cache can become stale if you re not careful, causing a read to see old data o write-through: most recent write is always in o write-back: most recent write could be in somebody else s cache multiple operations can show up at concurrently o need to make writes atomic prevent two writes to the same record from resulting in a garbled record if propagate writes across caches, operations can show up in caches in different orders o FE 1 sees update A then update B o FE 2 sees update B then update A web FE $ web FE $ Network web FE $ Sidebar: keeping the truth durably in one place (a single ) creates a single point of failure. Idea: replicate the for fault tolerance and higher capacity! Issues that are created: a write operation needs to propagate to multiple replicas writes are expensive need to preserve the order of updates at the replicas o otherwise they will diverge, leading to confusion. Basically the same issue as in the caching case. if you require all the replicas to be available when a write is satisfied, the overall reliability of the system goes down! o more likely to have a component in a failed state the more components you have o but, if you don t, then some replicas will web FE replica miss updates consensus protocols to solve topic for a later day web FE Network replica web FE replica

Coherence The classic problem is to maintain coherence of caches multiprocessor/multicore, and multiple memory caches multiple client caches for a shared network filesystem web browsers cache web pages DNS caches cache DNS entries Basic issue: a write needs to propagate out to other caches A memory system is coherent iff: a read must return the most recent write (no matter which processor reads) Thus: every write must eventually be accessible via a read (unless overwritten) writes to a given location are seen in the same order by all processors Strawman approach: writethrough with broadcast invalidation On a read miss, fetch data from memory On a read hit, serve data from cache On a write o (a) write data through cache to memory o (b) broadcast an invalidation to all other caches Does it work? o Not if (a) and (b) are separate steps. Why? Nothing preventing some caches from reading old values and others from reading new values in between (a) and (b), depending on whether they have the old value cached or not. o Must: lock the bus before starting (a) receive an ACK from all caches during (b) release the bus lock after (b) is finished Is it efficient? o No! Locked bus and broadcast on EVERY write hugely expensive. Writeback with exclusive ownership approach Idea: allow at most one cache to contain dirty data designate that cache to be in an exclusive state, and only it can read/write data. Can have read sharing with no dirty data; all shared caches can read simultaneously. State machines:

Messages arriving from CPU Invalid write send "write miss" to all copies read send "read miss" to exclusive owner read write Exclusive write send "write miss" to all copies Shared read Messages arriving from bus Invalid write miss write miss Exclusive read miss Shared Does this work? Great news: repeated writes are absorbed by the exclusive cache o bad news: if concurrent write sharing, cached item will bounce back and forth between caches

Yes, but need to be careful about ordering and making transitions atomic. e.g., two caches may experience CPU-driven write misses at the same time. o the send write miss to all copies needs to be atomic and acknowledged, and must include the propagation of the data from other cache/memory Yes, but need to keep track of all of the copies of data (either shared or exclusive) so that you can propagate invalidations or downgrades how do you do this? o locks, data structures, and acknowledged messages in IVY o shared bus and snooping in multiprocessor caches Observation: implementing cache coherence means keeping track of the state of caches TTL-based coherence associate a time-to-live for a data item in a cache o invalidate the item when TTL expires o DNS works this way, as does NFS to some degree Advantage: o server doesn t need to keep track of state!

Consistency motivation Consider this: Will this code work? for it to work, require that when P2 reads Mytask à data, it sees the value that P1 wrote in there. What guarantees this will happen? o nothing it depends on the memory consistency model! note that this is not a coherence issue its about the order in which reads/writes to differing locations in memory become visible o on some commercial machines, this will work, and on others, it won t Another one to consider; mutual exclusion algorithm: Does it work? again, only if you assume something about the ordering of writes and reads to the two variables.

So, we have to reason about the memory consistency model. There is a rich spectrum. Linearizability (or strict consistency) Every processor sees updates in the same order that they actually happened according to real time ------ [is linearizable] P2: R(x)0 R(x)1 - P2: R(x)2 R(x)2 - [is linearizable] P3: W(x)2 - P2: R(x)1 R(x)1 - [is not linearizable] P3: W(x)2 [is not linearizable] P2: R(x)0 R(x)1 Simple mental model: its as though all processes are timesliced on a single processor with a single memory o reads/writes are ordered in memory according to when they happened in real time Subtlety: what if reads and writes aren t instantaneous? o and a read and write overlap? o Linearizability says its as though the operation happened instantaneously sometime between the begin and end Extremely simple model to think about, but extremely expensive to implement. Basically need to coordinate on every read and write, so that concurrent read/writes resolve according to real time.

Sequential consistency Two conditions: 1. As if the operations of all the processors were executed in some sequential order, and 2. Operations of each individual processor appear in this sequence in the order specified by its program. Mental model: [is sequential consistency; sliding beads on a string.] P2: R(x)0 R(x)1 [is sequential consistency; neither processor can disprove] P2: R(x)1 P2: R(x)1 R(x)2 [is sequential consistency] P3: R(x)1 R(x)2 P4: W(x)2 P1: W(x)2 P2: R(x)1 R(x)2 [is sequential consistency, but not linearizable] P3: R(x)1 R(x)2 P4: W(x)1 (The above two are examples of a data race two memory operations on different processors with no intervening synchronization operation.)

P2: R(x)1 R(x)2 [is not sequentially consistent; P2 & P3 order writes difft] P3: R(x)2 R(x)1 P4: W(x)2 Implementing sequential consistency is possible, but has complications: Program order requirement o a processor must ensure that its previous memory operation is complete before proceeding with its next memory operation in program order. ensuring completeness means getting an acknowledgement back from cache or memory o in a cache-based system, a write must generate invalidate or update messages for all cached copies, and the write can be considered complete only when the generated invalidates and updates are acknowledged by the target caches. Write atomicity requirement o writes to the same location must be serialized (i.e., writes to the same location be made visible in the same order to all processors) o the value of a write cannot be returned by a read until all invalidates or updates generated by the write are acknowledged (i.e., the write is visible to all processors) There are some processor-level and compiler-level optimizations that sequential consistency makes impossible. So, right off the bat, we see there is a tradeoff between the ease of understanding of a memory model, and the efficiency of the memory system. Causal consistency A read returns a causally consistent recent version of the data o If I received a message A from a node (or indirectly through another node), I will see all updates that node made prior to A Can something be causally consistent but sequentially inconsistent? o Yes concurrent writes can appear in different orders at different processors!! o Means that causally unrelated writes don t need to be coordinated o Fast, fast, fast!

P2: R(y)2 R(x)0 [is causally consistent] P3: W(y)2 --------- P2: R(y)2 R(y)0 [is not causally consistent] --------- P3: R(x)1 W(y)2 Release consistency A process must acquire a block of data before being able to modify it. Modifications only need to be written back before a release. Acquires and releases are sequentially consistent. Eventual consistency A read returns a recent version of the data o but not necessarily the most recent o and not necessarily consistent with another CPUs read If no updates happen for long enough, all CPUs eventually agree on the state of memory