Virtual Memory Simulation Theorem

Similar documents
Lecture 17: Virtual Memory II. Goals of virtual memory

CS 61C: Great Ideas in Computer Architecture Virtual Memory Cont.

Virtual Shared Memory (VSM)

Virtual vs Physical Addresses

Virtual Memory. How is it possible for each process to have contiguous addresses and so many of them? A System Using Virtual Addressing

StrongARM** SA-110 Microprocessor Instruction Timing

W4118: segmentation and paging. Instructor: Junfeng Yang

Intel P6 Systemprogrammering 2007 Föreläsning 5 P6/Linux Memory System

COS 318: Operating Systems. Virtual Memory and Address Translation

Proving the Correctness of Pipelined Micro-Architectures

This Unit: Virtual Memory. CIS 501 Computer Architecture. Readings. A Computer System: Hardware

Intel StrongARM SA-110 Microprocessor

Board Notes on Virtual Memory

Introduction to RISC Processor. ni logic Pvt. Ltd., Pune

Full and Para Virtualization

ARM Virtualization: CPU & MMU Issues

Memory Management Outline. Background Swapping Contiguous Memory Allocation Paging Segmentation Segmented Paging

6.033 Computer System Engineering

18-548/ Associativity 9/16/98. 7 Associativity / Memory System Architecture Philip Koopman September 16, 1998

Hypervisor: Requirement Document (Version 3)

IOMMU: A Detailed view

The Xen of Virtualization

The Quest for Speed - Memory. Cache Memory. A Solution: Memory Hierarchy. Memory Hierarchy

Introduction to Virtual Machines

Parallel Processing and Software Performance. Lukáš Marek

A Unified View of Virtual Machines

Multiprocessor Cache Coherence

OPERATING SYSTEMS MEMORY MANAGEMENT

We r e going to play Final (exam) Jeopardy! "Answers:" "Questions:" - 1 -

ARM Microprocessor and ARM-Based Microcontrollers

IBM CELL CELL INTRODUCTION. Project made by: Origgi Alessandro matr Teruzzi Roberto matr IBM CELL. Politecnico di Milano Como Campus

The Reduced Address Space (RAS) for Application Memory Authentication

Intel s Virtualization Extensions (VT-x) So you want to build a hypervisor?

Chapter 10: Virtual Memory. Lesson 08: Demand Paging and Page Swapping

A Lab Course on Computer Architecture

OPENSPARC T1 OVERVIEW

CS 695 Topics in Virtualization and Cloud Computing. More Introduction + Processor Virtualization

What is a bus? A Bus is: Advantages of Buses. Disadvantage of Buses. Master versus Slave. The General Organization of a Bus

CS:APP Chapter 4 Computer Architecture. Wrap-Up. William J. Taffe Plymouth State University. using the slides of

I/O. Input/Output. Types of devices. Interface. Computer hardware

& Data Processing 2. Exercise 3: Memory Management. Dipl.-Ing. Bogdan Marin. Universität Duisburg-Essen

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture

IOS110. Virtualization 5/27/2014 1

OPERATING SYSTEM - VIRTUAL MEMORY

Development of Type-2 Hypervisor for MIPS64 Based Systems

Embedded Parallel Computing

OSes. Arvind Seshadri Mark Luk Ning Qu Adrian Perrig SOSP2007. CyLab of CMU. SecVisor: A Tiny Hypervisor to Provide

The ARM Architecture. With a focus on v7a and Cortex-A8

Computer Organization and Architecture. Characteristics of Memory Systems. Chapter 4 Cache Memory. Location CPU Registers and control unit memory

On Demand Loading of Code in MMUless Embedded System

ARM Caches: Giving you enough rope... to shoot yourself in the foot. Marc Zyngier KVM Forum 15

How To Write To A Linux Memory Map On A Microsoft Zseries (Amd64) On A Linux (Amd32) (

Outline: Operating Systems

Hardware Assisted Virtualization

INSTRUCTION LEVEL PARALLELISM PART VII: REORDER BUFFER

Handout #1: Mathematical Reasoning

OC By Arsene Fansi T. POLIMI

Using process address space on the GPU

Multi-core architectures. Jernej Barbic , Spring 2007 May 3, 2007

PikeOS: Multi-Core RTOS for IMA. Dr. Sergey Tverdyshev SYSGO AG , Moscow

COMPUTER HARDWARE. Input- Output and Communication Memory Systems

Virtualization. Types of Interfaces

ARM VIRTUALIZATION FOR THE MASSES. Christoffer Dall

Main Memory. Memories. Hauptspeicher. SIMM: single inline memory module 72 Pins. DIMM: dual inline memory module 168 Pins

Instruction Set Architecture (ISA)

Creative Commons Attribution-Share 3.0 United States License

SOC architecture and design

Peter J. Denning, Naval Postgraduate School, Monterey, California

CS550. Distributed Operating Systems (Advanced Operating Systems) Instructor: Xian-He Sun

Evading Android Emulator

Virtualization. Pradipta De

Chapter 5 Cloud Resource Virtualization

Virtual Machines. COMP 3361: Operating Systems I Winter

An Implementation Of Multiprocessor Linux

Precise and Accurate Processor Simulation

Virtualization. Explain how today s virtualization movement is actually a reinvention

KVM: A Hypervisor for All Seasons. Avi Kivity avi@qumranet.com

CHAPTER 7 GENERAL PROOF SYSTEMS

Virtualization. Jia Rao Assistant Professor in CS

Cortex -A15. Technical Reference Manual. Revision: r2p0. Copyright 2011 ARM. All rights reserved. ARM DDI 0438C (ID102211)

Distributed Systems. Distributed Systems

Hybrid Virtualization The Next Generation of XenLinux

Exploiting Address Space Contiguity to Accelerate TLB Miss Handling

Computer Architecture

Theory of Multi Core Hypervisor Verification

POSIX. RTOSes Part I. POSIX Versions. POSIX Versions (2)

Virtualization. Dr. Yingwu Zhu

Virtual Memory. Chapter 4

Memory ICS 233. Computer Architecture and Assembly Language Prof. Muhamed Mudawar

Computer Architecture Syllabus of Qualifying Examination

Exokernel : An Operating System Architecture for Application-Level Resource Management. Joong Won Roh

How To Write A Page Table

Hardware accelerated Virtualization in the ARM Cortex Processors

Eastern Washington University Department of Computer Science. Questionnaire for Prospective Masters in Computer Science Students

Computer Architecture TDTS10

A Performance Analysis of Secure HTTP Protocol

Digital Advisory Services Professional Service Description Network Assessment

Operating Systems 4 th Class

Router Architectures

High Performance or Cycle Accuracy?

Transcription:

Virtual Memory Simulation Theorem Mark A. Hillebrand, Wolfgang J. Paul {mah,wjp}@cs.uni-sb.de Saarland University, Saarbruecken, German This work was partially supported by

Overview Theorem: the parallel user programs of a see a sequentially consistent virtual shared (Correctness of main frame hardware & p

Context A (practical) approach for the complete fo verification of real computer systems: 1. Specify (precisely) 2. Construct (completely) 3. Mathematical correctness proof 4. Check correctness proof by computer 5. Automate approach (partially; recall

Example: Processor design 1. 3. [MP00] Computer Architecture: Complexity and Correctness Springer 4. [BJKLP03] Functional verification of the VAMP Charme 03 5. PhD Thesis Project S. Tverdyshev (Khabarovsk State Technical Univers

Why Memory Management? Layers of computer systems (all using loc computation and communication): User Program Operating System Hardware! In memory management hardware a are coupled extremely tightly.

DLX Configuration A processor configuration of the DLX is c = (R, M): R : {register names} {0, 1} 32 where register names PC, GPR(r), s M : {memory addresses} {0, 1} 8 where memory addresses {0, 1} 32 Standard definition is an abstraction: real hardware usually has no 2 32 bytes ma

DLX V Configuration A virtual processor configuration of DLX c = (R, M, r): R : {register names} {0, 1} 32 where register names: PC, GPR(r), M : {virtual memory addresses} where virtual memory addresses {0 r : {virtual memory addresses} { where the rights R (read) and W (writ identical for each page (4K).! DLX V is a basis for user programs.

DLX S Configuration A real specification machine configuration is a triple c S = (R S, PM, SM ): R S \ R: mode system mode (0) or user m pto Page table origin ptl Page table length (only for PM physical memory SM swap memory! DLX S is hardware specification.

px: page in bx: byte ind Page-Table Lookup (pto,0 12 ) px bx Page Table 32 20 ppx + 0 2 Let c = (R S, PM Virtual addr va = (px, bx P T c (px) = PM 4 ( pto + 4 px ) = 31 12 11 3 2 1 ppx[19 : 0] r w ppx c (va): physical page index v c (va): valid bit ( page in PM)

Address Translation (pto,0 12 ) px bx Page Table 32 20 20 ppx pma c (va) + 0 2 12 Let c = (R S, PM Virtual addr va = (px, bx px: page in bx: byte ind pma c (va) = (ppx c (va), bx) pma c : physical memory address To access swap memory, we also defi sma c : swap memory address (e.g. sb

Instruction Execution DLX V uses virtual addresses: Fetch: va = DPC (delayed PC) Effective address of load/store: va = ea = GPR(RS1 ) + imm (regi Hardware DLX S for mode = 1 (user): If v c (va), use translated addresses pm instead of va. Otherwise, exception. (hardware supplies parameters for pa handler)

Hardware Implementation CPU fetch load, store IMMU DMMU ICache DCache PM Build 2 hardware boxes MMU (mem management unit for fetch and load/s between CPU and caches Show it translates Done

Hardware Implementation CPU fetch load, store IMMU DMMU ICache DCache PM Build 2 hardware boxes MMU (mem management unit for fetch and load/s between CPU and caches Show it translates Done No!

Hardware Implementation CPU fetch load, store IMMU DMMU ICache DCache PM Build hardware boxes MMU & a few Identify software conditions Show MMU translates if software co met Show software meets conditions Almost done We do not care about translation (purely t we care about a simulation theorem.

Simulating DLX V by DLX S Let c = (R S, PM, SM ) and c V = (R V, P Define a projection: c V = Π(c) Identical register contents: R V (r) = Rights in page table: R r(va) r c (va) = 1 W r(va) w c (va) = 1

Simulating DLX V by DLX S VM (va) = { PM (pmac (va) if v c (va) SM (sma c (va) otherwise PT (px) ppx v 1 va px bx page(px) SM PM cache for virtual memory, PT is cache Handlers (almost!) work accordingly (sele write back to SM, swap in from SM) PM

Simulating DLX V by DLX S VM (va) = { PM (pmac (va) if v c (va) SM (sma c (va) otherwise PT (px) ppx v 0 va px bx page(px) SM PM cache for virtual memory, PT is cache Handlers (almost!) work accordingly (sele write back to SM, swap in from SM) PM

Software Conditions 1. OS code and data structures (PT, sbas free space) maintained in system are Sys PM 2. OS does not destroy its code & data 3. User program (UP) cannot modify Sy (impossibility of hacking) 4. Writes to code section are separated f code section by sync or (syncing) r Standard sync empties pipelined or OoO order) machine before next instruction is i! Swap in code then user mode fetch = self modification of code by OS & U

Guaranteeing Software Con 1. Operating system construction 2. Operating system construction 3. No pages of Sys allocated via PT to U 4. UP alone not self modifying, handler rfe

Hardware I CPU memory system protocol CPU fetch load / store ICache DCache Cache Hit Cache Miss clk mr mbusy addr dout DPC I DPC

Hardware II Inserting 2 MMUs: CPU fetch load, store IMMU DMMU ICache DCache Must obey memory protocol at both sides

Primitive MMU Primitive MMU controlled by finite state (p.addr[31:2],0^2) c.din[31:0] ptl[19:0] pto[19:0] idle [11:0] [31:12] p.req p.t drce dr[31:0] [31:0] 0^2 0^12 ad arce p.bu < + pteexcp / pte[31:0] [31:12] lexcp [31:0] /c.busy readpte: c.mr,drce p.busy t 1 0 /c.busy add 1 0 comppa: arce,p.busy [2:0] arce ar[31:0] /pteexcp & p.mr / p.din[31:0] (r,w,v) [31:2] c.addr[31:2] read: c.mr,drce p.busy c.busy

MMU Correctness Local translation lemma: Let T and V denote the start and end of a read request, no excp. Let t {T,..., V } Hypothesis: the following 4 groups of inp change in cycles t (i.e. X t = X T ): G 0 : va = p.addr t, p.rd t, p.wr t, p.req t G 1 : pto t, ptl t, mode t G 2 : P T t G 3 : PM t (pma t (va) Claim: p.din V = PM T (pma T (va)) Proof: plain hardware correctness

Guaranteeing Hypotheses G G 0 G 1 G 3 G 4 MMU keeps p.busy active during tra Extra gates: normal sync before issu enough. If rfe or update to {mode, p issue stage, stop translation of fetch o instruction. User program cannot modify Sys. Pr system code terminated (by sync) Fetch: correct by sync Load: assumes non-pipelined, in-orde unit, extra arguments otherwise

Global Hardware Correctne Define scheduling functions I(k, T ) = i: I i is in stage k during cycle T iff (... ) Similar to tag computation Key concept for hardware verification Hypothesis: I(fetch, T ) = i, translation f Claim: IR.din V = PM i S(pma i S (DPC i S)) Formal proof: part of PhD thesis project o (Khabarovsk State Technical University)

Virtual Memory Theorem (S Consider a computation of DLX S : Mode Conf. 0 c 0 0 1 1 c α c c α 1 0 c 0 1 c c Phase Initialisation User Program Handler User Progra Initialisation: Π(c α S ) = c0 V Simulation Step Theorem:! 2 page faults per instruction possible (fetch & load/store)

Virtual Memory Theorem II Assume Π(c i S ) = cj V. Define: Same cycle or first cycle after handle { i if p s 1 (i) = min{j > i, mode j } other Cycle after pagefault-free user mode { i + 1 if pff i s 2 (i) = s 1 (s 1 (i)) + 1 otherwise Claim: Π(c s 2(i) S ) = c j+1 V

Liveness We must maintain in Sys: MRSI (most recently swapped-in page) Page fault handler must not evict page MR Formal proof: PhD thesis project of T. In (Saarbrücken)

Translation Look-aside Buff 1-level lookup: formally caches for PT-re Consistency of 4 caches: ICache, DCache, ITLB, DTLB Simply invalidate TLBs at mode swit sufficient by sync conditions

Translation Look-aside Buff Multi-level lookup: TLB is simplified cac Normal cache entry: c ad v = 1 tag PM(tag,c ad) TLB entry: c ad v = 1 tag pma t (tag,c ad) t : time of last sync / rfe Invalidate at mode switch to 1 No writeback or load of lines Only cache reads and writes of valu by MMU Formal verification trivial from verified ca

Multiuser with Sharing Easy implementation and proof of pr properties using right bits r(va) and w

Main Frames I Proc Proc Proc Shared Memory PM: sequentially consistent shared m cache coherence protocol) New software condition: before chan page table entry all processors sync Sync hardware: some AND trees and interfaced with CPU Considered alone almost completely meaningless.

Main Frames II Theorem: user programs see sequentially virtual shared memory Proof: Phases OS - UPs OS UPs Global serial schedule: in each phase from sequential consist physical shared memory straight forward composition across p remaining arguments not changed!

Summary Mathematical treatment of memory m Intricate combination of hardware an considered Formalization under way

Future Work Formal verification of compilers, operating systems, applications, communication systems in industrial context...

Future Work Formal verification of compilers, operating systems, applications, communication systems in industrial context...... with a little help from