Software Execution Protection in the Cloud



Similar documents
Cloud computing in a nutshell

Outline. Clouds of Clouds lessons learned from n years of research Miguel Correia

How to Secure Infrastructure Clouds with Trusted Computing Technologies

Technical Brief Distributed Trusted Computing

Bypassing Local Windows Authentication to Defeat Full Disk Encryption. Ian Haken

Property Based TPM Virtualization

Background. TPMs in the real world. Components on TPM chip TPM 101. TCG: Trusted Computing Group. TCG: changes to PC or cell phone

Lecture Embedded System Security Dynamic Root of Trust and Trusted Execution

Patterns for Secure Boot and Secure Storage in Computer Systems

Microsoft s Advantages and Goals for Hyper-V for Server 2016

Virtual Infrastructure Security

Trusted VM Snapshots in Untrusted Cloud Infrastructures

Lecture Overview. INF3510 Information Security Spring Lecture 4 Computer Security. Meaningless transport defences when endpoints are insecure

Opal SSDs Integrated with TPMs

Introduction to Cloud Computing

Private Virtual Infrastructure: A Model for Trustworthy Utility Cloud Computing UMBC Computer Science Technical Report Number TR-CS-10-04

Apache Hadoop. Alexandru Costan

CSE543 Computer and Network Security Module: Cloud Computing

What Is It? Business Architecture Research Challenges Bibliography. Cloud Computing. Research Challenges Overview. Carlos Eduardo Moreira dos Santos

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Private Virtual Infrastructure: A Model for Trustworthy Utility Cloud Computing UMBC Computer Science Technical Report Number TR-CS-10-04

Intel s Virtualization Extensions (VT-x) So you want to build a hypervisor?

Hi and welcome to the Microsoft Virtual Academy and

Intro to Virtualization

Index. BIOS rootkit, 119 Broad network access, 107

Google File System. Web and scalability

Acronym Term Description

Handling Hyper-V. In this series of articles, learn how to manage Hyper-V, from ensuring high availability to upgrading to Windows Server 2012 R2

Virtualization System Security

A review of BackupAssist within a Hyper-V Environment. By Brien Posey

Virtualization Backup/Replication Solution Comparison:

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

Secure Data Management in Trusted Computing

MS-6422A - Implement and Manage Microsoft Windows Server Hyper-V

TPM Key Backup and Recovery. For Trusted Platforms

Control your corner of the cloud.

VIRTUALIZATION 101. Brainstorm Conference 2013 PRESENTER INTRODUCTIONS

Data Protection: From PKI to Virtualization & Cloud

Getting the Most Out of Virtualization of Your Progress OpenEdge Environment. Libor Laubacher Principal Technical Support Engineer 8.10.

A Virtualized Linux Integrity Subsystem for Trusted Cloud Computing

Windows Azure and private cloud

Protecting Data with Short- Lived Encryption Keys and Hardware Root of Trust. Dan Griffin DefCon 2013

vtpm: Virtualizing the Trusted Platform Module

Definitions. Hardware Full virtualization Para virtualization Hosted hypervisor Type I hypervisor. Native (bare metal) hypervisor Type II hypervisor

IOS110. Virtualization 5/27/2014 1

UNCLASSIFIED Version 1.0 May 2012

Parallels Cloud Storage

HADOOP MOCK TEST HADOOP MOCK TEST I

Use of Hadoop File System for Nuclear Physics Analyses in STAR

IaaS Cloud Architectures: Virtualized Data Centers to Federated Cloud Infrastructures

Security and Privacy in Public Clouds. David Lie Department of Electrical and Computer Engineering University of Toronto

Abstract. 1 Introduction

GraySort and MinuteSort at Yahoo on Hadoop 0.23

M6422A Implementing and Managing Windows Server 2008 Hyper-V

Office 365 Migration Performance & Server Requirements

Distributed File Systems

Hadoop Architecture. Part 1

Whitepaper Enhancing BitLocker Deployment and Management with SimplySecure. Addressing the Concerns of the IT Professional Rob Weber February 2015

Virtualization Technologies and Blackboard: The Future of Blackboard Software on Multi-Core Technologies

Virtual Switching Without a Hypervisor for a More Secure Cloud

BlobSeer: Towards efficient data storage management on large-scale, distributed systems

CS 155 Spring TCG: Trusted Computing Architecture

Leveraging Virtualization in Data Centers

Red Hat enterprise virtualization 3.0 feature comparison

Server Virtualization with Windows Server Hyper-V and System Center

Microsoft Windows Server 2008: MS-6422 Implementing and Managing Hyper V Virtualization 6422

MODULE 3 VIRTUALIZED DATA CENTER COMPUTE

Hadoop Basics with InfoSphere BigInsights

Assignment # 1 (Cloud Computing Security)

A Cost-Evaluation of MapReduce Applications in the Cloud

Comparing Free Virtualization Products

CA ARCserve Replication and High Availability Deployment Options for Hyper-V

DATA SECURITY MODEL FOR CLOUD COMPUTING

Best Practices for Monitoring Databases on VMware. Dean Richards Senior DBA, Confio Software

Journal of science STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS)

9/26/2011. What is Virtualization? What are the different types of virtualization.

Energy Efficient MapReduce

Protect SQL Server 2012 AlwaysOn Availability Group with Hitachi Application Protector

A SURVEY OF POPULAR CLUSTERING TECHNOLOGIES

Best Practices for Virtualised SharePoint

Deployment Options for Microsoft Hyper-V Server

TPM. (Trusted Platform Module) Installation Guide V2.1

Oracle VM Server Recovery Guide. Version 8.2

Setup Guide for HDP Developer: Storm. Revision 1 Hortonworks University

Before we can talk about virtualization security, we need to delineate the differences between the

Top 5 Reasons to choose Microsoft Windows Server 2008 R2 SP1 Hyper-V over VMware vsphere 5

Securing your Virtual Datacenter. Part 1: Preventing, Mitigating Privilege Escalation

Software Token Security & Provisioning: Innovation Galore!

A Secure Strategy using Weighted Active Monitoring Load Balancing Algorithm for Maintaining Privacy in Multi-Cloud Environments

Compromise-as-a-Service

Masters Project Proposal

Protecting the Filesystem Integrity of a Fedora 15 Virtual Machine from Offline Attacks using IMA/EVM Linux Security Summit 8 September 2011

BitLocker Drive Encryption Hardware Enhanced Data Protection. Shon Eizenhoefer, Program Manager Microsoft Corporation

Desktop Virtualization. The back-end

Simplifying Server Workload Migrations

Server Virtualization A Game-Changer For SMB Customers

Virtual Machine Synchronization for High Availability Clusters

Hard Disk Drive vs. Kingston SSDNow V+ 200 Series 240GB: Comparative Test

Cloud Server. Parallels. Key Features and Benefits. White Paper.

How To Run A Modern Business With Microsoft Arknow

Transcription:

Software Execution Protection in the Cloud Miguel Correia 1st European Workshop on Dependable Cloud Computing Sibiu, Romania, May 8 th 2012 Motivation clouds fail 2 1

Motivation accidental arbitrary faults Recent studies: Google dt datacenters: t more than 8% DIMMs affected by errors yearly (even with ECC) Consumer PCs (Microsoft): CPU and core chipset faults are frequent Known cases: Sun server caches corrupted by cosmic rays (~2000) IOMMU in AMD chipsets corrupted data due to a bug activated by certain sw/hw combinations These faults can corrupt software execution 3 Motivation malicious faults Recent studies: Mlii Malicious insiders id with access to servers management VM (dom 0 in Xen) can easily obtain passwords, private RSA keys, files, thus tamper with user VMs Recent cases: Google engineer read Gmail/Gtalk communications and contacted teen users CyberLynk (cloud storage) ex employee deleted a season of a kids TV series These attacks can corrupt software execution 4 2

Outline Protecting MapReduce executions from accidental arbitrary faults Protecting user virtual machines from malicious faults Conclusions 5 PROTECTING MAPREDUCE EXECUTIONS FROM ACCIDENTAL ARBITRARY FAULTS 6 3

What is MapReduce? Programming model + execution environment Introduced dby Google in 2004 Used for processing large data sets using clusters of servers Currently several implementations are available and it s used by many organizations Hadoop MapReduce, an open source MapReduce Probably the most used, the one we used Includes the Hadoop Distributed File System (HDFS) a file system for GB files 7 WordCount example GB files! 8 4

Data flow and locality servers servers 9 Job submission and execution 10 5

Fault Tolerance The original Hadoop MR tolerates some faults Job btracker detects dt tand recovers crashed hd map/reduce tasks Detects corrupted files (a hash is stored with each block) But execution can be corrupted and a task can return wrong output e.g., due to memory corruption or chipset errors 11 Accidental arbitrary fault tolerance (or Byzantine fault tolerance BFT) Basicapproachisto approach is replicatetask task executionand vote Given f = the maximum number of faulty replicas Standard approaches: Replication + consensus all tasks executed 3f+1 times Replication + client voting all tasks executed 2f+1 times Computation multiplied by at least 3! Our approach: All tasks executed f+1 times + 1 task execution/fault Tolerates more than f faults (meaning of f is slightly different) 12 6

System model Tasks (map/reduce) and nodes can be correct or faulty Client is correct (not part of MR) Job Tracker is correct (like in Hadoop MR) Tasks: for every task, at most f of its faulty replicas can produce the same output Otherwise the number of faulty replicas is unbounded Next: basic idea 13 Original MR Map perspective 14 7

Basic BFT MR Map perspective 15 Original MR Reduce perspective 16 8

Basic BFT MR Reduce perspective 17 Improvements over basic version Basic version: map/reduce tasks executed 2f+1 times Our BFT MR can be thought ht of as this basic version plus the following modifications: Deferred execution Tentative reduce execution Digest outputs Tight storage replication 18 9

Deferred execution Faults are uncommon Job Tracker creates only f+1 replicas Creates more if the results aren't equal (until f+1 are equal) 19 Tentative reduce execution Start reduce tasks when the first input becomes available (instead of waiting for f+1 equal inputs) Restart reduce tasks if inputs disagree 20 10

Digest outputs Fetch data only from one map replica Fetch a digest (hash) from the remaining ones 21 Tight storage replication HDFS replicates data blocks for fault tolerance Turn off HDFS replication for Reduce output data 22 11

Experimental evaluation GridMix benchmark Standard d benchmark kfor Hd Hadoop MapReduce 6 different types of applications Executed in the Grid'5000 infrastructure (France) No faults injected f=1 twice as much computation as the original MR 10 and 20 nodes of the same type Variable number of input splits (hundreds of MB to GB) Ran 5 times each case; results are average 23 Job Execution Time 10 nodes WebdataScan app. with variable number of 64 MB input splits BFT version took around the double of time 24 12

BFT/original exec. times 10 nodes Ratio of duration of BFT and original versions BFT version takes from 1.5 to 2.5 more time 25 BFT/original exec. times 20 nodes With more nodes, BFT takes only slightly longer than original But CPU time used is still the double 26 13

PROTECTING USER VIRTUAL MACHINES FROM MALICIOUS FAULTS 27 Malicious insider example attack Consider an IaaS cloud that runs user VMs Servers contain ti hypervisor, mgmt VM, and run user VMs Malicious insider can attack VMs by abusing server functionality, e.g., memory snapshot Attack insider logs in Xen dom 0 and runs: $ xm dump-core 2 -L lucidomu.dump Dumping core of domain: 2... $ rsakeyfind lucidomu.dump found private key at 1b061de8 28 14

Protecting user VMs basic idea To prove to the cloud user that its data is in a server with a safe software configuration e.g., in which the management VM has no snapshot function Do this using the Trusted Platform Module (TPM), a security chip from the Trusted Computing Group now shipping with common PC hardware 29 Measurements TPM has Platform Configuration Registers (PCR) A PCR stores (typically) a measurement of a software block, i.e., its cryptographic hash During system boot, BIOS stores hash(boot loader) in PCR 0, boot loader stores hash(hypervisor) in PCR 1, A vector of PCR values gives a trusted measurement of the software configuration It is signed by the TPM s Endorsement Key (EK) EK certificate signed by TPM s vendor (means it s a real TPM) 30 15

Remote attestation Computer gives to a challenger a measurement of the software configuration (vectorofpcrof values) 2 Request TPM vector of PCRs signed with EK Computer being attested (with TPM) 1 Request attestation 3 PCR vector (signed with EK) Challenger 4 Verify signature and if PCR values match a trusted configuration 31 Approach overview Servers run a Trusted Virtualization Environment (TVE), formed by hypervisor + managementvm that the user trusts TVE does not provide dangerous operations to administrators: memory snapshot, volume mount TVE provides only trusted versions of certain operations VMs enter and leave a TVE encrypted Users do remote attestation of TVEs/operations to be sure that their VMs are either in a TVE or encrypted The environment is a TVE if its measurements (PCR values) fall in a set of TVE configurations 32 16

Limitations of TVEs TVEs apparently solve the problem User VMs run only on TVEs so they are secure but A TVE is too big (is it possible to trust ~400 KLOCs?) TPM is too slow to run attestations with several VMs per server (e.g., can do only ~2 per second!) 33 Trusted Cloud Operation Modules TCOMs are modules that implement trusted versions ofcertain operations: launch, migrate, They re specific modules thus much smaller than a TVE TCOMs are executed in an Isolated Execution Environment (IEE) An IEE is created in runtime based on a DRTM (TrustVisor) TCOM measurements areextended extended into a μtpm μtpm is software, but much faster than the TPM Smaller, faster: solves the 2 limitations of TVEs 34 17

Trusted VM operations Operations that have to be trusted (so need TCOMs): VM launching, VM migration, VM backup, VM termination These operations involve four entities Server agent trusted because it is in the TVE TCOM of the operation in a server trusted User agent trusted, not part of the cloud Cloud management agent not trusted, what the malicious insider may control 35 VM launching Ensures that the VM is started in a TVE by a TCOM Cloud Server X User VM 7 Launch user VM VM launch TCOM 4 Get VM from storage or user User agent Server agent 3 Use TrustVisor to create IEE with VM launch TCOM 2 Launch VM Cloud management agent 36 18

VM migration, backup, terminate The process is similar for all except: VM migration The TCOM of the destination server has also to be attested Source TCOM encrypts VM with session key; destination TCOM decrypts it VM backup TCOM encrypts the VM TCOM gives the key to the user agent VM termination TCOM scrubs the memory and disk to delete all VM data 37 Open problems Gap between checking a measurement (just a hash) and trusting a complex software module (TVE) How can we know that there isn t really undesirable functionality, vulnerabilities, or malware inside? Putting this solution in production Short time to market and many players: cloud provider, software producers, assurance labs, cloud user 38 19

CONCLUSIONS 39 Conclusions Software executed in the cloud can be corrupted by: arbitrary accidental faults Memory corruptions Other hardware faults malicious faults / attacks Malicious insiders And many others 40 20

Conclusions Tolerance to accidental faults in MapReduce Execute each task more than once and compare results Do it efficiently to multiply computation by only 2, with f=1 (with no faults) Protection from malicious insiders Use trusted computing / the TPM to create root of trust Cloud providers may implement something of the kind soon (TCG, Intel, IBM are pushing) 41 Questions? 21