Secure Cloud Storage and Computing Using Reconfigurable Hardware Victor Costan, Brandon Cho, Srini Devadas Motivation Computing is more cost-efficient in public clouds but what about security?
Cloud Applications and Security Models Individual user backs up public data Upload file on Amazon S3 and anyone can download it User only concerned with integrity and reliability of storage User backs up private data (e.g., photographs) User can encrypt data prior to storing for privacy User concerned with integrity and reliability of storage User wants to back up and share photographs on Flickr User needs to trust integrity of application, e.g., Wordpress that is used to share photographs User wants to run a private application on private data (e.g., access private database) User has to trust the cloud provider to maintain privacy and integrity What Public Clouds Cannot Do (Yet) Guarantee integrity and privacy of computation Integrity and privacy can be guaranteed if cloud servers have trusted modules (e.g., TPMs, TEMs) Performance loss a significant concern Encrypted computation can be performed in theory using fully homomorphic encryption techniques (Gentry, 2008) These schemes are not yet practical For these reasons private clouds are used in database applications and other applications where privacy is crucial Can we secure public clouds?
Trusted Computing Bases 1. Trust the cloud providerʼs entire server The status quo: Amazon S3 and EBS Cheap, but no security guarantees 2. Trust a TPM (Trusted Platform Module) attached to server Very good security boundary: one well-studied chip Low performance, low throughput 3. The best of both worlds Donʼt trust weak components: server OS, system buses, RAM Do trust: TPM-like chip, plus high-performance chip (FPGA / ASIC) Security boundary is still good Good performance and throughput System Design
Design: System Architecture FPGA / ASIC (Trusted) Secure NVRAM Chip Client System Bus Internet CPU Disk RAM Network Card Attack Vectors for Trusted Storage Application Hard Disk tampering Try to inject invalid data (easy) Replay attacks (harder) Bugs from other applications running on the server OS compromise Physical tampering Active system bus tapping (e.g., Xbox) RAM glitching (e.g., PlayStation 3) Hard disk modification or roll-back to a previous state
Integrity Verification Client/TCB write Untrusted Disk INTEGRITY VERIFICATION read Integrity Verification Check if a value from untrusted disk is the most recent value stored at the address by the client MAC-based Integrity Verification? Client/TCB write Untrusted Disk Address 0x45 Keyed MAC V E RI F Y read 124, MAC(0x45, 124) 120, MAC(0x45, IGNORE 120) Message Authentication Code (MAC) is often used to authenticate a network message Store MAC(address, value) on writes, and check the MAC on reads Does NOT work Replay attacks Need to securely remember the untrusted disk state
Design: Trusted Storage on Untrusted Disks 160-bit hash in trusted memory authenticates 1TB disk 20 levels h 5 =h(h 1 h 2 ) Root Hash h 7 =h(h 5 h 6 ) h 6 =h(h 3 h 4 ) Root hash matches iff all blocks match Nodes hash their children h 1 =h(b 1 ) h 2 =h(b 2 ) h 3 =h(b 3 ) h 4 =h(b 4 ) Leaves hash their blocks B 1 B 2 B 3 B 4 Disk divided into 1MB blocks Design: Hash Tree Cache Server stores entire hash tree in RAM FPGA has a cache that stores a subset of nodes Server tells FPGA what nodes to store Cache management commands 1 2 3 4 5 6 7 Node Hash Verified 1 fabe Y 2 e6fc Y 4 53a8 Y 5 b2ce Y
Design: Hash Tree Cache - Efficiency Checking leaf 33 requires 10 node loads for a cold cache on this example Remember the root is always loaded in the cache 1 2 3 4 5 8 9 16 17 32 33 5/25/10 Design: Hash Tree Cache - Efficiency Checking leaf 38 only 4 node loads, because 9 is already in the cache and verified Server can predict client requests and manage cache for high performance 1 4 5 8 9 2 3 16 17 32 33 18 19 38 39 5/25/10
Design: Maintaining FPGA State FPGA 32nm, no NVRAM Physically Unclonable Function (PUF) or Battery-backed Encryption Key E-Fuses: hash of public key for the certificate of the trusted memory chip Trusted Memory Low performance Smart Card-family chip Encryption engine, manufacturer certificate NVRAM holding FPGAʼs root hash Implementation Decisions
Design: System Architecture Revisited FPGA / ASIC (Trusted) Secure NVRAM Chip Client System Bus Internet CPU Disk RAM Network Card Implementation: Storage Prototype uses desktop-class 7,200 RPM HDD with 1TB Normal servers would use 10,000 RPM disks Hash tree block size: 1Mb Model Throughput Latency GB / $ 7,200 RPM HDD 70 MB/s 12 ms 10 10,000 RPM HDD 100 MB/s 8 ms 1.5 15,000 RPM HDD 130 MB/s 6 ms 1 SSD 250 MB/s 0.065ms 0.4
Implementation: SHA-1 Hash Engine High-throughput 4-stage pipelined SHA-1 implementation 6 SHA-1 engines, 4 simultaneous hashes / engine Hash tree logic (with cache) uses 70% of the silicon, SHA-1 uses 30% FPGA Model Throughput Latency FPGA Cost Virtex-5 FPGA 20.4 GB/s 600 ns $50 Virtex-6 FPGA 21.6 GB/s 550 ns $75 Implementation: Hash Tree Cache 188 bits per cache entry, 43690 entries / MB 1 TB disk, 1MB nodes path length is 20 nodes Prototype: 1MB cache on FPGA, avg. 3 node loads / block Production: 8MB cache (like Core i7), avg. 1 load / block Cache size, strategy Hit rate Loads / op Verifies / write 32kB, LRU 50% 10 20 512kB, LRU 75% 5 20 1MB, LRU 85% 3 20 8MB, LRU 95% 1 20
Implementation: FPGA CPU Bus Prototype uses Gigabit Ethernet at 80% capacity Production servers should use 16-lane PCI-Express Model PCI Express x16 SATA II PCI Express x1 Ethernet USB 2.0 Throughput 4 GB/s 384 MB/s 250 MB/s 100 MB/s 60 MB/s Implementation: Trusted Memory Chip Irrelevant for performance, used for booting the FPGA Smart card technology Prototype: JavaCard 2.2.1, 32kB EEPROM 2kB RAM, 100ms / op Estimated requirements: 4kB ROM, 4kB EEPROM Production: any $1 secure chip with a processor and NVRAM Secure NVRAM to Server Bus Prototype: USB Production: USB, LPC Irrelevant for system performance, only used at boot
Implementation: Prototype System Virtex 5 XC5VLX110T JCOP21 36k MacBookPro6,2 Core i7 620 4GB RAM Gigabit Ethernet Ethernet USB 1.0 SATA II HyperTransport PCI-E x1 Cat 5 cable Core i7 920 1TB 7,200 RPM PC1066 2GB Generic Gigabit Ethernet Implementation: Performance Overview 5.2GB/s 96us 100MB/s 200us 100MB/s 200us 384MB/s 250MB/s 100MB/s 8ms 100MB/s 200us
Implementation: Overhead Analysis for the Prototype Client Server Bandwidth overhead: 0.002% Operation: 1 HMAC (20 bytes) per 1MB = 0.002% Handshake: extra secret exchange piggybacks on SSL: 5% Latency overhead (1 client): 4% Without security: 8.2ms / request With security: 8.5ms / request Latency overhead = the latency of a very fast Internet hop No throughput overhead (N-clients) With or without security: 100MB/s Need 40 HDDs to saturate PCI-E x16, 52 HDDs to saturate FPGA Ongoing Work
Other Applications FPGA can be used to load user-specified circuits and perform arbitrary computation with security guarantees Applications: encrypted image search, financial calculations Potential applications in highly regulated industries, e.g. medical record keeping and processing, secure financial services Acknowledgement: Work was funded by Quanta Corporation