What Role Does Hardware Design Have to Play in Cybersecurity? Srini Devadas Massachuse0s Ins2tute of Technology
Agenda Cyber threats today Defensive strategies Role of hardware design Research direc=ons and open problems
AKacks on Individuals Ransomware Worm enters system through downloaded file. Payload encrypts user s hard drive and deletes the original files user cannot decipher his/her own files Pay $500 in Bitcoin to get your files back! https://nakedsecurity.sophos.com/2012/11/02/ anonymous-ransomware/ MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY
AKacks on Services Target in 2013 40 million: The number of credit and debit cards thieves stole. 70 million: The number of records stolen that included the name, address, email address and phone number of Target shoppers. 46: % drop in profits in the 4th quarter of 2013, compared to 2012. 200 million: Estimated cost for reissuing 21.8 million cards. 53.7 million: The income that hackers likely generated from the sale of 2 million cards stolen and sold at $26 per card. 0: Number of customer cards with AVAILABLE hardware security technology that would have been able to stop the bad guys from stealing. MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY
AKacks on Infrastructure The Stuxnet Cyberphysical AKack A 500 Kbyte computer worm that infected the sotware of at least 14 industrial sites in Iran including a nuclear facility Goal was to cause fast- spinning centrifuges to tear themselves apart Stuxnet was tracked down by Kaspersky Labs but not before it did some damage MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY
http://spectrum.ieee.org/telecom/security/the-real-story-of-stuxnet AKacks on Infrastructure The Stuxnet Cyberphysical AKack MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY
Cybersecurity and Threats Cybersecurity is a property of computer systems similar to performance and energy AKackers take a holis=c view by akacking any component or interface of system Diverse threat models dictate different desirable security proper=es Viruses and worms: Bug- free programs Denial- of- Service akacks: Redundant resources Cyberphysical akacks: Tamper- resistant hardware MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY
Current Approaches are Limited Computer systems are so complex that it is impossible to design them without vulnerabili=es. Therefore, the best we can do is to: Focus on exis=ng compu=ng systems and their akacks to discover flaws Design mechanisms into these systems to protect against these akacks Manage risk and administer systems well Unfortunately, new flaws are always discovered We need to do beker than this Patch & Pray, Perimeter Protec=on mindset MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY
Security property cannot be ar=culated well when isolated to a component or layer à need a systems- wide, architectural viewpoint New theore=cal and prac=cal founda=ons of secure compu=ng that integrate security in the design process à security by default à Holis=c View of Cybersecurity Remove program error as a source of vulnerability Need researchers from diverse disciplines, e.g., systems and applica=on sotware designers, architects and digital system designers to tackle the problem MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY
Three Defenses Preven=on: Increasing the difficulty of akacks Resilience: Allowing a system to remain func=onal despite akacks Detec=on and Recovery: Allowing systems to more quickly detect and recover from akacks to fully func=onal state. MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY
The Obvious Role of Hardware in Cybersecurity Can implement security func=onality in hardware, e.g., Encryp=on Message authen=ca=on Network packet inspec=on Etc. to improve performance and lower energy
Preven=on Protec=on Against Physical AKack
Tradi=onal Device Authen=ca=on Each IC needs to be unique Embed a unique secret key SK in on- chip non- vola=le memory Use cryptography to authen=cate an IC Cryptographic opera=ons can address other problems such as protec=ng IP or secure communica=on IC with a secret key Sends a random number Sign the number with a secret key à Only the IC s key can generate a valid signature IC s Public Key
BUT How to generate and store secret keys on ICs in a secure and inexpensive way? Adversaries may physically extract secret keys from non- vola=le memory Trusted party must embed and test secret keys in a secure loca=on What if cryptography is NOT available? Extremely resource (power) constrained systems such as passive RFIDs Commodity ICs such as FPGAs Invasive probing Non-invasive measurement
Physical Unclonable Func=ons (PUFs) Extract secrets from a complex physical system Because of random process varia=ons, no two Integrated Circuits even with the same layouts are iden=cal Varia=on is inherent in fabrica=on process Hard to remove or predict Delay- Based Silicon PUF concept (2002) Generate keys from unique delay features of chips Challenge" n-bits" Response" m-bits" Combinatorial Circuit"
Why PUFs? (Challenge) PUF n Response PUF can enable secure, low- cost authen=ca=on w/o crypto Use PUF as a func=on: challenge response Only an authen=c IC can produce a correct response for a challenge Inexpensive: no special fabrica=on technique PUF can generate a unique secret key / ID Physically secure: vola=le secrets, no need for trusted programming Can integrate key genera=on into a secure processor
An Arbiter- Based Silicon PUF n-bit! Challenge! Rising Edge! 10 0 1 1 0 0 1 1 0 0 1 " 1 0 0 1 D Q G 01 1 if top! path is! faster,! else 0! Response! Compare two paths with an iden=cal delay in design Random process varia=on determines which path is faster An arbiter outputs 1- bit digital response Mul=ple bits can be obtained by either duplicate the circuit or use different challenges Each challenge selects a unique pair of delay paths
Arbiter Experiments PUF Response: Average Code Distances 128 (2x64) bit, RFID MUX PUF Rev.Ax1 M3 vs. Rev.Ax8 M3 @ +25 C Millions 25 20 64 stage Intra-chip @ Rev.Ax1 Inter-chip @ Rev.Ax1 Intra-chip @ Rev.Ax8 Inter-chip @ Rev.Ax8 15 512 stage 10 5 0 0 16 32 48 64 80 96 112 128 Code distance [Bits] 18
Arbiter is not a PUF (clonable!) Introduced in 2003 paper, shown in same paper to be suscep=ble to a machine learning model- building akack 100% Rev.A PUF Model/Data Correlation Levels 90% Model output match level to 16,384 bits of real Rev.A PUF data (teaching set included) 80% 70% 60% 50% 40% 30% 20% 10% 0% 16 32 48 64 80 96 112 128 144 160 176 192 208 224 240 256 19 Need to add nonlinearity to circuit Number of challenges (= single response bits) taught CFMin CFAvg CFMax
XOR Arbiter PUF Can process and combine outputs of mul=ple PUFs Simplest version: XOR opera=on n-bit! Challenge! PUF Circuit PUF Circuit PUF Circuit PUF Circuit XOR of k PUFs each with n stages
XOR Arbiter PUF Security Machine learning complexity appears to grow as O(n k+1 ) for k- way XOR over n- stage PUFs Size of circuit grows as O(nk) N = 64, k = 6 is on the edge of being broken Can go up to k = 8 with reasonable noise levels
8- way XOR experiments PUF Response: Average Code Distances 128 (2x64) bit, RFID MUX PUF Rev.B vs. (synthesized) Rev.Bx2XOR @ +25 C Millions 30 25 Intra-chip @ Rev.B Inter-chip @ Rev.B Intra-chip @ Rev.Bx2 Inter-chip @ Rev.Bx2 20 15 4-way XOR 8-way XOR 10 5 0 0 16 32 48 64 80 96 112 128 Code distance [Bits] 22
Current Limita=ons of PUFs PUF- based authen=ca=on not cryptographically secure, i.e., not reducible to established hard problems Combined machine learning and side channel akacks have broken many candidates New candidates con=nually being proposed Key genera=on needs helper data Many more bits than key bits Proofs of no leakage from helper data make some untested assump=ons
Resilience Under AKack Encrypted Computa=on
Trusted Compu=ng Base The trusted compu=ng base (TCB) is the set of sotware and hardware components that need to be trusted by a user In TPM- based systems, the TPM akests the veracity of several millions of lines of (buggy) OS code The TPM does not provide real security In the cloud, the TCB is > 20M lines of code from tens of sotware vendors No wonder we have so many security breaks!
Applica=on: Secure Cloud Compu=ng I want to delegate processing of my data, without giving away access to it. Separa=ng processing from access via encryp=on: I will encrypt my data before sending it to the cloud They will apply their processing on the encrypted data, send me processed (s=ll encrypted) result I will decrypt the result and get my answer Computation Under Encryption
An Analogy: Alice s Jewelry Store Courtesy: C. Gentry Alice s workers need to assemble raw materials into jewelry But Alice is worried about thet How can the workers process the raw materials without having access to them?
An Analogy: Alice s Jewelry Store Courtesy: C. Gentry Alice puts materials in locked glove box For which only she has the key Workers assemble jewelry in the box Alice unlocks box to get results
The Analogy Encrypt: puvng things inside the box Alice does this using her key c i ß Enc(m i ) Decrypt: Taking things out of the box Only Alice can do it, requires the key m* ß Dec(c*) Process: Assembling the jewelry Anyone can do it, compu=ng on ciphertext c* ß Process(c 1,,c n ) m* = Dec(c*) the ring, made from raw material m i
Encrypted Computa=on Encrypted computa=on can thus be achieved using Fully Homomorphic Encryp=on (FHE) without trus2ng anything on the server side Server does not need to store a secret key Unfortunately, FHE overheads are about 10 8 to 10 9 for straight- line code and overheads grow if there is complex control in the program Only usable for simple computa=ons What About Hardware Approaches?
Tamper Resistant Hardware Tamper resistant hardware The secure processor is trusted, shares secret key with client. Private informa=on stored in the hardware is not accessible through external means. examples: XOM, Aegis, TPM, TPM + TXT, etc.
Tamper Resistant Hardware Limita=ons Just trus=ng the tamper resistance of the chip is not enough! I/O channels of the secure processor can be monitored by sotware and leak informa=on Examples: address channel, I/O =ming channel Main Memory
Leakage through Address Channel for i = 1 to N if (x == 0) sum += A[i] else sum += A[0] Address sequence: 0x00, 0x01, 0x02 Address sequence: 0x00, 0x00, 0x00 The value of x is leaked through the access pakern Sensi=ve data exposed by observing the addresses!
A Typical Computer TCB Software! Running on hardware! (Ring 3)! TRUSTED! Privelege! App! App! Secure App! TRUSTED! OS Kernel (Ring 0)! Hypervisor (Ring 0, VMX root)! BIOS (SMM)! DRAM! Main! board! DRAM Ctrl.! CPU!! Thread! L3 $! Chipset! L1 $! L2 $! Disk! Network!
Intel SGX to reduce TCB (Ring 3) " App" TRUSTED! Enclave" SGX protects a small codebase! Doesn t trust OS! Protected app = Enclave! Privelege! OS Kernel (Ring 0) " Hypervisor (VMX root) " BIOS (SMM)" Provides a trusted environment:! - app integrity! - protects data! TCB is the Intel CPU no off-chip interfaces to secure! DRAM" DRAM Ctrl." TRUSTED! CPU" " L1 $" Thread" L2 $" L3 $" Main" board" Chipset" Disk" Network"
Cache Timing AKack (Modern x86 CPU: 4-way set associative L1, L2$, 64B lines)! Simultaneous multithreading (hypertheading!)! A spy, sharing physical core with victim:! - malloc enough data to blow the $! - For each $ line, read to populate all 4 ways! - Use RDTSC to time reads, log results:! - Fast: no evictions occurred.! - Slow: eviction! Victim loaded.! - Slower: writeback! victim stored!! Physical" Memory" L1 or L2 $" Victim owns at most one line in set" S sets" 4 ways"
Memory Access PaKern Leakage Assume a malicious OS, own scheduling!!mount the same attack as before.! We can do even better!!!orchestrate page mappings!!!cache partitioning!!!kernel data separate from attack!!very low noise!! Kernel data structures" Cache timing attack" on enclave, as before" RAM" Allocation" for" snooping" Kernel" data" structures" Enclave data"
Ascend Security Goal Protect against all sotware- based and some hardware- based akacks when running untrusted sotware An adversary cannot learn a user s private informa=on by observing the pin traffic of Ascend. Main Memory
Ascend Processor: Eliminate leakage over chip pins
Two Interac=ve Protocol Data transfer only happens twice Time=0 Input data fed into Ascend (stored in ORAM) Time=T Output data returned to the user.
Oblivious RAM Time = 0 Public Input (from the server) Enc(Private input) (from the user) Ascend L1$ CPU ORAM Interface A E S Periodic Access Main Memory (4GB ORAM) L2$ AES Time = T Enc( final result )
Oblivious RAM Oblivious RAM (ORAM) [1] ORAM allows a client to conceal its access pakern to the remote storage by con=nuously shuffling and re- encryp=ng data as they are accessed. Any two access sequences of same length are computa=onally indis=nguishable. ORAM does not protect =ming channel, i.e., when accesses are made can s=ll leak informa=on. [1] O. Goldreich and R. Ostrovsky. Software protection and simulation on oblivious RAMs. J. ACM, 1996.
Naïve Oblivious RAM Naïve ORAM Each access touches all the N data blocks in main memory Blocks are read, re- encrypted using probabilis=c encryp=on and wriken back Dummy blocks are filled to obfuscate memory footprint. O(N) overhead unacceptable
Path ORAM 0 1 2 3 Path ORAM is organized as a binary tree. L levels Each node contains Z data blocks (cache lines) off-chip on-chip ORAM Interface Trusted Coprocessor Binary Tree Unoccupied nodes are filled with dummy blocks Dummy and real blocks are indis=nguishable ater encryp=on
Path Oblivious RAM Path ORAM* Each access only touches O(log(N)) data blocks. The most prac=cal ORAM scheme for hardware Compute- bound applica2ons: < 2X overhead Memory- bound applica2ons: 5-10X overhead Neither server nor applica2on soqware is trusted. * Path ORAM: An Extremely Simple Oblivious RAM Protocol, Emil Stefanov, Marten van Dijk, Elaine Shi, Christopher Fletcher, Ling Ren, Xiangyao Yu, Srinivas Devadas, CCS, 2013.
Current Limita=ons of Ascend Batch computa=on model The channel between the user and Ascend is only used at beginning and end of computa=on. No interac=on during computa=on User input/output, network, disk, etc. Oblivious RAM Network Disk
Detec=on and Recovery Integrity of Computa=on
AKacks on Integrity Some=mes one is only concerned with obtaining correct results, not privacy leakage Integrity of storage (malicious errors) implies reliability of storage (random errors) Solu=on: Cryptographic hash func=ons Reliability and integrity of computa=on is a harder problem Errors can have catastrophic effects Many possible akacks on computa=on 48
Redundant Computa=ons Redundancy in the form of retries or parallel computa=ons is key to recovery Challenge is to keep overheads manageable à hardware can help Key idea: Hardware computes confidence informa=on for each computa=on Confidence low on data from an external source, high on data from trusted sources 49
Informa=on Flow Tracking Tracking Confidence Architect a processor to track the flow of informa=on through the code This can be done in sotware albeit with greater overhead Worked well for buffer overflow akacks Tracking calculus becomes more complicated under more sophis=cated akacks Abort computa=on or retry when confidence falls below threshold 50
Some Open Problems
Public- Model Physical Unclonable Func=ons Concept proposed by Koushanfar, Potkonjak Simula=ng the public model takes much longer than evalua=ng the system Give me response to challenge Correct response within time T
Hardware Trojans Suppose the manufacturer of the chip is not trusted How can we protect against a malicious manufacturer?
Securing Interac=on Under Untrusted SoTware Securing interac=ons with memory is easy Encryp=on, integrity verifica=on, ORAM What about keyboards, displays, network, and other I/O devices? How to authen=cate a keypress? How to authen=cate what is being shown on a display? 54
Summary SoTware dominates cybersecurity conversa=ons Several cybersecurity challenges are best handled through appropriate hardware design, since sotware IS the problem! 55