Hash Function JH and the NIST SHA3 Hash Competition Hongjun Wu Nanyang Technological University Presented at ACNS 2012 1
Introduction to Hash Function Hash Function Design Basics Hash function JH Design Security Performance Conclusion 2
Hash Function Compress an arbitrary message into an output with fixed length (checksum) Being used since 1950s Mostly used to accelerate table lookup or data comparison 3
Cryptographic Hash Function Each output of a cryptographic hash function represents only one input message Invented for digital signature signing the short message digests (wikipedia) 4
Cryptographic Hash Function How to ensure that each output represents only one input message? The message space size is much larger than the size of output space impossible for each output to represent only one message Solution: we try to ensure that it is computationally impossible to find two messages with the same output => computationally possible for each output to represent only one input message 5
Cryptographic Hash Function A strong cryptographic hash function has the following three properties: Preimage Resistance Given an output, difficult to find an input Second-Preimage Resistance Given an input, difficult to find another input with the same output Collision Resistance Difficult to find two inputs with the same output 6
Cryptographic Hash Function Applications digital signature (collision resistance) data integrity (collision or pre-image resistance) Example: Checksum for downloading software Random number generator Compression: entropy amplification One-way: protect the seed Security token One-way 7
Cryptographic Hash Function MD4 (1990) 128-bit message digest MD5 (1991) 128-bit message digest MD5 broken by Wang Xiaoyun et. al. in 2005 8
Cryptographic Hash Function Hash function standard of NIST SHA Secure Hash Algorithm SHA-0 (1993) 160-bit message digest size Insecure withdrawn shortly, replaced by SHA-1 SHA-1 (1995) 160-bit message digest size Insecure (2 69, Wang Xiaoyun et. al., 2005) but so far not broken on computer SHA-2 (2001) SHA-256, SHA-224 SHA-512, SHA-384 9
Cryptographic Hash Function NIST SHA-3 competition (2008 2012) Due to the fear that the attacks against MD5 and SHA-1 may be extended to break SHA-2 64 submissions 51 candidates in round 1 14 candidates in round 2 Now 5 finalists in round 3 (final round): Blake, Grostl, JH, Keccak, Skein 10
Cryptographic Hash Function Hash function and the recent cyber attack Flame Detected by Iran CERT in May 2012 Advanced espionage malware MD5 collision is exploited in Flame: (wikipedia) Microsoft Terminal Server Licensing Service certificate still uses the MD5 Produce a counterfeit certificate that was used to sign some components of the malware to make them appear to have originated from Microsoft 11
Hash Function Design Basics A typical cryptographic hash functions involve three components: Operation mode Compression function structure Confusion-diffusion operations 12
Hash Function Design Basics Operation mode: Iterated structure Divide a message into many message blocks m = m 1 m 2 m 3. Hash each message block iteratively: H 0 = IV (here IV is a fixed constant) H i = f(h i-1, m i ) (f is called compression function) i i-1 i (the size of H i must be at least as large as the size of the message digest) 13
Hash Function Design Basics Operation mode: Merkle-Damgard structure (iterated) Strengthen the iterated structure with padding pad bit 1 to the end of the message pad some zeros pad the message length (in bits) After padding, the overall length should be multiple of the block size Finalization stage: process the output from the last message block, then to generate the message digest The most widely used hash function overall structure 14
Hash Function Design Basics Merkle-Damgard structure (wikipedia) 15
Hash Function Design Basics Compression function structure: Two popular structures: Davies-Meyer (MD5, SHA-1, SHA-2, ) Matyas-Meyer-Oseas 16
Hash Function Design Basics Davies-Meyer Matyas-Meyer-Oseas 17
Hash Function Design Basics Confusion-diffusion operations confusion: Sbox, addition, ADD, OR, diffusion: MDS code, rotation, permutation 18
Hash Function Design Basics What can we learn from the attacks on MD5 & SHA-1? MD5, SHA-1: Compression function: Davies-Meyer structure Confusion-diffusion: Addition-rotation-xor (ARX) 19
Hash Function Design Basics Why MDx and SHA-1 are weak? Main reason: large differential probability MD5: 2-43 for 17--64 steps (2004) SHA-1: 2-83 for 17--80 steps (2005) Why large differential probability? Main reason: weak differential propagation due to local collision Why local collision? Davies-Meyer structure => structure of compression function is important for security! 20
Hash Function Design Basics Why it took around 10 years to apply differential attacks to break MD5 and SHA-1? Main reason: it is extremely difficult to find the optimal differential path in MD5 and SHA-1 Reason 1: Davies-Meyer structure Difficult to analyze the interaction between message schedule & step functions (such as local collisions) => structure of compression function is important for simplifying security evaluation Reason 2: ARX operations Difficult to analyze the differential propagations in ARX (such as carry bits) => Confusion-diffusion methods are important for simplifying security evaluation 21
Hash Function Design Basics What can we learn from the attacks on MD5 & SHA-1? Try to find the best compression function structure & confusion-diffusion methods To simplify security evaluation difficult to analyze => usually bad for security It is better to design a cipher that can be analyzed by the designer To achieve efficient differential propagation 22
Design of JH New compression function structure Confusion & diffusion: Combining the best of AES and Serpent 23
Design of JH: compression function structure M (i) : m bits H (i) : 2m bits 24
Each finalist uses a different compression function structure: diversification Blake Davies-Meyer difficult to analyze: need to consider the interaction between differential paths in two functions Groestl JH New (based on two parallel permutations) Not that difficult to analyze, but need to consider the interaction between those two permutations New (based on single permutation) Easy to analyze Keccak Skein Sponge Easy to analyze MMO Difficult to analyze: need to consider the interaction between differential paths in two functions 25
Design of JH: Diffusion & Confusion The generalized AES design method: SPN + MDS code (to a multi-dimensional array) => A simple and flexible approach to design a large permutation (block cipher) from small components by increasing dimension Examples: AES (2D, 128 bits) => 3D (512 bits) => 4D (2048 bits); JH (8D, 1024 bits) bit-slice 26
Design of JH: Diffusion & Confusion Combining the best of AES and Serpent: AES Serpent SPN + MDS code Bit-slice fast software implementation Security analysis is easy JH Table lookup is avoided to prevent cache timing attack 27
Comparison of diffusion & confusion: Sbox+MDS Groestl, JH Easy to analyze Sbox+permutation Keccak Difficult to analyze ARX Blake, Skein Difficult to analyze 28
Design of JH JH with the lowest security evaluation cost among the five finalists Compression function structure Easy to analyze Confusion & diffusion Easy to analyze I was able to finish the security analysis against differential attack before the submission in 2008 29
Security of JH The generalized AES design: SPN + MDS (to a multi-dimensional array) Advantages Analyze small functions to find the best attack Verify the attack on small functions 30
Security of JH: Large Security Margin Truncated differential attack is the most powerful attack against JH JH has large security margin against truncated differential attack that can be easily verified: Assume that message modification can remove 16 rounds, the complexity of the truncated collision attack is more than 2 512 Assume that message modification can remove 24 rounds, the complexity of the truncated collision attack is more than 2 400 31
Security comparison None of the finalist is broken Grostol & JH The best differential trail can be found Blake, Keccak, Skein Currently the best differential trail cannot be found 32
Performance: Fast software Bit-slice; suitable for the 128-bit SIMD instruction set (available on many platforms): compute 128 Sboxes in parallel compute 128 MDS codes in parallel less than 20 cycles/byte on the common Intel & AMD processors; 33
Software Implementation (Bernstein, Lange, the 3 rd SHA-3 conference) 34
Efficient Implementation: Hardware (Gaj, The 3 rd SHA-3 Conference) 35
Efficient Implementation: Hardware (Gaj, The 3 rd SHA-3 Conference) 36
Efficient Implementation Flexible design If we need a light-weight hash function Just use the 6-dimensional array in JH 4 times smaller than JH Achieve about 128-bit security for collision, preimage and second-preimage, no resistance against lengthextension The other finalists do not have such flexibility 37
Conclusion JH is a finalist of SHA-3 competition Low security evaluation cost Large security margin Efficient & flexible 38
Conclusion SHA-3 hash function competition (2008-2012) Finish soon (maybe in this August) The decision would be affected by the following factors: Software performance Hardware performance Security Completeness of security evaluation Novelty.. Whether NIST likes it or not 39
Conclusion Open problem remains How to design a hash function that is Extremely efficient in software, and easy to analyze None of the 64 submissions solves this problem 40
Thank you! Q & A 41