Processing encrypted signals A new frontier for Multimedia security Mauro Barni University of Siena
Summary Motivations Secure watermarking Further (even more interesting) examples The dawn of a new discipline? An information-theoretic paradox Three (or four) paradigms DSC (again) Homomorphic signal processing Is interaction the solution? Conclusions
Motivations The advantages offered by the availability of tools that can process encrypted data is evident I will further support this necessity through a few examples starting from watermarking
Watermarking encrypted data A known problem with plain fingerprinting is that buyer s rights are not considered hence undermining the validity of the scheme B 2 B n B 1 A+w 1 A+w 1 A+w 1 Seller A buyer whose watermark is found in an unauthorized copy can not be inculpated since he/she can claim that the unauthorized copy was created and distributed by the seller
Watermarking encrypted data is a solution E K B [ w ] B EK B [ A] E [ w ] = E [ A w ] K B K B B B Aw = A wb = EK' [ EK [ A wb]] b B B The Then and buyer seller sends it encrypts adds it (mix) to the the buyer, watermark document encrypted who with can watermark decrypt his the ID public by to it the using with key encrypted his of private the public buyer document key
Zero knowledge watermarking A prover wants to prove that a watermark is present in a document without revealing the watermark itself Assume a simple correlation based detector is used ρ n? = xw i i i= 1 T i) calculate E[ρ] by knowing only E[w] (and E[x]) ii) compare ρ with the threshold T by knowing only E[ρ] Several solutions based on homomorphic encryption and Zero Knowledge protocols have been proposed.
Multiparty Computation (MC) In MC two participants computes the output of f(x 1,x 2 ). Each party knows one of the inputs, and does not want to reveal it to the other In our case 1 if ρ T f( ρ, T) = 0 otherwise T could also be made public This is a particular instance of the Millionaire s problem solved by Yao in 1982: A.C. Yao, Protocols for secure computations. In Proceedings of Twenty-third IEEE Symposium on Foundations of Computer Science, pages 160 164, Chicago, Illinois, November 1982.
Medical diagnosis in a trusted world Leackeage of sensitive information is possible. Privacy relies on ethichal behaviour of involved personnel
Medical diagnosis in an untrusted world 01110100010 11001010100 00110010101 11111101010 0111010001 1100101010 Leackeage of sensitive information is prevented thus ensuring a higher level of privacy
Coding / transcoding encrypted signals 0111010001 1100101010 If coding / transcoding is necessary the encryption key must be shared with the network node, undermining the security of the system If the node can code / transcode the multimedia content without first decrypting it, a the security of the system would increase singificantly
and many others Searching an encrypted database Encrypted data mining Exploting the knowledge of someone you don t trust Secure (privacy preserving) Artificial Intelligence tools
The dawn of a new discipline? There s enough social and industrial request to justify the birth of a new discipline: s.p.e.d. - signal processing in the encrypted domain From scattered studies to a thorough understanding of limits and trade-offs Formidable research challenges Theoretical feasibility of s.p.e.d. Computational feasibility
An information-theoretic paradox Given a source with an alphabet X, a cryptosystem of length n and rate R is a triple (K, E, D) composed by A key alphabet K from which keys are randomly selected An encoding function E: X n x K {1,2 2 nr } A decoding function D: {1,2 2 nr } x K X n The effectiveness of the cryptosystem is measured according to the following criteria Secrecy against eavesdropping Rate of the encrypted signal: R Length of the secret Key (size of K )
An information-theoretic paradox In 1949, Shannon gave a very elegant and precise definition of the security of a cryptosystem Let X be the plain message and B be the output of the encoder function The cryptosystem is perfectly secure if I [ XB ; ] = 0 A somewhat weaker notion of security due to Wyner requires that the above is valid asymptotically as n tends to infinity Is any s.p.e.d. operation possible at all in a perfectly secure cryptosystem?
The way out Luckily it is possible to get around the apparent information-theoretic paradox in several ways. Trying to summarize the approaches proposed so far we can identify three s.p.e.d. paradigms Distributed processing (DSC again) For a limited range of applications Partial / selective encryption Homomorphic encryption At the price of reduced security Interactive computation ZKP and multiparty computation At the price of increased complexity
The DSC paradigm (the lossless case) Bernoulli p = 0.1 X Bernoulli p = 0.5 K B Lossless Compression Decoding X K It is a typical example of source coding with side information at the decoder (D. Slepian, J. K. Wolfe, Noiseless coding of correlated information sources, IEEE Trans. Information Theory, vol. 19, pp. 471-480, July 1973) The conditional entropy H(B K) is exactly equal to H(X), hence B can be coded at the same rate of a coder operating on the plain sequence It can be shown that nothing is lost from a security point of view (M. Johnson et al. On compressin encrypted data, IEEE trans. Information Theory, vol. 52, no. 10, Oct. 2003)
The DSC paradigm (the lossy case) Gaussian, σ 1 X Gaussian, σ 2 K Y Wyner-Ziv compression Decoding - X K It is a typical example of lossy source coding with side information at the decoder (A. Wyner, J. Ziv, The rate-distortion function for source coding with side information at the decoder, IEEE Trans. Information Theory, vol. 22, pp. 1-10, Jan. 1976) Even in this case it can be shown that nothing is lost from a security and a compression point of view (M. Johnson et al. On compressin encrypted data, IEEE trans. Information Theory, vol. 52, no. 10, Oct. 2003)
Secure transcoding of encrypted data A more classical approach is based on progressive encryption of scalable video The bistream is split into segments (layers). The basic layer allows a lowquality reconstruction of the video. Adding new layers improves the quality of the video L4 L3 L2 L1 Enhancement layers Basic layer Header Layers are encrypted sequentially through a cypher block chain. Block n is encrypted by relying on the data contained in block n-1. Transcoding of the encrypted data is possible by simply truncating the bit stream.
The homomorphic paradigm Perfect security is not reachable Key lenght equal to message length Computational security Breaking the cryptosystem is possible, but computationally unfeasible Many modern cryptosystem are structured enough to allow some operations to be performed directly in the encrypted domain.
The homomorphic paradigm Suppose we have a cryptosystem for which elementary operations in the plain domain are mapped into simple operations on the encrypted data, for instance E[ x+ y] = E[ x] + E[ y] or E[ x+ y] = E[ x] E[ y] Eax [ ] = aex [ ] or Eax [ ] = Ex [ ] a Then certain operations can be performed in the encrypted domain, e.g.: n n x E [ ρ] = E [ xw] = E [ xw] = E [ w] i PK PK i i PK i i PK i i= 1 i= 1 i= 1 Aren t we loosing security? It can be shown that preserving {+,-,/,*} is not possible without loosing security: the more operations are preserved the less security we have Luckily some popular (and secure) homomorphic cryptosystems exist: RSA, Paillier n
Probabilistic encryption In addition to the homomorphic property, randomness of the encryption scheme is needed for secure componentwise encryption Assume we want to componentwise encrypt a sequence of bits some sort of randomness is needed In a probabilistic encryption scheme the encrypted message depends on a secret key and a random parameter r c = E [ x, r] 1 pk 1 c = E [ x, r ] 2 pk 2 however decription does depend on r x= D [ c ] sk 1 x= D [ c ] sk 2
Probabilistic encryption Strange as it may seem, homomorphic probabilistic schemes exist The space of the encrypted signals must be much larger than that of the plaintext Expansion factor: huge in first schemes, improved recently The first probabilsitic ecnryption scheme was described in S. Goldwasser and S. Micali, Probabilistic Encryption JCSS Vol. 28 No 2, pp. 270-299, 1984. The most popular one is due to Paillier, P. Pailler. Public-key cryptosystems based on composite degree residuosity classes. In Proceedings of Eurocrypt 99, Lecture Notes is Computer Science vol. 1592, pages 223 238. Springer-Verlag, 1999.
Quadratic Residuosity Problem Given x (1 x n) decide if exists y such that: x = y 2 mod n it s hard as finding a factorization for n. If such an y exists x is said to be a quadratic residue modulo n One can define E pk : {0,1} [0,n] the function that maps 0 s in random square, 1 s in random non square. of course D sk : [0,n] [0,1] the function that decide if x is a square or not.
Example Alice s public key is (y,n). y is a random non-quadratic residue in [1,n] n composite integer: n = pq Alice s private key is (p,q) Note: if x is a QR and y is a NQR, xy is a NQR Encoding: Bob select a random x in [1,n] If m=0 c=x 2 mod n (c is a QR) Else c=yx 2 mod n (c is a NQR) Bob sends c to Alice Decoding: Alice decides if c is a QR or a NQR, (that is easy knowing the factorization of n)
Is interaction the key? s.p.e.d. possibilities greatly increase if we allow interaction between the untrusted party ZK proofs rely on this principle ALI BABA s cave A secret door that can be opened by a password. Peggy knows the password of the door, and wants to convince Victor that she knows it, but doesn't want Victor to know the password itself. cave entry right branch secret door
Is interaction the key? ALI BABA s cave Peggy goes into a random branch, which Victor doesn't know. Vic calls out a branch, where Peggy should come out. If Peggy knows the secret, she can come out the right way every time. If Peggy doesn't know the password, she has a 50% chance of initially going into the wrong branch, so Vic can call her bluff. left branch Cave entry secret door
Zero knowledge protocols Zero knowledge protocols belongs to the class of intercative proof systems challenge answer At the end of the proof, Vic will accept or reject, depending on whether or not Peggy successfully answered to Vic's challenges
Is interaction the key? Multi-party computation (MPC) is another way to exploit interaction to process encrypted data To give an idea of how it may work consider the Millionaire s problem Suppose that: RD has I millions (say 5), US has J (say 6) PK and SK = RD s RSA keys Wealth is an integer in [0,10]
The millionaire problem US takes a random number X (say 1234), computes C = E PK [X] and transmits to RD the value C - J (say 896) RD generates 10 numbers C -J + U (U = 1 10) and decrypts them with his SK RD computes Z U = Y u mod p, and adds 1 from the I+1- th position on. Then he sends the table to US U (C - J + U) decryption Y U 1 897 E SK [896] 1059 2 898 E SK [896] 1156 3 899 E SK [896] 2502.... 6 902. 1234.... 10 906 E SK [896] 1311 Z U 1945 1345 1190. 4352. 1967 W U 1945 1345 2501. +1 4353 +1 +1 1968 US computes X mod p = G and compares it with the J-th (6-th) position in the table. If W U (J) > G, then US is richer, otherwise US is NOT richer and tells the result to RD.
The millionaire problem remarks To avoid that US cheats some modifications must be made Many MC schemes assume semi-honest palyers A ZK protocol may be required to ensure that the players correctly apply the protocol The complexity grows with the required resolution It is an example of the classical trade-off between flexibility and complexity
MP computation Interestingly it has been shown that MPC can be applied to any function f(x 1,x 2 x n ) It only needs to show that a MPC protocol exists to securely compute the output of a universal logical port, e.g. the NAND port (O. Goldreich, S. Micali, and A. Wigderson. How to play any mental game or a completeness theorem for protocols with honest majority. In STOC, pages 218 229. ACM, 198) The challenge is to develop efficient MPC protocols.
MP computation Efficient MPC protocols exist for the following functions Solution of the equation ( M + M ) x= b + b 1 2 1 2 Mean and standard deviation of concatenated vectors (x y) Distance metrics x-y 2 Scalar product between vectors n xy i i i=1
Conclusions and Research challenges No doubts that s.p.e.d. is a very hot research topic Analyse the potentiality of the various s.p.e.d. paradigms What can and what can not be done Develop efficient s.p.e.d. tools Basic tools Protocols Application to real scenarios Efficient (and secure) application of cryptographic tools to realvalued signals Let perception play a role
Conclusions and Research challenges Investigate the trade-off between the various corners of the problem Flexibility vs security vs complexity Develop a general s.p.e.d. theory SPEED Project, VI FP, FET scheme (2006-2009) Università degli Studi di Siena (UNISI) Delft University of Technology (TUD) Ruhr-Universitaet Bochum (RUB) Katholieke Universiteit Leuven (KUL) Università degli Studi di Firenze (UNIFI) Philips Electronics Nederland B.V. (Philips)