ZQL Work in progress a cryptographic compiler for processing private data George Danezis Joint work with Cédric Fournet, Markulf Kohlweiss, Zhengqin Luo Microsoft Research and Joint INRIA-MSR Centre
Data Privacy Privacy at odds with big data produced, processed, and stored Private data? Personal, medical, financial, legal A controversial trust issue Show-stopper when deploying new technology High potential for negative press Strong EU regulations Wanted: reliable tools for privacy-friendly data processing (as in dolphin-friendly tuna)
Example 1: Smart Metering
Privacy-Preserving Smart Metering Only the monthly fee need to be sent back to the utility (not the detailed meter readings) Smart Meter Utility Provider variable policy & rates greenenergyoptions.co.uk certified readings (private data) User, paying her monthly bill price to pay + crypto evidence
Example 2: Pay-howyou-drive insurance
Example 2: Pay-howyou-drive insurance
Example 2: Pay-howyou-drive insurance ( )
Pay-how-you-drive insurance Only the premium needs to be communicated to the insurance company certified pricing policy http://www.coverbox.co.uk/ Tracker Insurance company certified location, speed and distance (private data) User, paying her insurance fee fee to pay + crypto evidence
Many similar problems (Partly) private user data Public pricing policy Conflicting goals: Privacy for the user Concealing meter readings Concealing locations, speed and distance Integrity for the verifier Only the correct price can be proved
So far, ad hoc cryptographic solutions Bespoke privacy-preserving protocols using a mix of cryptographic mechanisms Linear policies homomorphic commitments Cumulative policies zero-knowledge proofs on CL-signatures Penalty-based policies (Pay-as-you-go)? Also many protocols for anonymous credentials, e-cash, e-voting These protocols are hard to design, implement, and deploy We cannot involve cryptographers and security experts each time we change the query or revise the service policy!
Goal: private data processing 1. A high-level language for querying data 2. An optimizing, verifying query compiler Automatically selects cryptographic constructions Generates code for different platforms Verifies its security before deployment certified public data agreement on a data query Reliable Data Providers certified private data Client, or Prover in control of her data query results + crypto evidence Service, or Verifier in need of valid results
Integrity and Privacy (Ideal) Trusted Third Party Reliable Data Providers Ok. Client, or Prover in control of her data Service, or Verifier in need of valid results
ZQL: a language for querying private data SQL [Structured Query Language, 1970 ]: a fine declarative domain-specific language for querying relational DBs ZQL [Zero-Knowledge Query Language]: a subset of SQL extended for cryptographic processing privacy annotations random sampling hash, sign, big numbers for keys, exponentials, exponents, The SQL theory carries over to ZQL, despite unusual data: Useful algebraic properties Efficient evaluation plans and representations (e.g. indexing)
Compiler Architecture Zeroknowledge proofs data privacy specification T 1 : We generate a query for each participant, with matching I/Os Tsign 1 Q(T 1 T n ) ZQL compiler Qprove query expressed in ZQL Qverify queries expressed in ZQL+crypto We emit code for each participant ZQL.fs crypto.fs runtime libraries F# generator T1.fs Qv.fs Qp.fs reference high-level code in F# T1.c Qp.c Qv.c fast, portable low-level code in C
Zero-Knowledge Crash Course! (1) Alice has a secret x and gives Bob a commitment to x, Cx. Commitments: binding and hiding properties. Public: prime p, generators g, h in Zp. Random o (opening) Pedersen Commitment Cx = g x h o mod p Note: If Cx is signed, this is an identification scheme. All our inputs are signed, and certified. Homomorphic property: Ca * Cb = C(a+b) (Ca) b = Cab
Zero-Knowledge Crash Course! (2) Alice proves to Bob that she knows a secret x in Cx (Schnorr): Alice choses random wx, wo and sends to Bob Cwx = g wx h wo mod p Bob sends a challenge c to Alice Alice replies with rx = wx c * x and ro = wo c * o Bob checks Cwx = (g rx h ro ) Cx c mod p Subtleties Family: Sigma-protocols Proof: simulation (privacy) and extraction (integrity) To prove equality between commitments, reuse wx, and rx Chain zero-knowledge proofs together to prove whole computations correct.
ZQL By example: queries and arithmetic A query is defined as an ML function (in F# syntax). Parameters are public or secret integers. The returned value is implicitly declassified. Addition and multiplication on public values and secrets. Example: Basic translation to a sigma protocol: All inputs are signed commitments. Encode all secrets as (values, Pedersen Commitments) E.g. (x, Cx = g x h ox mod p) Prove linear operations using homomorphisms of commitments Prove multiplications of secrets as: Cab = C_a b h oab = g ab h o
ZQL By example: map & fold Inputs can be tables of records containing public or secret integers. Map: Takes a table, applies an expression to all its rows, and returns a table Fold: takes and accumulator and a table, and applies a function to accumulate a value over the table. Example: Sum of the product of two columns of a table. Note: optimize to not use sigma-protocols.
ZQL By example: findt Lookup tables can be used in the query. Restriction: need to be certified by 3 rd party. Non-linear ops and quantized functions. Findt operator: allows a lookup into a table matches and returns the first valid row. Example: Compilation: Lookup is implemented as a proof of knowledge or a re-randomizable (CL) signature.
ZQL By example: pay-as-you-go
Compilation Example: Given a table of (times, readings) from a meter, and a table of (readings, prices) from a tariff policy: Compute the bill as the sum of the prices corresponding to the readings for all times. Source Query from programmer Input tables for R and T For each reading Lookup its price Accumulate the sum Return sum
ZQL compiled prover for Q Source Query + Cryptographic evidence that the computation is correct Input tables for R and T Accumulate the sum For each reading Lookup its price Return sum
Expressiveness & Performance When operating on secrets, we support a fragment of SQL 1. Tables of pub / secrets 2. Linear expressions 3. Polynomials expressions 4. Table lookups 5. Inequalities Performance dominated by bignum multiplications! Some numbers Current limitation: The shape of intermediate tables must be public
Preliminary performance If RSA modulus larger than 1024 bits use Pairing based crypto. Verification is up to twice as expensive as proof. Pairing based proofs are shorter. Symbolic evaluator should guide compilation.
Security Verification Our compiler automatically generate complex protocol implementations Hard to test or review How can the user tell whether her privacy is preserved? T 1 : Q(T 1 T n ) ZQL compiler F# generator Our compiler then calls independent, automated verification tools to prove that these implementations are secure T1.fs Qv.fs Qp.fs reference implementation in F# Privacy: the service learns nothing more than the query result Integrity: the user can build evidence only for the correct result
Security Verification (2003 ) We develop a cryptographic verification kit for verifying new implementations of security protocols [with A.D. Gordon, K. Bhargavan, and MSR-INRIA] Tools: ProVerif, FS2PV, FS2CV, F7, F* Mostly for F#, with experiments for C and C# We can automatically verify large implementations against precise cryptographic security assumptions Probabilistic information security: no secret information flows to the adversary Computational security: except with a negligible probability, the adversary cannot Verification case studies: Web services security TLS 1.2 Internet Standard DKM [with T. Acar and D.Shumow] for cloud data security
Towards a certifying ZQL Compiler We also generate proof goals and type annotations to keep track of query evaluations T 1 : ZQL compiler Q(T 1 T n ) ZQL.fs7 Crypto.fs7 F7 generator T1.fs7 Qv.fs7 Qp.fs7 We use F7 and Z3 to automatically prove that implementations conform with their specifications F# generator T1.fs Qv.fs Qp.fs typed specification in F7 reference implementation in F# F7 typing We get either a compile-time error (bug) or strong integrity & privacy theorems
Theorems Correctness. For any given source inputs, the sequential composition of the cryptographic queries for the data providers, the user, and the verifier yields the same result as the source query. Integrity. An adversary given the capabilities of the user cannot get the verifier to accept any other result except with a negligible probability. Privacy. An adversary given the capabilities of the verifier, able to choose any two collections of input tables such that the source query yields the same result, and given the result of the users s cryptographic query, cannot tell which of the two inputs was used except with a negligible advantage.
Intermediate table lookups. Flexible data providers. Endorsed computations. Optimization: Secret / public propagation. Cost based primitive selection. Where next? Support for modularity and functions. Batch proofs & verification. Compilation to javascript. Beyond sigma protocols.
ZQL (Summary) 1. With ZQL, clients process their own private data, services still get correct results 2. The programmer specifies SQL-like queries & privacy goals; we compile them into zero-knowledge cryptographic protocols 3. The security of fresh protocol implementations can be automatically verified (at compile-time) under standard cryptographic assumptions (Technical report due out in January)