Probabilistic Assertions



Similar documents
Expressing and Verifying Probabilistic Assertions

EMSCRIPTEN - COMPILING LLVM BITCODE TO JAVASCRIPT (?!)

This can dilute the significance of a departure from the null hypothesis. We can focus the test on departures of a particular form.

Bayesian networks - Time-series models - Apache Spark & Scala

Programming the Internet of Uncertain <T>hings

Boogie: A Modular Reusable Verifier for Object-Oriented Programs

KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS

Compiling Object Oriented Languages. What is an Object-Oriented Programming Language? Implementation: Dynamic Binding

Not agree with bug 3, precision actually was. 8,5 not set in the code. Not agree with bug 3, precision actually was

Uncertain<T>: A First-Order Type for Uncertain Data

Formal Verification and Linear-time Model Checking

Understand the role that hypothesis testing plays in an improvement project. Know how to perform a two sample hypothesis test.

Basics of Statistical Machine Learning

A) B) C) D)

Towards practical reactive security audit using extended static checkers 1

MPI-Checker Static Analysis for MPI

Scoping (Readings 7.1,7.4,7.6) Parameter passing methods (7.5) Building symbol tables (7.6)

Software Testing & Analysis (F22ST3): Static Analysis Techniques 2. Andrew Ireland

Database Security. The Need for Database Security

Regression Verification: Status Report

Embedded Systems. Review of ANSI C Topics. A Review of ANSI C and Considerations for Embedded C Programming. Basic features of C

InvGen: An Efficient Invariant Generator

A Static Analyzer for Large Safety-Critical Software. Considered Programs and Semantics. Automatic Program Verification by Abstract Interpretation

A Bayesian Approach for on-line max auditing of Dynamic Statistical Databases

Graph Analytics in Big Data. John Feo Pacific Northwest National Laboratory

Tachyon: a Meta-circular Optimizing JavaScript Virtual Machine

Project Report BIG-DATA CONTENT RETRIEVAL, STORAGE AND ANALYSIS FOUNDATIONS OF DATA-INTENSIVE COMPUTING. Masters in Computer Science

Software Metrics in Static Program Analysis

Unified Static and Runtime Verification of Object-Oriented Software

Introduction to Hypothesis Testing. Hypothesis Testing. Step 1: State the Hypotheses

SYSM 6304: Risk and Decision Analysis Lecture 5: Methods of Risk Analysis

Polarization codes and the rate of polarization

IKOS: A Framework for Static Analysis based on Abstract Interpretation (Tool Paper)

Clock Synchronization

Cost Model: Work, Span and Parallelism. 1 The RAM model for sequential computation:

Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification

CIS 631 Database Management Systems Sample Final Exam

Big Data and Big Analytics

Supplement to Call Centers with Delay Information: Models and Insights

Technical paper review. Program visualization and explanation for novice C programmers by Matthew Heinsen Egan and Chris McDonald.

Static Analysis. Find the Bug! : Analysis of Software Artifacts. Jonathan Aldrich. disable interrupts. ERROR: returning with interrupts disabled

Practice problems for Homework 12 - confidence intervals and hypothesis testing. Open the Homework Assignment 12 and solve the problems.

Hypothesis Testing for Beginners

Institut für Parallele und Verteilte Systeme. Abteilung Anwendersoftware. Universität Stuttgart Universitätsstraße 38 D Stuttgart

RevoScaleR Speed and Scalability

Simulation Exercises to Reinforce the Foundations of Statistical Thinking in Online Classes

A Case Study in Software Enhancements as Six Sigma Process Improvements: Simulating Productivity Savings

Lecture 1: Oracle Turing Machines

Attribution. Modified from Stuart Russell s slides (Berkeley) Parts of the slides are inspired by Dan Klein s lecture material for CS 188 (Berkeley)

Independent t- Test (Comparing Two Means)

Eurovent Certification for Air Handling Units : Five Energy Efficiency classes to make the right choice

Learning outcomes. Knowledge and understanding. Competence and skills

Wave Analytics Data Integration

CARAMEL: Detecting and Fixing Performance Problems That Have Non-Intrusive Fixes

Detection of changes in variance using binary segmentation and optimal partitioning

StaRVOOrS: A Tool for Combined Static and Runtime Verification of Java

BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts

Analysis report examination with CUBE

Best Practises for LabVIEW FPGA Design Flow. uk.ni.com ireland.ni.com

6.088 Intro to C/C++ Day 4: Object-oriented programming in C++ Eunsuk Kang and Jean Yang

1 Abstract Data Types Information Hiding

HIGH PERFORMANCE BIG DATA ANALYTICS

Database Design Patterns. Winter Lecture 24

Network Algorithms for Homeland Security

An Open Framework for Reverse Engineering Graph Data Visualization. Alexandru C. Telea Eindhoven University of Technology The Netherlands.

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research

A Brief Introduction to Static Analysis

Take An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc

Life of A Knowledge Base (KB)

Data Structures and Algorithms Written Examination

First Java Programs. V. Paúl Pauca. CSC 111D Fall, Department of Computer Science Wake Forest University. Introduction to Computer Science

Fully Automated Static Analysis of Fedora Packages

Cloud Computing. Up until now

Why? A central concept in Computer Science. Algorithms are ubiquitous.

THEMIS: Fairness in Data Stream Processing under Overload

The C Programming Language course syllabus associate level

HiBench Installation. Sunil Raiyani, Jayam Modi

How to make the computer understand? Lecture 15: Putting it all together. Example (Output assembly code) Example (input program) Anatomy of a Computer

Reminder: Complexity (1) Parallel Complexity Theory. Reminder: Complexity (2) Complexity-new

When COTS is not SOUP Commercial Off-the-Shelf Software in Medical Systems. Chris Hobbs, Senior Developer, Safe Systems

Big Data and Analytics by Seema Acharya and Subhashini Chellappan Copyright 2015, WILEY INDIA PVT. LTD. Introduction to Pig

Techniques for Supporting Prediction of Security Breaches in. Critical Cloud Infrastructures Using Bayesian Network and. Markov Decision Process

Part 2: Community Detection

3. The Junction Tree Algorithms

APPROACHES TO SOFTWARE TESTING PROGRAM VERIFICATION AND VALIDATION

Scalability. Microsoft Dynamics GP Benchmark Performance: Advantages of Microsoft SQL Server 2008 with Compression.

Simulation-Based Security with Inexhaustible Interactive Turing Machines

Integration Technologies Group (ITG) ITIL V3 Service Asset and Configuration Management Assessment Robert R. Vespe Page 1 of 19

Outline Introduction Circuits PRGs Uniform Derandomization Refs. Derandomization. A Basic Introduction. Antonis Antonopoulos.

A Statistical Framework for Operational Infrasound Monitoring

bigdata Managing Scale in Ontological Systems

Project management: a simulation-based optimization method for dynamic time-cost tradeoff decisions

KS3 Computing Group 1 Programme of Study hours per week

Transcription:

Expressing and Verifying Probabilistic Assertions Adrian Sampson Pavel Panchekha Todd Mytkowicz Kathryn S. McKinley Dan Grossman Luis Ceze University of Washington Microsoft Research University of Washington PLDI 2014

Probabilistic assertions express correctness properties in modern software. Our verifier checks them efficiently and accurately.

assert file!= NULL test verify check

assert efile!= NULL e must hold on every execution

Approximate Computing this approximate image is close to its precise version k-means clustering is likely to converge even on unreliable hardware assert e e Obfuscation for Data Privacy obfuscated data is still useful in aggregate Mobile and Sensing sensor error does not render the app s conclusions useless

Approximate Computing this approximate image is close to its precise version k-means clustering is likely to converge even on unreliable hardware assert e Traditional assertions are insufficient e for programs with probabilistic behavior. Obfuscation for Data Privacy obfuscated data is still useful in aggregate Mobile and Sensing sensor error does not render the app s conclusions useless

Assertions are insufficient for private-data obfuscation true_avg = average(salaries)! private_avg =! average(obfuscate(salaries))! assert true_avg - private_avg! <= 10,000

Assertions are insufficient for private-data obfuscation true_avg = average(salaries)! private_avg =! average( obfuscate(salaries))! assert true_avg - private_avg! <= 10,000 probability distribution

Assertion assert e

Probabilistic assertion p assert e, p, c

Probabilistic assertion p assert e, p, c e must hold with probability p at confidence c

Probabilistic assertion p assert e, p, c test? verify? check?

How to verify a probabilistic assertion probabilistic program float obfuscated(float n) {! return n + gaussian(0.0, 1000.0);! }! float average_salary(float* salaries) {! total = 0.0;! for (int i = 0; i < COUNT; ++i)! total += obfuscated(salaries[i]);! avg = total / len(salaries);! p_avg =...; passert e, p, c }?

How to verify a probabilistic assertion naively probabilistic program float obfuscated(float n) {! return n + gaussian(0.0, 1000.0);! }! float average_salary(float* salaries) {! total = 0.0;! for (int i = 0; i < COUNT; ++i)! total += obfuscated(salaries[i]);! avg = total / len(salaries);! p_avg =...; passert e, p, c? }

How to verify a probabilistic assertion with statistical reasoning queries & inference passert for statistical models for probabilistic software Church Infer.NET [Sankaranarayanan+ PLDI 2013] [Hur+ PLDI 2014]?

How to verify a probabilistic assertion efficiently and accurately distribution extraction via symbolic execution statistical verification optimizations float obfuscated(float n) {! return n + gaussian(0.0, 1000.0);! }! float average_salary(float* salaries) {! total = 0.0;! for (int i = 0; i < COUNT; ++i)! total += obfuscated(salaries[i]);! avg = total / len(salaries);! p_avg =...; passert e, p, c } Bayesian network IR

How to verify a probabilistic assertion efficiently and accurately distribution extraction via symbolic execution statistical verification optimizations float obfuscated(float n) {! return n + gaussian(0.0, 1000.0);! }! float average_salary(float* salaries) {! total = 0.0;! for (int i = 0; i < COUNT; ++i)! total += obfuscated(salaries[i]);! avg = total / len(salaries);! p_avg =...; passert e, p, c } Bayesian network IR implementation for LLVM & Clang

How to verify a probabilistic assertion efficiently and accurately distribution extraction via symbolic execution statistical verification optimizations float obfuscated(float n) {! return n + gaussian(0.0, 1000.0);! }! float average_salary(float* salaries) {! total = 0.0;! for (int i = 0; i < COUNT; ++i)! total += obfuscated(salaries[i]);! avg = total / len(salaries);! p_avg =...; passert e, p, c } Bayesian network IR implementation for LLVM & Clang

Distribution extraction: random draws are symbolic symbolic heap a 4.2 b = a + gaussian(0.0, 1.0) a 4.2 b 4.2 + G0,1

Concrete vs. symbolic semantics program + input nondeterministic concrete execution outputs

Concrete vs. symbolic semantics program + input nondeterministic concrete execution outputs program + input deterministic symbolic execution nondeterministic sampling outputs

a 4.2 b G 0,1 input: a = 4.2! b = gaussian(0.0, 1.0)

a 4.2 b G 0,1 input: a = 4.2! b = gaussian(0.0, 1.0)! c = a + b c +

a 4.2 b G 0,1 input: a = 4.2! b = gaussian(0.0, 1.0)! c = a + b! d = c + b c + d +

input: a = 4.2! b = gaussian(0.0, 1.0)! c = a + b! d = c + b d + c + a 4.2 b G 0,1

input: a = 4.2! b = gaussian(0.0, 1.0)! c = a + b! d = c + b! if b > 0.5! e = 2.0! else! e = 4.0 d + if c + > a 4.2 G 0,1 0.5 b e? then 2.0 else 4.0

input: a = 4.2! b = gaussian(0.0, 1.0)! c = a + b! d = c + b! if b > 0.5! e = 2.0! else! e = 4.0! passert e <= 3.0,! 0.9, 0.9 d + 3.0 e? c + > if then else a 4.2 G 0,1 0.5 2.0 b 4.0

input: a = 4.2! b = gaussian(0.0, 1.0)! c = a + b! d = c + b! if b > 0.5! e = 2.0! else! e = 4.0! passert e <= 3.0,! 0.9, 0.9 + 3.0? if then + > 4.2 G 0,1 0.5 2.0 else 4.0

input: a = unif(2.0, 4.2! 9.0) b = gaussian(0.0, 1.0)! c = a + b! d = c + b! if b > 0.5! e = 2.0! else! e = 4.0! passert e <= 3.0,! 0.9, 0.9 + 3.0? if + > then else 4.2 G 0,1 0.5 2.0 4.0

concrete input salary = $24,000 input distribution salary = uniform( ) testing static analysis

More in the paper Arrays & pointers Loops External code Probabilistic path pruning

Distribution extraction produces an expression dag Bayesian network > + + 4.2 0.5 G 0,1

Distribution extraction produces an expression dag Bayesian network 4.2 G 0,1 + + 0.5 >

Distribution extraction produces an expression dag Bayesian network nodes: random variables 4.2 G 0,1 edges: dependence + directed & acyclic + 0.5 random draws only at leaves > sample in a single pass

distribution extraction via symbolic execution statistical verification optimizations float obfuscated(float n) {! return n + gaussian(0.0, 1000.0);! }! float average_salary(float* salaries) {! total = 0.0;! for (int i = 0; i < COUNT; ++i)! total += obfuscated(salaries[i]);! avg = total / len(salaries);! p_avg =...; passert e, p, c } Bayesian network IR implementation for LLVM & Clang

statistical property passert verifier optimization

Bayesian-network IR enables new optimizations G Gʹ Gʹʹ + X G(µ X, Y G(µ Y, Z = X + Y 2 X) 2 Y ) ) Z G(µ X + µ Y, 2 X + 2 Y )

Bayesian-network IR enables new optimizations c U Uʹ X U(a, b) Y = cx ) Y U(ca, cb)

Bayesian-network IR enables new optimizations U c B X U(a, b) Y X apple c a apple c apple b c ) Y B b a a

Central Limit Theorem collapses large sums D D D D D D D G + X 1,X 2,...,X n D Y = X i X i ) Y G(nµ D,n 2 D)

distribution extraction via symbolic execution statistical verification optimizations float obfuscated(float n) {! return n + gaussian(0.0, 1000.0);! }! float average_salary(float* salaries) {! total = 0.0;! for (int i = 0; i < COUNT; ++i)! total += obfuscated(salaries[i]);! avg = total / len(salaries);! p_avg =...; passert e, p, c } Bayesian network IR implementation for LLVM & Clang

Verification via direct evaluation D D D D D D D + c B

Verification via hypothesis testing D3 G 0,1 +, p, c p μ D 2 c >

distribution extraction via symbolic execution statistical verification optimizations float obfuscated(float n) {! return n + gaussian(0.0, 1000.0);! }! float average_salary(float* salaries) {! total = 0.0;! for (int i = 0; i < COUNT; ++i)! total += obfuscated(salaries[i]);! avg = total / len(salaries);! p_avg =...; passert e, p, c } Bayesian network IR implementation for LLVM & Clang

Probabilistic assertions for C and C++.c LLVM IR LLVM IR Native Code strawman stress-tester

Probabilistic programs used in the evaluation sensing gpswalk salary privacy salary-abs kmeans approximate computing sobel hotspot inversek2j

Running time vs. stress testing analyze sample time relative to baseline 1.2 1.0 0.8 0.6 0.4 0.2 0.0 B B B B B B B B gpswalk salary salary-abs kmeans sobel hotspot inversek h.mean baseline

Running time vs. stress testing analyze sample time relative to baseline 1.2 1.0 0.8 0.6 0.4 0.2 0.0 B N B N B N B N B N B N B N B N gpswalk salary salary-abs kmeans sobel hotspot inversek h.mean baseline no statistical optimizations

Running time vs. stress testing analyze sample time relative to baseline 1.2 1.0 0.8 0.6 0.4 0.2 0.0 gpswalk salary salary-abs kmeans sobel hotspot inversek h.mean B N O B N O B N O B N O B N O B N O B N O B N O optimized 24 faster than baseline verifier on average Mostly analysis time

Probabilistic assertions express correctness properties in modern software. Our verifier checks them efficiently and accurately.