Detecting Pattern-Match Failures in Haskell. Neil Mitchell and Colin Runciman York University www.cs.york.ac.uk/~ndm/catch

Similar documents

Haskell Programming With Tests, and Some Alloy

Termination Checking: Comparing Structural Recursion and Sized Types by Examples

Visualizing Information Flow through C Programs

The countdown problem

Type Classes with Functional Dependencies

Chapter 7: Functional Programming Languages

Automated Program Behavior Analysis

The Clean programming language. Group 25, Jingui Li, Daren Tuzi

ML for the Working Programmer

Regular Expressions and Automata using Haskell

Programming Languages

Statically Checking API Protocol Conformance with Mined Multi-Object Specifications Companion Report

Introduction to Static Analysis for Assurance

CSc 372. Comparative Programming Languages. 21 : Haskell Accumulative Recursion. Department of Computer Science University of Arizona

Functional Programming

CS 103X: Discrete Structures Homework Assignment 3 Solutions

Functional Programming in C++11

Gluing things together with Haskell. Neil Mitchell

Software Testing & Analysis (F22ST3): Static Analysis Techniques 2. Andrew Ireland

SQL INJECTION ATTACKS By Zelinski Radu, Technical University of Moldova

Software Engineering Techniques

Regression Verification: Status Report

Outline. 1 Denitions. 2 Principles. 4 Implementation and Evaluation. 5 Debugging. 6 References

CSE 308. Coding Conventions. Reference

Platform-independent static binary code analysis using a metaassembly

POLYTYPIC PROGRAMMING OR: Programming Language Theory is Helpful

Art of Code Front-end Web Development Training Program

Software Engineering

A binary search tree or BST is a binary tree that is either empty or in which the data element of each node has a key, and:

Lecture 10: Multiple Clock Domains

Algorithms and Data Structures

The Union-Find Problem Kruskal s algorithm for finding an MST presented us with a problem in data-structure design. As we looked at each edge,

recursion, O(n), linked lists 6/14

Computing Concepts with Java Essentials

Static Analysis. Find the Bug! : Analysis of Software Artifacts. Jonathan Aldrich. disable interrupts. ERROR: returning with interrupts disabled

Testing and Tracing Lazy Functional Programs using QuickCheck and Hat

Software Verification and Testing. Lecture Notes: Temporal Logics

Affdex SDK for Android. Developer Guide For SDK version 1.0

Towards practical reactive security audit using extended static checkers 1

A Labeling Algorithm for the Maximum-Flow Network Problem

Factoring & Primality

1. The memory address of the first element of an array is called A. floor address B. foundation addressc. first address D.

Programming Languages CIS 443

Lecture 12: Abstract data types

A Static Analyzer for Large Safety-Critical Software. Considered Programs and Semantics. Automatic Program Verification by Abstract Interpretation

Sources: On the Web: Slides will be available on:

Data Structure with C

Persistent Data Structures

Moving from CS 61A Scheme to CS 61B Java

Journal of Functional Programming, volume 3 number 4, Dynamics in ML

Programming by Contract. Programming by Contract: Motivation. Programming by Contract: Preconditions and Postconditions

Testing and Inspecting to Ensure High Quality

Course: Programming II - Abstract Data Types. The ADT Stack. A stack. The ADT Stack and Recursion Slide Number 1

Big Data & Scripting Part II Streaming Algorithms

Party Time! Computer Science I. Signature Examples PAIR OBJ. The Signature of a Procedure. Building a Set. Testing for Membership in a Set

C Compiler Targeting the Java Virtual Machine

Mathematical Induction

RUnit - A Unit Test Framework for R

Bounded Cost Algorithms for Multivalued Consensus Using Binary Consensus Instances

Rigorous Software Development CSCI-GA

Adding GADTs to OCaml the direct approach

Chapter 6: Episode discovery process

The C Programming Language course syllabus associate level

Inflation. Chapter Money Supply and Demand

Searching Algorithms

How To Understand The Theory Of Computer Science

Random vs. Structure-Based Testing of Answer-Set Programs: An Experimental Comparison

Programming Language Rankings. Lecture 15: Type Inference, polymorphism & Type Classes. Top Combined. Tiobe Index. CSC 131! Fall, 2014!

PHPUnit and Drupal 8 No Unit Left Behind. Paul Mitchum

InvGen: An Efficient Invariant Generator

Boolean Expressions, Conditions, Loops, and Enumerations. Precedence Rules (from highest to lowest priority)

Variable Base Interface

University of Toronto Department of Electrical and Computer Engineering. Midterm Examination. CSC467 Compilers and Interpreters Fall Semester, 2005

CS 6371: Advanced Programming Languages

What's new in 3.0 (XSLT/XPath/XQuery) (plus XML Schema 1.1) Michael Kay Saxonica

Paillier Threshold Encryption Toolbox

Recursion. Definition: o A procedure or function that calls itself, directly or indirectly, is said to be recursive.

MPI-Checker Static Analysis for MPI

2! Multimedia Programming with! Python and SDL

Semester Review. CSC 301, Fall 2015

Programming and Reasoning with Side-Effects in IDRIS

The Goldberg Rao Algorithm for the Maximum Flow Problem

Replication on Virtual Machines

Cloud Haskell. Distributed Programming with. Andres Löh. 14 June 2013, Big Tech Day 6 Copyright 2013 Well-Typed LLP. .Well-Typed

Deterministic Discrete Modeling

TA: Matt Tong Tues EBU3b B225 - Office hour WLH Discussion Thurs EBU3b B250/B250A - Lab/Office hour

Optimizing Generation of Object Graphs in Java PathFinder

Binary Trees and Huffman Encoding Binary Search Trees

Increasing for all. Convex for all. ( ) Increasing for all (remember that the log function is only defined for ). ( ) Concave for all.

Loop Invariants and Binary Search

Recursion and Recursive Backtracking

PROBLEM SOLVING SEVENTH EDITION WALTER SAVITCH UNIVERSITY OF CALIFORNIA, SAN DIEGO CONTRIBUTOR KENRICK MOCK UNIVERSITY OF ALASKA, ANCHORAGE PEARSON

Transcription:

Detecting Pattern-Match Failures in Haskell Neil Mitchell and Colin Runciman York University www.cs.york.ac.uk/~ndm/catch

Does this code crash? risers [] = [] risers [x] = [[x]] risers (x:y:etc) = if x y then (x:s) : ss else [x] : (s:ss) where (s:ss) = risers (y:etc) > risers [1,2,3,1,2] = [[1,2,3],[1,2]]

Does this code crash? risers [] = [] risers [x] = [[x]] risers (x:y:etc) = if x y then (x:s) : ss else [x] : (s:ss) where (s:ss) = risers (y:etc) Potential crash > risers [1,2,3,1,2] = [[1,2,3],[1,2]]

Does this code crash? risers [] = [] risers [x] = [[x]] risers (x:y:etc) = if x y then (x:s) : ss else [x] : (s:ss) where (s:ss) = risers (y:etc) Potential crash > risers [1,2,3,1,2] = [[1,2,3],[1,2]] Property: risers (_:_) = (_:_)

Overview The problem of pattern-matching A framework to solve patterns Constraint languages for the framework The Catch tool A case study: HsColour Conclusions

The problem of Pattern-Matching head (x:xs) = x head x_xs = case x_xs of x:xs x [] error head [] Problem: can we detect calls to error

Haskell programs go wrong Well-typed programs never go wrong But... Incorrect result/actions requires annotations Non-termination cannot always be fixed Call error not much research done

My Goal Write a tool for Haskell 98 GHC/Haskell is merely a front-end issue Check statically that error is not called Conservative, corresponds to a proof Entirely automatic No annotations = Catch

Preconditions Each function has a precondition If the precondition to a function holds, and none of its arguments crash, it will not crash pre(head x) = x {(:) } pre(assert x y) = x {True} pre(null x) = True pre(error x) = False

Properties A property states that if a function is called with arguments satisfying a constraint, the result will satisfy a constraint x {(:) } (null x) {True} x {(:) [] _} (head x) {[]} x {[]} (head x) {True} Calculation direction

Checking a Program (Overview) Start by calculating the precondition of main If the precondition is True, then program is safe Calculate other preconditions and properties as necessary Preconditions and properties are defined recursively, so take the fixed point

Checking risers risers r = case r of [] [] x:xs case xs of [] (x:[]) : [] y:etc case risers (y:etc) of [] error pattern match s:ss case x y of True (x:s) : ss False [x] : (s:ss)

Checking risers risers r = case r of [] [] x:xs case xs of [] (x:[]) : [] y:etc case risers (y:etc) of [] error pattern match s:ss case x y of True (x:s) : ss False [x] : (s:ss)

Checking risers risers r = case r of [] [] x:xs case xs of [] (x:[]) : [] y:etc case risers (y:etc) of [] error pattern match s:ss case x y of True (x:s) : ss False [x] : (s:ss) r {[]} xs {[]} risers (y:etc) {(:) }

Checking risers risers r = case r of [] [] x:xs case xs of [] (x:[]) : [] r {(:) } [] {(:) } y:etc case risers (y:etc) of [] error pattern match s:ss case x y of True (x:s) : ss False [x] : (s:ss)... (x:[]) : [] {(:) }... {(:) }... (x:s) : ss {(:) }... [x] : (s:ss) {(:) }

Checking risers risers r = case r of [] [] x:xs case xs of Property: r {(:) } risers r {(:) } [] (x:[]) : [] r {(:) } [] {(:) } y:etc case risers (y:etc) of [] error pattern match s:ss case x y of True (x:s) : ss False [x] : (s:ss)... (x:[]) : [] {(:) }... {(:) }... (x:s) : ss {(:) }... [x] : (s:ss) {(:) }

Checking risers risers r = case r of [] [] x:xs case xs of Property: r {(:) } risers r {(:) } [] (x:[]) : [] y:etc case risers (y:etc) of [] error pattern match s:ss case x y of True (x:s) : ss False [x] : (s:ss) r {[]} xs {[]} risers (y:etc) {(:) }

Checking risers risers r = case r of [] [] x:xs case xs of Property: r {(:) } risers r {(:) } [] (x:[]) : [] y:etc case risers (y:etc) of [] error pattern match s:ss case x y of True (x:s) : ss False [x] : (s:ss) r {[]} xs {[]} y:etc {(:) }

Calculating Preconditions Variables: pre(x) = True Always True Constructors: pre(a:b) = pre(a) pre(b) Conjunction of the children Function calls: pre(f x) = x pre(f) pre(x) Conjunction of the children Plus applying the preconditions of f Note: precondition is recursive

Calculating Preconditions (case) pre(case on of [] a x:xs b) = pre(on) (on {[]} pre(a)) (on {(:) } pre(b)) An alternative is safe, or is never reached

Extending Constraints ( ) risers r = case r of [] [] x:xs case xs of [] (x:[]) : [] y:etc... xs {(:) }... r<(:)-2> {(:) } r {(:) _ ((:) )} <(:)-2> {(:) } {(:) _ ((:) )} <(:)-1> {True} {(:) True _}

Splitting Constraints ( ) risers r = case r of [] [] x:xs case xs of [] (x:[]) : [] y:etc... (x:[]):[] {(:) }... True ((:) 1 2) {(:) } True ((:) 1 2) {[]} False ((:) 1 2) {(:) True []} 1 {True} 2 {[]}

Summary so far Rules for Preconditions How to manipulate constraints Extend ( ) for locally bound variables Split ( ) for constructor applications Invoke properties for function application Can change a constraint on expressions, to one on function arguments

Algorithm for Preconditions set all preconditions to True set error precondition to False while any preconditions change recompute every precondition end while Fixed Point! Algorithm for properties is very similar

Fixed Point To ensure a fixed point exists demand only a finite number of possible constraints At each stage, ( ) with the previous precondition Ensures termination of the algorithm But termination useable speed!

The Basic Constraints These are the basic ones I have introduced Not finite but can bound the depth A little arbitrary Can t represent infinite data structures But a nice simple introduction!

A Constraint System Finite number of constraints Extend operator ( ) Split operator ( ) notin creation, i.e. x {(:) )} Optional simplification rules in a predicate

Regular Expression Constraints Based on regular expressions x r c r is a regular expression of paths, i.e. <(:)-1> c is a set of constructors True if all r paths lead to a constructor in c Split operator ( ) is regular expression differentiation/quotient

RE-Constraint Examples head xs xs (1 {:}) map head xs xs (<(:)-2>* <(:)-1> {:}) map head (reverse xs) xs (<(:)-2>* <(:)-1> {:}) xs (<(:)-2>* {:})

RE-Constraint Problems They are finite (with certain restrictions) But there are many of them! Some simplification rules Quite a lot (19 so far) Not complete This fact took 2 years to figure out! In practice, too slow for moderate examples

Multipattern Constraints Idea: model the recursive and non-recursive components separately Given a list Say something about the first element Say something about all other elements Cannot distinguish between element 3 and 4

MP-Constraint Examples head xs xs ({(:) _} {[], (:) _}) xs must be (:) xs.<(:)-1> must be _ All recursive tails are unrestricted Use the type s to determine recursive bits

More MP-Constraint Examples map head xs {[], (:) ({(:) _} {[], (:) _})} {[], (:) ({(:) _} {[], (:) _})} An infinite list {(:) _} {(:) _}

MP-Constraint semantics MP = {set Val} Val = _ {set Pat} {set Pat} Element must satisfy at least one pattern Each recursive part must satisfy at least one pattern Pat = Constructor [(non-recursive field, MP)]

MP-Constraint Split ((:) 1 2) {(:) _} {(:) {True}} An infinite list whose elements (after the first) are all true 1 _ 2 {(:) {True}} {(:) {True}}

MP-Constraint Simplification There are 8 rules for simplification Still not complete... But! x a x b = x c union of two sets x a x b = x c cross product of two sets

MP-Constraint Currying We can merge all MP s on one variable We can curry all functions so each has only one variable MP-constraint Predicate MP-constraint ( ) a b ( ) (a, b)

MP vs RE constraints Both have different expressive power Neither is a subset/superset RE-constraints grow too quickly MP-constraints stay much smaller Therefore Catch uses MP-constraints

Numbers data Int = Neg Zero One Pos Checks Is positive? Is natural? Is zero? Operations (+1), (-1) Work s very well in practice

Summary so far Rules for Preconditions and Properties Can manipulate constraints in terms of three operations MP and RE Constraints introduced Have picked MP-Constraints

Making a Tool (Catch) Haskell Core First-order Core Yhc In draft paper, see website Curried Analyse This talk

Testing Catch The nofib benchmark suite, but main = do [arg] getargs print $ primes!! (read arg) Benchmarks have no real users Programs without real users crash

Nofib/Imaginary Results (14 tests) Trivially Safe Perfect Answer Good Failures Bad Failures Good failure: Did not get perfect answer, but neither did I!

Bad Failure: Bernouilli tail (tail x) Actual condition: list is at least length 2 Inferred condition: list must be infinite drop 2 x

Bad Failure: Paraffins radical_generator n = f undefined where f unused = big_memory_result array :: Ix a (a, a) [(a, b)] Array a b Each index must be in the given range Array indexing also problematic

Perfect Answer: Digits of E2 e = ( 2. ++) $ tail concat $ map (show head) $ iterate (carrypropagate 2 map (10*) tail) $ 2 : [1,1..]

Performance of Catch Time (Seconds) 8 7 6 5 4 3 2 1 0 0 200 400 600 800 1000 1200 1400 Source Code

Case Study: HsColour Takes Haskell source code and prints out a colourised version 4 years old, 6 contributors, 12 modules, 800+ lines Used by GHC nightly runs to generate docs Used online by http://hpaste.org

HsColour: Bug 1 FIXED data Prefs =... deriving (Read,Show) Uses read/show serialisation to a file readfile prefs, then read result Potential crash if the user has modified the file Real crash when Pref s structure changed!

HsColour: Bug 1 Catch > Catch HsColour.hs Check Prelude.read: no parse Partial Prelude.read$252 Partial Language.Haskell.HsColour....Colourise.parseColourPrefs Partial Main.main Full log is recorded All preconditions and properties

FIXED HsColour: Bug 2 The latex output mode had: outtoken ( \ :xs) = `` ++ init xs ++ file.hs: hscolour latex file.hs Crash

HsColour: Bug 3 FIXED The html anchor output mode had: outtoken ( ` :xs) = <a> ++ init xs ++ </a> file.hs: (`) hscolour html anchor file.hs Crash

CHANGED HsColour: Problem 4 A pattern match without a [] case A nice refactoring, but not a crash Proof was complex, distributed and fragile Based on the length of comment lexemes! End result: HsColour cannot crash Or could not at the date I checked it... Required 2.1 seconds, 2.7Mb

Case Study: FiniteMap library Over 10 years old, was a standard library 14 non-exhaustive patterns, 13 are safe delfromfm (Branch key..) del_key del_key > key =... del_key < key =... del_key key =...

Case Study: XMonad Haskell Window Manager Central module (StackSet) Checked by Catch as a library No bugs, but suggested refactorings Made explicit some assumptions about Num

Catch s Failings Weakest Area: Yhc Conversion from Haskell to Core requires Yhc Can easily move to using GHC Core (once fixed) 2 nd Weakest Area: First-order transform Still working on this Could use supercompilation

??-Constraints Could solve more complex problems Could retain numeric constraints precisely Ideally have a single normal form MP-constraints work well, but there is room for improvement

Alternatives to Catch Reach, SmallCheck Matt Naylor, Colin R Enumerative testing to some depth ESC/Haskell - Dana Xu Precondition/postcondition checking Dependent types Epigram, Cayenne Push conditions into the types

Conclusions Pattern matching is an important area that has been overlooked Framework separate from constraints Can replace constraints for different power Catch is a good step towards the solution Practical tool Has found real bugs www.cs.york.ac.uk/~ndm/catch