Gold Standard Method for Benchmarking C Source Code Static Analysis Tools

Similar documents
Gold Standard Benchmark for Static Source Code Analyzers

Using Static Code Analysis Tools for Detection of Security Vulnerabilities

A Test Suite for Basic CWE Effectiveness. Paul E. Black.

How To Develop A Static Analysis System For Large Programs

Static Analysis Tool Exposition (SATE) 2008

Interactive Application Security Testing (IAST)

Source Code Security Analysis Tool Functional Specification Version 1.0

Visualizing Information Flow through C Programs

Software Engineering Introduction & Background. Complaints. General Problems. Department of Computer Science Kent State University

Real World Software Assurance Test Suite: STONESOUP

Software Testing & Analysis (F22ST3): Static Analysis Techniques 2. Andrew Ireland

Redefining Static Analysis A Standards Approach. Mike Oara CTO, Hatha Systems

Source Code Review Using Static Analysis Tools

Automated Theorem Proving - summary of lecture 1

Software security assessment based on static analysis

Towards practical reactive security audit using extended static checkers 1

Vulnerability Management

Abstract Interpretation-based Static Analysis Tools:

Continuous Prevention Testing

The Security Development Lifecycle

TECHNOLOGY STRATEGY AUDIT

Module 12. Software Project Monitoring and Control. Version 2 CSE IIT, Kharagpur

Open-source Versus Commercial Software: A Quantitative Comparison

Code Dx: Visual analytics for triage of source code vulnerabilities

InvGen: An Efficient Invariant Generator

Adversary Modelling 1

TIBCO Cyber Security Platform. Atif Chaughtai

Introduction to Static Analysis for Assurance

Trustworthy Software Systems

SAST, DAST and Vulnerability Assessments, = 4

Obtaining Enterprise Cybersituational

Advancing Cyber Security Using System Dynamics Simulation Modeling for System Resilience, Patching, and Software Development

Machine vs. Machine: Lessons from the First Year of Cyber Grand Challenge. Approved for Public Release, Distribution Unlimited

STATIC CODE ANALYSIS Alexandru G. Bardas 1

Using the Juliet Test Suite to compare Static Security Scanners

Introduction to Software Engineering. 8. Software Quality

2011 Forrester Research, Inc. Reproduction Prohibited

(Refer Slide Time: 01:52)

Federal Mobile App Vetting Center for Assured Software Pilot. Nick Valletta

A Technical Review of TIBCO Patterns Search

Detecting Critical Defects on the Developer s Desktop

The Road from Software Testing to Theorem Proving

Software security specification and verification

Oracle Solaris Studio Code Analyzer

Lecture Notes on Linear Search

Cloud9 Parallel Symbolic Execution for Automated Real-World Software Testing

Secure Software Programming and Vulnerability Analysis

GETTING REAL ABOUT SECURITY MANAGEMENT AND "BIG DATA"

Juniper Networks Secure

Protect Your Organization With the Certification That Maps to a Master s-level Education in Software Assurance

Application Security Testing How to find software vulnerabilities before you ship or procure code

Hardware Enabled Zero Day Protection

A Static Analyzer for Large Safety-Critical Software. Considered Programs and Semantics. Automatic Program Verification by Abstract Interpretation

Software Code Quality Checking (SCQC) No Clearance for This Secret: Information Assurance is MORE Than Security

Optimizing Network Vulnerability

Logistics. Software Testing. Logistics. Logistics. Plan for this week. Before we begin. Project. Final exam. Questions?

FREQUENTLY ASKED QUESTIONS

Software Vulnerability Assessment

Linux Kernel. Security Report

Automating Security Testing. Mark Fallon Senior Release Manager Oracle

The introduction covers the recent changes is security threats and the effect those changes have on how we protect systems.

Five Steps to Secure Outsourced Application Development

Specification and Analysis of Contracts Lecture 1 Introduction

DEPLOYMENT. ASSURED. SEVEN ELEMENTS OF A MOBILE TEST STRATEGY. An Olenick & Associates White Paper

Cyber Essentials PLUS. Common Test Specification

White Paper. Understanding NIST FISMA Requirements

Chair of Software Engineering. Software Verification. Assertion Inference. Carlo A. Furia

SAFECode Security Development Lifecycle (SDL)

Coverity White Paper. Effective Management of Static Analysis Vulnerabilities and Defects

Vision & Positioning Statement For Wurldtech Labs

VOLUME 4. State of Software Security Report. The Intractable Problem of Insecure Software

White Paper. Automating Your Code Review: Moving to a SaaS Model for Application Security

Introducing Formal Methods. Software Engineering and Formal Methods

IKOS: A Framework for Static Analysis based on Abstract Interpretation (Tool Paper)

Breaking the Cyber Attack Lifecycle

The Hacker Strategy. Dave Aitel Security Research

Software Development Life Cycle (SDLC)

Software Testing in the Cloud. Tauhida Parveen, PhD

Discovering passwords in the memory

The Advantages of Block-Based Protocol Analysis for Security Testing

How to Sandbox IIS Automatically without 0 False Positive and Negative

Transcription:

Gold Standard Method for Benchmarking C Source Code Static Analysis Tools Cyber Security Division 2012 Principal Investigators Meeting October 11, 2012 Henny Sipma Sr. Computer Scientist Kestrel Technology, LLC sipma@kestreltechnology.com (650) 320 8474

Kestrel Technology Core activity: Sound static analysis (mathematical proof of safety) Tool: CodeHawk Languages supported: C, Java, x86 executables Underlying technology: Abstract interpretation (Cousot,Cousot, 1977) Properties C: Founded: 2000 Location: Palo Alto, CA Language-level properties: memory safety, null-dereference (application-independent, mathematically well-defined properties) CWEs: 119-127,129,131,242,252,391,466-469,476,682,786-788,805,806,824,839 Java: Taint analysis, loop-bound analysis (CWE606), integer overflow (CWE190/191) X86: Memory safety, information extraction 2

Our Tool: CodeHawk Java byte code front end.class.jar.war sound abstraction from Java byte code into CHIF abstract interpretation engine C source code front end Iterators.c.exe CIL sound abstraction from preprocessed CIL code into CHIF x86 binary front end disassembly abstraction from x86 binary code into CHIF Abstract domains: constants intervals strided intervals linear equalities polyhedra symbolic sets value sets taint

Proving safety of C programs Starting point: Mathematically well-defined properties Proving absence of: CWEs: 119-127,129,131,242,252,391,466-469,476,682,786-788,805,806,824,839 Create proof obligations Safety conditions that state that the property holds at every relevant location in the program Generate invariants Assertions at all locations that are true for all inputs for all program executions Discharge proof obligations Location is safe from targeted vulnerabilities if invariants generated imply the safety conditions 4

Proving safety of C programs: feasible? Not automatic: undecidable problem Not easy: Klein et al., Sel4: Formal Verification of an OS Kernel, SOSP 2009 Program size: 8700 lines of C, 600 lines of assembly code Proof effort : 11 person-years Our own experience with CodeHawk: 1100 small test programs (SAMATE): fully automatic, a few minutes Larger benchmark programs (up to 1200 LOC): full verification: 1-2 person-weeks Real-world applications (up to 100,000 LOC): 55-67% of proof obligations discharged automatically (~5 mins) 5

Proving safety of C programs: our proposal (TTA #1: Software Assurance) Perform a full verification of 6 real-world C applications - ranging in size from 50,000 200,000 LOC, - for CWEs 119, 120,121, 122, 123, 124, 125, 126, 127, 129, 131, 134, 170, 242, 252, 391, 415, 416, 457, 466, 467, 468, 469, 476, 682, 786, 787, 788, 805, 806, 824,839 Ambitious? Interesting Research? Immediate Operational Capability Yes, a very important one, but first a brief interlude on the current static analysis landscape 6

Static Analysis Landscape: Bug-finders Current Reality: Bug-finders are the first line of defense against software vulnerabilities (but how effective are these bug-finders?) Two important terms: False positive: bug report that turns out not to be a bug Makes software developers and managers very unhappy False negative: bug that is not reported Invisible, until exploited 7

Static Analysis Landscape: Bug-finders False positive: bug report that turns out not to be a bug Makes software developers and managers very unhappy Reality: Naturally, software developers heavily favor bug-finders with low false-positive rate Low false positive rate is very easy to obtain for a bug-finder: Only report a bug if it is 99% likely to be a bug otherwise keep quiet Very high false negative rate (but that s invisible to the developer) 8

The problem: Evaluating bug-finders The hated false positives The invisible false negatives Tool A Tool B 9

The problem: Evaluating bug-finders The hated false positives Tool A Tool B Developer will choose Tool A Tool Vendor B is forced to lower its false positive rate (and the only cost-effective way to do so, is to increase false negatives) to stay in business Strong economic incentive for tools that provide less assurance Society suffers 10

The problem: Evaluating bug-finders Maybe dealing with the extra false positives can be justified by knowing that a much higher level of assurance is achieved Make these false negatives visible! Give Tool B due credit for its low false negative rate Tool A Tool B 11

Evaluating Bug-finders Problem has long been recognized by NIST: Dr. Paul Black Small synthetic benchmarks SAMATE (1163 test cases by MIT/Lincoln Labs), 2005 Juliet Test Suite v1.0 (45,324 test cases), Dec 2010 Juliet Test Suite v1.0 (57,099 test cases), Sep 2012 SATE competition: (real-world programs) 2008: Nagios, Lighttpd, Naim 2009: Irssi, Pvm 2010: Dovecot,. 2011:. 12

SATE competition: participation 13

Evaluating Bug-finders Problem has long been recognized by NIST: Dr. Paul Black Small synthetic benchmarks SAMATE (1163 test cases by MIT/Lincoln Labs), 2005 Juliet Test Suite v1.0 (45,324 test cases), Dec 2010 Juliet Test Suite v1.0 (57,099 test cases), Sep 2012 SATE competition: (real-world programs) Measuring false positives and false negatives 2008: Nagios, Lighttpd, Naim 2009: Irssi, Pvm 2010: Dovecot,. 2011:. 14

Evaluating Bug-finders Problem has long been recognized by NIST: Dr. Paul Black Small synthetic benchmarks SAMATE (1163 test cases by MIT/Lincoln Labs), 2005 Juliet Test Suite v1.0 (45,324 test cases), Dec 2010 Juliet Test Suite v1.0 (57,099 test cases), Sep 2012 SATE competition: (real-world programs) Measuring false positives and false negatives 2008: Nagios, Lighttpd, Naim 2009: Irssi, Pvm 2010: Dovecot,. 2011:. Gold standard + measurement tool 15

Tasks 1. Definition of metrics (02/13) Redefine false positive rate / false negative rate relative to ground truth 2. Develop analysis support for CWEs 457, 415/416, 170, and 134 (11/13) Uninitialized variables, use-after-free, double-free, improper null-termination, uncontrolled format string 3. Increase level of proof obligations proven automatically to > 80% (06/14) Increase precision by improved pointer analysis, limited shape analysis, and higher degree of context sensitivity Preliminary results for proving memory safety (2010) 16

Tasks (cont d) 4. Specialize the analyzer for the six applications to achieve 100% (06/14) E.g. introduce application-specific data structure invariants, environment assumptions 5. Develop measurement tool (08/14) Evaluate results Ground truth report Static Analysis tool results Application-type-specific strengths and weaknesses of bug-finder tool evaluated 6. Apply measurement tool to published results and SWAMP tools (08/14) 17

Technology Transfer Commercialization Software assurance centers (Lockheed Martin, GE,.) Static analysis tool developers Developers of high-security software Evaluation kit, available Q1 2013 Government customers NIST Provide prototype to NIST, early 2013 Incorporate feedback from NIST, throughout the program Provide support for future SATE competitions Interested in other collaborations 18

Contribution to SWAMP Evaluate tools for false positives and false negatives on benchmark programs 19

Operational Capability Better assessment of vulnerabilities in critical cyber infrastructure Tool X Measurement tool: Provide advice which tool, or combination of tools is best suitable for each application area Tool Z Tool Y 20

Benefits to society Create bigger market opportunity for high-quality static analysis tool vendors by enabling differentiating on false negatives Let the market do its work To create higher and higher levels of software assurance 21

Gold Standard Method for Benchmarking C Source Code Static Analysis Tools Kestrel Technology, LLC, TTA-01-0012-1 SATE results Create market opportunity for highquality software and high-quality tools Evaluate Let market increase software assurance levels Ground truth report Operational Capability Quantitative performance targets: Complete metrics definition for all targeted CWEs Increase percentage automation of proof to 80% 100% analysis of 6 NIST SATE applications for targeted CWEs Cost of ownership: all deliverables provided under research contract with no additional cost Benefits to DHS and society at large Allows measurement of false negatives of bug-finding tools Allows DHS to more accurately assess vulnerability exposure Feedback aids developers of bug-finders to improve their tools Advances the state of the art in scalable sound static analysis Proposed Technical Approach Goal: Provide Gold Standard benchmarking method for SWAMP-resident static analysis tools Tasks: Define metrics as basis for measurement Develop analysis support for CWEs 134,415/6,457,170) Increase percentage of safety conditions proven automatically Full reference analysis of 6 NIST SATE benchmark applications Develop scoring tool to compare tool results to reference Current status of proposed technology: Working prototype C analyzer based on abstract interpretation that automatically proves 55-67% of memory safety conditions for NIST benchmark applications Schedule and Milestones: 24 months period of performance 6: Definition of metrics 15: Analysis coverage for CWEs 457, 415/6, 134, 170 22: 80% level of automation on NIST benchmarks 22: 100% analysis of NIST benchmarks 24: Measurement tool, evaluation of SATE participants tools Deliverables: Gold Standard Analysis of NIST benchmarks Measurement tool to compare SWAMP results with reference Tool (as executable) with government rights Corporate Contact: Doug Smith, smith@kestreltechnology.com 3260 Hillview Ave, Palo Alto, CA 94304, (650) 320 8474 22