Visualizing Information Flow through C Programs

Similar documents
Soft-Timer Driven Transient Kernel Control Flow Attacks and Defense

Towards practical reactive security audit using extended static checkers 1

I Control Your Code Attack Vectors Through the Eyes of Software-based Fault Isolation. Mathias Payer, ETH Zurich

CHAPTER 5 INTELLIGENT TECHNIQUES TO PREVENT SQL INJECTION ATTACKS

Secure Software Programming and Vulnerability Analysis

Early Vulnerability Detection for Supporting Secure Programming

APPROACHES TO SOFTWARE TESTING PROGRAM VERIFICATION AND VALIDATION

Bringing Big Data Modelling into the Hands of Domain Experts

Integrated Network Vulnerability Scanning & Penetration Testing SAINTcorporation.com

Barracuda Load Balancer Online Demo Guide

Grant Management. System Requirements

Levels of Software Testing. Functional Testing

IKOS: A Framework for Static Analysis based on Abstract Interpretation (Tool Paper)

TOOL EVALUATION REPORT: FORTIFY

CMS Query Suite. CS4440 Project Proposal. Chris Baker Michael Cook Soumo Gorai

Gold Standard Benchmark for Static Source Code Analyzers

Chapter 1. Dr. Chris Irwin Davis Phone: (972) Office: ECSS CS-4337 Organization of Programming Languages

Abila Grant Management. System Requirements

Security Certification of Third- Parties Applications

Detecting Pattern-Match Failures in Haskell. Neil Mitchell and Colin Runciman York University

Module 10. Coding and Testing. Version 2 CSE IIT, Kharagpur

RiMONITOR. Monitoring Software. for RIEGL VZ-Line Laser Scanners. Ri Software. visit our website Preliminary Data Sheet

Detecting Software Vulnerabilities Static Taint Analysis

JOURNAL OF OBJECT TECHNOLOGY

Measuring the Attack Surfaces of SAP Business Applications

Automating Security Testing. Mark Fallon Senior Release Manager Oracle

Introduction to Automated Testing

Apache Web Server Execution Tracing Using Third Eye

Chapter 12 Programming Concepts and Languages

Fine-Grained User-Space Security Through Virtualization. Mathias Payer and Thomas R. Gross ETH Zurich

Drive Towards an Analytics Driven Organization using the Navis BI Portal

Performing a Web Application Security Assessment

Language Evaluation Criteria. Evaluation Criteria: Readability. Evaluation Criteria: Writability. ICOM 4036 Programming Languages

Transparent Monitoring of a Process Self in a Virtual Environment

Securing software by enforcing data-flow integrity

EMSCRIPTEN - COMPILING LLVM BITCODE TO JAVASCRIPT (?!)

An Oracle Benchmarking Study February Oracle Insurance Insbridge Enterprise Rating: Performance Assessment

Richmond Web Services Installation Guide Web Reporting Version 10.0

Pattern Insight Clone Detection

The programming language C. sws1 1

Elemental functions: Writing data-parallel code in C/C++ using Intel Cilk Plus

SCADE System Technical Data Sheet. System Requirements Analysis. Technical Data Sheet SCADE System

Software testing. Objectives

Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child

Designing and Developing Web Applications by using the Microsoft.NET Framework

Glossary of Object Oriented Terms

Embedded Software Development with MPS

What is Web Security? Motivation

Software security assessment based on static analysis

Seven Practical Steps to Delivering More Secure Software. January 2011

Scalability and Performance Report - Analyzer 2007

Out of the Fire - Adding Layers of Protection When Deploying Oracle EBS to the Internet

How I hacked PacketStorm ( )

Tackling Big Data with MATLAB Adam Filion Application Engineer MathWorks, Inc.

Software Testing & Analysis (F22ST3): Static Analysis Techniques 2. Andrew Ireland

Bug hunting. Vulnerability finding methods in Windows 32 environments compared. FX of Phenoelit

Gold Standard Method for Benchmarking C Source Code Static Analysis Tools

Design and Functional Specification

Bachelor of Games and Virtual Worlds (Programming) Subject and Course Summaries

Linux Kernel. Security Report

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

Loop-Extended Symbolic Execution on Binary Programs

Protect Your Organization With the Certification That Maps to a Master s-level Education in Software Assurance

SQL Server 2005 Features Comparison

Criteria for web application security check. Version

International Workshop on Field Programmable Logic and Applications, FPL '99

Usage Analysis Tools in SharePoint Products and Technologies

Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff

Performance Testing. Slow data transfer rate may be inherent in hardware but can also result from software-related problems, such as:

How To Trace

MPI-Checker Static Analysis for MPI

1 File Processing Systems

CS 356 Lecture 28 Internet Authentication. Spring 2013

Columbia University Web Security Standards and Practices. Objective and Scope

Informatica Data Director Performance

QUIRE: : Lightweight Provenance for Smart Phone Operating Systems

Security Management of Cloud-Native Applications. Presented By: Rohit Sharma MSc in Dependable Software Systems (DESEM)

Oracle Solaris Studio Code Analyzer

Big Data Mining Services and Knowledge Discovery Applications on Clouds

Certification Authorities Software Team (CAST) Position Paper CAST-13

Managing Scalability of Web services

Introduction to Web Application Security. Microsoft CSO Roundtable Houston, TX. September 13 th, 2006

Component visualization methods for large legacy software in C/C++

System Requirements - Table of Contents

Domain driven design, NoSQL and multi-model databases

Enterprise Applications

Automatic vs. Manual Code Analysis

Rackspace Cloud Databases and Container-based Virtualization

CS 40 Computing for the Web

Understanding Infrastructure as Code. By Michael Wittig and Andreas Wittig

Ivan Medvedev Principal Security Development Lead Microsoft Corporation

Transcription:

Visualizing Information Flow through C Programs Joe Hurd, Aaron Tomb and David Burke Galois, Inc. {joe,atomb,davidb}@galois.com Systems Software Verification Workshop 7 October 2010 Joe Hurd, Aaron Tomb and David Burke Visualizing Information Flow through C Programs 1 / 32

Talk Plan 1 Introduction 2 Information Flow Analysis 3 C Information Flow Tool (Cift) 4 Summary Joe Hurd, Aaron Tomb and David Burke Visualizing Information Flow through C Programs 2 / 32

Software Security Evaluation Evaluating the security of a program generally focuses on: The attack surface (e.g., the interface to the user or network). The critical data (e.g., crypto keys, database queries). Typical Question: What are the possible effects of changes at the attack surface on the critical data? Answering this requires an understanding of how information flows through the program. Joe Hurd, Aaron Tomb and David Burke Visualizing Information Flow through C Programs 4 / 32

Information Flow Diagnostic Tool Tight time constraints mean that evaluators often cannot look at every line of the codebase. Project Goal: Develop an interactive diagnostic tool that allows an evaluator to scan for anomalies in the program information flows. Joe Hurd, Aaron Tomb and David Burke Visualizing Information Flow through C Programs 5 / 32

Information Flow and Security Properties Many program security properties can be expressed in terms of potential information flow between program variables. Confidentiality: There are no information flows from secret variables to public variables. Integrity: There are no information flows from tainted variables to critical variables. Availability: No way to specify that a flow must happen. Joe Hurd, Aaron Tomb and David Burke Visualizing Information Flow through C Programs 6 / 32

Information Flow and Confidentiality Use Case: Ensure Bell-La Padula properties hold for a cross domain application. Annotate: Program variables with their sensitivity level. Check: There are no flows from higher sensitivity to lower sensitivity variables. Example: Code (Confidentiality Bug) void f() { int k = get_secret_key(); } publish_to_internet(k); Joe Hurd, Aaron Tomb and David Burke Visualizing Information Flow through C Programs 7 / 32

Information Flow and Integrity Use Case: Defend against SQL injection attacks. Annotate: Tainted data variables; critical data variables; and validation functions. Check: All flows from tainted variables to critical variables go through validation functions. Example: Code (Integrity Check Succeeds) void f() { int data = get_user_input(); data = validate_input(data); } query_sql_database(data); Joe Hurd, Aaron Tomb and David Burke Visualizing Information Flow through C Programs 8 / 32

User Centered Design All tools are user interfaces Clark Dodsworth Evaluator/Tool Workflow: 1 The evaluator seeds the analysis by annotating some program variables as sensitive data or dangerous user input. 2 The tool uses the annotations to find candidate insecure information flows. 3 The evaluator examines the flows, and removes false positives by providing additional annotations so that the tool can make a more precise analysis. Tool Requirements: 1 Scalable analysis of program information flow. 2 Intuitive visualization of information flow in terms of source code. Joe Hurd, Aaron Tomb and David Burke Visualizing Information Flow through C Programs 9 / 32

Analysis Evidence Evidence of Security Bugs: Insecure information flows are presented as a sequence of assignments on the control flow. False Positives: The evaluator uses the tool to browse the insecure information flows, and adds annotations to eliminate false positives. Evidence of Assurance: The analysis computes an conservative over-approximation of information flow on a subset of the programming language. False Negatives: The tool will emit a warning message when the analysis detects that the program is outside of the conservative subset, allowing the evaluator to assess the residual risk. Joe Hurd, Aaron Tomb and David Burke Visualizing Information Flow through C Programs 10 / 32

Static Analysis Static analysis is a program verification technique that is complementary to testing. Testing works by executing the program and checking its run-time behavior. Static analysis works by examining the text of the program. Driven by new techniques, static analysis tools have recently made great improvements in scope. Example 1: Modern type systems can check data integrity properties of programs at compile time. Example 2: Abstract intepretation techniques can find memory problems such as buffer overflows or dangling pointers. Example 3: The TERMINATOR tool developed by Microsoft Research can find infinite loops in Windows device drivers that would cause the OS to hang. Joe Hurd, Aaron Tomb and David Burke Visualizing Information Flow through C Programs 12 / 32

Information Flow Static Analysis: Requirements Evidence: Generating evidence of assurance relies on the information flow static analysis being sound: 1 Define a sound static analysis on a simple flow language. 2 Implement a conservative translator from the target programming language to the simple flow language. Scalability: To help the static analysis scale up to realistically sized codebases, we design it to be compositional. Preserve function calls in the flow language. Program Understanding: The analysis result must help an evaluator understand how information flows through the program source code. Link each step in the analysis to the program source code. Joe Hurd, Aaron Tomb and David Burke Visualizing Information Flow through C Programs 13 / 32

Information Flow Static Analysis of C Code The following front end processing is performed to translate C code to the flow language: 1 Preprocessing: The C preprocessor. 2 Parsing: The Haskell Language.C package. 3 Simplification: Normalizing expressions (like CIL). 4 Variable Classification: Special handling for address-taken locals and dynamically allocated memory. 5 Pointer Analysis: Anderson s algorithm replaces each indirect reference with a set of direct references. Key Property: The front end processing is conservative. Every information flow in the C code is translated to an information flow in the flow language. Assumption: the C code is memory safe. Joe Hurd, Aaron Tomb and David Burke Visualizing Information Flow through C Programs 14 / 32

The Flow Language Variables Global variables. Local variables of a function. Statements Simple variable assignment v 1 v 2. Function call v f (v 1,..., v n ). Functions Special local variables representing input arguments $arg1,...,$argn and return value $ret. A function contains a set of statements (flow insensitive). Programs A set of functions, including a distinguished main function where execution begins. Joe Hurd, Aaron Tomb and David Burke Visualizing Information Flow through C Programs 15 / 32

The Flow Language as a C Subset Code (Example Program) /* High global variables */ int high_in; int high_out; /* Low global variables */ int low_in; int low_out; int f(int x) { return x + 1; } int main() { high_in = 42; low_in = 35; high_out = f(high_in); low_out = f(low_in); } return 0; Joe Hurd, Aaron Tomb and David Burke Visualizing Information Flow through C Programs 16 / 32

Step 1/4: Compute Function Transformers For a function f, the transformer T f is the subset of global variables and argument variables that can flow into the return value. Transformers can be efficiently computed by a bottom-up traversal of the call graph (using Bourdoncle s algorithm). Analysis (Example Function Transformers) T f = {$arg1} T main = Joe Hurd, Aaron Tomb and David Burke Visualizing Information Flow through C Programs 17 / 32

Step 2/4: Compute Function Contexts For a function f, the context C f is a mapping from each argument of f to the subset of global variables that can flow into the argument. Contexts can be efficiently computed by a top-down traversal of the call graph, starting with main (using Bourdoncle s algorithm and the transformers). Analysis (Example Function Contexts) C f = $arg1 {low in, high in} C main = Joe Hurd, Aaron Tomb and David Burke Visualizing Information Flow through C Programs 18 / 32

Step 3/4: Compute Function Information Flow Graphs For a function f, the information flow graph G f is a directed graph between global variables, where an edge x y indicates that f enables a possible information flow from x to y. The function information flow graphs can be efficiently computed from the transformers and contexts. Key Property: The information flow analysis is context sensitive. Analysis (Example Function Information Flow Graphs) G f = G main = {low in low out, high in high out} Joe Hurd, Aaron Tomb and David Burke Visualizing Information Flow through C Programs 19 / 32

Step 4/4: Compute Program Information Flow Graph The program information flow graph G is a directed graph between global variables, where an edge x y indicates that the program enables a possible information flow from x to y. The program information flow graph is the union of all the function information flow graphs G f where f is reachable from the main function. Key Property: The information flow analysis is sound. Analysis (Example Program Information Flow Graph) G = {low in low out, high in high out} Joe Hurd, Aaron Tomb and David Burke Visualizing Information Flow through C Programs 20 / 32

Cift Architecture The C Information Flow Tool (Cift) allows evaluators to examine information flows in C code using a standard web browser. Haskell analysis server Standard web browser Web server HTTP JavaScript visualization C source files Information flow analysis The architecture is designed to support multiple simultaneous users browsing code and sharing annotations. Joe Hurd, Aaron Tomb and David Burke Visualizing Information Flow through C Programs 22 / 32

Visualizing Information Flow A program information flow consists of many assignments distributed across the codebase: Tracking a long information flow across source code involves much tedious opening, closing and searching of files. Evaluating software is like frying 1,000 eggs A different visualization solution is needed. Joe Hurd, Aaron Tomb and David Burke Visualizing Information Flow through C Programs 23 / 32

Right-Angle Fractal Call Trees Joe Hurd, Aaron Tomb and David Burke Visualizing Information Flow through C Programs 24 / 32

Demo Joe Hurd, Aaron Tomb and David Burke Visualizing Information Flow through C Programs 25 / 32

Restricting to Variables of Interest Problem: A typical large program contains many variables V in the program information flow graph: information overload. Solution: Allow the user to specify a subset X V of interesting variables. Remove the uninteresting global variables V X from the program information flow graph one by one. When removing a variable v, an extra edge x y must be added between every pair of variables x, y satisfying x v and v y. This amounts to computing the transitive closure of the program information flow graph on demand. A first step towards information flow annotations. Joe Hurd, Aaron Tomb and David Burke Visualizing Information Flow through C Programs 26 / 32

Emphasizing Call Tree Paths of Interest Problem: Functions with too many function calls result in an uninformative hairy spike (left graphic). Solution: Emphasize function calls contributing to information flows between variables of interest (right graphic). Joe Hurd, Aaron Tomb and David Burke Visualizing Information Flow through C Programs 27 / 32

Open Source Benchmarks All experiments were carried out on a MacBook Pro 2.2Ghz Core 2 Duo with 4Gb of RAM, using GHC 6.12.1. Analyzing the 67 KLoC C implementation of OpenSSH takes 1:53s of CPU time and consumes 1.6Gb of RAM. Analyzing the 94 KLoC C implementation of the SpiderMonkey JavaScript interpreter takes 6:49s of CPU time and consumes 1.3Gb of RAM. Joe Hurd, Aaron Tomb and David Burke Visualizing Information Flow through C Programs 28 / 32

Cift Development Plan Milestones: Develop an automatic information flow analysis that scales up to realistic C codebases. Develop a visualization technique for program information flow that is grounded in the source code. Implement a research prototype tool to examine information flows in C programs. Develop an annotation language for information flow properties of C functions and variables. Allow users to edit annotations through the browser interface and see the resulting effects on the analysis. Joe Hurd, Aaron Tomb and David Burke Visualizing Information Flow through C Programs 30 / 32

Future Plans Extend the scope of the information flow analysis. Supporting array sensitivity to distinguish the elements of an array or cells in a memory block. Adding flow sensitivity and a clobber analysis to detect failures to sanitize confidential data after use. Target LLVM to extend the analysis to C++/Ada/etc. Support higher-level information flow specifications. Derive program specifications from higher-level security policies. Track information flow across module and language barriers. Joe Hurd, Aaron Tomb and David Burke Visualizing Information Flow through C Programs 31 / 32

Summary This Talk: We have presented a research prototype static analysis tool that an evaluator can use to visualize how information flows through C programs. Feedback Welcome: Please let us know what features you d like to see in a program understanding tool. joe@galois.com Joe Hurd, Aaron Tomb and David Burke Visualizing Information Flow through C Programs 32 / 32