How I Learned to Stop Fuzzing and Find More Bugs Jacob West Fortify Software August 3-5, 2007 Las Vegas
Agenda Introduction to fuzzing What is fuzzing? Challenges with fuzzing Introduction to static analysis How static analysis works Examples of bugs static analysis is good at finding Untapped potential: Customization Experiment Fuzzing versus static analysis Conclusion
What is Fuzzing? Encompasses runtime testing that attempts to induce faults in software systems by inputting random or semi-random values Introduced by Barton Miller at the University of Wisconsin, Madison in 1990 (cs.wisc.edu/~bart/fuzz/)
How Fuzzing Works Identify sources of input to a program Permute or generate pseudorandom input Monitor the program for failures Record the input and program state combinations that generate faults Repeat for desired duration
Input Sources: File Formats Identify all valid file formats (e.g. JPG, TIFF, PDF, DOC, XLS) Collect a library of valid files Malform a file Consume the file and observe the application
Input Sources: Protocols Create bogus messages (e.g. TCP/IP, RPC, SOAP, HTTP) Record-fuzz-replay Enable a sniffer Collect a few thousand messages Fuzz the messages Replace the fuzzed messages Fuzz messages at program boundary
Intelligence Dumb fuzzing = modify data randomly Most input will be entirely invalid Can make for good test cases Takes a long time to enumerate valid test cases May test the validation logic of high-level protocols instead of the underlying application Smart fuzzing = aware of data structure Altering content size Replacing null-terminated strings Altering numeric values or flipping signs 0, 2^n +/- 1 Adding invalid headers, altering header values, duplicate headers
Challenges: Nebulous File Formats / Protocols No problem for a standard Web application What about? Proprietary Web Services interfaces Network servers Thick client software Difficult to enumerate input sources to fuzz Even harder to generate valid input Requires customization
Challenges: Program Semantics / Reachability Example: if (!strcmp(input1, static_string ) { strcpy(buffer2, input2); } Need to provide value of input1 equal to static_string and large value of input2 Requires N*M random inputs to reach bug guarded by two-variable conditions Requires customization
Challenges: Identifying Errors Error reporting conventions differ between programs Good design guidelines require programs to mask errors and error details Requires customization
Challenges: Completeness / Coverage Microsoft SDL mandates that you run 100,000 iterations per file format/parser. If you find a bug, you reset to 0 and start running another 100,000 with a new random seed. Why? How many input sources were missed? How much of the program was tested? How good were the tests?
Advantages Verifiable and reproducible at runtime Scalable to across programs that utilize the same protocol Least effort to find a bug, impossible to ensure completeness, very costly to approach
Tools Open Source / Free SPIKE Scratch Peach Commercial Cenzic idefense SPI Dynamics (http://www.hacksafe.com.au/blog/2006/08 /21/fuzz-testing-tools-and-techniques/)
chainsaw
Static Source Code Analysis Benefits 1000x faster than code review Security knowledge built in Consistent Limitations Does not understand architecture Does not understand application semantics Does not understand social context
The Many Faces of Static Analysis Type checking Style checking Program understanding Program verification / Property checking Bug finding Security review
Type Checking Taken for granted Imperfect: short s = 0; int i = s; /* the type checker allows this */ short r = i; /* false positive: this will cause a type checking error at compile time. */ /* false negative: passes type checking, fails at runtime */ Object[] objs = new String[1]; objs[0] = new Object();
Style Checking Pickier than type checker, might look at Whitespace Naming Deprecated functions gcc -Wall does some style checking typedef enum { red, green, blue } Color; char* getcolorstring(color c) { char* ret = NULL; switch (c) { case red: printf("red"); } return ret; } Tools Lint, PMD
Program Understanding Help make sense of a large codebase Tools: Fujaba Klockwork CAST Systems
Program Verification / Property Checking Prove that a program has particular properties Partial specification -> property checking Often focuses on temporal safety properties Example: allocated memory must be freed inbuf = (char*) malloc(bufsz); if (inbuf == NULL) return -1; outbuf = (char*) malloc(bufsz); if (outbuf == NULL) return -1; /* memory leak */ Soundness Aspires to Sound WRT the specification : reports all bugs Tools: Praxis, PolySpace, GrammaTech
Bug Finding More sophisticated than a style checker Less ambitious than program verification Search code for bug idioms Find high-confidence, low noise results (low false positives) Soundness Aspires to Sound WRT counterexample : never reports a bug that isn t a bug Example: double checked locking if (fitz == null) { synchronized (this) { if (fitz == null) { fitz = new Fitzer(); } } } Tools: FindBugs, Coverity, Klocwork, Prefast
Security Review Focus on finding exploitable code Find high-risk code constructs for review (low false negatives) Example int main(int argc, char* argv[]) { char buf1[1024]; char buf2[1024]; char* shortstring = "a short string"; strcpy(buf1, shortstring); /* innocuous use of strcpy */ strcpy(buf2, argv[0]); /* dangerous use of strcpy */... Tools: RATS, ITS4, FlawFinder; Fortify Software and Ounce Labs
Security Example: Dataflow Analysis Trace potentially tainted data through the program Report locations where an attacker could take advantage of a vulnerable function or construct buff = getinputfromnetwork(); copybuffer( newbuff, buff ); exec( newbuff ); (command injection vulnerability)
A Peek Inside a Static Analysis Tool Analyzer src Front End System Model Analyzer Results Viewer Analyzer Modeling Rules Security Properties
Parsing Language support One language/parser is straightforward Lots of combinations is harder Could analyze compiled code Everybody has the binary No need to guess how the compiler works No need for rules but Decompilation can be difficult Loss of context hurts Want to report line numbers
Analysis / Rules: Structural Identify bugs in the program's structure Example: calls to gets() FunctionCall: function is [name == "gets"] Structural rule:
Analysis / Rules: Structural Identify bugs in the program's structure Example: memory leaks caused by realloc() buf = realloc(buf, 256); Structural rule: FunctionCall c1: ( c1.function is [name == "realloc"] and c1 in [AssignmentStatement: rhs is c1 and lhs == c1.arguments[0] ] )
Analysis / Rules: Dataflow Source Rule Following interesting values through the program Example: Command injection vulnerability buff = getinputfromnetwork(); copybuffer( newbuff, buff ); exec( newbuff ); Source rule: Function: getinputfromnetwork() Postcondition: return value is tainted
Analysis / Rules: Dataflow Pass-Through Rule Following interesting values through the program Example: Command injection vulnerability buff = getinputfromnetwork(); copybuffer( newbuff, buff ); exec( newbuff ); Pass-through rule: Function: copybuffer() Postcondition: if the second argument is tainted, then the first argument becomes tainted
Analysis / Rules: Dataflow Sink Rule Following interesting values through the program Example: Command injection vulnerability buff = getinputfromnetwork(); copybuffer( newbuff, buff ); exec( newbuff ); Sink rule: Function: exec() Precondition: the first argument must not be tainted
Analysis / Rules: Control Flow Look for dangerous sequences Example: Double-free while ((node = *ref)!= NULL) { *ref = node->next; free(node); if (!unchain(ref)) { break; } } if (node!= 0) { free(node); return UNCHAIN_FAIL; } (other operations) (other operations) start initial state freed error free(x) free(x)
Analysis / Rules: Control Flow Look for dangerous sequences Example: Double-free while ((node = *ref)!= NULL) { *ref = node->next; free(node); if (!unchain(ref)) { break; } } if (node!= 0) { free(node); return UNCHAIN_FAIL; } (other operations) (other operations) start initial state freed error free(x) free(x)
Analysis / Rules: Control Flow Look for dangerous sequences Example: Double-free while ((node = *ref)!= NULL) { *ref = node->next; free(node); if (!unchain(ref)) { break; } } if (node!= 0) { free(node); return UNCHAIN_FAIL; } (other operations) (other operations) start initial state freed error free(x) free(x)
Common Problems False positives Incomplete/inaccurate model Conservative analysis Missing rules False negatives Incomplete/inaccurate model Forgiving analysis Missing rules
Untapped Potential: Customization Improve tool understanding of the program Model the behavior of third-party libraries Describe program semantics Identify program-specific vulnerabilities Enforce specific coding standards Find vulnerabilities in custom interfaces Design for testability Write code knowing that it will be checked
Advantages of Static Analysis over Fuzzing Speed Doesn t require running the code Customization has almost no impact on performance Thoroughness Considers every path through the program,
Experiment Comparison of fuzzing and static analysis on an open-source code base Without customization With customization
Results TBA
Summary Static analysis is spot-on for security Important attributes Language support Analysis techniques Rule set Performance Results management Customization Better return on investment with static analysis
<end> PDF for talk available here: http://www.fortify.com/presentations Send me e-mail! Jacob West jacob@fortify.com Secure Programming with Static Analysis