Leveraging User Interactions for In-Depth Testing of Web Applications Sean Mc Allister, Technical University Vienna (sean@iseclab.org) Christopher Kruegel, University California, Santa Barbara (chris@iseclab.org) Engin Kirda, Institute Eurecom, France (ek@iseclab.org) 1
Overview 1. Challenges 2. Improved Fuzzing 3. Evaluation 2
Vulnerability Scanners black-box testing tools to detect vulnerabilities within web applications 3 main phases in the workflow: Discovery find new URLs (+ input parameters) to be used as attack vectors follow links and analyze forms Audit (Fuzzing Phase) fuzz parameters, send request and analyze response Crawling (optional) used to detect persistent vulnerabilities 3
Current Problems complex forms: server side validation prevents tools to find new attack vectors current solutions often guess values some tools offer the possibility to supply values for certain forms (i.e. login credentials) the lack of valid input keeps tools from finding vulnerabilities embedded deeper within the application but persistent attacks (such as stored XSS) require malicious input that is accepted as valid 4
Current Problems Importance of workflow within the web application login required before interaction with the application supply credentials and block logout links order of steps is of importance 5
Our Solution to correctly fill out complex forms it is necessary to have real user input to follow workflows within the application the scanner needs some sort of guidance build black-box test cases from real user interaction with the application by monitoring user behavior and capturing POST / GET data 6
Overview 1. Challenges 2. Improved Fuzzing 3. Evaluation 7
Guided Fuzzing 8
Guided Fuzzing use cases can reach deep into the application user supplied input is (often) valid 8
Guided Fuzzing use cases can reach deep into the application user supplied input is (often) valid - less breadth than traditional fuzzers (depending on the amount of use cases) 8
Extended, Guided Fuzzing depths of the application reached by following user interactions (1. & 3.), testing breadth increased by alternating crawling phases (2.) 9
Extended, Guided Fuzzing depths of the application reached by following user interactions (1. & 3.), testing breadth increased by alternating crawling phases (2.) - fuzzing phases might break the use case 9
Problems these workflows have the disadvantage that the fuzzing phase can in some cases break the replay of the use case i.e. logout from web application, deletion of all items from the shopping cart before proceeding to checkout or, even worse, delete content generated by the fuzzing component the need for stateful testing arises due to these shortcomings the state of a web application is controlled by (1) the client (cookie values) (2) the server (database) client side can be controlled but the server side cannot 10
Stateful Fuzzing 11
Implementation request capturing component running as a middleware between the server and the web application replay component (HTTP protocol driver) server side implementation of the state-machine intercepts and records all data manipulation originating from a request rollback all changes after fuzzing phase fuzzing component 12
Overview 1. Challenges 2. Improved Fuzzing 3. Evaluation 13
Tested Applications 3 common web applications were tested 1. blog Browse entries and create comments forced preview of comment 2. forum application create threads and replies 3. e-commerce application large number of pages browsing of articles, adding to shopping cart, checkout registration of new users, login, logout comment on articles 14
Tools tested and compared 1. w3af open source vulnerability scanner many modules available for various attacks 2. Acunetix Web Vulnerability Scanner commercial tool claims high success rates high amount of different attack strings, including advanced XSS attacks 3. Burp Suite Spider Component not really a vulnerability scanner, but a manual penetration testing tool simple form filling algorithms and web spider capabilities 15
Measuring the Effectiveness coverage of an application (number of pages found and tested) high coverage of an application is definitely desirable - questionable for sites with large amount of content that all derive from the same base template measuring generated content does the scanner have any effect on the content displayed on a web application? Both in terms of generated pages (new threads in bulletin boards) and content on existing pages (replies and comments on existing content) on the data level: How many objects have been generated by the scanner? 16
Results 1. blog no other scanner managed to generate a comment on the blog Acunetix and w3af both found more pages, by requesting root directories of each found URL persistent XSS vulnerability found after successfully posting comment 2. forum application due to the varying number of test strings used, some scanners generated more objects in the database Acunetix: 687 threads w3af: 29 ours: 1 to 36, depending on method amount of found vulnerabilities (1) identical for all scanners 17
Results (2) 3. e-commerce application due to the complexity of this application the evaluated scanners failed to supply valid input data for most forms (even after configuration with username/password) and could not find more than a single vulnerability, the use case based approaches found up to 8 more a crawling and attacking phase breaks the use case immediately spider logs out deletes content from shopping cart etc. coverage was high with all presented approaches, but depth could only be reached with use cases stateful fuzzing as the only feasible approach to reach both depth and breadth for security testing of this application 18
Conclusions The workflow of vulnerability scanners can not cope with the demand for extensive testing of web applications, because they are unable to reach certain end points. Use cases offer a good approach to increase the coverage of scanners within a complex web application. The lack of extensive use cases leads to the demand for alternative approaches that can increase the testing breadth of the application In an application that strongly depends upon actions being performed in the right order, additional effort is needed to ensure a high coverage. 19
Any questions? 20