Bayesian Classification for SQL Injection Detection Brandon Skari College of Engineering and Applied Science University of Wyoming Laramie, Wyoming 82070 brandon.skari@gmail.com April 6, 2011
Overview SQL Injection Types of attacks Prevention techniques SQLassie Brandon Skari (UWyo) Bayesian Classification April 2011 2 / 34
Dynamic web sites Change based on user interaction Web mail Online stores Search engines Social networks Banking sites All of the top 10 most popular[1] websites are dynamic Brandon Skari (UWyo) Bayesian Classification April 2011 3 / 34
SQL Injection Injection attacks are one of the foremost threats facing dynamic web applications[23, 13, 24, 5] Allow attackers to: Access, modify and delete data Bypass authentication Denial of service Compromise the server Brandon Skari (UWyo) Bayesian Classification April 2011 4 / 34
Queries Queries usually come in 4 types: Select Insert Update Delete A typical query may look like the following[6]: SELECT name FROM U s e r s WHERE name = Brandon AND password = 1234 Brandon Skari (UWyo) Bayesian Classification April 2011 5 / 34
Queries Brandon Skari (UWyo) Bayesian Classification April 2011 6 / 34
Queries A developer has written: SELECT name FROM U s e r s WHERE name = $name AND password = $password What happens when the user enters the following for the username? name = The query is run as the following: SELECT name FROM U s e r s WHERE name = AND password = Brandon Skari (UWyo) Bayesian Classification April 2011 7 / 34
Types of attacks Halfond[13] classified attacks based on their intention Identifying injectable parameters Performing database fingerprinting Determining database schema Adding or modifying data Denial of service Accessing data Avoiding detection Bypassing authentication Executing remote commands Performing privilege escalation Brandon Skari (UWyo) Bayesian Classification April 2011 8 / 34
Attack techniques There are a number of techniques that attackers use to attack a database[6, 13, 20, 21] Multiple queries Unions Invalid queries Tautologies Inference Detection evasion Second order attacks Brandon Skari (UWyo) Bayesian Classification April 2011 9 / 34
Multiple queries The easiest attacks to implement SQL allows multiple queries to be run in sequence, if they are separated by semicolons Attackers can run essentially query, allowing full access to the database Most database engines include special commands to access files or other information from the operating system Some frameworks disallow multiple queries Brandon Skari (UWyo) Bayesian Classification April 2011 10 / 34
Multiple queries SELECT name FROM U s e r s WHERE name = $name AND password = $password What if the user enters: ; DROP TABLE U s e r s The query is run as: SELECT name FROM U s e r s WHERE name = ; DROP TABLE Users -- AND password = Brandon Skari (UWyo) Bayesian Classification April 2011 11 / 34
Unions Normal SELECT commands retrieve a number of rows from a database UNION can be used to append data from other tables and even from unrelated applications This can expose data that would normally be inaccessible Brandon Skari (UWyo) Bayesian Classification April 2011 12 / 34
Unions SELECT name FROM U s e r s WHERE name = $name AND password = $password What if the user enters: UNION SELECT Admin The query is run as SELECT name FROM U s e r s WHERE name = UNION SELECT Admin -- AND password = Brandon Skari (UWyo) Bayesian Classification April 2011 13 / 34
Unions SELECT a. f o r u m i d, a. a u t h v i e w, a. auth mod FROM p h p b b a u t h a c c e s s a, p h p b b u s e r g r o u p ug WHERE ug. u s e r i d = $ i d AND ug. u s e r p e n d i n g = 0 AND a. g r o u p i d = ug. g r o u p i d What if the user enters the following: 0 UNION SELECT TABLE SCHEMA, TABLE NAME, COLUMN NAME FROM i n f o r m a t i o n s c h e m a.columns This would return the database name, table name, and column name of everything that s stored in the database engine. wiki - Users - name wiki - Users - password store - Customer - creditcardnumber Brandon Skari (UWyo) Bayesian Classification April 2011 14 / 34
Invalid queries When presented with an invalid query, most database engines will provide detailed information that can aid in debugging Attackers can use this information to determine: Database vendor and version Operating system Data types Schema information Most databases are configured by default to run in developer mode You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near AND password = Brandon Skari (UWyo) Bayesian Classification April 2011 15 / 34
Tautologies Tautology attacks work by modifying conditionals to make the statement always true Either remove conditionals or add new conditionals SELECT name FROM U s e r s WHERE name = $name AND password = $password SELECT name FROM U s e r s WHERE name = Brandon OR 1 = 1 AND password = Brandon Skari (UWyo) Bayesian Classification April 2011 16 / 34
Inference Applications that are in production typically disable debugging to the point that if attacks succeed, no usable feedback is provided Used against mysql.com on March 27 2011 Normal blind attacks Cannot see error messages but can see the results of modified queries Use IF statements to selectively change a value to infer information Totally blind attacks Attackers get no visual feedback Use statements that selectively slow a query to infer information SELECT name FROM U s e r s WHERE name = Brandon AND IF((SELECT SUBSTR(password, 0, 1) FROM Users WHERE name = admin ) < ASCII( L ), 1, 0) -- AND password = Brandon Skari (UWyo) Bayesian Classification April 2011 17 / 34
Detection evasion Many intrusion detection systems rely on string comparisons or check for illegal characters Detection evasion tries to evade these detection patterns Build strings using concatenation, insertion, substring Build strings from hashes or constant string function MySQL lets you specify strings in hex: 0x4272616E646F6E Functions to encode numbers to ASCII characters Brandon Skari (UWyo) Bayesian Classification April 2011 18 / 34
Second order attacks All of the attacks shown thus far have taken input directly from the an attacker and used it to modify queries Second order attacks work by inserting data into a database and then retrieving it later and using it in attacks These are much more difficult to predict because the database and previous results are usually considered a safe source All data that could have ever originated from a user need to be sanitized every time it s used in a query Brandon Skari (UWyo) Bayesian Classification April 2011 19 / 34
Second order attacks Searching for posts by users whose name matches a particular pattern SELECT username FROM p h p b b u s e r s WHERE username LIKE $ r e g e x SELECT p o s t i d, post name FROM p h p b b p o s t s WHERE username = What if a user registers the following username? f a k e UNION SELECT username, u s e r p a s s w o r d FROM p h p b b u s e r s Brandon Skari (UWyo) Bayesian Classification April 2011 20 / 34
Prevention So what can you, as a developer, do to prevent injection?[13] Sanitize input Parameterizing input Input type checking Positive pattern matching The problem with all these techniques is that they must be applied consistently by hand Restrict privileges to minimize potential damage Brandon Skari (UWyo) Bayesian Classification April 2011 21 / 34
Input sanitization $ c o n n e c t i o n = new m y s q l i ( 1 2 7. 0. 0. 1, Brandon, 1234, w i k i ) ; $name = $ POST [ name ] ; $password = $ POST [ password ] ; $query = SELECT name FROM U s e r s WHERE name = $name AND password = $password ; $ r e s u l t = $ c o n n e c t i o n >q u e r y ( $query ) o r d i e ( $ c o n n e c t i o n >c o n n e c t e r r o r ( ) ) ; So what happens when the user enters: Brandon ; DROP TABLE Users -- The problem is that single quote changes the meaning of the query. By escaping that quote, we can remove its significance. $name = $ c o n n e c t i o n >m y s q l r e a l e s c a p e s t r i n g ( $ POST [ name ] ) ; $password = $ c o n n e c t i o n >m y s q l r e a l e s c a p e s t r i n g ( $ POST [ password ] ) ; Now, name will be set to Brandon \ ; DROP TABLE Users -- This must be done every time you use values that originated from a user! Brandon Skari (UWyo) Bayesian Classification April 2011 22 / 34
Input sanitization Unfortunately, you have to do more than just escape strings! SQL lets you leave numeric constants unquoted, so this is ok: SELECT username FROM Users WHERE id = 5 What happens if we do this with PHP? $ i d = $ c o n n e c t i o n >m y s q l r e a l e s c a p e s t r i n g ( $ POST [ i d ] ) ; $ query = SELECT username FROM U s e r s WHERE i d = $ i d ; $ r e s u l t = $ c o n n e c t i o n >q u e r y ( $query ) ; What if the user enters: 5 ; DROP TABLE Users When this string is escaped, it is not changed! Always quote your variables inside of queries $ query = SELECT username FROM U s e r s WHERE u s e r i d = $ i d ; The database will either ignore the second part or give an error Brandon Skari (UWyo) Bayesian Classification April 2011 23 / 34
Input sanitization User input can come from unexpected places[19] HTTP POST requests and GET requests Hidden fields HTTP headers (referrer, user-agent, etc.) Cookies In PHP, arrays are stored as key-value pairs: id 1, email foo@bar. The extract function moves an array s key-value pairs into variable scope e x t r a c t ( $ POST ) ; is equivalent to $ i d = $ POST [ i d ] ; $ e m a i l = $ POST [ e m a i l ] ; $newpassword = ; f o r ( $ i = 0 ; $ i < 8 ; $ i ++) $newpassword.= chr ( rand ( 9 7, 122) ) ; e x t r a c t ( $ POST ) ; $connect >q u e r y ( UPDATE u s e r s SET password = $newpassword WHERE i d = $ i d ) ; Brandon Skari (UWyo) Bayesian Classification April 2011 24 / 34
Parameterizing input Easiest, most reliable way to prevent injection attacks Queries are prepared with placeholders where data will go User input is bound and sent to the database separately The database treats input as values so there is no possibility of injection Parameterized queries are slower than dynamically built ones Brandon Skari (UWyo) Bayesian Classification April 2011 25 / 34
Parameterizing input In PHP, when you bind a parameter, you have to specify its type i integer d double s string b binary blob $ c o n n e c t i o n = new m y s q l i ( 1 2 7. 0. 0. 1, Brandon, 1234, w i k i ) ; $ i d = $ POST [ i d ] ; $ t o p i c = $ POST [ t o p i c ] ; $stmt = $ c o n n e c t i o n >p r e p a r e ( SELECT postcount FROM p o s t s WHERE u s e r i d =? AND t o p i c =? ) ; $stmt >bind param ( i s, $id, $ t o p i c ) ; $stmt >e x e c u t e ( ) ; $stmt >c l o s e ( ) ; Brandon Skari (UWyo) Bayesian Classification April 2011 26 / 34
Input type checking $ i d = $ c o n n e c t i o n >m y s q l r e a l e s c a p e s t r i n g ( $ POST [ i d ] ) ; $ query = SELECT username FROM U s e r s WHERE i d = $ i d ; $ r e s u l t = $ c o n n e c t i o n >q u e r y ( $query ) ; An attacker could enter: 5 ; DROP TABLE USERS -- What if you just typecast as you receive it from the user? $ i d = ( i n t ) $ POST [ i d ] ; Your framework will either reject the input, or ignore the bad part No injection is possible through integers! Brandon Skari (UWyo) Bayesian Classification April 2011 27 / 34
Positive pattern matching It s difficult to predict all the bad things that an attacker can do It s easy to predict what kinds of things are normal Use regular expressions to match what is expected i f (1 == preg match ( ˆ\w+$, $name ) ) Brandon Skari (UWyo) Bayesian Classification April 2011 28 / 34
Black box testing Black box testing tests an application without examining source code SQL Inject Me[25] WAVES[15] SQLiX[7] HP WebInspect[14] IBM Rational AppScan[16] Black box testing is limited by its list of prebuilt attacks Will probably not find second order attacks Brandon Skari (UWyo) Bayesian Classification April 2011 29 / 34
Static analysis Analyzes the source code for problems No runtime overhead JDBC checker[8] LAPSE[19] PHP-Checker[27] Pixy[17] Source code is not always available Can present false positives because they can t determine if a source is safe[27] Fixing the source can be difficult Brandon Skari (UWyo) Bayesian Classification April 2011 30 / 34
Static and dynamic analysis Source code is analyzed and then queries are monitored at runtime A model of normal queries is built and queries that don t match are rejected AMNESIA[11] DFA web filter[22] Source code is not always available Fixing the source can be difficult If queries are not identified during the static analysis, there will be false positives Brandon Skari (UWyo) Bayesian Classification April 2011 31 / 34
Taint tracking User input is marked as tainted as it enters an application Tainted data are tracked Queries that are generated using input are checked to make sure that tainted data do not modify the query SQLGuard[4] Java String class[10] WASP[12] SQLrand[3] CANDID[2] Metacharacter tracking[26] Hamming distance[18] Ignore second order attacks False sense of security Brandon Skari (UWyo) Bayesian Classification April 2011 32 / 34
Blacklisting Reject queries based on certain features Effectiveness is proportional to the sophistication of the detection technique Doesn t require access to source code Requires no interaction from the user or modification of code GreenSQL[9] SQLassie Blacklists must be updated to deal with new attacks Runtime overhead Best used when writing an application Brandon Skari (UWyo) Bayesian Classification April 2011 33 / 34
Permissions MySQL lets you define permissions on a per user/host basis Each user/host combination can have a separate password and permission set Most web applications shouldn t need to DROP TABLES, so remove those permissions Give each application a separate account and only give that user access to one particular database Make sure the passwords are strong! Limit connections to machines you trust Brandon Skari (UWyo) Bayesian Classification April 2011 34 / 34
[1] Alexa. Alexa top 500 global sites. http://www.alexa.com/topsites, December 2010. [2] Bandhakavi, S., Bisht, P., Madhusudan, P., and Venkatakrishnan, V. N. CANDID: preventing SQL injection attacks using dynamic candidate evaluations. In CCS 07: Proceedings of the 14th ACM conference on Computer and communications security (New York, NY, USA, 2007), ACM, pp. 12 24. [3] Boyd, S. W., and Keromytis, A. D. SQLrand: Preventing SQL injection attacks. In In Proceedings of the 2nd Applied Cryptography and Network Security (ACNS) Conference (2004), pp. 292 302. [4] Buehrer, G., Weide, B. W., and Sivilotti, P. A. G. Using parse tree validation to prevent SQL injection attacks. Brandon Skari (UWyo) Bayesian Classification April 2011 34 / 34
In Proceedings of the 5th international workshop on Software engineering and middleware (New York, NY, USA, 2005), SEM 05, ACM, pp. 106 113. [5] Christey, S., and Martin, R. A. CVE - vulnerability type distributions in CVE. http://cve.mitre.org/docs/vuln-trends/, May 2007. [6] Clarke, J. SQL Injection Attacks and Defense, first ed. Syngrass, May 2009. [7] Cochin, C. SQLiX project. http://www.owasp.org/index.php/category: OWASP_SQLiX_Project, August 2006. [8] Gould, C., Su, Z., and Devanbu, P. JDBC checker: a static analysis tool for SQL/JDBC applications. In Software Engineering, 2004. ICSE 2004. Proceedings. 26th International Conference on (May 2004), pp. 697 698. Brandon Skari (UWyo) Bayesian Classification April 2011 34 / 34
[9] GreenSQL. GreenSQL open source SQL database security, SQL injection prevention. http://www.greensql.net, December 2010. [10] Haldar, V., Chandra, D., and Franz, M. Dynamic taint propagation for Java. In In Proceedings of the 21st Annual Computer Security Applications Conference (2005), pp. 303 311. [11] Halfond, W. G. J., and Orso, A. Combining static analysis and runtime monitoring to counter SQL-injection attacks. In Proceedings of the third international workshop on Dynamic analysis (New York, NY, USA, 2005), WODA 05, ACM, pp. 1 7. [12] Halfond, W. G. J., Orso, A., and Manolios, P. Using positive tainting and syntax-aware evaluation to counter SQL injection attacks. Brandon Skari (UWyo) Bayesian Classification April 2011 34 / 34
In Proceedings of the 14th ACM SIGSOFT international symposium on Foundations of software engineering (New York, NY, USA, 2006), SIGSOFT 06/FSE-14, ACM, pp. 175 185. [13] Halfond, W. G. J., Viegas, J., and Orso, A. A classification of SQL-injection attacks and countermeasures. In Proc. of the International Symposium on Secure Software Engineering (2006). [14] Hewlett Packard. HP WebInspect Software. http://www.hp.com/go/appsec, December 2010. [15] Huang, Y.-W., Huang, S.-K., Lin, T.-P., and Tsai, C.-H. Web application security assessment by fault injection and behavior monitoring. In Proceedings of the 12th international conference on World Wide Web (New York, NY, USA, 2003), WWW 03, ACM, pp. 148 159. [16] IBM. IBM software - AppScan product line. Brandon Skari (UWyo) Bayesian Classification April 2011 34 / 34
http://www-01.ibm.com/software/awdtools/appscan/, December 2010. [17] Jovanovic, N. Pixy: XSS and SQLI scanner for PHP. http://pixybox.seclab.tuwien.ac.at/pixy/, July 2007. [18] Liu, A., Yuan, Y., Wijesekera, D., and Stavrou, A. SQLProb: a proxy-based architecture towards preventing SQL injection attacks. In Proceedings of the 2009 ACM symposium on Applied Computing (New York, NY, USA, 2009), SAC 09, ACM, pp. 2054 2061. [19] Livshits, V. B., and Lam, M. S. Finding security vulnerabilities in Java applications with static analysis. In Proceedings of the 14th conference on USENIX Security Symposium - Volume 14 (Berkeley, CA, USA, 2005), USENIX Association, pp. 18 18. [20] Mackay, C. A. Brandon Skari (UWyo) Bayesian Classification April 2011 34 / 34
SQL injection attacks and some tips on how to prevent them. http://www.codeproject.com/kb/database/ SqlInjectionAttacks.aspx, January 2005. [21] Mavituna, F. SQL injection cheat sheet. http: //ferruh.mavituna.com/sql-injection-cheatsheet-oku, March 2007. [22] Muthuprasanna, M., Wei, K., and Kothari, S. Eliminating SQL injection attacks - a transparent defense mechanism. In Web Site Evolution, 2006. WSE 06. Eighth IEEE International Symposium on (2006), pp. 22 32. [23] OWASP. Open web application security project (OWASP) top ten project. http://www.owasp.org/index.php/category: OWASP_Top_Ten_Project, December 2010. Brandon Skari (UWyo) Bayesian Classification April 2011 34 / 34
[24] SANS Institute. SANS: The top cyber security risks. http://www.sans.org/top-cyber-security-risks/, September 2009. [25] SecCom. SQL inject me 0.4.5. http://labs.securitycompass.com/index.php/exploit-me/ sql-inject-me/, May 2010. [26] Su, Z., and Wasserman, G. The essence of command injection attacks in web applications. ACM Press, pp. 372 382. [27] Xie, Y., and Aiken, A. Static detection of security vulnerabilities in scripting languages. In Proceedings of the 15th conference on USENIX Security Symposium - Volume 15 (Berkeley, CA, USA, 2006), USENIX Association. Brandon Skari (UWyo) Bayesian Classification April 2011 34 / 34