Abstract parsing: static analysis of dynamically generated string output using LR-parsing technology



Similar documents
DISTRIBUTED DATA PARALLEL TECHNIQUES FOR CONTENT-MATCHING INTRUSION DETECTION SYSTEMS

DISTRIBUTED DATA PARALLEL TECHNIQUES FOR CONTENT-MATCHING INTRUSION DETECTION SYSTEMS. G. Chapman J. Cleese E. Idle

Partial optimal labeling search for a NP-hard subclass of (max,+) problems

A Spam Message Filtering Method: focus on run time

Performance of a Browser-Based JavaScript Bandwidth Test

Optical Illusion. Sara Bolouki, Roger Grosse, Honglak Lee, Andrew Ng

Unit 11 Using Linear Regression to Describe Relationships

Report b Measurement report. Sylomer - field test

A note on profit maximization and monotonicity for inbound call centers

Group Mutual Exclusion Based on Priorities

Two Dimensional FEM Simulation of Ultrasonic Wave Propagation in Isotropic Solid Media using COMSOL

MSc Financial Economics: International Finance. Bubbles in the Foreign Exchange Market. Anne Sibert. Revised Spring Contents

Scheduling of Jobs and Maintenance Activities on Parallel Machines

CASE STUDY BRIDGE.

Project Management Basics

DUE to the small size and low cost of a sensor node, a

A technical guide to 2014 key stage 2 to key stage 4 value added measures

MECH Statics & Dynamics

Name: SID: Instructions

Queueing systems with scheduled arrivals, i.e., appointment systems, are typical for frontal service systems,

Support Vector Machine Based Electricity Price Forecasting For Electricity Markets utilising Projected Assessment of System Adequacy Data.

Cluster-Aware Cache for Network Attached Storage *


6. Friction, Experiment and Theory

v = x t = x 2 x 1 t 2 t 1 The average speed of the particle is absolute value of the average velocity and is given Distance travelled t

Availability of WDM Multi Ring Networks

TRADING rules are widely used in financial market as

Mixed Method of Model Reduction for Uncertain Systems

Health Insurance and Social Welfare. Run Liang. China Center for Economic Research, Peking University, Beijing , China,

Chapter 10 Stocks and Their Valuation ANSWERS TO END-OF-CHAPTER QUESTIONS

Bi-Objective Optimization for the Clinical Trial Supply Chain Management

Bidding for Representative Allocations for Display Advertising

Profitability of Loyalty Programs in the Presence of Uncertainty in Customers Valuations

2. METHOD DATA COLLECTION

Solution of the Heat Equation for transient conduction by LaPlace Transform

Assigning Tasks for Efficiency in Hadoop

Testing Documentation for CCIH Database Management System By: John Reeves, Derek King, and Robert Watts

Bio-Plex Analysis Software

12.4 Problems. Excerpt from "Introduction to Geometry" 2014 AoPS Inc. Copyrighted Material CHAPTER 12. CIRCLES AND ANGLES

Mobile Network Configuration for Large-scale Multimedia Delivery on a Single WLAN

Pekka Helkiö, 58490K Antti Seppälä, 63212W Ossi Syd, 63513T

Senior Thesis. Horse Play. Optimal Wagers and the Kelly Criterion. Author: Courtney Kempton. Supervisor: Professor Jim Morrow

CHARACTERISTICS OF WAITING LINE MODELS THE INDICATORS OF THE CUSTOMER FLOW MANAGEMENT SYSTEMS EFFICIENCY

A New Optimum Jitter Protection for Conversational VoIP

Online story scheduling in web advertising

Assessing the Discriminatory Power of Credit Scores

INFORMATION Technology (IT) infrastructure management

CASE STUDY ALLOCATE SOFTWARE

FEDERATION OF ARAB SCIENTIFIC RESEARCH COUNCILS

Linear energy-preserving integrators for Poisson systems

AN OVERVIEW ON CLUSTERING METHODS

TIME SERIES ANALYSIS AND TRENDS BY USING SPSS PROGRAMME

Towards Control-Relevant Forecasting in Supply Chain Management

Performance of Multiple TFRC in Heterogeneous Wireless Networks

Benchmarking Bottom-Up and Top-Down Strategies for SPARQL-to-SQL Query Translation

A Note on Profit Maximization and Monotonicity for Inbound Call Centers

A Review On Software Testing In SDlC And Testing Tools

Control of Wireless Networks with Flow Level Dynamics under Constant Time Scheduling

Tap Into Smartphone Demand: Mobile-izing Enterprise Websites by Using Flexible, Open Source Platforms

SRA SOLOMON : MUC-4 TEST RESULTS AND ANALYSI S

Algorithms for Advance Bandwidth Reservation in Media Production Networks

INTERACTIVE TOOL FOR ANALYSIS OF TIME-DELAY SYSTEMS WITH DEAD-TIME COMPENSATORS

Incline and Friction Examples

Solutions to Sample Problems for Test 3

Research Article An (s, S) Production Inventory Controlled Self-Service Queuing System

Ohm s Law. Ohmic relationship V=IR. Electric Power. Non Ohmic devises. Schematic representation. Electric Power

User Behavior Analysis Using Alignment Based Grammatical Inference from Web Server Access Log

Module 8. Three-phase Induction Motor. Version 2 EE IIT, Kharagpur

Risk Management for a Global Supply Chain Planning under Uncertainty: Models and Algorithms

January 21, Abstract

NETWORK TRAFFIC ENGINEERING WITH VARIED LEVELS OF PROTECTION IN THE NEXT GENERATION INTERNET

Mimicry Attacks on Host-Based Intrusion Detection Systems

Warehouse Security System based on Embedded System

Queueing Models for Multiclass Call Centers with Real-Time Anticipated Delays

1) Assume that the sample is an SRS. The problem state that the subjects were randomly selected.

How Enterprises Can Build Integrated Digital Marketing Experiences Using Drupal

Apigee Edge: Apigee Cloud vs. Private Cloud. Evaluating deployment models for API management

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science

How To Understand The Hort Term Power Market

Morningstar Fixed Income Style Box TM Methodology

1 Introduction. Reza Shokri* Privacy Games: Optimal User-Centric Data Obfuscation

COMPANY BALANCE MODELS AND THEIR USE FOR PROCESS MANAGEMENT

QUANTIFYING THE BULLWHIP EFFECT IN THE SUPPLY CHAIN OF SMALL-SIZED COMPANIES

MANAGING DATA REPLICATION IN MOBILE AD- HOC NETWORK DATABASES (Invited Paper) *

Parallel Adaptive Multigrid Methods in Plane Linear Elasticity Problems

REDUCTION OF TOTAL SUPPLY CHAIN CYCLE TIME IN INTERNAL BUSINESS PROCESS OF REAMER USING DOE AND TAGUCHI METHODOLOGY. Abstract. 1.

σ m using Equation 8.1 given that σ

A Resolution Approach to a Hierarchical Multiobjective Routing Model for MPLS Networks

Turbulent Mixing and Chemical Reaction in Stirred Tanks

Redesigning Ratings: Assessing the Discriminatory Power of Credit Scores under Censoring

Multi-Objective Optimization for Sponsored Search

Growing Self-Organizing Maps for Surface Reconstruction from Unstructured Point Clouds

HUMAN CAPITAL AND THE FUTURE OF TRANSITION ECONOMIES * Michael Spagat Royal Holloway, University of London, CEPR and Davidson Institute.

Solution to Problem Set 1

1. Introduction. C. Camisullis 1, V. Giard 2, G. Mendy-Bilek 3

Trusted Document Signing based on use of biometric (Face) keys

Achieving Quality Through Problem Solving and Process Improvement

SPECIFICATIONS FOR PERIMETER FIREWALL. APPENDIX-24 Complied (Yes / No) Remark s. S.No Functional Requirements :

RISK MANAGEMENT POLICY

SELF-MANAGING PERFORMANCE IN APPLICATION SERVERS MODELLING AND DATA ARCHITECTURE

Transcription:

Abtract paring: tatic analyi of dynamically generated tring output uing LR-paring technology Kyung-Goo Doh 1, Hyunha Kim 1, David A. Schmidt 2 1 Hanyang Univerity, Anan, South Korea 2 Kana State Univerity, Manhattan, Kana, USA Abtract. We combine LR(k)-paring technology and data-flow analyi to analyze, in advance of execution, the document generated dynamically by a program. Baed on the document language context-free reference grammar and the program control tructure, the analyi predict how the document will be generated and pare the predicted document. Our trategy remember context-free tructure by computing abtract LR-pare tack. The technique i implemented in Objective Caml and ha tatically validated a uite of PHP program that dynamically generate HTML document. 1 Introduction Scripting language like PHP, Perl, Ruby, and Python ue tring a a univeral data tructure to communicate value, command, and program. For example, one might write a PHP cript that aemble within a tring variable an SQL query or an HTML page or an XML document. Typically, the well-formedne of the aembled tring i verified when the tring i upplied a input to it intended proceor (databae, web brower, or interpreter), and an incorrectly aembled tring might caue proceor failure. Wore till, a maliciou uer might deliberately upply mileading input that generate a document that attempt a cro-ite-cripting or injection attack. A a firt tep toward preventing failure and attack, the well-formedne of a dynamically generated, grammatically tructured tring (document) hould be checked with repect to the document context-free reference grammar (for SQL or HTML or XML) before the document i upplied to it proceor. Better till, the document generator program itelf hould be analyzed to validate that all it generated document are well formed with repect to the reference grammar, like an application program i type checked in advance of execution. In thi paper, we employ LR(k)-paring technology and data-flow analyi to analyze tatically a program that dynamically generate document a tring, and at the ame time, pare the dynamically generated tring with the context-free reference grammar for the document language. We compute abtract pare tack that remember the context-free tructure of the tring.

x = a r = ] while... x = [. x. r print x X0 = a R = ] X1 = X0 X2 X2 = [ X1 R X3 = X1 (Read. a an infix tring-append operation.) Fig.1. Sample program and it data-flow equation 2 Motivating example Say that a cript mut generate an output tring that conform to thi grammar, S a [S] where S i the only nonterminal. (HTML, XML, and SQL are uch bracket language.) The grammar i LR(0), but it can be difficult to enforce even for imple program, like the one in Figure 1, left column. Perhap we require thi program to print only well-formed S-phrae the occurrence of x at print x i a hot pot and we mut analyze x poible value. An analyi baed on type checking aign type (reference-grammar nonterminal) to the program variable. The occurrence of x can indeed be data-typed a S, but r ha no data type that correpond to a nonterminal. An analyi baed on regular expreion (Chritenen [2], Minamide [6], Waerman [8]) olve flow equation hown in Figure 1 right column in the domain of regular expreion, determining that the hot pot (X3 ) value conform to the regular expreion, [ a ], but thi doe not validate the aertion. A grammar-baed analyi (Thiemann [7]) treat the flow equation a a et of grammar rule. The type of x at the hot pot i X3. Next, a language-incluion check trie to prove that all X3-generated tring are S-generable. Our approach olve the flow equation in the domain of pare tack X3 meaning i the et of LR-pare of the tring that might be denoted by x.

. S [ 0 1 S [. S] S. [S ] S. a S. a S [ S. S [S ] 5 S. 2 S a. a a 3 4 S [S. ] ] S [S]. pare tack (top lie at right) input equence (front lie at left) 0 [[a]] 0 :: 1 [a]] (becaue goto( 0,[) = 1) 0 :: 1 :: 1 a]] 0 :: 1 :: 1 :: 2 ]] (reduce:s a) 0 :: 1 :: 1 S ]] 0 :: 1 :: 1 :: 3 ]] (becaue goto( 1, S) = 3) 0 :: 1 :: 1 :: 3 :: 4 ] (reduce:s [S]) 0 :: 1 S ] 0 :: 1 :: 3 ] 0 :: 1 :: 3 :: 4 (reduce:s [S]) 0 S 0 :: 5 (finihed) Fig.2. goto controller for S [S] a and an example pare of [[a]] Aume that the reference grammar i LR(k); we firt calculate it LR-item and build it pare ( goto ) controller. We interpret the flow equation in Figure 1 a function that map an input pare tate to (a et of) output pare tack.

x = a r = ] while... x = [. x. r print x X0 = a R = ] X1 = X0 X2 X2 = [ X1 R X3 = X1. S 0 1 S [. S] S. [S ] S. a S. a S [ S. S [S ] 5 S. 2 S a. a a [ 3 4 S [S. ] ] S [S]. To analyze the hot pot at X3, we generate the function call, X3( 0 ), where 0 i the tart tate for paring an S-phrae. The flow equation, X3 = X1, generate X3( 0 ) = X1( 0 ) which itelf demand a pare of the tring generated at point X1 from tate 0 : X1( 0 ) = X0( 0 ) X2( 0 ) The union of the pare from X0 and X2 mut be computed. 3 Conider X0( 0 ): X0( 0 ) = goto( 0,a) = 2 (reduce:s a) goto( 0, S) = 5 A pare of tring a from 0 generate 2, a final tate, that reduce to nonterminal S, which generate 5 an S-phrae ha been pared. (The ignifie a reduce tep to a nonterminal.) The completed tack i therefore 0 :: 5. The remaining call, X2( 0 ), goe X2( 0 ) = ([ X1 R)( 0 ) = goto( 0,[) (X1 R) = 1 (X1 R) = 1 :: (X1( 1 ) R) The operator equence the pare tep: for pare tack, t, and function, E, t E = t :: E(top(t)), that i, the tack made by appending t to the tack returned by E(top(t)). Then, X1( 1 ) = X0( 1 ) X2( 1 ) compute to 3, and X2( 0 ) = 1 :: (X1( 1 ) R) = 1 :: ( 3 R) = 1 :: 3 :: R( 3 ) = 1 :: 3 :: 4 (reduce:s [S]) goto( 0, S) = 5 That i, X2( 0 ) built the tack, 1 :: 3 :: 4, denoting a pare of [S], which reduced to S, giving 5. 3 In general, the function compute et of pare tack. In thi example, all the et are ingleton.

x = a r = ] while... x = [. x. r print x X0 = a R = ] X1 = X0 X2 X2 = [ X1 R X3 = X1 Here i the complete lit of olved function call: X3( 0 ) = X1( 0 ) X1( 0 ) = X0( 0 ) X2( 0 ) = = 5 5 = 5 X0( 0 ) = goto( 0,a) = 2 goto( 0, S) = 5 X2( 0 ) = goto( 0,[) (X1 R) = 1 :: X1( 1 ) R = = 1 :: 3 :: R( 3 ) = 1 :: 3 :: 4 goto( 0, S) = 5 R( 3 ) = goto( 3,]) = 4 X1( 1 ) = X0( 1 ) X2( 1 ) = = 3 3 = 3 (ee comment below) X0( 1 ) = goto( 1,a) = 2 goto( 1, S) = 3 X2( 1 ) = goto( 1,[) (X1 R) = 1 :: (X1( 1 ) R) = = 1 :: 3 :: R( 3 ) (ee comment below) = 1 :: 3 :: 4 goto( 1, S) = 3 The olution i X3( 0 ) = 5, validating that the tring printed at the hot pot mut be S-phrae. Each equation intance, X i ( j ) = E ij, i a firt-order data-flow equation. In the example, X1( 1 ) and X2( 1 ) are mutually recurively defined, and their olution are obtained by iteration-untilconvergence. The flow-equation et i generated dynamically while the equation are being olved. Thi i a demand-driven analyi [1, 3, 4], called minimal function-graph emantic [5], computed by a worklit algorithm.

Worklit, added and proceed from top to bottom: X3( 0) X1( 0) X0( 0) X2( 0) X1( 0) X1( 1) X3( 0) X0( 1) X2( 1) X1( 1) X2( 0) X2( 1) R( 3) X2( 0) X2( 1) X1( 0) X1( 1) Cache update, inerted from top to bottom, where X() P abbreviate Cache[X()] := P X3( 0) X1( 0) X0( 0) X2( 0) X0( 0) reduce( 0, goto( 0,a)) = reduce( 0, 2) = reduce( 0, goto( 0, S)) = reduce( 0, 5) = { 5} X1( 1) X1( 0) { 5} X0( 1) X2( 1) X3( 0) { 5} X0( 1) reduce( 1, goto( 1,a)) = { 3} X1( 1) { 3} R( 3) R( 3) reduce( 3, goto( 3,])) = { 4} X2( 0) ([ :: X1 :: R)( 0) = 1 (X1 :: R) = ( 1 :: X1( 1)) R = 1 :: 3 :: R( 3) = reduce( 0, 1 :: 3 :: 4) = reduce( 0, goto( 0, S)) = { 5} X2( 1) ([ :: X1 :: R)( 1) = { 3} Generated call graph: X3 ( ) 0 X1 ( 0 ) X0 ( 0 ) X2 ( 0 ) X1 ( 1 ) R ( 3 ) X0 ( 1 ) X2 ( 1 ) Fig.3. Worklit-algorithm calculation of call, X3( 0), in Figure 1 The initialization tep place initial call, X0( 0 ), into the worklit and into the call graph and aign to the cache the partial olution, Cache[X0( 0 )]. The iteration tep repeat the following until the worklit i empty: 1. Extract a call, X(), from the worklit, and for the correponding flow equation, X = E, compute E(), folding abtract tack a neceary. 2. While computing E(), if a call, X ( ) i encountered, (i) add the dependency, X ( ) X(), to the call graph (if it i not already preent); (ii) if there i no entry for X ( ) in the cache, then aign Cache[X ( )] and place X ( ) on the worklit. 3. When E() compute to an anwer et, P, and P contain an abtract pare tack not already lited in Cache[X()], then aign Cache[X()] (Cache[X()] P) and add to the worklit all X ( ) uch that the dependency, X() X ( ), appear in the flowgraph.

Concrete emantic: A ource program compute a tore that map variable to tring. The concrete collecting emantic compute a et of tore for each program point; the collecting emantic i then abtracted o that it compute, for each program point, a ingle tore that map each variable to a et of tring. The collecting emantic i overapproximated by the data-flow emantic, which ue flow equation to compute the et of tring denoted by each variable at each program point. In Figure 1, the data-flow emantic compute thee value of variable x at the program point: X0 = {a} X2 = {[ 1] 1 X1} R = {]} X1 = X0 X2 = X3 Let Σ name the tate in the parer goto-controller. A pare tack, t Σ +, model thoe tring that pare to t. Function γ : P(Σ + ) P(String) concretize a et of pare tack into a et of tring: γ(s) = {t String 0 :: 1 :: :: k S and pare( 0, t) = 0 :: 1 :: :: k } The abtract collecting interpretation, X, compute the et of pare tack denoted by a program variable. For flow equation, Xi = E i, the function, X i : Σ P(Σ ), i defined a X i() = [E i ](), where Σ and [t] = {reduce(, goto(, t))}, where t i a terminal ymbol [E 1 E 2 ] = [E 1 ] [E 2 ] [Xj ] = [E j ], where Xj = E j i the flow equation for Xj [E 1 E 2 ] = {reduce(, p ) p ([E 1 ]) [E 2 ]}, where S g = {p :: g(top(p)) p S} where reduce(,p) reduce the final tate within pare tack, :: p. reduce(, p) = t := top(p) if t = m, the final tate for item, T U 1U 2 U m, then p := pop(m,p) // pop m tate, correponding to U 1U 2 U m p := p :: goto(top( :: p ), T) return reduce(,p ) // repeat till finihed ele return p // t wa not a final tate, o nothing to reduce Fig.4. Abtract collecting interpretation: X i() = [E i ] denote the et of pare tack generated by paring the tring denoted by E i, tarting from pare tate.

3 Abtract pare tack In the previou example, the reult for each X i ( j ) wa a ingle tack. In general, a et of pare tack can reult, e.g., for x = [ while... x = x. [ x = x. a. ] X0 = [ X1 = X0 X2 X2 = X1 [ X3 = X1 a ] at concluion, x hold zero or more left bracket and an S-phrae; X3( 0 ) i the infinite et, { 5, 1 :: 3, 1 :: 1 :: 3, 1 :: 1 :: 1 :: 3, }. To bound the et, we abtract it by folding it tack o that no pare tate repeat in a tack. Since Σ, the et of pare-tate name, i finite, folding produce a finite et of finite-ized tack (that contain cycle). A tack egment like p = 1 :: 1 i a linked lit, a graph, 1 1, where the tack top and bottom are marked by pointer; when we puh a tate, e.g., p :: 2, we get 1 1 2. The folded tack i formed by merging ame-tate object and retaining all link: 1 2. (Thi can be written a the regular expreion, + 1 :: 2.) Folding can apply to multiple tate, e.g., 6 7 6 7 6 8 fold to 6 For the above example, X3( 0 ) = { 5, + 1 :: 3}. 7 8.

Flow equation et generated from demand, X3( 0): X0( 0) = [( 0) X1( 0) = X0( 0) X2( 0) X2( 0) = X1( 0) [ X3( 0) = X1( 0) (a.]) Leat fixed-point olution expreed with abtract pare tack: X0( 0) = [( 0) = { 1} Becaue X1 and X2 are mutually defined, we iterate to a olution, where Xi value at iteration j i denoted Xi j: X1 1( 0) = { 1} = { 1} X2 1( 0) = X1 1( 0) [ = fold{ 1 :: 1} = { + 1 } X1 2( 0) = { 1} { + 1 } = {1, + 1 } = { + 1 }. (We can merge the two tack egment ince the firt i a prefix of the econd and ha the ame bottom and top tate.) X2 2( 0) = X1 2( 0) [ = { + 1 :: [(1)} = fold{+ 1 :: 1} = {+ 1 } X1 3( 0) = { 1} { + 1 } = {1, + 1 } = {+ 1 } = X12(0) X2 3( 0) = { + 1 } = X22(0) X3( 0) = { + 1 :: a(1) ]} Firt, + 1 :: a(1) = + 1 :: 2 + 1 :: goto(1, S) = + 1 :: 3. = { + 1 :: 3 :: ](3)} = {+ 1 :: 3 :: 4} The reduction, S [S], plit the tack into two cae: (i) there are multiple 1 within + 1 ; (ii) there i only one 1: = (i){ + 1 :: goto(1, S)} (ii){goto(0, S)} = { + 1 :: 3, 5} Fig.5. Iterative olution with folded pare tack, depicted a regular expreion The reult, X3( 0 ) = { 5, + 1 :: 3}, aert that the tring at X3 might be a well-formed S phrae or it might contain a urplu of unmatched left bracket. At the end of the calculation in Figure 5, the reduction of S [S] i done on the folded tack egment, + 1 :: 3 :: 4, that i, the complete tack i 0 1 3 4, meaning that three tate mut be popped: we travere 4, 3, and 1, and follow the link from the lat tate, 1, to ee what the remaining tack might be. There are two poibilitie: 0 1 reult for each cae, a hown in the Figure. and 0. We compute the

Thi tag not to be removed under penalty of law.

A et of pare tack can be oundly approximated by a ingle, abtract tack: For label et Σ, a Σ-labelled graph, g, i a tuple, node g,edge g,label g, where node g i a et of node, edge g node g node g i a et of directed edge (at mot one per ource, target node pair), and label g : node g Σ aign a label to each node. Let Graph Σ be the et of Σ-labelled graph. An abtract tack i a triple, (g,bot, top), uch that g Graph Σ and bot, top node g mark the bottom and top node of the tack. Let AbStack Σ be the et of abtract tack labelled with Σ-value. Example: the tack, 1 :: 1 :: 3, i modeled a ( {a, b, c}, {(c, b),(b, a)}, [a 1, b 1, c 3], a, c). An abtract tack, (g,bot, top) AbStack Σ, concretize to a et of pare tack: γ(g,bot, top) = {t P(Σ + ) t i a finite path through g from top to bot} Two abtract tack, G 1 = (g 1, bot 1, top 1) and G 2 = (g 2, bot 2, top 2), are compoed by :: into the dijoint union of g 1 and g 2 plu one new edge from bot 2 to top 1: G 1 :: G 2 = ( node g1 node g2, edge g1 edge g2 {(bot 2, top 1)}, label g1 + label g2, bot 1, top 2) An abtract tack i folded (widened) by merging all node that hare the ame label, in effect, equating the node with the label: fold(g,bot, top) = ( { Σ n node g,label g(n) = }, {(, ) (n, n ) edge g,label g(n) =,label g(n ) = }, λ., label g(bot), label g(top)) The abtract interpretation of flow equation, Xi = E i, i the function, X i : Σ P fin (AbStack Σ), defined a X i() = {fold(p) p [E i ]()}. Thi interpretation i ound for the abtract collecting emantic in Figure 4. A et of abtract tack can be further abtracted into a ingle tack of form, Graph Σ P(Σ) P(Σ), by unioning the tack node et, edge et, bot-value and top-value. The reulting tack i a ubgraph of the parer goto-controller. Fig.6. Abtract interpretation defined in term of abtract, folded, pare tack

hot pot reference grammar ocamlyacc LALR(1) table PHP program PHP String flow Analyzer data flow equation Abtract Parer pared OK paring ERR Fig. 7. Implementation 4 Implementation and experiment The implementation i in Objective Caml. The front end of Minamide analyzer for PHP [6] wa modified to accept a PHP program with a hot-pot location and to return data-flow equation with tring operation for the hot pot. ocamlyacc produce an LALR(1) paring table, and the abtract parer ue the data-flow equation and the paring table to pare tatically the tring generated by the PHP program. Abtract paring work directly on character (not token), o the reference grammar i written for cannerle paring. (Performance wa good enough for practical ue.) We applied our tool to a uite of PHP program that dynamically generate HTML document, the ame one tudied by Minamide [6], uing a MacOSX with an Intel Core 2 Duo Proceor (2.56GHz) and 4 GByte memory: webche faqforge phpwim timeclock choolmate file 21 11 30 6 54 line 2918 1115 6606 1006 6822 no. of hot pot 6 14 30 7 1 no. of paring 6 16 36 7 19 pared OK 5 1 19 0 1 pared ERR 1 15 17 7 18 no. of alarm 1 31 16 14 20 true poitive 1 31 13 14 17 fale poitive 0 0 3 0 3 time(ec) 0.224 0.155 1.979 0.228 2.077 We manually identified the hot pot and ran our abtract parer for each hot pot. Since we do not yet have pare-error recovery, each time a pare error wa identified by our analyzer, we located the ource of the error, fixed it, and tried again until no pare error were detected. All the fale-poitive alarm were caued by ignoring the tet within conditional command. The paring time hown in the table i the um of all execution time needed to find all paring error for all hot pot. The reference grammar pare table took 1.323 econd to contruct; thi i not included in the analyi time.

The alarm are claified below: webche faqforge phpwim timeclock choolmate file 21 11 30 6 54 line 2918 1115 6606 1006 6822 no. of hot pot 6 14 30 7 1 no. of paring 6 16 36 7 19 pared OK 5 1 19 0 1 pared ERR 1 15 17 7 18 no. of alarm 1 31 16 14 20 true poitive 1 31 13 14 17 fale poitive 0 0 3 0 3 time(ec) 0.224 0.155 1.979 0.228 2.077 claification occurrence open/cloe tag yntax error 11 open/cloe tag miing 45 uperfluou tag 5 improperly neted 14 miplaced tag 5 ecaped character yntax error 2 All in all, our abtract parer work without limiting the neting depth of tag, validate the yntax reaonably fat, and i guaranteed to find all paring error reducing inevitable fale alarm to a minimum.

Minamide excluded one PHP application, named tagit, from hi experiment [6], ince tagit generate an arbitrary neting depth of tag. In principle, our abtract parer hould be able to validate tagit, but we alo excluded tagit from our tudie becaue the current verion of our abtract parer check that tring-update operation atify an update-invariance property (a tring update mut preerve the exiting pare). Unexpectedly (to u!), o many tring update in tagit violated update invariance that our abtract parer generated too many fale-poitive to be helpful. We can eliminate thee fale poitive with ome eay finite-machine theory: every tring update, e.g., x =... replace(pattern, target, x) define a f..a.-tranducer (pattern i an f..a.; target i the f..a. output; x i the input). For each uch tranducer that appear in flow-equation generation, we compoe the tranducer with the LR-pare controller, itelf a tranducer. The compound tranducer direct the abtract pare. We are implementing thi technique, which can alo be ued for input validation and harpening the reult of tet in conditional command. We are alo implementing reference grammar for application that generate XML tructure.

5 Concluion Injection and cro-ite-cripting attack can be reduced by analyzing the program that dynamically generate document [9]. In thi paper, we have improved the preciion of uch analye by employing LR-paring technology to validate the context-free grammatical tructure of generated document. A pare tree i but the firt tage in calculating a tring meaning. The pared tring ha a emantic (a enforced by it interpreter), and one can encode thi emantic with emanticproceing function, like thoe written for ue with a parer-generator. (Tainting analyi tracking unanitized data i a implitic emantic property that i encoded thi way.) The emantic can then be approximated by the tatic analyi o that abtract paring and abtract emantic proceing proceed imultaneouly.

Thi talk i aved at www.ci.ku.edu/ chmidt/paper/hometalk.html The paper wa publihed at the 2009 Static Analyi Sympoium (Springer LNCS 5673). The paper i aved locally at www.ci.ku.edu/ chmidt/paper Reference 1. G. Agrawal. Simultaneou demand-driven data-flow and call graph analyi. In Proc. Int l. Conf. Software Maintenance, Oxford, 1999. 2. A.S. Chritenen, A. Møller, and M.I. Schwartzbach. Static analyi for dynamic XML. In Proc. PLAN- X-02, 2002. 3. E. Dueterwald, R. Gupta, and M.L. Soffa. A practical framework for demand-driven interprocedural data flow analyi. ACM TOPLAS, 19:992 1030, 1997. 4. S. Horwitz, T. Rep, and M. Sagiv. Demand interprocedural dataflow analyi. In Proc. 3rd ACM SIGSOFT Symp. Foundation of Software Engg., 1995. 5. N.D. Jone and A. Mycroft. Data flow analyi of applicative program uing minimal function graph. In Proc. 13th Symp. POPL, page 296 306. ACM Pre, 1986. 6. Y. Minamide. Static approximation of dynamically generated web page. In Proc. 14th ACM Int l Conf. on the World Wide Web, page 432 441, 2005. 7. P. Thiemann. Grammar-baed analyi of tring expreion. In Proc. ACM workhop Type in language deign and implementation, page 59 70, 2005. 8. G. Waermann, C. Gould, Z. Su, and P. Devanbu. Static checking of dymanically generated querie in databae application. ACM Tran. Software Engineering and Methodology, 16(4):14:1 27, 2007. 9. G. Waermann and Z. Su. Sound and precie analyi of web application for injection vulnerabilitie. In Proc. ACM PLDI, page 32 41, 2007.