SQLFlow: PL/SQL Multi-Diagrammatic Source Code Visualization Samir Tartir Department of Computer Science University of Georgia Athens, Georgia 30602 USA Email: startir@uga.edu ABSTRACT: A major problem in software maintenance is the lack of a well-documented source code in software applications. This has led to serious difficulties in software maintenance and evolution. In particular, for those developers who are faced with the task of fixing or modifying a piece of code they never even knew existed before. Database triggers and procedures are parts of almost every application, are typical examples of badly documented software components being the hidden back end of software applications. Source code visualization is one of the heavily used techniques in the literature so as to familiarize software developers with new source code, and compensate for the knowledge-sharing problem. Therefore, a new PL/SQL flowcharting reverse engineering tool, SQLFlow, has been designed and developed. SQLFlow is a three tiers architecture tool that dominates over the currently available flowcharting tools by its powerful multi-diagrammatic and source code metrics extraction capabilities. Finally, future work is planned to integrate SQLFlow with the UML modelling standards in the software industry. KEYWORDS: PL/SQL, Flowchart, Reverse engineering, legacy systems. 1. INTRODUCTION Databases (DBs) form a part of almost every computer application that is currently in use [11]. A huge effort is usually put in designing and building a database and its underlying stored packages, procedures, and triggers. As a part of a larger system, databases usually are called the "back end" of the system, and this has usually led to not paying enough attention to document code that is stored within, contrary to code of a the "front end" which is usually better documented [10]. In many cases, there is no documentation of the project source code and the people who originally wrote the code no longer work with the company. This situation might cause problems for programmers who are faced with the task of fixing or modifying a database code they never even knew existed before. The results of such a situation may be the introduction of more problems than the ones that were intended to fix, or Ayman Issa University of the West of England (UWE), Coldharbour Lane, Frenchay, Bristol BS16 1QY United Kingdom Email: Ayman2.Issa@uwe.ac.uk introducing new problems to other code that was perfectly working before. Therefore, educating programmers about the behaviors of the target code before start modifying it is not an easy task. Source code visualization is one of the heavily used techniques in the literature to overcome this lack of documentation problem, and its consequent knowledge-sharing problem [7, 9]. Therefore, this research aims at investigating the current database source code visualization tools and identifying the main open issues for research. Consequently, the mostly used PL/SQL database programming language has been selected for a flowcharting reverse engineering prototype tool called "SQLFlow". SQLFlow has been designed, implemented, and tested to parse the target PL/SQL code stored in database procedures, analyze its structure, and render its flow in a visual flowchart. Section 2 presents a brief summary for the most wellknown flowcharting tools. Section 3 discusses the high level architecture and the underlying components of SQLFlow. Section 4 contrasts the features of SQLFlow against the available literature. The validation and verification of SQLFlow is detailed in section 5. Finally, an evaluation including an outline of future research work is presented in section 6. 2. COMMERCIAL FLOWCHARTING TOOLS On the professional level, code visual to flowchart and Visutin [1, 3] are the most well known and commonly used commercial flowcharting tools in the literature. Both tools are so powerful to support source code visualization of several programming languages such as: C, C++, C#, VB, Java, and much more. But, they also share the same limitation of supporting very small number of DB scripting languages compared to the supported number of high level programming languages. The underlying operational principle of both tools is identical. They rely on the system user to define the physical location of the target source code to analyze and visualize. However, they differ in the way they present their output and how they integrate with other external systems such as: VISIO and MS Word [1, 3]. In that, Visutin seems to show more powerful integration with external systems. 1
Unfortunately, both tools concentrate on flowcharting program visualization and ignore the other useful diagrams that might be of interest to the users in the different stages of the software development life cycle (SDLC). Moreover, they also discard the role of several complexity metrics that could be extracted from the source code, which are highly beneficial to the different stakeholders in the different stages of the SDLC. On the other hand, several academic attempts [7] have been cited in the literature as prototyping tools to support more types of diagrams, but not metrics. Kita et al. [7] is a typical example for a prototype tool that that aims at educating Java novice programmers by generating flow charts and paths for their target Java files. This raises the flag for an urgent need for a new source code visualization tool that overcome the above major drawbacks of the most well known flowcharting tools. Correspondingly, SQLFlow has been designed to not only generate the flow chart of the target Oracle PL/SQL source code, but also the Flow graph. In addition, several metrics are now extracted from the source code to facilitate the subsequent analysis and testing phases. 3. SYSTEM ARCHITECTURE AND COMPONENTS SQLFlow is a light weight MS Visual Basic 6.0 tool designed to read an input file stored from an Oracle PL/SQL database procedure and generate the corresponding flowchart and flow graph that reflect its operational flow. A model-view-controller (MVC) three-tier architecture [10] has been adopted in SQLFlow implementation to separate the concerns of the different system components. The first tier, the view tier, is represented by a graphical user interface (GUI) Builder component that is responsible on analyzing the results of parsing the source code and generating the corresponding diagrams. The middle tier, the controller tier, holds the parsing and analyzing business logic component. And, the last tier, the persistence model tier, is represented by the file management component. Figure 1 depicts the visual view of the system architecture. Sections 3.1 and 3.2 discuss the details of the controller and view components, respectively. 3.1 SOURCE CODE PARSING COMPONENT Figure 1: SQLFlow Architecture. The main responsibility of the parser is to find the lexical tokens of the source code and identify the tokens that will change the flow of control. Therefore, it goes through the input PL/SQL source code file, line by line, and identifies the keywords that will determine its execution flow. For example, it will recognize a "Begin" keyword, as a start of a new bock. On the other hand, finding an "If" statement will start a branching statement that should be closed by an "End If" or "Else" keyword. Currently, there are three types of control flow statements; sequential, conditions and repetition. While accessing each token, the parser dynamically involves the GUI builder component that will handle any requirements to reflect the token's action on the chart. Figure 2 presents the pseudo code of the parser s component workflow. For each line in the input file Read current word; Case Word Case "--": single-line: ignore; Case "/*": multiple-line comment: find closing and ignore; Case "BEGIN": Signal a new program Case "DECLARE": ignore and step to the next "Begin"; Case "EXCEPTION": Signal an exception Case "ELSE": Signal else; Case "IF": Find the corresponding "THEN" and signal "IF" Case "LOOP": Signal "LOOP" Case "FOR": Find the corresponding "LOOP" statement and signal a "FOR" Case "WHILE": Find the corresponding "LOOP" statement and signal a "WHILE" Case "END": Get next word Case ";": Close last "BEGIN" Case "LOOP": Close last "LOOP, FOR, or WHILE" Case "IF": Close last "IF" Case "WHEN": Find the corresponding "THEN" and Signal a "WHEN" Case "RAISE": find the corresponding ";" and signal a statement; Otherwise: "one-word;", "SQL", "oneword...;", or...:=...;": Get next word: Case SQL: SQL Statement: find corresponding ";" and signal statement; Case Assignment: Assignment statement: Signal assignment statement; Otherwise: Procedure call: Signal procedure call statement; Figure 2: Parser Component Pseudo Code. 2
3.2 GUI BUILDER COMPONENT The main responsibility of the GUI builder component is to handle the actual generation of the flowchart and flow graph diagrams. It "listens" to signals by the parser, and processes each signal appropriately to draw the correct shapes to reflect the actions of the signal. For example, if the signal was for a loop, the drawer will draw the loop parallelogram shape and will keep a note of an open loop that needs an "end loop;" statement to close it. Figure 3 summarizes the parser signals caught by the GUI builder and its corresponding reaction. Hence, the GUI Builder builds the required diagrams dynamically by showing each shape that is being added to the diagram. This makes it easier to follow the progress of the parsing process and trace the generated diagrams to its source code fragment. 4. SQLFLOW VALIDATION AND EVALUATION Several DB applications have been targeted to validate and verify SQLFlow. An iterative validation and verification process has been adopted to evaluate the generated flowcharts, flow graphs, and source code metrics. The intention is to prioritize SQLFlow functionalities and get the core functionality, source code parsing and analyzing, delivered earlier. Then, the other supportive functionalities, GUI and file management, are delivered in subsequent iterations. Further, focusing on validating and verifying one component at a time had resulted in cleaner and more structured source code to facilitate future reusing and integration with other systems. On the other hand, earlier iterations functionalities tend to receive more testing with every new system version [5]. Figures 4, 5, and 6 present a sample PL/SQL source code, and its corresponding flowchart, and flow graph, respectively, generated using SQLFlow. The top right corner of the flowchart and flow graph diagrams is allocated for the resulting source code metrics. The flowchart related metrics [4] are: maximum nesting depth, total number of SQL statements, total number of conditional statements, and total number of repetition statements. On the other hand, the flow graph metrics are: cyclomatic complexity [8], number of regions, and number of predicates. A typical SDLC consists of 5 main phases: requirements analysis, design, implementation, testing, and deployment. The objective of the extracted source code metrics is to be used as a historical data to guide and inform current and future software development projects. Current projects could utilize the extracted metrics, e.g. cyclomatic complexity, conditional statements, etc, to inform the upcoming testing, deployment, and maintenance phases; while future projects could combine the extracted metrics with effort and productivity information to be reused in estimating new software development projects. "BEGIN": Draw a rectangle with the word "Begin". "EXCEPTION": Draw a rectangle with the word "Exception". "IF": Draw a diamond containing the "ELSE": Draw another branch for the last "IF" diamond and move drawing to that branch. "LOOP": Draw a parallelogram with the word "Loop". "FOR": Draw a parallelogram with the loop "WHILE": Draw a parallelogram with the loop "END": Draw a rectangle that closes the last "BEGIN" and draw a line between the "BEGIN" and "END" rectangles. "END LOOP": Draw a rectangle that closes the last "LOOP" and draw a line between the "LOOP" parallelogram and the "END" rectangle. "END IF": Draw a rectangle that closes the last "IF" and draw a line between the "IF" diamond and the "END" rectangle. "WHEN": Draw a diamond containing the "RAISE": Draw a rectangle with the raise statement. "SQL": Draw a rectangle with the SQL statement. "Assignment": Draw a rectangle with the assignment statement. "CALL": Draw a rectangle with the call. Figure 3: GUI Builder Caught Signals and itscorresponding Reaction. 5. SQLFLOW FEATURES COMPARISON Comparing SQLFlow features to the available commercial and academic flowcharting tools [1, 3, 7]; it has been found that SQLFlow dominates over them from different perspectives. In particular, the multidiagrammatic and metrics extraction appear to be unique in SQLFlow. However, further development is needed to extend SQLFlow scalability to consider large volumes DB triggers and procedures. Table 1 summarizes the 3
results of contrasting the different flowcharting tools with SQLFlow. declare a number; b number; c number; d number; e number; f number; g number; h number; i number; j number; begin Proc1;Proc2(a, b, c); raise form_trigger_failure; if a=1 or(b=2 and(c=3 and(d=4 or(e=5 and f=6)or g=7)and h=8)or(i=9 and j=10)) then message('condition met'); else message('condition Failed'); end if; FOR K IN 1..54 LOOP MESSAGE(I); END LOOP; end; Figure 4: Sample PL/SQL Source Code. Figure 6: Sample PL/SQL Source Code Flow Graph. Table 1: Flowcharting Tools Features Comparison. Feature Code visual to Visutin SQLFlow flowchart Support Function and No Yes Yes Procedure Call Provide Source No No Yes Code Metrics Support Error No No Yes Detection Operating System No Yes Yes Limitations Performance High High High Generated Diagrams High High High Clarity Support Multiple Diagrams Generation No No Yes 6. CONCLUSION AND FUTURE WORK A new PL/SQL flowcharting tool, SQLFlow, has been design, built, and presented. It aims at facilitating the process of knowledge sharing between the different software development team members. On the other hand, it eases the burden of understanding and maintaining the existing legacy systems. Figure 5: PL/SQL Source Code Sample Flowchart. SQLFlow is a light weight three-tier architecture tool that separates the concerns of the embodied diverse components. Contrasting SQLFlow against other professional and academic flowcharting tools, it has been concluded that SQLFlow dominates over them by its powerful multi-diagrammatic and metrics extraction features. However, further development is needed to extend SQLFlow scalability. 4
Future work is planned to upgrade SQLFlow to visualize the source code using the standard UML [2] activity diagrams. This paves the way to integrate SQLFlow into a UML design tool, such as rational rose, making the PL/SQL code a part of the formal model of the application. Finally, applying SQLFlow to several legacy DB systems will result in building large volume multi-purpose software metrics repository that could be utilized to inform the diverse SDLC phases [6], e.g. cost estimation and planning phase, of prospective software development projects. 7. ACKNOWLEDGEMENT Samir Tartir would like to express his gratitude to Prof. Talib Al-Sari for his useful feedback during SQLFlow development. REFERENCES [1] Aivosto, (2003). Visustin Flow Chart Generator [online]. Aivosto.com. Available from: http://www.aivosto.com/visustin.html [Accessed 21-5-2005]. [2] Booch, G., Rumbaugh, J. and Jacobson, I., (1999). The Unified Modeling Language User Guide. Reading, Mass. Harlow: Addison-Wesley. [3] Fatesoft (1997). Code Visual to Flowchart [online]. USA: Available from: http://www.fatesoft.com/s2f/ [Accessed 21-5- 2005]. [4] Fenton, N. and Neil, M., (2000). Software Metrics:Roadmap in International Conference on Software Engineering New York, NY, USA ACM Press, pp.357-370. [5] Harrold, M., (2000). Testing:a Roadmap in International Conference on Software Engineering New York, NY, USA IEEE-CS: Computer Society, pp.61-72. [6] Issa, A., Odeh, M., and Coward, D., (2005). Using Use Case Models to Generate Object Points. in Kokol, P. Proceedings of the IASTED International Conference on Software Engineering Austria. ACTA Press, pp.468-473. [7] Kita, Y. Kawasoe T. Katayama T., (2005). Prototype Of An Automatic Visualization Tool For Java To Educate Novice Programmers in Kokol, P. Proceedings of the IASTED International Conference on Software Engineering Austria. ACTA Press, pp.307-312. [8] McCABE, T., (1976). A Complexity Measure IEEE Transactions on Software Engineering, 2 (4), pp. 308-320. [9] Quatrani, T., (2000). Visual Modeling With Rational Rose 2000 and UML. Rev. ed. Boston, Mass. London: Addison Wesley. [10] Sommerville, I., (2001). Software Engineering. 6th ed. Harlow, England ; New York : Addison- Wesley. [11] Sommerville, Ian and Kotonya, Gerald, (1998). Requirements Engineering: Processes and Techniques. Chichester, New York : J. Wiley. 5