Experiment 5.1 How to measure performance of database applications?

.1 CSCI315 Database Design and Implementation Experiment 5.1 How to measure performance of database applications? Experimented and described by Dr. Janusz R. Getta School of Computer Science and Software Engineering, University of Wollongong, Australia, Bldg. 3, room 210, phone +61 02 42214339, fax +61 02 42214170, e-mail: jrg@uow.edu.au, Web: http://www.uow.edu.au/ jrg, Msn: jgetta, Skype: jgetta007 Table of contents Step 0 How to begin and what you need to know before you start? Step 1 How to roughly measure time spent on the execution of SQL statement? Step 2 How to use TIMING option of SQL*Plus client? Step 3 How to find an execution plan of SQL statement? Step 4 How to use automatic tracing of SQL statements? Step 5 How to find an impact of ARRAYSIZE parameter on performance? Step 6 How to clean up after the experiment? References Actions Step 0 How to begin and what you need to know before you start? A printable copy of this experiment in pdf format is available here.turn the system on. Download and uncompress SQL scripts used in Homework 5. Use cd command to navigate to a folder where the downloaded and uncompressed SQL scripts are located. Start SQL*Plus client in a way described in either Experiment 1.1 for XP operating system or in Experiment 1.2 for Linux operating system. Connect as a user STUDENT. A password is: student.in 0 These are the specifications of the homeworks in a subject Database Design and Implementation (CSCI315) delivered in January 2009 at Singapore Institute of Management by Dr. Janusz R. Getta 5-1

Experiment 5.1: How to measure performance of database applications? 5-2 the next few experiments we will use a sample database that contains information about books, customers, orders submitted by customers, etc. The database is an implementation of TPC-W benchmark database ( www.tpc.org ). A user CSCI315 owns the relational tables of the sample database. The total size of all relational tables included in the database as around 300+ Mbytes. The database contains synthetic data generated by wgen program. A conceptual schema of the database is available here. A script dbcreate5.sql has been used to create the relational tables of the sample database. Access to the relational tables of the sample database has been granted to you by a user CSCI315.It is important to familarize yourself with the relational structures of the database, i.e. to browse the schemas of relational tables and to be aware of the primary and foreign key constraints. A good idea is to read through a script dbcreate5.sql first. Connect to a database server and execute a script file listtabs.sql The script lists the names of all relational tables included in the sample database. While connected as a user STUDENT, execute script file listconstr.sql to lists the descriptions of attributes and consistency constraints imposed on the relational tables in the sample database. Remain connected as a user STUDENT and execute a script file listcount.sql to find the total number of rows in each one of the relational tables of the sample database. Counting all rows will take same time, so please be patient. As the relational tables are quite large it is not recommended to SELECT * from the relational tables as it will take ages or even longer to scroll down through all rows selected from the larger relational tables. If it is necessary, this experiment shows later on how to turn of listing of the rows retrieved by SELECT statement. This experiment require a locally managed tablespace to store the execution plans. A script creatbs40k.sql contains a statement that creates a tablespace TBS40K.When ready, execute the script.execute a script grantquota.sql to grant 5 Mbytes quota on a tablespace TBS40K a user STUDENT.Execute a script makedef.sql to makes the tablespace a default table space of a user STUDENT. Step 1 How to roughly measure time spent on the execution of SQL statement? While connected as a user STUDENT, execute a script file timecnt.sql. The script roughly measures time spent on the execution of SELECT statement that counts the total number of rows in a relational table ORDERS.The script displays the timestamps obtained from the system through an access to SYSTIMESTAMP pseudo-column before and after the execution of SELECT statement that counts the rows, see below. SELECT TO_CHAR(SYSTIMESTAMP, DD-MON-YYYY HH24:MI:SS.FF4 ) STARTED FROM DUAL; SELECT count(*) SELECT TO_CHAR(SYSTIMESTAMP, DD-MON-YYYY HH24:MI:SS.FF4 ) COMPLETED FROM DUAL; A technique used in the script provides a pretty rough estimation of time as it includes both statement processing and data transmission time as well as time spent by SQL*Plus

Experiment 5.1: How to measure performance of database applications? 5-3 client on listing the results. In this experiment we exercise more sophisticated techniques to measure the execution time and total number of data blocks read/written by the system while processing SQL statement. To compare time spent on counting all rows in a relational table with counting of selected rows, execute a script file timesel.sql. The script counts the rows in a relational table ORDERS that satisfy a given condition, see a statement below. SELECT TO_CHAR(SYSTIMESTAMP, DD-MON-YYYY HH24:MI:SS.FF4 ) STARTED FROM DUAL; SELECT count(*) FROM CSCI315.ORDERS WHERE O_TAX > 100; SELECT TO_CHAR(SYSTIMESTAMP, DD-MON-YYYY HH24:MI:SS.FF4 ) COMPLETED FROM DUAL; The results generated by the script show that counting of the rows that satisfy a given condition takes more time than just counting all rows. How is it possible? It is possible because counting all rows in a relational table does not need access to a relational table! Instead, the system traverses a leaf level of an index built over a primary key. On the other hand to count the rows that satisfy a given condition, the system has to perform a full scan of a relational table to evaluate the condition against the contents of each row (unless an index can be used to pick the rows). Step 2 How to use TIMING option of SQL*Plus client? To get a bit more precise estimation of execution time a user should set to ON SQL*Plus variable TIMING.While connected as a user STUDENT, execute the following statement: SET TIMING ON Next, execute a statement: SELECT COUNT(*) Then, type / to re-execute the same statement for the second time. Setting TIMING option causes SQL*Plus client to display the amounts of time elapsed after the execution of every SQL statement. Time elapsed determines the amounts of real time spent on the processing of SQL statement. Note, that the numbers listed by SQL* Plus client denote real time and not processor time. This is why the amounts of elapsed time also depend on the current load

Experiment 5.1: How to measure performance of database applications? 5-4 of the system and on the size of a buffer used by SQL*Plus to receive data. It is important to note, that the second and each next execution of SQL statement takes less time than the first one. This is because, the first execution loads the data blocks from a leaf level of primary key index into a data buffer cache and each next execution accesses these blocks from a data buffer cache and not from disk drive. To turn the timing of SQL statements off execute a statement: SET TIMING OFF Step 3 How to find an execution plan of SQL statement? An important feature of a query processing system is its ability to report an execution plan cooked for a given SQL statement. a statement. A query execution plan consists of the elementary steps performed by the system at run time, e.g. sorting of a relational table, vertical scan of an index, horizontal scan of an index, join of the relational tables, etc. Analysis of a query execution plan allows for the identification of performance problems. While connected as a user STUDENT, execute the following statement: EXPLAIN PLAN FOR SELECT * EXPLAIN PLAN statement listed above stores an execution of SQL statement provided in FOR clause in a relational table PLAN_TABLE.Remain connected as a user STUDENT and execute a script file showplan.sql that display the contents of PLAN_TABLE in a nice format (see a printout below). Id Operation Name Rows Bytes Cost (%CPU) Time 0 SELECT STATEMENT 259K 13M 715 (4) 00:00:09 1 TABLE ACCESS FULL ORDERS 259K 13M 715 (4) 00:00:09 The system plans to implement SELECT statement through a full scan of a relational table ORDERS (see the contents of Operation column). The contents of the columns Rows and Bytes determines the estimated numbers of rows and bytes to be read when computing an operation, e.g. the system estimates that a full scan of a relational table ORDERS needs to read 259,000 rows, which is equivalent to reading of 13 Mbytes of persistent storage. The

Experiment 5.1: How to measure performance of database applications? 5-5 columns Cost and Time determine the estimated costs of query computation and estimated time spent by the system on implementation of the execution plan. Execute the following statement: EXPLAIN PLAN FOR SELECT * FROM CSCI315.ORDERS WHERE O_ID = 1234567; Again, execute a script showplan.sql to list an execution plan prepared by the system for SELECT statement above. The system should display the following plan: --------- Id Operation Name Rows Bytes Cost (%CPU) Time --------- 0 SELECT STATEMENT 1 56 2 (0) 00:0 1 TABLE ACCESS BY INDEX ROWID ORDERS 1 56 2 (0) 00:0 * 2 INDEX UNIQUE SCAN ORDERS_PKEY 1 1 (0) 00:0 --------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("o_id"=1234567) To compute a query above the system plans to vertically traverse an index on a primary key ORDERS_PKEY on a primary key O_ID in a relational table ORDERS and then to access the relational table using a row identifier assigned to a value of index key found at a leaf level of the index. Step 4 How to use automatic tracing of SQL statements? Tracing of SQL statements submitted at SQL> prompt provides the execution plans and statistics of what has happened during the executions. An output from the tracing of a single SQL statement includes an execution plan, the total number of physical read operations, the total number of bytes transmitted over a network, the total number of sorts performed, the total number of rows processed etc. Remain connected as a user STUDENT. To turn on the automatic tracing of SQL execute the statements: SET AUTOTRACE ON SET AUTOTRACE TRACEONLY

Experiment 5.1: How to measure performance of database applications? 5-6 Starting from now all SQL statements submitted at SQL> prompt are followed by the execution plans and detailed statistics from the executions. No results of SQL statement are listed due to TRACEONLY option set by SET AUTOTRACE statement. An option TRACEONLY suppresses the display of results coming from the executions of SQL statements. It makes possible a safe execution of SELECT * FROM large-table statement without spending ages on listing the contents of large-table.autotrace option of SQL*Plus has the following parameters: ON EXPLAIN display the execution plan, ON STATISTICS display I/O, CPU, and network statistics, ON display execution plan and statistics, TRACEONLY display execution plan and statistics and suppress the results of SQL state OFF turn off AUTOTRACE option. While connected as a user STUDENT, execute the following statement (make sure that an option TRACEONLY suppressed the listings of a relational table ORDERS ): SELECT * Execution of the statement requires a full scan of a relational table ORDERS and because of that it will take some time, so please be patient. When ready type / to repeat the execution after some data have been loaded into a data buffer cache. The system should reply with the following messages: Id Operation Name Rows Bytes Cost (%CPU) Time 0 SELECT STATEMENT 259K 13M 715 (4) 00:00:09 1 TABLE ACCESS FULL ORDERS 259K 13M 715 (4) 00:00:09 Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 19650 consistent gets 2448 physical reads 0 redo size 19390082 bytes sent via SQL*Net to client

Experiment 5.1: How to measure performance of database applications? 5-7 190449 bytes received via SQL*Net from client 17281 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 259200 rows processed An Execution Plan section of the printout tells how the system plans to execute a statement. In the case above, the system plans to scan entire table ORDERS (TABLE ACCESS (FULL) OF ORDERS ) Execution plan is very important because it is a source of information on whether the system plans to use an index or not. The number of recursive calls determines the total number of SQL statements executed by the system in order to process a given SQL statement. For example, these are the additional SELECT statements executed to examine the contents of data dictionary to find if the relational tables used the original statement exist in a database and whether a user has the right to access these tables. The number of db block gets determines the total number of times a data block was requested and read either from disk or data buffer cache without checking if the block has been updated by another transaction without verification of the transactional consistency of the block. The number of consistent gets means the total number of times a consistent read either from disk or data buffer cache was requested. A difference between db block gets and consistent gets is such that consistent gets may require the additional operations on the rollback or undo segments in a situation when the block has been already updated by a transaction that logically follows the traced statement. A union of db block gets and consistent gets is frequently called as logical I/O. The number of physical reads means the total number of data blocks read from disk storage. This number equals the total number of physical reads directly to application s transient memory plus all reads into data buffer cache. Physical reads are frequently called as physical I/O. Then, the total number of reads satisfied by the contents of data buffer cache is equal to (logical I/O + physical I/O) and data buffer cache hit ratio is equal to: (logical I/O + physical I/O) / logical I/O Redo size denotes the total amount of redo data measured in bytes and generated by the traced statement. The total number of bytes sent via SQL*Net to client determines the total number of bytes sent to the client from the server processes during the processing of the statement. The total number of bytes received via SQL*Net from client determines the total number of bytes received from the client when processing of the statement. The total number of SQL*Net round-trips to/from client means determines the total number of messages sent to and received from the client during the processing of the statement. The total number of sorts (memory) The total number of sorts (disk) determines the total number of sort operations that required at least one disk write operation. The total number of rows processed means the number of rows processed during the processing of the traced statement. Note, that a value of physical reads has decreased in the second execution. This is because the first full scan through a relational table ORDERS loaded and kept some of the data blocks in a data buffer cache such that the second scan did not need to read from a disk drive. Also note, that a value of consistent gets remained the same in

Experiment 5.1: How to measure performance of database applications? 5-8 both executions. We will use this parameter to determine the complexity of processed SQL statements.. Step 5 How to find an impact of ARRAYSIZE parameter on performance? The statistics reported by AUTOTRACE option of SQL*Plus seem to be a very useful tool for the estimation of performance of SQL statement. However, watch out!!! The results reported by AUTOTRACE option strongly depend on a value of ARRAYSIZE parameter of SQL*Plus. It means that execution of the same statement against the same database server and against the same database returns different results for different values of parameter ARRAYSIZE. Here are the examples. While connected as a user STUDENT, execute the following statement: SET ARRAYSIZE 15 The statement above sets the total number of rows fetched by SQL*Plus client from a database in one go. The valid values of parameter ARRAYSIZE are from 1 to 5000. Next, execute the following statements several times and record the results of the last execution: SET AUTOTRACE ON SET AUTOTRACE TRACEONLY SELECT * The results of the last execution are as follows (except execution plan): Statistics ---------------------------------------------------------- 323 recursive calls 0 db block gets 19785 consistent gets 0 physical reads 0 redo size 19074195 bytes sent via SQL*Net to client 121191 bytes received via SQL*Net from client 17281 SQL*Net roundtrips to/from client 5 sorts (memory) 0 sorts (disk) 259200 rows processed Next, execute a statement that sets the largest possible value of parameter ARRAYSIZE :

Experiment 5.1: How to measure performance of database applications? 5-9 SET ARRAYSIZE 5000 Finally, execute the following statement several times and record the results of the last execution: SET AUTOTRACE ON SET AUTOTRACE TRACEONLY SELECT * The results of the latest trace are as follows: Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 2594 consistent gets 523 physical reads 0 redo size 17159907 bytes sent via SQL*Net to client 941 bytes received via SQL*Net from client 53 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 259200 rows processed The comparison of the last two traces of the same statement provides quite disturbing results. The total number of consistent gets significantly dropped after changing a value of parameter ARRAYSIZE! A clue to this mystery is hidden in the values of the parameters: bytes sent via SQL*Net to client,bytes received via SQL*Net from client,sql*net round The total number of SQL*Net round-trips to/from client dropped from 17281 to 53! It means that less transmissions were needed to bring the results from a server to a client. The lower numbers of the parameters bytes sent via SQL*Net to client,bytes received via SQL*Net fro firm this hypothesis. If less transmissions brings the same number of rows then it means that transmission unit is larger and less consistent gets is needed to read all data! If a value of ARRAYSIZE is smaller than to transfer the contents of a database block a number of consistent gets of the same block is larger because the same block must be read several times to transfer its contents piece by piece to a client. The comparisons of the values of parameter consistent gets obtained from the executions of different SQL statements are relevant as long as the tracing of all executions is performed with the same value of parameter ARRAYSIZE.

Experiment 5.1: How to measure performance of database applications? 5-10 Step 6 How to clean up after the experiment? While connected as a user STUDENT, execute a script makedef.sql to make a tablespace USERS a default tablespace of a user STUDENT.At the end of the experiment we remove the database objects created so far. While connected as a user STUDENT execute a script clean5-1.sql To drop a tablespace created in this experiment. The script contains the following statement. DROP TABLESPACE TBS40K INCLUDING CONTENTS AND DATAFILES; References SQL Reference, SELECT statement SQL Reference, CREATE TABLESPACE statement SQL Reference, DROP TABLESPACE statement SQL Reference, ALTER USER statement SQL Reference, EXPLAIN PLAN statement SQL Reference, DROP TABLE statement SQL Reference, SYSTIMESTAMP function SQL*Plus Reference, SET TIMING statement SQL*Plus Reference, SET AUTOTRACE statement SQL*Plus Reference, SET ARRAYSIZE statement Reference, DBA TABLES view Reference, DBA CONSTRAINTS view Reference, DBA CONS COLUMNS view Reference, PLAN TABLE table