WRITING EFFICIENT SQL By Selene Bainum
About Me Extensive SQL & database development since 1995 ColdFusion Developer since 1996 Author & Speaker Co-Founder RiteTech LLC IT & Web Company in Washington, DC Area Created ModernHOA.com Web Site Platform for HOAs and Small Businesses
Overview Why Improve SQL Efficiency Design Considerations Worst Technique Joins & Subqueries Other Techniques & Tips Conclusion / Q&A
Efficiency, Who Cares? System Administrators Memory costs money and resources They hate being paged at night Site Visitors Slow site = lost customers Your Boss! You personal satisfaction of writing good code
Database Efficiency Decreases total page processing time Reduces load on the database Allows more time for other processes: Code logic HTML/JavaScript returned to browser Site overhead tasks
Sample Database USDA National Nutrient Database 13 tables Thousands of rows Available at: http://ndb.nal.usda.gov/
Design Considerations Keys/Indexes Primary Keys Foreign Keys Indexes Create index for each foreign key field in your table Create index for other fields commonly used for filters Speed up queries significantly Can increase time to insert/update
Worst Technique
Avoid: SELECT * Database must query column names from system table Takes more time than selecting column names even if all columns are needed Does not allow for column aliasing ColdFusion queries may break if columns added to table Better Yet: query only columns you need!
ColdFusion Metrics Food_Des table; 8194 Rows; Avg. over 10 queries SELECT * SELECT * FROM Food_Des Avg Time: 60.8 ms SELECT [All Column Names] SELECT NDB_No, FdGrp_Cd, Long_Desc, Shrt_Desc, ComName, ManufacName, Survey, Ref_Desc, Refuse, SciName, N_Factor, Pro_Factor, Fat_Factor, CHO_Factor FROM FOOD_DES Avg Time: 56.4 ms SELECT [Needed Columns] SELECT NDB_No, FdGrp_Cd FROM Food_Des Avg Time: 6.1 ms
Joins & Subqueries
Joins & Subqueries Are absolutely necessary and useful Increase processing time Can be optimized Don t use more than necessary
Join Types Inner Join Only returns rows where values exist and match in both tables Outer joins Returns rows from one table even if no matching value exists in other table Cross Join Returns Cartesian product of rows: every combination of values in left table for every value in right table Don t Do! Self Join Joining a table to itself either as inner or outer join
Join Options WHERE Clause INNER JOIN FROM T1, T2 WHERE T1.ID = T2.ID LEFT OUTER JOIN FROM T1, T2 WHERE T1.ID *= T2.ID RIGHT OUTER JOIN FROM T1, T2 WHERE T1.ID =* T2.ID
Join Options FROM Clause INNER JOIN FROM T1 INNER JOIN T2 ON T1.ID = T2.ID LEFT OUTER JOIN FROM T1 LEFT OUTER JOIN T2 ON T1.ID = T2.ID RIGHT OUTER JOIN FROM T1 RIGHT OUTER JOIN T2 ON T1.ID = T2.ID
Join Issues Standard SQL encourages joins in FROM clause FROM clause processed before WHERE clause Every database different for WHERE clause Check DB documentation for specific examples Microsoft SQL Server WHERE Clause Joins Future versions may not support at all
WHERE vs. FROM WHERE Clause: SELECT FD.NDB_No, FD.FdGrp_Cd, FD.Long_Desc, FD.Shrt_Desc, FG.FdGrp_Desc FROM Food_Des FD, Fd_Group FG WHERE FD.FdGrp_Cd = FG.FdGrp_Cd Average time: 121 ms FROM Clause: SELECT FD.NDB_No, FD.FdGrp_Cd, FD.Long_Desc, FD.Shrt_Desc, FG.FdGrp_Desc FROM Food_Des FD INNER JOIN Fd_Group FG ON FD.FdGrp_Cd = FG.FdGrp_Cd Average time: 110 ms
LEFT OUTER Joins - Filtering Goal: return all foods and their level of alcohol Tables Food_Des food definitions Nut_Data nutritional levels for foods Nutr_Def nutritional definitions Not all foods will have a nutritional record for alcohol
LEFT OUTER Joins - Filtering Add filter to the WHERE clause: SELECT FD.NDB_No, FD.Long_Desc, ND.Nutr_Val FROM Food_Des FD LEFT OUTER JOIN ( Nut_Data ND INNER JOIN Nutr_Def NR ON ND.Nutr_No = NR.Nutr_No ) ON FD.NDB_No = ND.NDB_No WHERE NR.NutrDesc LIKE '%alcohol%' ORDER BY FD.Long_Desc Returns 4887 records (not 8194) WHERE clause filter turns LEFT OUTER join into INNER join
LEFT OUTER Joins - Filtering Add filter to the FROM clause: SELECT FD.NDB_No, FD.Long_Desc, ND.Nutr_Val FROM Food_Des FD LEFT OUTER JOIN ( Nut_Data ND INNER JOIN Nutr_Def NR ON ND.Nutr_No = NR.Nutr_No ) ON FD.NDB_No = ND.NDB_No AND NR.NutrDesc LIKE '%alcohol%' ORDER BY FD.Long_Desc Returns 8194 records All foods returned, some with NULL for level of alcohol
Join vs. Subquery Joins Typically used when you want to return the data from the tables included Subqueries Typically used when you want to return data from one or more tables Results is dependent on data in other tables Data from those other tables is not desired
Subquery Goal: Return all food groups that contain foods with alcohol Tables: Fd_Group distinct food groups Food_Desc food definitions Nut_Data nutritional levels for foods Nutr_Def nutritional definitions Will only return 2 rows
WHERE IN SELECT FG.FdGrp_Cd, FG.FdGrp_Desc FROM Fd_Group FG WHERE FG.FdGrp_Cd IN ( SELECT FD.FdGrp_Cd FROM Food_Des FD INNER JOIN Nut_Data ND ON FD.NDB_No = ND.NDB_No INNER JOIN Nutr_Def NR ON ND.Nutr_No = NR.Nutr_No WHERE NR.NutrDesc LIKE '%alcohol% AND ND.Nutr_Val > 0 ) Returns entire sub-query for each row Very inefficient Avg. time: 465 ms
WHERE EXISTS SELECT FG.FdGrp_Cd, FG.FdGrp_Desc FROM Fd_Group FG WHERE EXISTS ( SELECT 1 FROM Food_Des FD INNER JOIN Nut_Data ND ON FD.NDB_No = ND.NDB_No INNER JOIN Nutr_Def NR ON ND.Nutr_No = NR.Nutr_No WHERE NR.NutrDesc LIKE '%alcohol%' AND FG.FdGrp_Cd = FD.FdGrp_Cd AND ND.Nutr_Val > 0 ) Returns much smaller subset for each row Much faster than WHERE IN Columns in sub-select are irrelevant Avg. time: 449 ms - 3.44% more efficient than WHERE IN
Why Not JOIN? SELECT DISTINCT FG.FdGrp_Cd, FG.FdGrp_Desc FROM Fd_Group FG INNER JOIN Food_Des FD ON FG.FdGrp_Cd = FD.FdGrp_Cd INNER JOIN Nut_Data ND ON FD.NDB_No = ND.NDB_No INNER JOIN Nutr_Def NR ON ND.Nutr_No = NR.Nutr_No AND NR.NutrDesc LIKE '%alcohol%' AND ND.Nutr_Val > 0 DISTINCT must be used to limit rows Much slower than both WHERE IN and WHERE EXISTS Avg. time: 2786 ms - 499% less efficient than WHERE IN
Other Techniques & Tips
Views Saved query Great for repeatable logic Removes need to join multiple tables SLOW!
Eliminate Views Create tables that hold desired data Complete with primary key, foreign keys and indexes Create a script or procedure to query the data used in the view Truncate table and insert new data Repeat as necessary Refresh rate based on dynamic nature of data Even repopulating table every 5 minutes can be more efficient
Unions The Good: Combine results of multiple queries Allows multiple result sets to be merged into one query The Bad: Multiple unions in the same query can be inefficient The more rows returned, the slower it can be Results could take minutes
Slow Union Workaround Create a stored procedure Create a temporary table that contains all the columns returned Have each query insert its records into the table Query all the columns (not using *) Drop the temporary table
Excessive Joins & Columns The culprits: Joining many tables Returning many columns Thousands of rows The solution: Create a stored procedure Break one query into multiple queries that create temporary tables One query to join the temporary tables
Troubleshooting Tips Run directly in database Comment out sections and add logic in line by line Helps isolate which portion or line is causing issues Try creative and unique ideas Just because it doesn t seem like it would be faster or better doesn t mean it isn t
One Size Does Not Fit All A best practice may not always be best for your database Data types, table sizes, etc all play a role Try different techniques Select what works best Queries may need to be rewritten as database grows
Questions & Answers
Thank You Selene Bainum http://www.ritetech.net sbainum@ritetech.net