Top Ten Qlik Performance Tips Rob Wunderlich Panalytics, Inc 1 About Me Rob Wunderlich Qlikview Consultant and Trainer Using Qlikview since 2006 Author of Document Analyzer and other tools Founder of QlikView Components script library Qlik Luminary and MVP in QlikCommunity Blogger at QlikviewCookbook.com Presenter at Masters Summit for Qlik Instructor at q-on.bi Tweets as @QVCookbook 2 Copyright 2016 Rob Wunderlich 1
Please ask questions. Don t assume you are the only one wondering. 3 Define Performance Response time after a click What is fast and what is slow? depends who you ask. Reload Time Utilization of hardware Cost of purchase, upgrade and management. Development Effort 4 Copyright 2016 Rob Wunderlich 2
When (Not) to Performance Tune " premature optimization is the root of all evil " Donald Knuth Performance Tuning takes time, time is usually money. Best practices are frequently free Have a problem to solve 5 The Tuning Volume Curve Rows Data Model, Expressions Hardware Required Knowledge < Few million Doesn't matter Unimportant Get the numbers right! Few million to tens of millions Many tens of millions to low hundred millions Many hundreds of millions Billions Best practices Has an impact Best Practices Intentional Very important Senior Consultant Critical Critical Expert Specialized techniques Custom planning QV Internals, custom tooling 6 Copyright 2016 Rob Wunderlich 3
Measuring Performance Script Document Log + ScriptLogAnalyzer from QvCookbook Charts Sheet, Object Properties, Calc Time Understand the impacts of cache and multi-processing on these numbers Charts Document Analyzer from QvCookbook 7 Remove Unused Fields Remove Fields that are not being referenced in the front end. Use Document Analyzer to identify unused Fields. Don t get obsessive about this. Focus on the fields with high cardinality. 8 Copyright 2016 Rob Wunderlich 4
DROP FIELDS Question: From a performance perspective, is there a difference between: 1.DROP FIELD [AccountNumber]; DROP FIELD [BillToAddressID]; DROP FIELD [City]; 2.DROP FIELD [AccountNumber], [BillToAddressID], [City]; 9 Remove Unneeded Fact Rows Don t load rows that are not required for analysis. Limit in SQL where possible SQL SELECT * FROM Orders WHERE OrderDate >= '2012-01-01'; 10 Copyright 2016 Rob Wunderlich 5
Limiting QVD Fact Rows This be slow LOAD * FROM data.qvd(qvd) WHERE Date > MakeDate(2012); When limiting rows from a QVD, WHERE Exists() is usually the fastest choice. TempDates: LOAD MakeDate(2012) + RecNo()-1 as Date AutoGenerate 10000 ; data: LOAD * FROM data.qvd(qvd) WHERE Exists(Date); DROP TABLE TempDates; 11 QVD Subset Performance QV11 QV12 12 Copyright 2016 Rob Wunderlich 6
Segment QVDs Four years of facts in a single QVD, ~815M Rows, ~80 Fields 17M Rows per month Loading one year of data: LOAD * FROM OneBig.qvd(qvd) WHERE Date >= 2015-01-01 ; 40 Minutes Modified the extract to create one QVD per month Facts_YYYYMM LOAD * FROM Facts_2015*.qvd(qvd); 6 Minutes 13 Remove Unneeded Dimension Rows A subset ratio of less then 100% indicates an opportunity to eliminate some Dimension rows. Use the Table Viewer to identify. Limit Dimensions using WHERE Exists() or KEEP. Product: LOAD * FROM Product.qvd (qvd) WHERE exists(productid); 14 Copyright 2016 Rob Wunderlich 7
Reduce Cardinality Fields with high number of values impact RAM footprint, Document Save/Open time. Split Timestamps into two fields Date and Time. AutoNumber Keys and non-display Id fields Always use the second autonumber parm to ensure sequential integers for optimum efficiency. AutoNumber(%KeyField, $KeyField ) as %KeyField 15 Cardinality Challenge Excessive Detail Complex fields such as browser UserAgent can have a lot of unique values: Mozilla/5.0 (Linux; Android 4.4.2; Lenovo X2-EU Build/KOT49H) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/30.0.0.0 Mobile Safari/537.36 May be used for filtering like =Sum({<UserAgent-={'*Bot*','*bot*','*Spider*'}>}Clicks) 16 Copyright 2016 Rob Wunderlich 8
Cardinality Challenge Excessive Detail Lots of unique values, but not many unique components. Pre-parse in the script to extract components such as OS Browser Device FieldName #Values Avg Width UserAgent 234,173 156 UserAgentDevice 6 16 UserAgentOS 99 19 UserAgentType 11 16 UserAgentVersion 2,180 16 * 36M rows, 4 Months 17 Beware the Cost of Preceding Load LOAD *, A&B as B2 FROM data.qvd(qvd); LOAD *, A&B as B2; LOAD * FROM data.qvd(qvd); 18 Copyright 2016 Rob Wunderlich 9
Understand Caching The results of a chart calculation are stored and remembered in the Cache. If the "same calculation" is called for again in that chart or another chart, the results are retrieved from the cache and the computation is skipped, thus saving time. Cache is global on the server. A calculation is "the same" if: The same selections are in force The chart has the same Dimensions The expression is identical 19 Weird Cache Facts Expression text must be exactly the same to be considered equivalent. sum(linetotal) SUM(LineTotal) sum(linetotal) sum(linetotal) // a comment Document Analyzer can help identify logically equivalent expressions. Avoid Expression Variations by utilizing Variables or Master Measures Linked Objects 20 Copyright 2016 Rob Wunderlich 10
Weird Cache Facts 2 A calculation is "the same" if: The same selections are in force Any change in selections, even data not connected to the chart, will cause recalculation You must close QlikView, not just the Doc, to reset the cache. Caching can be turned off via easter egg but your computer will turn to dust. 21 Data Island Objects Data Islands are tables that have no linking field to the main model. Commonly used for UI switches like currency selection or select your metrics. 22 Copyright 2016 Rob Wunderlich 11
Impact of Data Island Objects Selections made in the Pick Dimensions listbox update the Reports chart. This is a data change that causes every object on every sheet to be recalculated. Every object in the same state that is 23 Put Data Island Objects in Alt State Put the listbox in an Alternate State and reference that State in objects or expressions as required. 24 Copyright 2016 Rob Wunderlich 12
Measures Should Be In Same Table An expression that requires values from two tables will generally run slower than if all values are available in the same table. Expressions that utilize one table are called "single row operations". The advantage is that QV can process the existing data row by row, and avoid the phase 2 step of assembling intermediate composite tables. ListPrice is in the Product table. Can we get all Fields in single table? Sum(OrderQty * ListPrice) - sum(linetotal) 25 If Necessary, Make a Copy of the Field ListPrice cannot be JOINed to SaleOrderHeader without losing some Product rows. What to do? We can create an additional copy of the ListPrice field for use in this chart Sum(OrderQty * SalesListPrice) - sum(linetotal) * Note: Pre-calculating the expression in the script is an alternative (and preferred) solution. 26 Copyright 2016 Rob Wunderlich 13
Control Detail Table Objects Straight and Pivot tables with millions of output rows will take a long time to calculate, and impact the entire sheet. You generally can t make them calculate faster, but you can make design choices about when you will calculate them. 27 Control Detail Table Objects Hidden objects don t get calculated. Minimized objects don t get calculated. All Container objects get calculated. Export Objects can stay minimized Use a Calculation Condition to limit to a reasonable number of rows. Calculation Condition is an expression, so you can always override it with a button and variable. Prefer Straight table to Pivot table when possible. 28 Copyright 2016 Rob Wunderlich 14
No Short Circuit Expression evaluation does not terminate false branches if(1=1,sum(x)// Always evaluated,sum(y)// Always evaluated ) Both expressions are always evaluated! Usually not a big deal. Except when many big choices that have only one truth for the chart, such as choosing an expression to match UI switches. 29 Short Circuit Optimization Use the Expression Conditional property Or move the if() into a variable calculation =if(ui_currency='usd','sum(sales_usd)','sum(sales_eur)' ) Further Reading http://qlikviewcookbook.com/2014/12/how-not-to-choose-an-expression/ http://qlikviewcookbook.com/2014/12/how-to-choose-an-expression/ 30 Copyright 2016 Rob Wunderlich 15
Sum(If(), the Performance Killer Extremely Slow. IF condition is performed for every row in the dataset Better Alternatives: Move the IF condition to the load script, generate a Flag (0/1) In 99% of cases, Set Analysis can be used In rare cases when Set Analysis is not possible, multiply by a flag As the last resort, use a numeric condition 31 Prefer Numeric to String Comparison Numeric comparisons are faster (double or better) than String comparisons, sum(if(expressship=1, LineTotal)) sum(if(expressship='yes', LineTotal)) (Even better, use Set Analysis when possible!) 32 Copyright 2016 Rob Wunderlich 16
Set Analysis Modifier In a Set Analysis Modifier, there is no performance difference between String and Numeric. sum({<expressshipnum={1}>} LineTotal) sum({<expressshiptext={'yes'}>} LineTotal) Set Analysis uses Search Logic, which is always string based. 33 Using Flags Flag Fields with a value of "0" or "1" can improve the performance of expressions. sum(if(expressshipnum=1, LineTotal))// Wrong! sum(linetotal * ExpressShipNum)// Right! sum({<expressshipnum={1}>}linetotal) // Right also! Which is faster, Multiplication or Set Analysis? It depends Is the time required to create the Set offset by the savings of a faster calculation? 34 Copyright 2016 Rob Wunderlich 17
Pre-calculate in Script When Possible Replace Calculated Dims. =Date(Floor(ShipTime)) should be done in the script FirstName & LastName as Name Calculated Dims don t cache well! Do Business logic in script Instead of: Expr: Sum({<OrderID={ =ShipDate=OrderDate }>}Sales) If(ShipDate=OrderDate, 1, 0) as Flag_SameDayShip Expr: Sum(Sales * Flag_SameDayShip) 35 Challenge I can t pre-aggregate. Evaluate all use cases, not just the lowest granularity. Example: Web Advertising, 100M Facts per month Measures #Impressions (Views) #Clicks Dimensions AdId Date Category Publisher 36 Copyright 2016 Rob Wunderlich 18
Pre-Aggregation Example Requirements I need to see overall Click% (#Clicks / #Impressions) for my selections. I need to trend measures over Months and Category. I may want to filter to a single AdId, Publisher or Category. I need to show the day by day activity for a single AdId. There is only one requirement for Day level data! We can pre-aggregate additional fields : sum(#impressions) as Monthly#Impressions and sum(#clicks) as Monthly#Clicks at the AdId, Month level. Assuming an average lifetime of 100 days per Ad, aggregation in the front end will use fewer rows 4/100. Imagine the pre-aggregation opportunity if the monthly data did not need to filter by AdId! 37 Q & A 38 Copyright 2016 Rob Wunderlich 19