OLAP To increase performaces SQL has been extended in order to have some new operations avaiable. They are:. roll up: aggregates different events to reduce details used to describe them (looking at higher levels of the hierarchy); it is performed join tables then performing group by on a given attribute; it is also possible that a dimension is dropped if its attribute is not present in the group by operation;. drill down: this is the opposite of roll up; details are increased looking at lower levels of the hierarchy; it is possible that new dimensions are added;. slice & dice: slice is simply a selection with equality predicate while dice is a selection in which the predicate is an expression (range or equality conjunct with logical operators); the granularity is not modified since only a subset of the entire information is selected;. pivot: it is used simply to visualize data in a different way from the current state; pivot can be applied at different levels of analysis. Extension of SQL language At first the window clause is discussed; it is composed by the following steps:. partitioning: this operation is similar to the group by but in this case information is not lost; the window clause partition tuples in partitions (more tuples charactherized by the same attribute), but single tuples are still avaible while group by collaps them into a single value;. aggregation window: for each row in the partition it is possible define a set of tuples which can be aggregated with a given function;. row ordering: this a preliminary operation, similar to the order by clause, performed into partitions in order to guarantee that tuples are in the proper partition. Now, starting from a fact table, some examples of SQL sintax usage and window definition are shown. 1
Fact table City Month Amount 1 0 40 150 30 0 0 1 1 The table has, as primary key, two attributes: City and Month; they are foreign keys and each one represents a different dimension. Example 1 For each city and month, the sale amount and the average income of the current month and the two previous months have to be reported. Solution First of all the average computation is different for different cities so the table partition have to be done on that attribute; the average have to be computed on the current month plus the 2 preceiding so (this is the window size), inside the partition, a sort have to be performed on Month attribute; notice that the sort is done internally so it does not affect anyway the final result. This average is a moving average because the outcome changes considering different rows: the default value is 3 rows but if there only 1 or 2 avaiable the average is performed on that rows anyway. 2
City Month Amount MovingAvg 1 1 0 0 40 150 0 30 50 0 0 50 1 1 In order to realize it there are two possible SQL solutions: AVG (Amount) OVER Wavg AS MovingAvg WINDOW Wavg AS ( PARTITION BY City ORDER BY Month ROWS 2 PRECEDING) or the formal form: AVG (Amount) OVER (PARTITION BY City ORDER BY Month ROWS 2 PRECEDING) AS MovingAvg 3
Aggregation window The window can be defined in two different ways:. at the physical level: partitions are computed counting physical rows;. at the logical level: partitions are computed sorted them based on a specific attribute. In the previous example the window is created in a physical level since the clause specify: ROWS 2 PRECEDING; if it is necessary to create it in a logical level the clause will be: RANGE 2 Month PRECEDING. Example 2: cumulative computation For each city and month, the sale amount and the cumulative income of all months in increasing order have to be reported. Solution As before the table have to be partitioned by City in order to compute the cumulative amount; then months have to be sorted in an ascending order otherwise the outcome has no meaning. The windows size changes dimension each time a new row of a given city is considered. The SQL statement is: SUM( Amount) OVER (PARTITION BY City ORDER BY Month ROWS UNBOUNDED PRECEDING) AS CumeTot The result obtained by this code is shown in the following page. 4
City Month Amount CumeTot 1 1 0 0 10 0 2 40 3 150 30 0 0 10 0 20 340 1 5 Example 3: total data For each city and month, the sale amount and the total sale amount in all months have to be reported. Solution As before the table have to be partitioned by City in order to compute the total sale amount, but ulike the previous example no sort is required. The windows size is simply the total partition of the given city considered. The SQL statement is: SUM( Amount) OVER (PARTITION BY City) 5
AS TotAmount City Month Amount TotAmount 1 0 0 40 150 5 30 5 0 5 0 5 5 1 5 Example 4: comparison between detailed data and total data For each city and month the objective is show:. the sale amount;. the ratio between the current row and the grand total;. the ratio between the current row and the total amount by city;. the ratio between the current row and the total amount by month. 6
Solution In this case a much complex SQL statement have to be performed since there are more constraints:. for the grand total no partition is needed;. for the total by city and the total by month a partition on these attribute is needed. Since totals are independent by the order in which they are computed no sort operation is required and the window size is still the whole partition. The SQL statement is: SELECT City, Month, Amount Amount/SUM(Amount) OVER () AS TotFract Amount/SUM(Amount) OVER (PARTITION BY City) AS CityFract Amount/SUM( Amount) OVER ( PARTITION BY Month) AS MonthFract City Month Amount TotFr CityFr MonFr 1 1/ 1/ 1/10 / / /40 0 0/ 0/ 0/1 0 0/ 0/ 0/10 40 40/ 40/ 40/0 150 150/ 150/ 150/320 / /5 /10 30 30/ 30/5 30/40 0 0/ 0/5 0/1 0 0/ 0/5 0/10 / /5 /0 1 1/ 1/5 1/320
Example 5: group by It is possible to use the group by clause in order to perform aggregate functions. For example, it is require to show the sale amount and the average sale with respect to the current month and the two preceding months, separately for each city. Solution The SQL statement is close to the one used in the example 1, the only difference is the fact that at the end the group by clause is performed. The code is: AVG (SUM ( Amount)) OVER ( PARTITION BY City ORDER BY Month ROWS 2 PRECEDING) AS MovingAvg Ranking functions Inside a partition is possible compute the rank with two functions:. rank(): this function computes the rank leaving an empty slot after a tie (for example after two tuples in the first position the next rank is third);. denserank(): this function computes the rank but an empty slot after a tie is not leaved (for example after two tuples in the first position the next rank is second). Example 6: group by For each city in december the sale amount and the rank on amount have to be reported. Solution In this case no partition is required but tuples have to be ordered by amount to perform ranking: notice that the sort is mandatory. The DESC present in the code is necessary to order tuples in a descending order, starting from the first rank to the last one. The code is: SELECT City, Amount, RANK() OVER (ORDER BY Amount DESC) AS Ranking WHERE Month =
City Amount Ranking 1 1 150 2 Example : sorting the result The result can be sorted by means of the order by clause. For example the previous result can be sorted by city. Solution The SQL statement is improved by: SELECT City, Amount, RANK() OVER (ORDER BY Amount DESC) AS Ranking WHERE Month = ORDER BY City City Amount Ranking 150 2 1 1