OLAP Solutions using Pentaho Analysis Services Gabriele Pozzani
PAS Pentaho Analysis Services (PAS) provides OLAP capabilities To interactively analyze data through a cross-tab interface No need to define a query A front-end provides the interface to retrieve and format data Drill-down Drill-up Slicing Dicing
PAS components (I) PAS consists of four components 1. Mondrian OLAP Engine: receives MDX queries from JPivot and returns a multi-dimensional result-set Included in the Pentaho Server 2. Schema Workbench: designes and tests Mondrian cube schemas Cubes are used by Mondrian to interpret MDX and translate it into SQL queries on a RDBMS
PAS components (II) 3. JPivot analysis front-end: a Java-based analysis tool. Front-end for OLAP cubes 4. Aggregate designer: a designer for generating aggregate tables to speed up the analytical engine
Schemas Mondrian Schemas are XML documents Describe multidimensional cubes Describe the mapping between multi-dimensional and relational model Is used to translate MDX to SQL
MDX MDX: Multi-Dimensional expressions A language designed for querying OLAP databases A de facto standard developed by Microsoft http://msdn.microsoft.com/en-us/library/ms145506.aspx
Pentaho Schema Workbench
Pentaho Schema Workbench PSW is a graphical tool To create Mondrian schemas To publish schemas to the Pentaho Server
Connect to DB The first thing to do is to establish a connection to the database Options Connections...
JDBC Explorer Once the connection has been established you can explore the database File New JDBC Explorer
Create a new schema The schema editor can: Create a new schema File New Schema Save the schema on disk.xml Edit object attributes Switch to view the XML representation of the schema Only view. No editing
Main tasks Basic tasks for defining a schema are: 1. Create a schema 2. Create cubes 2.1. Choose a fact table 2.2. Add measures 3. Create dimensions 3.1. Edit the default hierarchy and choose a dimension table 3.2. Define hierarchy levels 4. Associate dimensions with cubes
1. Create a schema File New Schema
2. Create cubes 2.1!!!
2.1. Choose a fact table DB Schema Table name in the schema
2.2. Add measures
3. Create dimensions (I) Dimensions can be added to: A cube: "private dimensions" known only to the cube that contains them A schema: "shared dimensions" that can be associated to multiple cubes
3. Create dimensions (II) Fact table foreign key Date/time related dim. has TimeDimension type
3. Create dimensions (III) Usual dimensions have StandardDimension type 3.1!!!
3.1. Add/edit hierarchies A new hierarchy is created for each dimension New hierarchies can be added to dimensions Each hierarchy must have a table node and one or more levels
3.1. Dimension table Same settings for fact tables
3.2. Add hierarchy levels
4. Associate shared dimensions Shared dimensions can be associated to a cube adding a "Dimension usage" Shared dim.
Testing and deployment Once schemas have been defined they may be Tested using the MDX query tool (MDX) included in PSW Published to the Pentaho Server
MDX query tool (I) File New MDX Query If a schema editor is open MDX attempts to connect to the underlying DB for loading the schema definition
MDX query tool (II) A query can be entered in the upper pane The result is shown in the lower pane
Publishing the cube (I) File Publish... Server URL Password specified in publisher_config.xml User with privileges for publishing
Publishing the cube (II) If the connection succeeds a dialog appears Choose the location in the server's solution repository where to save the schema Specify the data source to use at the server side to execute the SQL queries (corresponding to the MDX ones)
JPivot
JPivot Once a cube has been published it can be used to build analysis applications Pentaho provides the JPivot front-end in the Pentaho User Console
Analysis View
Create a new analysis view Schema to use Cube to use defined into the schema
New analysis view JPivot toolbar
Drilling Drilling allows the user to navigate from one level of aggregation to another
Drilling flavors There are 4 different ways to drill, with different drill result Different drill ways can be selected in the toolbar Drill member Drill position Drill replace Drill through Apply to dimensions Apply to measures
Drill member & Drill position Drill member: the drilling on one instance of a member is also applied to all other instances of this member Drill position: the drilling occurs directly to the member instance and it is not applied to other instances of that member
Drill replace The drilled member is replaced with the drill result
Drill through It applies to measures It retrieves the detail rows of the rolled up measure aggregate value and shows them in a separate table
The OLAP Navigator (I) It is a GUI that allows to control the mapping between the cube and the pivot table Which dimension is mapped to which axis How multiple dimensions on one axis are ordered What slice of the cube is used in analysis
The OLAP Navigator (II) The navigator has three sections A Columns section A Rows section A Filtes section
Controlling placement of dimensions on axes Clicking the little square before a dimension you can move the dimension from Rows (Columns) to Columns (Rows)
Slicing with the OLAP Navigator (I) A slicer corresponds to the MDX WHERE clause Used to show only a subset (slice) of the data Clicking on the funnel icon you move a dimension in the Filter section
Slicing with the OLAP Navigator (II)
Specifying member sets It is also possible to specify particular members on columns and rows axes
MDX query pane You can also view the MDX query that represent the current state of the analysis view Useful to learn MDX syntax
Export Print to PDF export in MS Excel format
Charts JPivot allows to display data in a chart The chart can be configured
Alternative to JPivot Pentaho has a modular structure It may be extended with new plugins SAIKU Provide a plugin for Pentaho offering lightweight OLAP features It also provides a RESTful server that can connect with any OLAP system http://analytical-labs.com
Saiku It allows to execute OLAP analysis on any cube already defined Based on the definition of what we want to see in the analysis By specifying which dimensions/measures we want on columns, rows, and filters Drag 'n' drop UI
Defining the analysis (I) Once a cube has been selected the available dimensions (with hierarchies) and measures are listed
Defining the analysis (II) Then, we can drag'n'drop dimensions and measures as we want in columns, rows, filters We are restricted only to not put measures on both columns and rows After each change the query is updated and executed automatically
Defining the analysis (III)
Filtering Filters may be applied to visible (columns and rows) and invisible (filter) dimensions
Ordering Each dimension and/or measure can be used to order data But not all possible combinations are allowed We can't order both by a measure on columns and a dimension on row (or viceversa)
Popup menus Some options for fast filtering and adding/removing dimension levels are available by clicking on columns and rows header
Charts Data can be also reported in a chart
Statistics Saiku allows also to show some statistics about columns values
Other commands Other available commands include: Show MDX query Drill through on cell Export Drill-Through on cell to CSV Export XLS Export CSV
Saiku remarks Saiku is still in development Some features of JPivot are missing Some features have bugs or malfunctionings Charts Drill through