Efficient Data Access and Data Integration Using Information Objects Mica J. Block Director, ACES Actuate Corporation mblock@actuate.com
Agenda Information Objects Overview Best practices Modeling Security Performance and optimization Cost Based Optimization Query Trimming Best Practices Information Object Caching Questions 2
Agenda Information Objects Overview Best practices Modeling Security Performance and optimization Cost Based Optimization Query Trimming Best Practices Information Object Caching Questions 3
Business Challenges Demand for data has never been higher Wider range of users and data needs More decision makers at every level Data needed for strategic & operational decision making Need accurate and timely information Real-time data Single view of the truth Self-service access to data Reduce costs Lower development and maintenance costs Leverage skill-sets across functions 4
Actuate Information Objects Technology to Address Data Challenges Enterprise Information Integration (EII) Easily integrate data from multiple sources For real-time data access Distributed, parallel and optimized queries XML XML XML Customer Accounts XML Enterprise Customer Object Investment History Customer Profile Meta-data layer hides data complexity Enables end-user empowerment Enables skill-set partitioning Isolates data from formatting XML XML XML Actuate 10 iserver XML Boost developer productivity Reusable building blocks Reduces costs and time to market Supports incremental development 5 ATM transactions Bank Account Operational Data Investment Account Customer Information Data Warehouse
Actuate Information Objects What are Information Objects? Virtual Data Views on heterogeneous data sources reducing inconsistencies Information Assets that are reusable across multiple Actuate applications Information Object Metadata layer to tag data with business information Actuate iserver 6
Information Object Designer (IOD) Intuitive tool for designing and publishing metadata Graphical tool to create information objects Drag and drop capabilities to build new objects Rich set of transformations Ability to manage and deploy information objects to iserver Can now create Cache Objects using Designer 7
Data Integration Service Data access layer Support for standard SQL sources Open Data Access Interface for XML, SAP, Salesforce etc. Security integration Available across all Actuate products Access through web services Public API Powerful data transformation Complex transformation of data Rich set of SQL functions User friendly column categorization Ability to preview using Business Reports Reports Spreadsheets Query Applications Information Access Web Services Data Transformation Data Access Regional Sales Customer Billing Sales Orders Customer Profile Shipments Transform Calculate Security Data Adapters Relational Flat Files Legacy ERP <XML> XML EJB 8
Agenda Information Objects Overview Best practices Modeling Security Performance and optimization Cost Based Optimization Query Trimming Best Practices Information Object Caching Questions 9
Best Practice Layered Modeling Approach Layer Types of Objects Metadata Application Layer Report or Query specific Data Views. E.g. Order Invoice JOIN Business Layer objects To create Data Views that can be Consumed by Report Developers or Actuate Query users Generic Business Layer Business Entities E.g. Customer, Order Additional metadata Data cleansing, Business transforms, Filter criteria etc Base Layer 1-1 Correspondence With database objects Column selection and Basic metadata, including Name etc. 10
Advantages of Layered Modeling Approach Manage DB Changes at Base Layer DB Schema can be managed at the Map Layer Redundant or unused columns can be dropped Basic column properties can be defined and propagated throughout the project Business Layer Create standard definition of Business Entities Provide all column properties required for Reporting Application Layer Provide commonly used Reporting Objects Implement Query Trimming to create large objects that can answer a range of questions XML XML XML XML HR Customer Order XML Order Invoice XML Products XML Actuate 10 iserver Financials Distribution Employee XML 11
Column Categories Problem Statement Its difficult for Business Users to find the columns they need in case of Information Objects with large number of columns Column Categorization allows modelers Categorize columns in Folders even nested folder Display columns in alphabetical order within a Folder 12
Example GL Expense.iob What the Business User sees 13
Modeling Categories in GL Expense.iob Organize column categories 14
Basics Specify Descriptive Business Metadata Information Objects are consumed by Business users Report developers e.spreadsheet developers Data modelers Enhance readability by specifying Descriptive object names Help text and description e.g. source or computed expression description Information Objects with full, complete column attributes will make report development and ad-hoc reporting easy! 15
Metadata Used by Design and Query Tools Modeling Used by Analysis Type Dimension, Measure Or Attribute BusinessReports, Actuate Query & e.analysis Display Name, format, length etc BusinessReports, Actuate Query, e.rdpro, e.ss Source Index, data type Optimizer Prompt Display properties and Autosuggest hints for prompts BusinessReports, BIRT, Actuate Query, erdpro, e.ss Filter Whether filters should be prompted or disabled Actuate Query, erdpro, ess 16
Parameters, Filters and Filter Property Parameters Variable in ASQL statement Used anywhere in ASQL Always Required type Variables can be used in Calculations or Filters Scalar values or expressions involving literals and scalar parameters are permitted Can be used in the WHERE clause as well to filter values Supports NULL values Filters Filters define the WHERE Clause of an Information Object Set at design-time Values can be Static Parameters Filter Property Property of an IO column Values are Optional, Pre-defined or Disabled Controls whether this field is exposed as a pre-defined or optional ad hoc parameter in the calling application Can accept multiple values using IN operator 17
Configuring Prompt Editor 18
Auto-populate Pick Lists from the Database Benefits Populate prompt list at runtime Personalize list of values Configure Autosuggest settings Key Features Graphical UI to populate the list from a database table Personalize filters based on the user-id Available in e.rdpro, e.spreadsheet and Actuate Query 19
Agenda Information Objects Overview Best practices Modeling Security Performance and optimization Cost Based Optimization Query Trimming Best Practices Information Object Caching Questions 20
Proxy versus Pass-through Security What is Proxy Security? Information Objects use the same database UID and PWD for all users Benefits Allows Actuate to use a generic DB UID and PWD for all Reports What is Pass-through Security? Information Objects use a different UID and PWD for each user Benefits Pass user credentials to underlying data source Leverage data source security to access data User 1 User 2 UID X PWD Y UID A PWD B Actuate iserver UID C PWD D User 1 User 2 UID X Actuate iserver PWD UID Y A PWD B For User 1 UID C PWD D 21
Row Level Security Situation Organization maintains customer data in a single database Customers need ad hoc access to data Data shared in on-demand environments Problem Database has data for all customers Needs a mechanism to filter data based on customer Solution Row-level security Filter data based on user-id Use an ASQL function called Current_user() in WHERE clause 22
Example Row Level Security Scenario ABC Corporation uses SSN as the Actuate User-ID Need to restrict access to Portfolios based on SSN Sample Implementation SELECT * FROM SECURITIES WHERE SECURITIES.OWNER = Current_User() 23
Encryption Plug-in Support for Connection Properties Support for customized encryption to encrypt connection properties of a DCD Drop in the customized plugin in IOD plugins and iserver plugins directories Plugin needs to exist in the designer and iserver environments for connection properties to be handled correctly The plugin needs to implement encrypt and decrypt functions which will be invoked for every masked connection property 24
Agenda Information Objects Overview Best practices Modeling Security Usage and What s New for Cache option Performance and optimization Cost Based Optimization Query Trimming Best Practices Information Object Caching Getting started Questions 25
Cost-based Optimization Definition A type of optimization used by Databases and EII technologies where an optimized query plan is generated based on the cost of each operation How is the cost calculated? Cost of the query is determined by Number of rows Presence of an Index Distinct values in a column Expected % of rows that match the JOIN criteria 26
Cost-based Optimization How query plan change with cost information? Query Cost determines Which Index to use (if any) JOIN Order JOIN Strategy (Merge, Dependent, Nested Loop) Shouldn t databases perform such optimizations? Yes for a single database however the EII engine needs to do the same when data is from multiple databases 27
How is Cost Information Accessed? Step 1 Automatic Extraction of Metadata Map creation retrieves technical metadata such as Whether the column in indexed Data Type Primary Key, Foreign Key (if defined) Step 2 Manually add more technical metadata Modelers are encouraged to provide the following Number of rows in a Map (table, view or EPR) # of Distinct values in a column ~Max value of a column ~Min value of a column JOIN Cardinality Note: The additional metadata is not mandatory but can improve performance especially when using multiple databases 28
Specify Map Property Click Show Map Properties to view properties Specify Cardinality (or number of rows) 29
Specify Column Map Properties Additional values that a modeler can provide 30
Specify Join Cardinality JOIN Cardinality can greatly improve query optimization. Helps re-order JOIN, create indexes etc 31
Agenda Information Objects Overview Best practices Modeling Security Performance and optimization Cost Based Optimization Query Trimming Best Practices Information Object Caching Questions 32
What is Query Trimming? Definition Build relationships between Information Objects and automatically prune the SQL statement to retrieve just what s needed to answer the Business Question Benefits Better performance Reduced proliferation of Information Objects Caution Query Trimming alters SQL Semantics Do NOT USE Query trimming for all objects Factor additional testing time for IOs that use Query Trimming 33
Query Trimming What parts of a Query can be trimmed? Tables Unused tables can be dropped Columns Unreferenced columns can be dropped Factors that govern Query Trimming? Joins Union Sub-query Filters Aggregates Distinct Other ASQL constructs Parameters 34
Join Hints The Optional Join JOIN can be marked optional Optional is a Compiler hint that under certain conditions unused tables in a join can be dropped 35
Understanding Optional Joins Left Optional Right Optional Both Optional 36 The optional table, if unused can be dropped from the Query under certain conditions
Conditions that Impact Optional Joins Rule 1 Column is selected If even a single column from a table is selected, table cannot be dropped Rule 2 Filter, GROUP BY, HAVING, ORDERBY If a column is used in any of the above SQL operations, the table cannot be dropped Rule 3 Table is Required for a JOIN If a Table is related to 2 other tables in a 3 table JOIN then it cannot be dropped even if no columns from that table are selected 37
Example Customer Orders.iob Query 1 Select CustomerID, CompanyName from Customer Order.iob Result Orders and Employee tables are dropped Query 2 Select OrderDate, RequiredDate from Customer Order.iob For the two JOINs both Tables are Marked Optional Result Customers & Employee tables is dropped Query 3 Select CustomerID from Customer Order.iob WHERE OrderDate>05/05/05 Result Employee Table is dropped Query 4 Select CustomerID, Employee.LastName from Customer Order.iob Result No Tables are Dropped 38
Query Trimming Rules for UNION Queries Example Union Query using Literals SELECT customers.custid, customers.customname, Customers as RelationShip FROM CUSTOMERS UNION ALL SELECT shippers.shipperid, shippers.shipname, Shippers as RelationShip FROM SHIPPERS Rule Literal Evaluations for Filters If a literal is specified in the Filter then that part of the UNION is dropped. For e.g. in the above Query, if the user requests just Customers, then the Shippers Table is dropped 39
Filters Rule If a filter is defined on a Column, then the table holding the column cannot be dropped Customer table cannot be dropped 40
Use of Aggregates Rule If Count(*) is used query trimming rules will not be applied at that level even if Optional is defined 41
Agenda Information Objects Overview Best practices Modeling Security Performance and optimization Cost Based Optimization Query Trimming Best Practices Information Object Caching Questions 42
Modeling Guideline 1 Map Properties Specify Map Cardinality Property For all Maps (.SMA) files, specify additional MAP properties Click on Map properties Map properties include Cardinality (or # of rows) How is this used? Useful for pure SQL optimization since cardinality provides cost information 43
Modeling Guideline 2 Map Column Properties Specify Map Column Properties Property How is this used? Useful for pure SQL optimization since additional MAP column properties provide cost information 44
Modeling Guideline 3 Join Cardinality Specify JOIN Cardinality How is this used? Useful for pure SQL optimization JOIN re-ordering, indexes on temp tables etc 45
Modeling Guideline 4 Query Trimming Step 1: Identify Query Trimming Candidates Do not enable Query Trimming for all Objects Query trimming alters SQL Semantics Factor additional testing time for Objects that use Query Trimming Step 2: Identify range of Business Questions What kinds of questions will a given IO answer? Test if dropping a table will result in meaningful queries 46
Modeling Guideline 4 Query Trimming Step 3: Specify Query Trimming Hints Step 4: Test IOs Create an IO which based on the IO with Query Trimming Select columns, filters etc to answer a business question. Review the query plan & native SQL 47
Agenda Information Objects Overview Best practices Modeling Security Performance and optimization Cost Based Optimization Query Trimming Best Practices Information Object Caching Questions 48
Information Objects Caching Optional Information Object data caching for performance iserver Cache Configuration and Management Incremental setup and configuration Flexible scheduling for cache data refresh Customer Information Object XML Information Object Data Cache Example Usage Scenarios Extract data from operational stores overnight Reuse same dataset for multiple reports More responsive on-demand reporting Bank Account Investment Account 49
Information Objects Caching When to Cache Performance Problems Complex queries take too long to run Data sources are not always available Excess load on operational sources Frequent queries on operational sources Data volatility Relatively unchanging data Lookups Ability to access transformed cache database from other applications In A10 new UI to set up cache available in IOD 50
Additional Performance Tuning Hints Choosing the right JOIN algorithm Nested Loop Particularly effective if the outer input is quite small Merge Join itself is very fast, but it can be an expensive choice if sort operations are required to be in the Engine Use of DISTINCT Apply DISTINCT on the Top-level IOB instead of intermediate ones AIS Memory Configuration Set per query memory thresholds and global thresholds based on application requirements RAM DISK Use RAM Disk if available for efficient materialize operations 51
Thank You
53 Thank You