Business Intelligence, Data warehousing Concept and artifacts



Similar documents
Presented by: Jose Chinchilla, MCITP

Monitoring Genebanks using Datamarts based in an Open Source Tool

70-467: Designing Business Intelligence Solutions with Microsoft SQL Server

Creating BI solutions with BISM Tabular. Written By: Dan Clark

SQL Server 2012 Business Intelligence Boot Camp

DirectQuery vs In-Memory in BISM

Business Intelligence in SharePoint 2013

COURSE SYLLABUS COURSE TITLE:

1. OLAP is an acronym for a. Online Analytical Processing b. Online Analysis Process c. Online Arithmetic Processing d. Object Linking and Processing

Implementing Data Models and Reports with Microsoft SQL Server 2012 MOC 10778

End to End Microsoft BI with SQL 2008 R2 and SharePoint 2010

The Microsoft Business Intelligence 2010 Stack Course 50511A; 5 Days, Instructor-led

Report Model (SMDL) Alternatives in SQL Server A Guided Tour of Microsoft Business Intelligence

SQL SERVER BUSINESS INTELLIGENCE (BI) - INTRODUCTION

Understanding Microsoft s BI Tools

MOC 20467B: Designing Business Intelligence Solutions with Microsoft SQL Server 2012

LEARNING SOLUTIONS website milner.com/learning phone

MS 50511A The Microsoft Business Intelligence 2010 Stack

Data Warehousing Systems: Foundations and Architectures

SQL Server Analysis Services Complete Practical & Real-time Training

Designing a Dimensional Model

Business Benefits From Microsoft SQL Server Business Intelligence Solutions How Can Business Intelligence Help You? PTR Associates Limited

Microsoft Data Warehouse in Depth

SQL SERVER TRAINING CURRICULUM

MS 20467: Designing Business Intelligence Solutions with Microsoft SQL Server 2012

COURSE OUTLINE. Track 1 Advanced Data Modeling, Analysis and Design

Data Warehouse: Introduction

Designing Business Intelligence Solutions with Microsoft SQL Server 2012 Course 20467A; 5 Days

Implementing Data Models and Reports with Microsoft SQL Server

Optimizing the Performance of the Oracle BI Applications using Oracle Datawarehousing Features and Oracle DAC

Data Warehouse design

Integrating SAP and non-sap data for comprehensive Business Intelligence

Building an Effective Data Warehouse Architecture James Serra

University of Gaziantep, Department of Business Administration

Week 3 lecture slides

Analysis Services Step by Step

SQL Server 2012 End-to-End Business Intelligence Workshop

The Design and the Implementation of an HEALTH CARE STATISTICS DATA WAREHOUSE Dr. Sreèko Natek, assistant professor, Nova Vizija,

Implementing Data Models and Reports with Microsoft SQL Server 20466C; 5 Days

SQL Server 2012 Performance White Paper

MICROSOFT EXAM QUESTIONS & ANSWERS

Managing the PowerPivot for SharePoint Environment

Microsoft End to End Business Intelligence Boot Camp

Dimodelo Solutions Data Warehousing and Business Intelligence Concepts

SAS BI Course Content; Introduction to DWH / BI Concepts

DATA WAREHOUSING AND OLAP TECHNOLOGY

SQL Server Administrator Introduction - 3 Days Objectives

Bussiness Intelligence and Data Warehouse. Tomas Bartos CIS 764, Kansas State University

SQL Server 2008 Business Intelligence

Microsoft Implementing Data Models and Reports with Microsoft SQL Server

BI Architecture with SQL 2012 & SharePoint 2010

Optimizing Your Data Warehouse Design for Superior Performance

Enterprise Performance Tuning: Best Practices with SQL Server 2008 Analysis Services. By Ajay Goyal Consultant Scalability Experts, Inc.

When to consider OLAP?

CS2032 Data warehousing and Data Mining Unit II Page 1

SQL Server 2008 Performance and Scale

Columnstore Indexes for Fast Data Warehouse Query Processing in SQL Server 11.0

Beyond Plateaux: Optimize SSAS via Best Practices

Sterling Business Intelligence

Unlock your data for fast insights: dimensionless modeling with in-memory column store. By Vadim Orlov

SQL Server 2012 Gives You More Advanced Features (Out-Of-The-Box)

OLAP Systems and Multidimensional Expressions I

2074 : Designing and Implementing OLAP Solutions Using Microsoft SQL Server 2000

BUILDING BLOCKS OF DATAWAREHOUSE. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT

Chapter 14: Databases and Database Management Systems

Combined Knowledge Business Intelligence with SharePoint 2013 and SQL 2012 Course

ORACLE OLAP. Oracle OLAP is embedded in the Oracle Database kernel and runs in the same database process

Business Intelligence and Healthcare

Data Warehousing and Data Mining

Moving Large Data at a Blinding Speed for Critical Business Intelligence. A competitive advantage

Foundations of Business Intelligence: Databases and Information Management

Course MIS. Foundations of Business Intelligence

LITERATURE SURVEY ON DATA WAREHOUSE AND ITS TECHNIQUES

Power BI Performance. Tips and Techniques

Microsoft Services Exceed your business with Microsoft SharePoint Server 2010

Data Testing on Business Intelligence & Data Warehouse Projects

Course 40009A: Updating your Business Intelligence Skills to Microsoft SQL Server 2012

Implementing Data Models and Reports with Microsoft SQL Server

Data warehousing with PostgreSQL

CHAPTER 5: BUSINESS ANALYTICS

Tiber Solutions. Understanding the Current & Future Landscape of BI and Data Storage. Jim Hadley

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data

SMB Intelligence. Reporting

Why Business Intelligence

MDM and Data Warehousing Complement Each Other

Data Warehousing and OLAP Technology for Knowledge Discovery

Whitepaper. Innovations in Business Intelligence Database Technology.

Data warehouse and Business Intelligence Collateral

Turkish Journal of Engineering, Science and Technology

Alejandro Vaisman Esteban Zimanyi. Data. Warehouse. Systems. Design and Implementation. ^ Springer

Would-be system and database administrators. PREREQUISITES: At least 6 months experience with a Windows operating system.

Chapter 6 FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT Learning Objectives

IST722 Data Warehousing

DATA WAREHOUSING - OLAP

Published by: PIONEER RESEARCH & DEVELOPMENT GROUP ( 28

Lost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole

Vendor briefing Business Intelligence and Analytics Platforms Gartner 15 capabilities

Transcription:

Business Intelligence, Data warehousing Concept and artifacts Data Warehousing is the process of constructing and using the data warehouse. The data warehouse is constructed by integrating the data from multiple heterogeneous sources. This data warehouse supports analytical reporting, structured and/or ad hoc queries and decision making. Data Warehousing involves data cleaning, data integration and data consolidations. What is new in SQL 2012 SQL Server 2005 and 2008 had the Unified Dimensional Model (UDM) for creating OLAP and data mining solutions. SQL Server 2008 R2 introduced the VertiPaq engine for storing data in a highly compressed format in memory at runtime. This technology improved the performance of analysis significantly faster in Excel with PowerPivot. Further, PowerPivot could be deployed to SharePoint for collaboration and to convert personal Business Intelligence (BI) solutions to team/organization BI. SQL Server 2012, has introduced an unified Business Intelligence Semantic Model (BISM) which is based on some of the existing as well as some new technologies. This model is intended to act as one model for all end user experiences for reporting, analytics, scorecards, dashboards, etc. whether it is personal BI, team BI or organizational BI. OLAP and OLTP OLPT and OLAP are complementing technologies. You can't live without OLTP: it runs your business day by day. So, using getting strategic information from OLTP is usually first quick and dirty approach, but can become limiting later. This post explores key differences between two technologies. OLTP stands for On Line Transaction Processing and is a data modeling approach typically used to facilitate and manage usual business applications. Most of applications you see and use are OLTP based.

OLAP stands for On Line Analytic Processing and is an approach to answer multi-dimensional queries. OLAP was conceived for Management Information Systems and Decision Support Systems but is still widely underused: every day I see too much people making out business intelligence from OLTP data! Conceptual Model Difference between OLTP and OLAP

MOLAP, ROLAP And HOLAP In the OLAP world, there are mainly two different types: Multidimensional OLAP (MOLAP) and Relational OLAP (ROLAP). Hybrid OLAP (HOLAP) refers to technologies that combine MOLAP and ROLAP. MOLAP This is the more traditional way of OLAP analysis. In MOLAP, data is stored in a multidimensional cube. The storage is not in the relational database, but in proprietary formats. Advantages:

Excellent performance: MOLAP cubes are built for fast data retrieval, and are optimal for slicing and dicing operations. Can perform complex calculations: All calculations have been pre-generated when the cube is created. Hence, complex calculations are not only doable, but they return quickly. Disadvantages: Limited in the amount of data it can handle: Because all calculations are performed when the cube is built, it is not possible to include a large amount of data in the cube itself. This is not to say that the data in the cube cannot be derived from a large amount of data. Indeed, this is possible. But in this case, only summary-level information will be included in the cube itself. Requires additional investment: Cube technology are often proprietary and do not already exist in the organization. Therefore, to adopt MOLAP technology, chances are additional investments in human and capital resources are needed. ROLAP This methodology relies on manipulating the data stored in the relational database to give the appearance of traditional OLAP's slicing and dicing functionality. In essence, each action of slicing and dicing is equivalent to adding a "WHERE" clause in the SQL statement. Advantages: Can handle large amounts of data: The data size limitation of ROLAP technology is the limitation on data size of the underlying relational database. In other words, ROLAP itself places no limitation on data amount. Can leverage functionalities inherent in the relational database: Often, relational database already comes with a host of functionalities. ROLAP technologies, since they sit on top of the relational database, can therefore leverage these functionalities.

Disadvantages: Performance can be slow: Because each ROLAP report is essentially a SQL query (or multiple SQL queries) in the relational database, the query time can be long if the underlying data size is large. Limited by SQL functionalities: Because ROLAP technology mainly relies on generating SQL statements to query the relational database, and SQL statements do not fit all needs (for example, it is difficult to perform complex calculations using SQL), ROLAP technologies are therefore traditionally limited by what SQL can do. ROLAP vendors have mitigated this risk by building into the tool out-of-the-box complex functions as well as the ability to allow users to define their own functions. HOLAP HOLAP technologies attempt to combine the advantages of MOLAP and ROLAP. For summarytype information, HOLAP leverages cube technology for faster performance. When detail information is needed, HOLAP can "drill through" from the cube into the underlying relational data. In the data warehousing field, we often hear about discussions on where a person / organization's philosophy falls into Bill Inmon's camp or into Ralph Kimball's camp. We describe below the difference between the two. Bill Inmon's paradigm: Data warehouse is one part of the overall business intelligence system. An enterprise has one data warehouse, and data marts source their information from the data warehouse. In the data warehouse, information is stored in 3rd normal form. Ralph Kimball's paradigm: Data warehouse is the conglomerate of all data marts within the enterprise. Information is always stored in the dimensional model.

There is no right or wrong between these two ideas, as they represent different data warehousing philosophies. In reality, the data warehouse systems in most enterprises are closer to Ralph Kimball's idea. This is because most data warehouses started out as a departmental effort, and hence they originated as a data mart. Only when more data marts are built later do they evolve into a data warehouse. Factless Fact Table A factless fact table is a fact table that does not have any measures. It is essentially an intersection of dimensions. On the surface, a factless fact table does not make sense, since a fact table is, after all, about facts. However, there are situations where having this kind of relationship makes sense in data warehousing. For example, think about a record of student attendance in classes. In this case, the fact table would consist of 3 dimensions: the student dimension, the time dimension, and the class dimension. This factless fact table would look like the following: Factless Fact Table Example The only measure that you can possibly attach to each combination is "1" to show the presence of that particular combination. However, adding a fact that always shows 1 is redundant because we can simply use the COUNT function in SQL to answer the same questions. Factless fact tables offer the most flexibility in data warehouse design. For example, one can easily answer the following questions with this factless fact table: How many students attended a particular class on a particular day? How many classes on average does a student attend on a given day? Without using a factless fact table, we will need two separate fact tables to answer the above two questions. With the above factless fact table, it becomes the only fact table that's needed. Junk Dimension

In data warehouse design, frequently we run into a situation where there are yes/no indicator fields in the source system. Through business analysis, we know it is necessary to keep such information in the fact table. However, if keep all those indicator fields in the fact table, not only do we need to build many small dimension tables, but the amount of information stored in the fact table also increases tremendously, leading to possible performance and management issues. Junk dimension is the way to solve this problem. In a junk dimension, we combine these indicator fields into a single dimension. This way, we'll only need to build a single dimension table, and the number of fields in the fact table, as well as the size of the fact table, can be decreased. The content in the junk dimension table is the combination of all possible values of the individual indicator fields. Let's look at an example. Assuming that we have the following fact table: Fact Table Before Junk Dimension In this example, the last 3 fields are all indicator fields. In this existing format, each one of them is a dimension. Using the junk dimension principle, we can combine them into a single junk dimension, resulting in the following fact table: Fact Table With Junk Dimension Note that now the number of dimensions in the fact table went from 7 to 5. The content of the junk dimension table would look like the following: Junk Dimension Example In this case, we have 3 possible values for the TXN_CODE field, 2 possible values for the COUPON_IND field, and 2 possible values for the PREPAY_IND field. This results in a total of 3 x 2 x 2 = 12 rows for the junk dimension table.

By using a junk dimension to replace the 3 indicator fields, we have decreased the number of dimensions by 2 and also decreased the number of fields in the fact table by 2. This will result in a data warehousing environment that offer better performance as well as being easier to manage. Conformed Dimension A conformed dimension is a dimension that has exactly the same meaning and content when being referred from different fact tables. A conformed dimension can refer to multiple tables in multiple data marts within the same organization. For two dimension tables to be considered as conformed, they must either be identical or one must be a subset of another. There cannot be any other type of difference between the two tables. For example, two dimension tables that are exactly the same except for the primary key are not considered conformed dimensions. Why is conformed dimension important? This goes back to the definition of data warehouse being "integrated." Integrated means that even if a particular entity had different meanings and different attributes in the source systems, there must be a single version of this entity once the data flows into the data warehouse. The time dimension is a common conformed dimension in an organization. Usually the only rule to consider with the time dimension is whether there is a fiscal year in addition to the calendar year and the definition of a week. Fortunately, both are relatively easy to resolve. In the case of fiscal vs. calendar year, one may go with either fiscal or calendar, or an alternative is to have two separate conformed dimensions, one for fiscal year and one for calendar year. The definition of a week is also something that can be different in large organizations: Finance may use Saturday to Friday, while marketing may use Sunday to Saturday. In this case, we should decide on a definition and move on. The nice thing about the time dimension is once these rules are set, the values in the dimension table will never change. For example, October 16th will never become the 15th day in October. Not all conformed dimensions are as easy to produce as the time dimension. An example is the customer dimension. In any organization with some history, there is a high likelihood that

different customer databases exist in different parts of the organization. To achieve a conformed customer dimension means those data must be compared against each other, rules must be set, and data must be cleansed. In addition, when we are doing incremental data loads into the data warehouse, we'll need to apply the same rules to the new values to make sure we are only adding truly new customers to the customer dimension. Building a conformed dimension also part of the process in master data management, or MDM. In MDM, one must not only make sure the master data dimensions are conformed, but that conformity needs to be brought back to the source systems. Data Warehouse Design After the tools and team personnel selections are made, the data warehouse design can begin. The following are the typical steps involved in the data warehousing project cycle. Requirement Gathering Physical Environment Setup Data Modeling ETL (Extract, Transform and Load) OLAP Cube Design Front End Development Report Development Performance Tuning Query Optimization Quality Assurance Rolling out to Production Production Maintenance Incremental Enhancements Each page listed above represents a typical data warehouse design phase, and has several sections:

Task Description: This section describes what typically needs to be accomplished during this particular data warehouse design phase. Time Requirement: A rough estimate of the amount of time this particular data warehouse task takes. Deliverables: Typically at the end of each data warehouse task, one or more documents are produced that fully describe the steps and results of that particular task. This is especially important for consultants to communicate their results to the clients. Possible Pitfalls: Things to watch out for. Some of them obvious, some of them not so obvious. All of them are real. Data Modeling - Conceptual, Logical, And Physical Data Models The three levels of data modeling, conceptual data model, logical data model, and physical data model, were discussed in prior sections. Here we compare these three types of data models. The table below compares the different features: Feature Conceptual Logical Physical Entity Names Entity Relationships Attributes Primary Keys Foreign Keys Table Names Column Names Column Data Types Below we show the conceptual, logical, and physical versions of a single data model. Conceptual Model Design Logical Model Design Physical Model Design

We can see that the complexity increases from conceptual to logical to physical. This is why we always first start with the conceptual data model (so we understand at high level what are the different entities in our data and how they relate to one another), then move on to the logical data model (so we understand the details of our data without worrying about how they will actually implemented), and finally the physical data model (so we know exactly how to implement our data model in the database of choice). In a data warehousing project, sometimes the conceptual data model and the logical data model are considered as a single deliverable. Data Integrity Data integrity refers to the validity of data, meaning data is consistent and correct. In the data warehousing field, we frequently hear the term, "Garbage In, Garbage Out." If there is no data integrity in the data warehouse, any resulting report and analysis will not be useful. In a data warehouse or a data mart, there are three areas of where data integrity needs to be enforced: Database level We can enforce data integrity at the database level. Common ways of enforcing data integrity include: Referential integrity The relationship between the primary key of one table and the foreign key of another table must always be maintained. For example, a primary key cannot be deleted if there is still a foreign key that refers to this primary key.

Primary key / Unique constraint Primary keys and the UNIQUE constraint are used to make sure every row in a table can be uniquely identified. Candidate Key Any column(s) that can guarantee uniqueness is called a candidate key Composite Key composite key is a special type of candidate key that is only formed by a combination of two or more columns. Sometimes the candidate key is just a single column, and sometimes it's formed by joining multiple columns. It s a combination of two or more columns in a table that can be used to uniquely identify each row in the table. Natural Key and Surrogate Key Natural key is a value that has meaning to the user, but ought to be unique for every row. A good example of a natural key would be a license plate number for a car. A surrogate key is an artificial value that has no meaning to the user, but is guaranteed to be unique by the database itself. An example of a surrogate key would be an arbitrary, unique integer that was added to the license plate table to allow for the fact that a license number might be reissued Not NULL vs. NULL-able For columns identified as NOT NULL, they may not have a NULL value. Valid Values Only allowed values are permitted in the database. For example, if a column can only have positive integers, a value of '-1' cannot be allowed. ETL (Extract, Transform and Load) process For each step of the ETL process, data integrity checks should be put in place to ensure that source data is the same as the data in the destination. Most common checks include record counts or record sums. Access level

We need to ensure that data is not altered by any unauthorized means either during the ETL process or in the data warehouse. To do this, there needs to be safeguards against unauthorized access to data (including physical access to the servers), as well as logging of all data access history. Data integrity can only ensured if there is no unauthorized access to the data. Tabular Model and Terminology of SQL Query & cache Paradigm When using the Tabular model in SSAS, the deployment options screen offers four choices and are Query Mode : DirectQuery, DirectQuery with In-Memory, In-Memory, In-Memory with DirectQuery. Query Mode is used to specify the source from which query results are returned when you deploy the BISM project to the SSAS tabular model server. Here is a description of each, with the benefits and drawbacks:

DirectQuery Mode (SSAS Tabular)

Analysis Services lets you create tabular models and reports that retrieve data and aggregates directly from a relational database system, using DirectQuery mode. Commonly cited benefits of using DirectQuery mode include the ability to query very large data sets that cannot fit in memory, and having data refreshed in real time. By default, tabular models use an in-memory cache to store and query data. Because tabular models use data that resides in memory, even complex queries can be incredibly fast. However, there are some drawbacks to using cached data: Data is not refreshed when the source data changes. You must process the model to get updates to the data. When you turn off the computer that hosts the model, the cache is saved to disk and must be reopened when you load the model or open the PowerPivot file. The save and load operations can be time-consuming. In contrast, a tabular model in DirectQuery mode uses data that is stored in a SQL Server database, or in a SQL Server PDW (Parallel Data Warehouse) data warehouse. At design time,

you import all or a small sample of the data into the cache and build your model as usual. When you are ready to deploy the model, you change the operating mode to DirectQuery. After you change the operating mode, any queries against the model will use the specified relational data source (either SQL Server or SQL Server PDW), not the cached data. When you create reports or queries against the model, you use DAX, but the DAX queries are translated by Analysis Services into equivalent Transact-SQL statements against the specified relational data source. There are many advantages to deploying a model using DirectQuery mode: It is possible to have a model over data sets that are too large to fit in memory on the Analysis Services server. The data is guaranteed to be up-to-date, and there is no extra management overhead of having to maintain a separate copy of the data. Changes to the underlying source data can be immediately reflected in queries against the data model. DirectQuery can take advantage of provider-side query acceleration, such as that provided by xvelocity memory optimized column indexes. xvelocity columnstore indexes are provided in both SQL Server 2012 and SQL Server PDW, to support improved DirectQuery performance. Any security enforced by the back-end database is guaranteed to be enforced, using row-level security. In contrast, if you are using cached data, it can be difficult to ensure that the cache is secured exactly as on the server. If the model contains complex formulas that might require multiple queries, Analysis Services can perform optimization to ensure that the query plan for the query executed against the back-end database will be as efficient as possible. In-Memory paradigm

This is the default. The data in the tabular model is processed and compressed using the xvelocity in-memory analytics engine (formerly called VertiPaq). This in-memory columnar storage engine has been optimized for high performance analysis and exploration of data. It provides fast query times for aggregation queries. However, there are some drawbacks: The data is not updated when the source data changes, so the model needs to be processed to refresh the data When you turn off the computer hosting the model, the cache is saved to disk and must be reopened when you load the model. The save and load operations can be timeconsuming The server need lots memory if you have a large amount of fact data

In-Memory with DirectQuery: This is a hybrid mode. By default, queries should be answered by using the In-Memory mode, however, the connection string from the client can instead choose to use the DirectQuery mode. DirectQuery with In-Memory: This is a hybrid mode. By default, queries should be answered by using the DirectQuery mode, however, the connection string from the client can instead choose to use the In-Memory mode. A hybrid mode provides you with many options: When both the cache and the relational data source are available, you can set the preferred connection method, but ultimately the client controls which source is used, using the DirectQueryMode connection string property. So you can serve clients that issue MDX queries and clients that issue DAX queries from the same model You can also configure partitions on the cache in such a way that the primary partition used for DirectQuery mode is never processed and must always reference the relational

source. There are many ways to use partitions to optimize the model design and reporting experience After the model has been deployed, you can change the preferred connection method. For example, you might use a hybrid mode for testing, and switch the model over to DirectQuery only mode only after thoroughly testing any reports or queries that use the model. Query Mode: In-Memory with DirectQuery In-Memory with DirectQuery option means that In-Memory is the primary (or default) connection. Howevever, when needed and if the client tool supports this, the secondary Query Mode, i.e DirectQuery, can be used instead. This query mode is ideal for the following scenarios 1. Users are mainly using Excel to perform analysis on the tabular model. 2. The processed data in memory will be used to serve Excel queries. 3. The processed data in memory will be used to serve PowerView report. 4. Only occasional real-time queries required for accessing the real time data, using SSRS as an example. Query Mode: DirectQuery with In-Memory DirectQuery with In-Memory option means that DirectQuery is the primary (or default) connection. Howevever, when needed and if the client tool supports this, the secondary Query Mode, i.e In-Memory, can be used instead. This query mode is ideal for the following scenarios

1. Users are mainly using PowerView (or DAX issuing Client Tool) to perform analysis on the tabular model. 2. By default, always returns real time data. 3. Only occasional processed in memory data is required to be retrieved from Excel. FAQ Whether dimension tables can be fact table as well or not? The short answer No. That is because the 2 types of tables are created for different reasons. However, from a database design perspective, a dimension table could have a parent table as the case with the fact table which always has a dimension table (or more) as a parent. Also, fact tables may be aggregated, whereas Dimension tables are not aggregated. Another reason is that fact tables are not supposed to be updated in place whereas Dimension tables could be updated in place in some cases. More details: Fact and dimension tables appear in a what is commonly known as a Star Schema. A primary purpose of star schema is to simplify a complex normalized set of tables and consolidate data (possibly from different systems) into one database structure that can be queried in a very efficient way. On its simplest form, it contains a fact table (Example: StoreSales) and a one or more dimension tables. Each Dimension entry has 0,1 or more fact tables associated with it (Example of dimension tables: Geography, Item, Supplier, Customer, Time, etc.). It would be valid also for the dimension to have a parent, in which case the model is of type "Snow Flake". However, designers attempt to avoid this kind of design since it causes more joins that slow performance. In the example of StoreSales, The Geography dimension could be composed of the columns (GeoID, ContenentName, CountryName, StateProvName, CityName, StartDate, EndDate)

In a Snow Flakes model, you could have 2 normalized tables for Geo information, namely: Content Table, Country Table. You can find plenty of examples on Star Schema. Also, check this out to see an alternative view on the star schema model Inmon vs. Kimball. Kimbal has a good forum you may also want to check out here: Kimball Forum. Edit: To answer comment about examples for 4NF: Example for a fact table violating 4NF: Sales Fact (ID, BranchID, SalesPersonID, ItemID, Amount, TimeID) Example for a fact table not violating 4NF: AggregatedSales (BranchID, TotalAmount) Here the relation is in 4NF Measure, Measure Groups and Dimensions A measure represents a column that contains quantifiable data, usually numeric, that can be aggregated. A measure is generally mapped to a column in a fact table. You can also use a measure expression to define the value of a measure, based on a column in a fact table as modified by a Multidimensional Expression. A measure expression enables weighting of measure values; for example, currency conversion can be used to weight a sales measure by an exchange rate. Attribute columns from dimension tables can be used to define measures, but such measures are typically semiadditive or nonadditive in terms of their aggregation behavior. You can also define a measure as a calculated member by using a Multidimensional Expressions (MDX) to provide a calculated value for a measure based on other measures in the cube. Calculated members add flexibility and analysis capability to a cube in Analysis Services.

A simple MeasureGroup object is composed of: basic information, measures, dimensions, and partitions. Basic information includes the name of the measure group, the type of measures, the storage mode, the processing mode, and others. Measures are the actual set of measures that compose the measure group. For each measure there is a definition for the aggregate function, the formatting attribute, the data item source, and others. Dimensions are a subset of cube dimensions that will be used to create the processed measure group. Partitions are the collection of physical splits of the processed measure group. Linked Measured Groups and Linked Dimensions in SSAS CUBES A linked measure group is based on another measure group in a different cube within the same database, or even in a different Analysis Services database in another solution as long as both databases are deployed on the same server. You might use a linked measure group if you want to reuse a set of measures, and the corresponding data values, in multiple cubes. Important: Linked measure groups are read-only. To pick up the latest changes, you must delete and recreate all linked measure groups based on the modified source object. For this reason, copy and pasting measure groups between projects is an alternative approach that you should consider in case future modifications to the measure group are required. The following list summarizes usage limitations. You cannot create a linked measure group from another linked measure group. You cannot add or remove measures in a linked measure group. Membership is defined only in the original measure group. Writeback is not supported in linked measure groups. Linked measure groups cannot be used in multiple many-to-many relationships, especially when those relationships are in different cubes. Doing so can result in ambiguous aggregations.

In order to define or use a linked measure group, the Windows service account for the Analysis Services instance must belong to an Analysis Services database role that has ReadDefinition and Read access rights on the source Analysis Services instance to the source cube and measure group, or must belong to the Analysis Services Administrators role for the source Analysis Services instance. A linked dimension is based on a dimension created and stored in another Analysis Services database of the same version and compatibility level. By using a linked dimension, you can create, store, and maintain a dimension on one database, while making it available to users of multiple databases. To users, a linked dimension appears like any other dimension. Linked dimensions are read-only. If you want to modify the dimension or create new relationships, you must change the source dimension, then delete and recreate the linked dimension and its relationships. You cannot refresh a linked dimension to pick up changes from the source object. All related measure groups and dimensions must come from the same source database. You cannot create new relationships between local measure groups and the linked dimensions you add to your cube. After linked dimensions and measure groups have been added to the current cube, the relationships between them must be maintained in their source database. What Are Perspectives? Perspectives define viewable subsets of a data model that provide focused, business-specific, or application-specific viewpoints of the model. Perspectives are subsets of the features of a cube. Perspectives are available in both Multidimensional and Tabular versions of Analysis Services. A perspective is a subset of the features of a cube. A perspective enables administrators to create views of a cube, helping users to focus on the most relevant data for them. A perspective contains subsets of all objects from a cube. A perspective cannot include elements that are not defined in the parent cube.

A simple Perspective object is composed of: basic information, dimensions, measure groups, calculations, KPIs, and actions. Basic information includes the name and the default measure of the perspective. The dimensions are a subset of the cube dimensions. The measure groups are a subset of the cube measure groups. The calculations are a subset of the cube calculations. The KPIs are a subset of the cube KPIs. The actions are a subset of the cube actions. A cube has to be updated and processed before the perspective can be used. Perspectives are an excellent option to reduce the complexity of a cube. Perspectives have some similarities to SQL Server Views which gives you the ability apply abstraction over available SSAS objects (measures and dimensions, KPI and named sets) available in an OLAP or Tabular cube. Perspectives do not require any additional storage beyond their definition and has no effect on processing times of a cube. What Perspectives Are Not? Perspective are not meant to be used to tool define security within in cube. Security is inherited from the underlying OLAP or Tabular Cube. Perspectives are read-only views of the cube and objects cannot be renamed or changed by using perspective.