Literature. Overview. Why Business Intelligence? Data analysis problems Data Warehouse (DW) introduction DW topics

Size: px
Start display at page:

Download "Literature. Overview. Why Business Intelligence? Data analysis problems Data Warehouse (DW) introduction DW topics"

Transcription

1 Introduction to Data Warehousing and Business Intelligence Slides kindly borrowed from the course Data Warehousing and Machine Learning Aalborg University, Denmark Christian S. Jensen Torben Bach Pedersen Christian Thomsen Course Structure Business intelligence Extract knowledge from large amounts of data collected in a modern enterprise Data warehousing, machine learning Purpose Acquire theoretical background in lectures and literature studies Obtain practical experience on (industrial) tools in practical exercises Data warehousing: construction of a database with only data analysis purpose Business Intelligence (BI) Machine learning: find patterns automatically in databases 2 1

2 Literature Multidimensional Databases and Data Warehousing, Christian S. Jensen, Torben Bach Pedersen, Christian Thomsen, Morgan & Claypool Publishers, 2010 Data Warehouse Design: Modern Principles and Methodologies, Golfarelli and Rizzi, McGraw-Hill, 2009 Advanced Data Warehouse Design: From Conventional to Spatial and Temporal Applications, Elzbieta Malinowski, Esteban Zimányi, Springer, 2008 The Data Warehouse Lifecycle Toolkit, Kimball et al., Wiley 1998 The Data Warehouse Toolkit, 2 nd Ed., Kimball and Ross, Wiley, Overview Why Business Intelligence? Data analysis problems Data Warehouse (DW) introduction DW topics Multidimensional modeling ETL Performance optimization 4 2

3 What is Business Intelligence (BI)? From Encyclopedia of Database Systems: [BI] refers to a set of tools and techniques that enable a company to transform its business data into timely and accurate information for the decisional process, to be made available to the right persons in the most suitable form. 5 What is Business Intelligence (BI)? BI is different from Artificial Intelligence (AI) AI systems make decisions for the users BI systems help the users make the right decisions, based on available data Combination of technologies Data Warehousing (DW) On-Line Analytical Processing (OLAP) Data Mining (DM) 6 3

4 Why is BI Important? Worldwide BI revenue in 2005 = US$ 5.7 billion 10% growth each year A market where players like IBM, Microsoft, Oracle, and SAP compete and invest BI is not only for large enterprises Small and medium-sized companies can also benefit from BI The financial crisis has increased the focus on BI You cannot afford not to use the gold in your data 7 BI and the Web The Web makes BI even more useful Customers do not appear physically in a store; their behaviors cannot be observed by traditional methods A website log is used to capture the behavior of each customer, e.g., sequence of pages seen by a customer, the products viewed Idea: understand your customers using data and BI! Utilize website logs, analyze customer behavior in more detail than before (e.g., what was not bought?) Combine web data with traditional customer data 8 4

5 Case Study of an Enterprise Example of a chain (e.g., fashion stores or car dealers) Each store maintains its own customer records and sales records Hard to answer questions like: find the total sales of Product X from stores in Aalborg The same customer may be viewed as different customers for different stores; hard to detect duplicate customer information Imprecise or missing data in the addresses of some customers Purchase records maintained in the operational system for limited time (e.g., 6 months); then they are deleted or archived The same product may have different prices, or different discounts in different stores Can you see the problems of using those data for business analysis? 9 Data Analysis Problems The same data found in many different systems Example: customer data across different stores and departments The same concept is defined differently Heterogeneous sources Relational DBMS, On-Line Transaction Processing (OLTP) Unstructured data in files (e.g., MS Word) Legacy systems 10 5

6 Data Analysis Problems (cont ) Data is suited for operational systems Accounting, billing, etc. Do not support analysis across business functions Data quality is bad Missing data, imprecise data, different use of systems Data are volatile Data deleted in operational systems (6 months) Data change over time no historical information 11 Data Warehousing Solution: new analysis environment (DW) where data are Subject oriented (versus function oriented) Integrated (logically and physically) Time variant (data can always be related to time) Stable (data not deleted, several versions) Supporting management decisions (different organization) Data from the operational systems are Extracted Cleansed Transformed Aggregated (?) Loaded into the DW A good DW is a prerequisite for successful BI 12 6

7 DW: Purpose and Definition DW is a store of information organized in a unified data model Data collected from a number of different sources Finance, billing, website logs, personnel, Purpose of a data warehouse (DW): support decision making Easy to perform advanced analysis Ad-hoc analysis and reports We will cover this soon Data mining: discovery of hidden patterns and trends You will study this in another course 13 DW Architecture Data as Materialized Views Existing databases and systems (OLTP) Appl. DB Appl. DB Appl. DB Appl. DB Appl. DB Trans. (Global) Data Warehouse New databases and systems (OLAP) DW (Local) Data Marts DM DM DM OLAP Data mining Visualization Analogy: (data) producers warehouse (data) consumers 14 7

8 Function vs. Subject Orientation Function-oriented systems Appl. DB Subject-oriented systems Sales DM D-Appl. Appl. DB DM D-Appl. Appl. Appl. DB Trans. All subjects, integrated DW Costs Profit D-Appl. Appl. DB DB Selected subjects DM 15 Top-down vs. Bottom-up Appl. DB DM D-Appl. Appl. DB DM D-Appl. Appl. DB Trans. DW Appl. DB Appl. Top-down: DB 1. Design of DW 2. Design of DMs In-between: 1. Design of DW for DM1 2. Design of DM2 and integration with DW 3. Design of DM3 and integration with DW D-Appl. DM Bottom-up: 1. Design of DMs 2. Maybe integration of DMs in DW 3. Maybe no DW 16 8

9 Hard/Infeasible Queries for OLTP Why not use the existing databases (OLTP) for business analysis? Business analysis queries In the past five years, which product is the most profitable? Which public holiday we have the largest sales? Which week we have the largest sales? Does the sales of dairy products increase over time? Difficult to express these queries in SQL 3 rd query: may extract the week value using a function But the user has to learn many transformation functions 4 th query: use a special table to store IDs of all dairy products, in advance There can be many different dairy products; there can be many other product types as well The need of multidimensional modeling 17 Multidimensional Modeling Example: sales of supermarkets Facts and measures Each sales record is a fact, and its sales value is a measure Dimensions Group correlated attributes into the same dimension easier for analysis tasks Each sales record is associated with its values of Product, Store, Time Product Type Category Store City County Date Sales Top Beer Beverage Trøjborg Århus Århus 25 May, Product Store Time 18 9

10 Multidimensional Modeling How do we model the Time dimension? Hierarchies with multiple levels Attributes, e.g., holiday, event T tid day day # week # month # year work day Week Year Month Day 1 January 1st January 2nd No Yes Advantage of this model? Easy for query (more about this later) Disadvantage? Data redundancy (but controlled redundancy is acceptable) 19 Quick Review: Normalized Database Product ID Type Category Price 001 Beer Beverage Rice Cereal Beer Beverage Wheat Cereal 5.00 Product ID TypeID Price TypeID Type CategoryID 013 Beer Rice Wheat 099 CategoryID Category 042 Beverage 099 Cereal Normalized database avoids Redundant data Modification anomalies How to get the original table? (join them) No redundancy in OLTP, controlled redundancy in OLAP 20 10

11 OLTP vs. OLAP OLTP OLAP Target operational needs business analysis Data small, operational data large, historical data Model normalized denormalized/ multidimensional Query language SQL not unified but MDX is used by many Queries small large Updates frequent and small infrequent and batch Transactional recovery necessary not necessary Optimized for update operations query operations 21 OLAP Data Cube Data cube Useful data analysis tool in DW Generalized GROUP BY queries Aggregate facts based on chosen dimensions Product, store, time dimensions Sales measure of sale facts Why data cube? Good for visualization (i.e., text results hard to understand) Multidimensional, intuitive Support interactive OLAP operations How is it different from a spreadsheet? Store Product Time Sales Aalborg Bread Aalborg Milk Copenhagen Bread

12 On-Line Analytical Processing (OLAP) 102 On-Line Analytical Processing Interactive analysis Explorative discovery Fast response times required OLAP operations/queries Aggregation, e.g., SUM Starting level, (Year, City) Roll Up: Less detail Drill Down: More detail Slice/Dice: Selection, Year= All Time Advanced Multidimensional Modeling Changing dimensions Some dimensions are not static. They change over time. A store moves to a new location with more space The name of a product changes A customer moves from Aalborg Øst to Hasseris How do we handle these changes? Large-scale dimensional modeling How do we coordinate the dimensions in different data cubes and data marts? Dimensions Time Customer Product Supplier Bus architecture Data marts Sales Costs + + Profit

13 Extract, Transform, Load (ETL) Getting multidimensional data into the DW Problems 1. Data from different sources 2. Data with different formats 3. Handling of missing data and erroneous data 4. Query performance of DW ETL Extract (for problem #1) Transformations / cleansing (for problems #2, #3) Load (for problem #4) The most time-consuming process in DW development 80% of development time spent on ETL 25 Performance Optimization Sales tid pid locid sales The data warehouse contains GBytes or even TBytes of data! OLAP users require fast query response time They don t want to wait for the result for 1 hour! Acceptable: answer within 10 seconds billion rows Idea: precompute some partial result in advance and store it At query time, such partial result can be utilized to derive the final result very fast 26 13

14 Materialization Example Imagine 1 billion sales rows, 1000 products, 100 locations CREATE VIEW TotalSales (pid, locid, total) AS SELECT s.pid, s.locid, SUM(s.sales) FROM Sales s GROUP BY s.pid, s.locid The materialized view has 100,000 rows Sales tid pid locid sales Wish to answer the query: SELECT p.category, SUM(s.sales) FROM Products p, Sales s WHERE p.pid=s.pid GROUP BY p.category Rewrite the query to use the view: SELECT p.category, SUM(t.total) FROM Products p, TotalSales t WHERE p.pid=t.pid GROUP BY p.category Query becomes 10,000 times faster! 1 billion rows VIEW TotalSales pid locid sales ,000 rows 27 Data Warehouse Architecture Central Federated Tiered 28 14

15 Central DW Architecture Clients All data in one, central DW All client queries directly on the central DW Pros Simplicity Easy to manage Cons Bad performance due to no redundancy/workload distribution Source Central DW Source 29 Federated DW Architecture Data stored in separate data marts, aimed at special departments Logical DW (i.e., virtual) Data marts contain detail data Pros Performance due to distribution Finance mart Mrktng mart Clients Distr. mart Cons More complex Logical DW Source Source 30 15

16 Tiered Architecture Central DW is materialized Data is distributed to data marts in one or more tiers Only aggregated data in cube tiers Data is aggregated/reduced as it moves through tiers Pros Best performance due to redundancy and distribution Cons Most complex Hard to manage Central DW 31 Common DW Issues Metadata management Need to understand data = metadata needed Greater need in OLAP than in OLTP as raw data is used Need to know about: Data definitions, dataflow, transformations, versions, usage, security DW project management DW projects are large and different from ordinary SW projects months and US$ 1+ million per project Data marts are smaller and safer (bottom up approach) Reasons for failure Lack of proper design methodologies High HW+SW cost Deployment problems (lack of training) Organizational change is hard (new processes, data ownership,..) Ethical issues (security, privacy, ) 32 16

17 Topics not Covered in the Course Privacy/security of data during ETL Encryption may not work During extraction/transformation, someone may need to know original values in order to check whether ETL performs correctly Data Visualization (VIS) Decision Analysis (What-if) Customer Relationship Management (CRM) 33 Summary Why Business Intelligence? Data analysis problems Data Warehouse (DW) introduction DW Topics Multidimensional modeling ETL Performance optimization BI provide many advantages to your organization A good DW is a prerequisite for BI 34 17

18 Multidimensional Databases Overview Cubes: Dimensions, Facts, Measures OLAP Queries Relational Implementation Redundancy 2 1

19 ER Model vs. Multidimensional Model Why don t we use the ER model in data warehousing? ER model: a data model for general purposes All types of data are equal, difficult to identify the data that is important for business analysis No difference between: What is important What just describes the important Normalized databases spread information When analyzing data, the information must be integrated again Hard to overview a large ER diagram (e.g., over 100 entities/relations for an enterprise) 3 ER Model vs. Multidimensional Model The multidimensional model Its only purpose: data analysis It is not suitable for OLTP systems More built in meaning What is important What describes the important What we want to optimize Easy for query operations Recognized by OLAP/BI tools Tools offer powerful query facilities based on MD design Example: TARGIT Analysis 4 2

20 The multidimensional model Data is divided into: Facts Dimensions Facts are the important entity: a sale Facts have measures that can be aggregated: sales price Dimensions describe facts A sale has the dimensions Product, Store and Time Facts live in a multidimensional cube (dice) Think of an array from programming languages Goal for dimensional modeling: Surround facts with as much context (dimensions) as possible Hint: redundancy may be ok (in well-chosen places) But you should not try to model all relationships in the data (unlike E/R and OO modeling!) 5 Cube Example Milk Bread Aalborg Copenhagen

21 Cubes A cube may have many dimensions! More than 3 - the term hypercube is sometimes used Theoretically no limit for the number of dimensions Typical cubes have 4-12 dimensions But only 2-4 dimensions can be viewed at a time Dimensionality reduced by queries via projection/aggregation A cube consists of cells A given combination of dimension values A cell can be empty (no data for this combination) A sparse cube has few non-empty cells A dense cube has many non-empty cells Cubes become sparser for many/large dimensions 7 Dimensions Dimensions are the core of multidimensional databases Other types of databases do not support dimensions Dimensions are used for Selection of data Grouping of data at the right level of detail Dimensions consist of dimension values Product dimension have values milk, cream, Time dimension have values 1/1/2001, 2/1/2001, Dimension values may have an ordering Used for comparing cube data across values Example: percent sales increase compared with last month Especially used for Time dimension 8 4

22 Dimensions Dimensions have hierarchies with levels Typically 3-5 levels (of detail) Dimension values are organized in a tree structure Product: Product->Type->Category Store: Store->Area->City->County Time: Day->Month->Quarter->Year Dimensions have a bottom level and a top level (ALL) Levels may have attributes Simple, non-hierarchical information Day has Workday as attribute Dimensions should contain much information Time dimensions may contain holiday, season, events, Good dimensions have or more attributes/levels 9 Dimension Example T Location T Country USA Denmark City Schema Berkeley New York Aalborg Copenhagen Instance 10 5

23 Facts Facts represent the subject of the desired analysis The important in the business that should be analyzed A fact is identified via its dimension values A fact is a non-empty cell Generally, a fact should Be attached to exactly one dimension value in each dimension Only be attached to dimension values in the bottom levels Some models do not require this 11 Types of Facts Event fact (transaction) A fact for every business event (sale) Fact-less facts A fact per event (customer contact) No numerical measures An event has happened for a given dimension value combination Snapshot fact A fact for every dimension combination at given time intervals Captures current status (inventory) Cumulative snapshot facts A fact for every dimension combination at given time intervals Captures cumulative status up to now (sales in year to date) Every type of facts answers different questions Often both event facts and both kinds of snapshot facts exist 12 6

24 Granularity Granularity of facts is important What does a single fact mean? Level of detail Given by combination of bottom levels Example: total sales per store per day per product Important for number of facts Scalability Often the granularity is a single business transaction Example: sale Sometimes the data is aggregated (total sales per store per day per product) Might be necessary due to scalability Generally, transaction detail can be handled Except perhaps huge clickstreams etc. 13 Measures Measures represent the fact property that the users want to study and optimize Example: total sales price A measure has two components Numerical value: (sales price) Aggregation formula (SUM): used for aggregating/combining a number of measure values into one Measure value determined by dimension value combination Measure value is meaningful for all aggregation levels 14 7

25 Types of Measures Three types of measures Additive Can be aggregated over all dimensions Example: sales price Often occur in event facts Semi-additive Cannot be aggregated over some dimensions - typically time Example: inventory Often occur in snapshot facts Non-additive Cannot be aggregated over any dimensions Example: average sales price Occur in all types of facts 15 Schema Documentation Store dimension T Store Product dimension T Product Customer dimension T Customer Time dimension T Time County Category Cust. group Year Month No well-defined standard Our own notation T level corresponds to ALL Store Record the measures You could also use a UMLlike notation Modeling and OLAP tools may have their own notation Product Customer Sales price Count Avg. sales price Day 16 8

26 Why the schema cannot answer question X Possible reasons Certain measures not included in fact table Granularity of facts too coarse Particular dimensions not in DW Descriptive attributes missing from dimensions Meaning of attributes/measures deviate from the expectation of data analysts (users) 17 ROLAP Relational OLAP Data stored in relational tables Star (or snowflake) schemas used for modeling SQL used for querying Pros Leverages investments in relational technology Scalable (billions of facts) Flexible, designs easier to change New, performance enhancing techniques adapted from MOLAP Cons Indices, materialized views Storage use (often 3-4 times MOLAP) Response times Product ID Store ID Sales

27 MOLAP Multidimensional OLAP Data stored in special multidimensional data structures E.g., multidimensional array on hard disk Pros Less storage use ( foreign keys not stored) Faster query response times Cons Up till now not so good scalability MOLAP data cube d 2 \ d Less flexible, e.g., cube must be re-computed when design changes Does not reuse an existing investment (but often bundled with RDBMS) Not as open technology 19 HOLAP Hybrid OLAP Detail data stored in relational tables (ROLAP) Aggregates stored in multidimensional structures (MOLAP) Pros Scalable (as ROLAP) Fast (as MOLAP) Cons High complexity 20 10

28 Relational Implementation Goal for dimensional modeling: surround the facts with as much context (dimensions) as we can Granularity of the fact table is important What does one fact table row represent? Important for the size of the fact table Often corresponding to a single business transaction (sale) But it can be aggregated (sales per product per day per store) Some properties Many-to-one relationship from fact to dimension Many-to-one relationships from lower to higher levels in the hierarchies 21 Relational Design Product Type Category Store City County Date Sales Top Beer Beverage Trøjborg Århus Århus 25 May Product Store Time One completely de-normalized table Bad: inflexibility, storage use, bad performance, slow update Star schemas Snowflake schemas 22 11

29 Star Schema Example Star schemas One fact table De-normalized dimension tables One column per level/attribute ProductID Product Type Category 1 Top Beer Beverage TimeID Day Month Year Maj 1997 ProductId StoreId TimeId Sale StoreID Store City County 1 Trøjborg Århus Århus 23 Relational Implementation The fact table stores facts One column for each measure One column for each dimension (foreign key to dimension table) Dimensions keys make up composite primary key A dimension table stores a dimension What are the disadvantages of using production codes as the key? E.g., product dimension, production code: AABC1234 E.g., customer dimension, CPR number: Use surrogate key ( meaningless integer key), which only allows the linking between its dimension table and the fact table For Extract-Transform-Load, we need to keep a mapping from production key to surrogate key (more about this in lecture #4) 24 12

30 Snowflake Schema Example TypeID Type CategoryID 1 Beer 1 MonthID Month YearID 1 May 1 ProductID Product TypeID 1 Top 1 TimeID Day MonthID Snowflake schemas Dimensions are normalized One dimension table per level Each dimension table has integer key, level name, and one column per attribute ProductId StoreId TimeId Sale StoreID Store CityID 1 Trøjborg 1 CityID City CountyId 1 Århus 1 25 Question Time Suppose that we want to replace the original Store hierarchy A by a new hierarchy B How do we modify the star schema to reflect this? How do we modify the snowflake schema to reflect this? T T County City Country County City Store Store Schema A Store Store Schema B 26 13

31 Star vs Snowflake Star Schemas + Simple and easy overview ease-of-use + Relatively flexible + Dimension tables often relatively small + Recognized by many RDBMSes -> good performance - Hierarchies are hidden in the columns - Dimension tables are de-normalized Snowflake schemas + Hierarchies are made explicit/visible + Very flexible + Dimension tables use less space - Harder to use due to many joins - Worse performance 27 Redundancy in the DW Only very little or no redundancy in fact tables The same fact data only stored in one fact table Redundancy is mostly in dimension tables Star dimension tables have redundant entries for the higher levels Redundancy problems? Inconsistent data the central load process helps with this Update time the DW is optimized for querying, not updates Space use: dimension tables typically take up less than 5% of DW So: controlled redundancy is good Up to a certain limit 28 14

32 (Relational) OLAP Queries Two kinds of queries Navigation queries examine one dimension SELECT DISTINCT l FROM d [WHERE p] Aggregation queries summarize fact data SELECT d1.l1, d2.l2, SUM(f.m) FROM d1, d2, f WHERE f.dk1 = d1.dk1 AND f.dk2 = d2.dk2 [AND p] GROUP BY d1.l1,d2.l2 Fast, interactive analysis of large amounts of data Milk Bread Aalborg Copenhagen OLAP Queries Starting level (City, Year, Product) Aalborg Bread Milk Slice/Dice: Milk Bread Aalborg Copenhagen Copenhagen Roll-up: get overview What is this value? 2000 Drilll-down: more detail Bread Milk Bread Milk Aalborg Aalborg Copenhagen Copenhagen ALL Time /2000 /2000 /2001 /

33 OLAP Cube in MS Analysis Services Project drill down 31 Case Study: Grocery Store Stock Keeping Units (SKUs) Point Of Sale (POS) system Stores Promotions Task: Analyze how promotions affect sales 32 16

34 DW Design Steps Choose the business process(es) to model Sales Choose the granularity of the business process Sales by Product by Store by Promotion by Day Low granularity is needed Are individual transactions necessary/feasible? Choose the dimensions Time, Store, Promotion, Product Choose the measures Dollar_sales, unit_sales, dollar_cost, customer_count Resisting normalization and preserving browsing Flat dimension tables makes browsing easy and fast 33 The Grocery Store Dimensions Time dimension Explicit time dimension is needed (events, holidays,..) Product dimension Many-level hierarchy allows drill-down/roll-up Many descriptive attributes (often more than 50) Store dimension Many descriptive attributes Promotion dimension Example of a causal dimension Used to see if promotions work/are profitable Ads, price reductions, end-of-aisle displays, coupons 34 17

35 The Grocery Store Measures All additive across all dimensions Dollar_sales Unit_sales Dollar_cost Gross profit (derived) Computed from sales and cost: sales cost Additive Gross margin (derived) Computed from gross profit and sales: (sales cost)/cost Non-additive across all dimensions Customer_count Additive across time, promotion, and store Non-additive across product. Why? Semi-additive 35 Data Warehouse Size Estimated number of fact records: Time dimension: 2 years = 730 days Store dimension: 300 stores reporting each day Product dimension: 30,000 products, only 3000 sell per day Promotion dimension: 5000 combinations, but a product only appears in one combination per day 730*300*3000*1 = 657,000,000 Total data warehouse size: 657,000,000 facts* 8 fields/fact * 4 bytes/field = 21 GB Number of fields: 4 FKs + 4 measures = 8 fields Assuming sizes of dimensions negligible Small size (by today s standard), feasible to store at transaction level detail 36 18

36 Summary Cubes: Dimensions, Facts, Measures OLAP Queries Relational Implementation Star schema vs Snowflake schema Redundancy 37 19

37 Advanced MD Modeling and MD Database Implementation Overview Handling Changes in Dimensions Coordinating Data Cubes / Data Marts Multidimensional Database Implementation 2 1

38 Changing Dimensions In the last lecture, we assumed that dimensions are stable over time New rows in dimension tables can be inserted Existing rows do not change This is not a realistic assumption We now study techniques for handling changes in dimensions Slowly changing dimensions phenomenon Dimension information change, but changes are not frequent Still assume that the schema is fixed 4 Handling Changes in Dimensions Handling change over time Changes in dimensions 1. No special handling 2. Versioning dimension values 2A. Special facts 2B. Timestamping 3. Capturing the previous and the current value 4. Split into changing and constant attributes 5 2

39 Example Time dim. TimeID Weekday Week Month Quarter Year DayNo Holiday Sales fact TimeID StoreID ProductID ItemsSold Amount change?? Store dim. StoreID Address City District Size SCategory Product dim. ProductID Description Brand PCategory timeline Attribute values in dimensions vary over time A store changes Size A product changes Description Districts are changed Problems Dimensions not updated DW is not up-to-date Dimensions updated in a straightforward way incorrect information in historical data 6 Example Store dim. Time dim. StoreID Product dim. Sales fact TimeID StoreID ProductID ItemsSold Address City District Size SCategory The store in Aalborg has the size of 250 sq. metres. On a certain day, customers bought 2000 apples from that store. Amount Sales fact table Store dimension table StoreID ItemsSold StoreID Size

40 Solution 1: No Special Handling Sales fact table Store dimension table StoreID ItemsSold StoreID Size The size of a store expands StoreID ItemsSold StoreID Size A new fact arrives StoreID ItemsSold StoreID Size What s the problem here? 8 Solution 1 Solution 1: Overwrite the old values in the dimension tables Consequences Old facts point to rows in the dimension tables with incorrect information! New facts point to rows with correct information Pros Easy to implement Useful if the updated attribute is not significant, or the old value should be updated for error correction Cons Old facts may point to incorrect rows in dimensions 9 4

41 Solution 2 Solution 2: Versioning of rows with changing attributes The key that links dimension and fact table, identifies a version of a row, not just a row Surrogate keys make this easier to implement what if we had used, e.g., the shop s zip code as key? Always use surrogate keys!!! Consequences Pros Larger dimension tables Correct information captured in DW No problems when formulating queries Cons Cannot capture the development over time of the subjects the dimensions describe e.g., relationship between the old store and the new store not captured 10 Solution 2: Versioning of Rows StoreID ItemsSold StoreID Size different versions of a store StoreID ItemsSold StoreID Size A new fact arrives StoreID ItemsSold StoreID Size Which store does the new fact (old fact) refer to? 11 5

42 Solution 3 Solution 3: Create two versions of each changing attribute One attribute contains the current value The other attribute contains the previous value Consequences Two values are attached to each dimension row Pros Possible to compare across the change in dimension value (which is a problem with Solution 2) Such comparisons are interesting when we need to work simultaneously with two alternative values Example: Categorization of stores and products Cons Not possible to see when the old value changed to the new Only possible to capture the two latest values 12 Solution 3: Two versions of Changing Attribute StoreID ItemsSold StoreID DistrictOld DistrictNew StoreID ItemsSold versions of an attribute StoreID DistrictOld DistrictNew StoreID ItemsSold We cannot find out when the district changed. StoreID DistrictOld DistrictNew

43 Solution 2A Solution 2A: Use special facts for capturing changes in dimensions via the Time dimension Assume that no simultaneous, new fact refers to the new dimension row Insert a new special fact that points to the new dimension row, and through its reference to the Time dimension, timestamps the row Pros Possible to capture the development over time of the subjects that the dimensions describe Cons Larger fact table 14 Solution 2A: Inserting Special Facts StoreID TimeID ItemsSold StoreID Size special fact for capturing changes StoreID TimeID ItemsSold StoreID Size StoreID TimeID ItemsSold StoreID Size

44 Solution 2B Solution 2B: Versioning of rows with changing attributes like in Solution 2 + timestamping of rows Pros Correct information captured in DW Cons Larger dimension tables 16 Solution 2B: Timestamping StoreID TimeID ItemsSold StoreID Size From To attributes: From, To StoreID TimeID ItemsSold StoreID Size From To StoreID TimeID ItemsSold StoreID Size From To

45 Example of Using Solution 2B Product descriptions are versioned, when products are changed, e.g., new package sizes Old versions are still in the stores, new facts can refer to both the newest and older versions of products Time value for a fact not necessarily between From and To values in the fact s Product dimension row Unlike changes in Size for a store, where all facts from a certain point in time will refer to the newest Size value Unlike alternative categorizations that one wants to choose between 18 Rapidly Changing Dimensions Difference between slowly and rapidly is subjective Solution 2 is often still feasible The problem is the size of the dimension Example Assume an Employee dimension with 100,000 employees, each using 2K bytes and many changes every year Solution 2B is recommended Examples of (large) dimensions with many changes: Product and Customer The more attributes in a dimension table, the more changes per row are expected Example A Customer dimension with 100M customers and many attributes Solution 2 yields a dimension that is too large 19 9

46 Solution 4: Dimension Splitting Customer dimension (original) CustID Name PostalAddress Gender DateofBirth Customerside NoKids MaritialStatus CreditScore BuyingStatus Income Education CustID Name PostalAddress Gender DateofBirth Customerside DemographyID NoKids MaritialStatus CreditScoreGroup BuyingStatusGroup IncomeGroup EducationGroup Customer dimension (new): relatively static attributes Demographics dimension: often-changing attributes 20 Solution 4 Solution 4 Make a minidimension with the often-changing (demograhic) attributes Convert (numeric) attributes with many possible values into attributes with few discrete or banded values E.g., Income group: [0,10K), [0,20K), [0,30K), [0,40K) Why? Any Information Loss? Insert rows for all combinations of values from these new domains With 6 attributes with 10 possible values each, the dimension gets 10 6 =1,000,000 rows If the minidimension is too large, it can be further split into more minidimensions Here, synchronous/correlated attributes must be considered (and placed in the same minidimension) The same attribute can be repeated in another minidimension 21 10

47 Solution 4 (Changing Dimensions) Pros DW size (dimension tables) is kept down Changes in a customer s demographic values do not result in changes in dimensions Cons More dimensions and more keys in the star schema Navigation of customer attributes is more cumbersome as these are in more than one dimension Using value groups gives less detail The construction of groups is irreversible 22 Changing Dimensions - Summary Why are there changes in dimensions? Applications change The modeled reality changes Multidimensional models realized as star schemas support change over time to a large extent A number of techniques for handling change over time at the instance level was described Solution 2 and the derived 2A and 2B are the most useful Possible to capture change precisely 23 11

48 Coordinating Data Cubes / Data Marts 24 DW Bus Architecture What method for DW construction? Everything at once, top-down DW ( monoliths ) Separate, independent marts ( stovepipes, data islands ) None of these methods work in practice Both have different built-in problems Architecture-guided step-by-step method Combines the advantages of the first two methods A data mart can be built much faster than a DW ETL is always the hardest - minimize risk with a simple mart But: data marts must be compatible Otherwise, incomparable views of the enterprise result Start with single-source data marts Facts from only one source makes everything easier 25 12

49 DW Bus Architecture Data marts built independently by departments Good (small projects, focus, independence, ) Problems with stovepipes (reuse across marts impossible) Conformed dimensions and facts/measures Conformed dimensions Same structure and content across data marts Take data from the best source Dimensions are copied to data marts (not a space problem) Conformed fact definitions The same definition across data marts (price excl. sales tax) Observe units of measurement (also currency, etc.) Use the same name only if it is exactly the same concept Facts are not copied between data marts (facts > 95% of data) This allows several data marts to work together Combining data from several fact tables is no problem 26 DW Bus Architecture Dimension content managed by dimension owner The Customer dimension is made and published in one place Tools query each data mart separately Separate queries to each data mart Results combined by tool or OLAP server It is hard to make conformed dimensions and facts Organizational and political challenge, not technical Get everyone together and Get a top manager (CIO) to back the conformance decision. No-one must be allowed to escape Exception: if business areas are totally separate No common management/control 27 13

50 Large Scale Cube Design The design is never finished The dimensional modeler is always looking for new information to include in dimensions and facts A sign of success! New dimensions and measures introduced gracefully Existing queries will give same result Example: Location dimension can be added for old+new facts Can usually be done if data has sufficiently fine granularity Data mart granularity Always as fine as possible (transaction level detail) Makes the mart insensitive to changes 28 Coordinating Data Marts Multi-source data marts Not built initially due to too large complexity Combine several single-source data marts (building blocks) Built on top of several single-source marts Relatively simple due to conformed dimensions and facts Can be done physically or virtually (in OLAP server) Example: profitability data mart Important to have fine (single transaction?) granularity 29 14

51 Matrix Method DW Bus Architecture Matrix Two-dimensional matrix X-axis: dimensions Y-axis: data marts Planning Process Make list of data marts Make list of dimensions Mark co-occurrences (which marts have which dimensions) Time dimension occurs in (almost) all marts 30 Matrix Example Time Customer Product Supplier Sales Costs Profit

52 Multidimensional database implementation MS SQL Server MS Analysis Services 32 MS SQL Server 2008 Microsoft s RDBMS Runs on Windows OS only Nice features built-in Analysis Services Integration Services Reporting Services Easy to use Graphical Management Studio and BI Developer Studio Watch the demonstration videos from Microsoft to get a quick introduction 33 16

53 MS Analysis Services Cheap, easy to use, good, and widely used Support ROLAP, MOLAP, HOLAP technology Intelligent pre-aggregation (for improving query performance) Programming: MS OLE DB for OLAP interface Uses the query language MDX (MultiDimensional expressions) 34 Summary Handling Changes in Dimensions Coordinating Data Cubes / Data Marts Multidimensional Database Implementation: MS SQL Server and Analysis Services 35 17

54 Extract, Transform, Load (ETL) ETL Overview The ETL Process General ETL issues Building dimensions Building fact tables Extract Transformations/cleansing Load SQL Server Integration Services 2 1

55 The ETL Process The most underestimated process in DW development The most time-consuming process in DW development Up to 80% of the development time is spent on ETL! Extract Extract relevant data Transform Transform data to DW format Build DW keys, etc. Cleansing of data Load Load data into DW Build aggregates, etc. 3 Phases Design phase Modeling, DB design, source selection, Loading phase First load/population of the DW Based on all data in sources Refreshment phase Keep the DW up-to-date wrt. source data changes 4 2

56 ETL/DW Refreshment DM DW Integration phase Preparation phase 5 ETL in the Architecture Data sources ETL side Query side Query Services Data Staging Area - Extract - Transform - Load Data Warehouse Bus Data marts with aggregate-only data Conformed dimensions and facts -Warehouse Browsing -Access and Security -Query Management - Standard Reporting -Activity Monitor Reporting Tools Desktop Data Access Tools Data mining Data marts with atomic data Operational system 6 3

57 Data Staging Area (DSA) Transit storage for data in the ETL process Transformations/cleansing done here No user queries Sequential operations on large data volumes Performed by central ETL logic Easily restarted No need for locking, logging, etc. RDBMS or flat files? (DBMS have become better at this) Finished dimensions copied from DSA to relevant marts Allows centralized backup/recovery Backup/recovery facilities needed Better to do this centrally in DSA than in all data marts 7 ETL Construction Process Plan 1) Make high-level diagram of source-destination flow 2) Test, choose and implement ETL tool 3) Outline complex transformations, DW key generation and job sequence for every destination table Construction of dimensions 4) Construct and test building static dimension 5) Construct and test change mechanisms for one dimension 6) Construct and test remaining dimension builds Construction of fact tables and automation 7) Construct and test initial fact table build 8) Construct and test incremental update 9) Construct and test aggregate build 10) Design, construct, and test ETL automation 8 4

58 High-level diagram 1) Make high-level diagram of source-destination flow Mainly used for communication purpose One page only, highlight sources and destinations Steps: extract, transform, load Source Raw-Product (Spreadsheet) Raw-Sales (RDBMS) Add product type Check R.I. Aggregate sales per product per day Extract time Destination Product Sales Time 9 Building Dimensions Static dimension table DW key assignment: production keys to DW keys using table Check one-one and one-many relationships (using sorting) Handling dimension changes Described in last lecture Find the newest DW key for a given production key Table for mapping production keys to DW keys must be maintained and updated Load of dimensions Small dimensions: replace Large dimensions: load only changes Key mapping for the Product dimension pid DW_pid Time Product dimension of FClub vs. Product dimension of a supermarket 10 5

59 Building Fact Tables Two types of load Initial load ETL for all data up till now Done when DW is started the first time Very heavy - large data volumes Incremental update Move only changes since last load Years Done periodically (e.g., month or week) after DW start Less heavy - smaller data volumes Dimensions must be updated before facts The relevant dimension rows for new facts must be in place Special key considerations if initial load must be performed again Jan 09 Feb 09 March Extract 12 6

60 Types of Data Sources Non-cooperative sources Snapshot sources provides only full copy of source, e.g., files Specific sources each is different, e.g., legacy systems Logged sources writes change log, e.g., DB log Queryable sources provides query interface, e.g., RDBMS Cooperative sources Replicated sources publish/subscribe mechanism Call back sources calls external code (ETL) when changes occur Internal action sources only internal actions when changes occur DB triggers is an example Extract strategy depends on the source types 13 Extract Goal: fast extract of relevant data Extract from source systems can take long time Types of extracts: Extract applications (SQL): co-existence with other applications DB unload tools: faster than SQL-based extracts e.g., MS SQL Export Wizard, MySQL DB dump Extract applications the only solution in some scenarios Too time consuming to ETL all data at each load Can take days/weeks Drain on the operational systems and DW systems Extract/ETL only changes since last load (delta) 14 7

61 Computing Deltas Delta = changes since last load Store sorted total extracts in DSA Delta can easily be computed from current + last extract + Always possible + Handles deletions - High extraction time Put update timestamp on all rows (in sources) Updated by DB trigger - Source system must be changed, operational overhead Extract only where timestamp > time for last extract + Reduces extract time - Cannot (alone) handle deletions. Last extract time: 300 Timestamp DKK Changed Data Capture Messages Applications insert messages in a queue at updates + Works for all types of updates and systems - Operational applications must be changed+operational overhead DB triggers Triggers execute actions on INSERT/UPDATE/DELETE + Operational applications need not be changed + Enables real-time update of DW - Operational overhead Replication based on DB log Find changes directly in DB log which is written anyway + Operational applications need not be changed + No operational overhead - Not possible in some DBMS (SQL Server, Oracle, DB2 can do it) 16 8

62 Transform 17 Common Transformations Data type conversions EBCDIC ASCII/Unicode String manipulations Date/time format conversions E.g., Unix time = what time? Normalization/denormalization To the desired DW format Depending on source format Building keys Table matches production keys to surrogate DW keys Correct handling of history - especially for total reload 18 9

63 Data Quality Data almost never has decent quality Data in DW must be: Precise DW data must match known numbers Complete DW has all relevant data Consistent No contradictory data: aggregates fit with detail data Unique The same thing is called the same and has the same key Timely Data is updated frequently enough and the users know when 19 Cleansing Why cleansing? Garbage In Garbage Out BI does not work on raw data Pre-processing necessary for BI analysis Handle inconsistent data formats Spellings, codings, Remove unnecessary attributes Production keys, comments, Replace codes with text for easy understanding City name instead of ZIP code, e.g., Aalborg vs. DK-9000 Combine data from multiple sources with common key E.g., customer data from customer address, customer name, 20 10

64 Types of Cleansing Conversion and normalization Most common type of cleansing Text coding, date formats does 3/2 mean 3 rd February or 2 nd March? Special-purpose cleansing Look-up tables, dictionaries to find valid data, synonyms, abbreviations Normalize spellings of names, addresses, etc. Dorset Rd or Dorset Road? København or Copenhagen? Aalborg or Ålborg? Remove duplicates, e.g., duplicate customers Domain-independent cleansing Approximate, fuzzy joins on records from different sources E.g., two customers are regarded as the same if their respective values match for most of the attributes (e.g., address, phone number) Rule-based cleansing User-specified rules: if-then style Automatic rules: use data mining to find patterns in data Guess missing sales person based on customer and item 21 Cleansing Don t use special values (e.g., 0, -1) in your data They are hard to understand in query/analysis operations Mark facts with Data Status dimension Normal, abnormal, outside bounds, impossible, Facts can be taken in/out of analyses Uniform treatment of NULL Use NULLs only for measure values (estimates instead?) Use special dimension key (i.e., surrogate key value) for NULL dimension values E.g., for the time dimension, instead of NULL, use special key values to represent Date not known, Soon to happen Avoids problems in joins, since NULL is not equal to NULL Data Status Dimension SID Status 1 Normal 2 Abnormal 3 Out of bounds Sales fact table Sales SID

65 Improving Data Quality Appoint data steward Responsibility for data quality Includes manual inspections and corrections! DW-controlled improvement Default values Not yet assigned 157 note to data steward Source-controlled improvements Construct programs that check data quality Are totals as expected? Do results agree with alternative source? Number of NULL values? 23 Load 24 12

66 Load Goal: fast loading into DW Loading deltas is much faster than total load SQL-based update is slow Large overhead (optimization, locking, etc.) for every SQL call DB load tools are much faster Index on tables slows load a lot Drop index and rebuild after load Can be done per index partition Parallellization Dimensions can be loaded concurrently Fact tables can be loaded concurrently Partitions can be loaded concurrently 25 Load Relationships in the data Referential integrity and data consistency must be ensured before loading Because they won t be checked in the DW again Can be done by loader Aggregates Can be built and loaded at the same time as the detail data Load tuning Load without log Sort load file first Make only simple transformations in loader Use loader facilities for building aggregates 26 13

67 ETL Tools ETL tools from the big vendors Oracle Warehouse Builder IBM DB2 Warehouse Manager Microsoft SQL Server Integration Services (SSIS) Offers much functionality Data modeling ETL code generation Scheduling DW jobs but (some) have steep learning curves and high costs The best tool does not exist Choose based on your own needs You may also have to code your own 27 Issues Pipes Redirect output from one process to input of another process cat payments.dat grep 'payment' sort -r Files versus streams/pipes Streams/pipes: no disk overhead, fast throughput Files: easier restart, often only possibility Use ETL tool or write ETL code Code: easy start, co-existence with IT infrastructure, maybe the only possibility Tool: better productivity on subsequent projects, self-documenting Load frequency ETL time dependent of data volumes Daily load is much faster than monthly Applies to all steps in the ETL process 28 14

68 SQL Server Integration Services A concrete ETL tool Example ETL flow 29 Integration Services (IS) Microsoft s ETL tool Part of SQL Server 2008 Tools Import/export wizard - simple transformations BI Development Studio - advanced development Functionality available in several ways Through GUI - basic functionality Programming - advanced functionality 30 15

69 Packages A package is a collection of Data flows (Sources Transformations Destinations) Connections Control flow: Tasks, Workflows Variables A package may also invoke other packages and/or processes It is somehow similar to a program 31 A package Arrows show precendence constraints Constraint values: success (green) failure (red) completion (blue) Conditional expressions may also be given (A > B) 32 16

70 Package Control Flow Containers provide Structure to packages Services to tasks Control flow Foreach loop container Repeat tasks by using an enumerator For loop container Repeat tasks by testing a condition Sequence container Groups tasks and containers into control flows that are subsets of the package control flow Task host container An abstract container class which is used implicitly 33 Tasks A task is a unit of work Workflow Tasks Execute package execute other SSIS packages, good for structure! Execute Process run external application/batch file SQL Servers Tasks Bulk insert fast load of data Execute SQL execute any SQL query Data Preparation Tasks File System operations on files FTP up/download data Scripting Tasks Script execute.net code Maintenance Tasks DB maintenance Data Flow Tasks run data flows from sources through transformations to destinations (this is where the work is done) 34 17

71 Data Flow Elements Sources Make external data available All ODBC/OLE DB data sources: RDBMS, Excel, Text files, Transformations Update, summarize, cleanse, merge Destinations Write data to specific store Input, Output, Error output 36 Transformations Row Transformations Character Map - applies string functions to character data Derived Column populates columns using expressions Rowset Transformations (rowset = tabular data) Aggregate - performs aggregations Sort - sorts data Percentage Sampling - creates sample data set by setting % Split and Join Transformations Conditional Split - routes data rows to different outputs Merge - merges two sorted data sets Lookup Transformation - looks up ref values by exact match Other Transformations Export Column - inserts data from a data flow into a file Import Column - reads data from a file and adds it to a data flow Slowly Changing Dimension - configures update of a SCD 37 18

72 A Few Hints on ETL Design Don t implement all transformations in one step! Build first step and check that result is as expected Add second step and execute both, check result (How to check?) Add third step Test SQL statements before putting into IS Do one thing at the time Copy source data one-by-one to the data staging area (DSA) Compute deltas Only if doing incremental load Handle versions and DW keys Versions only if handling slowly changing dimensions Implement complex transformations Load dimensions Load facts 38 Summary The ETL Process Extract Transformations/cleansing Load 39 19

ETL and Advanced Modeling

ETL and Advanced Modeling ETL and Advanced Modeling 1. Extract-Transform-Load (ETL) The ETL process Building dimension and fact tables Extract, transform, load 2. Advanced Multidimensional Modeling Handling changes in dimensions

More information

Business Intelligence, Data Warehousing and Multidimensional Databases. Torben Bach Pedersen

Business Intelligence, Data Warehousing and Multidimensional Databases. Torben Bach Pedersen Business Intelligence, Data Warehousing and Multidimensional Databases Torben Bach Pedersen Business Intelligence Overview Why Business Intelligence? Data analysis problems Data Warehouse (DW) introduction

More information

ETL Overview. Extract, Transform, Load (ETL) Refreshment Workflow. The ETL Process. General ETL issues. MS Integration Services

ETL Overview. Extract, Transform, Load (ETL) Refreshment Workflow. The ETL Process. General ETL issues. MS Integration Services ETL Overview Extract, Transform, Load (ETL) General ETL issues ETL/DW refreshment process Building dimensions Building fact tables Extract Transformations/cleansing Load MS Integration Services Original

More information

Data Warehousing and Data Mining

Data Warehousing and Data Mining Data Warehousing and Data Mining Part I: Data Warehousing Gao Cong [email protected] Slides adapted from Man Lung Yiu and Torben Bach Pedersen Course Structure Business intelligence: Extract knowledge

More information

Data Warehousing and Decision Support. Torben Bach Pedersen Department of Computer Science Aalborg University

Data Warehousing and Decision Support. Torben Bach Pedersen Department of Computer Science Aalborg University Data Warehousing and Decision Support Torben Bach Pedersen Department of Computer Science Aalborg University Talk Overview Data warehousing and decision support basics Definition Applications Multidimensional

More information

Extraction Transformation Loading ETL Get data out of sources and load into the DW

Extraction Transformation Loading ETL Get data out of sources and load into the DW Lection 5 ETL Definition Extraction Transformation Loading ETL Get data out of sources and load into the DW Data is extracted from OLTP database, transformed to match the DW schema and loaded into the

More information

Overview. DW Source Integration, Tools, and Architecture. End User Applications (EUA) EUA Concepts. DW Front End Tools. Source Integration

Overview. DW Source Integration, Tools, and Architecture. End User Applications (EUA) EUA Concepts. DW Front End Tools. Source Integration DW Source Integration, Tools, and Architecture Overview DW Front End Tools Source Integration DW architecture Original slides were written by Torben Bach Pedersen Aalborg University 2007 - DWML course

More information

Data Warehousing Systems: Foundations and Architectures

Data Warehousing Systems: Foundations and Architectures Data Warehousing Systems: Foundations and Architectures Il-Yeol Song Drexel University, http://www.ischool.drexel.edu/faculty/song/ SYNONYMS None DEFINITION A data warehouse (DW) is an integrated repository

More information

Designing a Dimensional Model

Designing a Dimensional Model Designing a Dimensional Model Erik Veerman Atlanta MDF member SQL Server MVP, Microsoft MCT Mentor, Solid Quality Learning Definitions Data Warehousing A subject-oriented, integrated, time-variant, and

More information

DATA WAREHOUSING AND OLAP TECHNOLOGY

DATA WAREHOUSING AND OLAP TECHNOLOGY DATA WAREHOUSING AND OLAP TECHNOLOGY Manya Sethi MCA Final Year Amity University, Uttar Pradesh Under Guidance of Ms. Shruti Nagpal Abstract DATA WAREHOUSING and Online Analytical Processing (OLAP) are

More information

1. OLAP is an acronym for a. Online Analytical Processing b. Online Analysis Process c. Online Arithmetic Processing d. Object Linking and Processing

1. OLAP is an acronym for a. Online Analytical Processing b. Online Analysis Process c. Online Arithmetic Processing d. Object Linking and Processing 1. OLAP is an acronym for a. Online Analytical Processing b. Online Analysis Process c. Online Arithmetic Processing d. Object Linking and Processing 2. What is a Data warehouse a. A database application

More information

Data Warehouse: Introduction

Data Warehouse: Introduction Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of base and data mining group,

More information

BUILDING BLOCKS OF DATAWAREHOUSE. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT

BUILDING BLOCKS OF DATAWAREHOUSE. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT BUILDING BLOCKS OF DATAWAREHOUSE G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT 1 Data Warehouse Subject Oriented Organized around major subjects, such as customer, product, sales. Focusing on

More information

THE DATA WAREHOUSE ETL TOOLKIT CDT803 Three Days

THE DATA WAREHOUSE ETL TOOLKIT CDT803 Three Days Three Days Prerequisites Students should have at least some experience with any relational database management system. Who Should Attend This course is targeted at technical staff, team leaders and project

More information

The Data Warehouse ETL Toolkit

The Data Warehouse ETL Toolkit 2008 AGI-Information Management Consultants May be used for personal purporses only or by libraries associated to dandelon.com network. The Data Warehouse ETL Toolkit Practical Techniques for Extracting,

More information

When to consider OLAP?

When to consider OLAP? When to consider OLAP? Author: Prakash Kewalramani Organization: Evaltech, Inc. Evaltech Research Group, Data Warehousing Practice. Date: 03/10/08 Email: [email protected] Abstract: Do you need an OLAP

More information

Advanced Data Management Technologies

Advanced Data Management Technologies ADMT 2015/16 Unit 2 J. Gamper 1/44 Advanced Data Management Technologies Unit 2 Basic Concepts of BI and Data Warehousing J. Gamper Free University of Bozen-Bolzano Faculty of Computer Science IDSE Acknowledgements:

More information

ETL-EXTRACT, TRANSFORM & LOAD TESTING

ETL-EXTRACT, TRANSFORM & LOAD TESTING ETL-EXTRACT, TRANSFORM & LOAD TESTING Rajesh Popli Manager (Quality), Nagarro Software Pvt. Ltd., Gurgaon, INDIA [email protected] ABSTRACT Data is most important part in any organization. Data

More information

Week 13: Data Warehousing. Warehousing

Week 13: Data Warehousing. Warehousing 1 Week 13: Data Warehousing Warehousing Growing industry: $8 billion in 1998 Range from desktop to huge: Walmart: 900-CPU, 2,700 disk, 23TB Teradata system Lots of buzzwords, hype slice & dice, rollup,

More information

Data Warehousing. Read chapter 13 of Riguzzi et al Sistemi Informativi. Slides derived from those by Hector Garcia-Molina

Data Warehousing. Read chapter 13 of Riguzzi et al Sistemi Informativi. Slides derived from those by Hector Garcia-Molina Data Warehousing Read chapter 13 of Riguzzi et al Sistemi Informativi Slides derived from those by Hector Garcia-Molina What is a Warehouse? Collection of diverse data subject oriented aimed at executive,

More information

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP Data Warehousing and End-User Access Tools OLAP and Data Mining Accompanying growth in data warehouses is increasing demands for more powerful access tools providing advanced analytical capabilities. Key

More information

Introduction to Data Warehousing. Ms Swapnil Shrivastava [email protected]

Introduction to Data Warehousing. Ms Swapnil Shrivastava swapnil@konark.ncst.ernet.in Introduction to Data Warehousing Ms Swapnil Shrivastava [email protected] Necessity is the mother of invention Why Data Warehouse? Scenario 1 ABC Pvt Ltd is a company with branches at Mumbai,

More information

14. Data Warehousing & Data Mining

14. Data Warehousing & Data Mining 14. Data Warehousing & Data Mining Data Warehousing Concepts Decision support is key for companies wanting to turn their organizational data into an information asset Data Warehouse "A subject-oriented,

More information

COURSE OUTLINE. Track 1 Advanced Data Modeling, Analysis and Design

COURSE OUTLINE. Track 1 Advanced Data Modeling, Analysis and Design COURSE OUTLINE Track 1 Advanced Data Modeling, Analysis and Design TDWI Advanced Data Modeling Techniques Module One Data Modeling Concepts Data Models in Context Zachman Framework Overview Levels of Data

More information

Data Warehousing. Outline. From OLTP to the Data Warehouse. Overview of data warehousing Dimensional Modeling Online Analytical Processing

Data Warehousing. Outline. From OLTP to the Data Warehouse. Overview of data warehousing Dimensional Modeling Online Analytical Processing Data Warehousing Outline Overview of data warehousing Dimensional Modeling Online Analytical Processing From OLTP to the Data Warehouse Traditionally, database systems stored data relevant to current business

More information

Microsoft Data Warehouse in Depth

Microsoft Data Warehouse in Depth Microsoft Data Warehouse in Depth 1 P a g e Duration What s new Why attend Who should attend Course format and prerequisites 4 days The course materials have been refreshed to align with the second edition

More information

Chapter 3 - Data Replication and Materialized Integration

Chapter 3 - Data Replication and Materialized Integration Prof. Dr.-Ing. Stefan Deßloch AG Heterogene Informationssysteme Geb. 36, Raum 329 Tel. 0631/205 3275 [email protected] Chapter 3 - Data Replication and Materialized Integration Motivation Replication:

More information

SQL SERVER BUSINESS INTELLIGENCE (BI) - INTRODUCTION

SQL SERVER BUSINESS INTELLIGENCE (BI) - INTRODUCTION 1 SQL SERVER BUSINESS INTELLIGENCE (BI) - INTRODUCTION What is BI? Microsoft SQL Server 2008 provides a scalable Business Intelligence platform optimized for data integration, reporting, and analysis,

More information

University of Gaziantep, Department of Business Administration

University of Gaziantep, Department of Business Administration University of Gaziantep, Department of Business Administration The extensive use of information technology enables organizations to collect huge amounts of data about almost every aspect of their businesses.

More information

Data warehouse Architectures and processes

Data warehouse Architectures and processes Database and data mining group, Data warehouse Architectures and processes DATA WAREHOUSE: ARCHITECTURES AND PROCESSES - 1 Database and data mining group, Data warehouse architectures Separation between

More information

CS54100: Database Systems

CS54100: Database Systems CS54100: Database Systems Date Warehousing: Current, Future? 20 April 2012 Prof. Chris Clifton Data Warehousing: Goals OLAP vs OLTP On Line Analytical Processing (vs. Transaction) Optimize for read, not

More information

CHAPTER SIX DATA. Business Intelligence. 2011 The McGraw-Hill Companies, All Rights Reserved

CHAPTER SIX DATA. Business Intelligence. 2011 The McGraw-Hill Companies, All Rights Reserved CHAPTER SIX DATA Business Intelligence 2011 The McGraw-Hill Companies, All Rights Reserved 2 CHAPTER OVERVIEW SECTION 6.1 Data, Information, Databases The Business Benefits of High-Quality Information

More information

Overview. Data Warehousing and Decision Support. Introduction. Three Complementary Trends. Data Warehousing. An Example: The Store (e.g.

Overview. Data Warehousing and Decision Support. Introduction. Three Complementary Trends. Data Warehousing. An Example: The Store (e.g. Overview Data Warehousing and Decision Support Chapter 25 Why data warehousing and decision support Data warehousing and the so called star schema MOLAP versus ROLAP OLAP, ROLLUP AND CUBE queries Design

More information

Data Warehousing and Decision Support. Introduction. Three Complementary Trends. Chapter 23, Part A

Data Warehousing and Decision Support. Introduction. Three Complementary Trends. Chapter 23, Part A Data Warehousing and Decision Support Chapter 23, Part A Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke 1 Introduction Increasingly, organizations are analyzing current and historical

More information

Decision Support. Chapter 23. Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke 1

Decision Support. Chapter 23. Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke 1 Decision Support Chapter 23 Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke 1 Introduction Increasingly, organizations are analyzing current and historical data to identify useful

More information

DATA CUBES E0 261. Jayant Haritsa Computer Science and Automation Indian Institute of Science. JAN 2014 Slide 1 DATA CUBES

DATA CUBES E0 261. Jayant Haritsa Computer Science and Automation Indian Institute of Science. JAN 2014 Slide 1 DATA CUBES E0 261 Jayant Haritsa Computer Science and Automation Indian Institute of Science JAN 2014 Slide 1 Introduction Increasingly, organizations are analyzing historical data to identify useful patterns and

More information

Data Warehousing and OLAP Technology for Knowledge Discovery

Data Warehousing and OLAP Technology for Knowledge Discovery 542 Data Warehousing and OLAP Technology for Knowledge Discovery Aparajita Suman Abstract Since time immemorial, libraries have been generating services using the knowledge stored in various repositories

More information

CHAPTER 5: BUSINESS ANALYTICS

CHAPTER 5: BUSINESS ANALYTICS Chapter 5: Business Analytics CHAPTER 5: BUSINESS ANALYTICS Objectives The objectives are: Describe Business Analytics. Explain the terminology associated with Business Analytics. Describe the data warehouse

More information

MDM and Data Warehousing Complement Each Other

MDM and Data Warehousing Complement Each Other Master Management MDM and Warehousing Complement Each Other Greater business value from both 2011 IBM Corporation Executive Summary Master Management (MDM) and Warehousing (DW) complement each other There

More information

OLAP and OLTP. AMIT KUMAR BINDAL Associate Professor M M U MULLANA

OLAP and OLTP. AMIT KUMAR BINDAL Associate Professor M M U MULLANA OLAP and OLTP AMIT KUMAR BINDAL Associate Professor Databases Databases are developed on the IDEA that DATA is one of the critical materials of the Information Age Information, which is created by data,

More information

Data Warehousing: Data Models and OLAP operations. By Kishore Jaladi [email protected]

Data Warehousing: Data Models and OLAP operations. By Kishore Jaladi kishorejaladi@yahoo.com Data Warehousing: Data Models and OLAP operations By Kishore Jaladi [email protected] Topics Covered 1. Understanding the term Data Warehousing 2. Three-tier Decision Support Systems 3. Approaches

More information

CHAPTER 4: BUSINESS ANALYTICS

CHAPTER 4: BUSINESS ANALYTICS Chapter 4: Business Analytics CHAPTER 4: BUSINESS ANALYTICS Objectives Introduction The objectives are: Describe Business Analytics Explain the terminology associated with Business Analytics Describe the

More information

CS2032 Data warehousing and Data Mining Unit II Page 1

CS2032 Data warehousing and Data Mining Unit II Page 1 UNIT II BUSINESS ANALYSIS Reporting Query tools and Applications The data warehouse is accessed using an end-user query and reporting tool from Business Objects. Business Objects provides several tools

More information

OLAP and Data Warehousing! Introduction!

OLAP and Data Warehousing! Introduction! The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still

More information

Unlock your data for fast insights: dimensionless modeling with in-memory column store. By Vadim Orlov

Unlock your data for fast insights: dimensionless modeling with in-memory column store. By Vadim Orlov Unlock your data for fast insights: dimensionless modeling with in-memory column store By Vadim Orlov I. DIMENSIONAL MODEL Dimensional modeling (also known as star or snowflake schema) was pioneered by

More information

Data Warehousing and OLAP

Data Warehousing and OLAP 1 Data Warehousing and OLAP Hector Garcia-Molina Stanford University Warehousing Growing industry: $8 billion in 1998 Range from desktop to huge: Walmart: 900-CPU, 2,700 disk, 23TB Teradata system Lots

More information

Dimodelo Solutions Data Warehousing and Business Intelligence Concepts

Dimodelo Solutions Data Warehousing and Business Intelligence Concepts Dimodelo Solutions Data Warehousing and Business Intelligence Concepts Copyright Dimodelo Solutions 2010. All Rights Reserved. No part of this document may be reproduced without written consent from the

More information

Data Integration and ETL Process

Data Integration and ETL Process Data Integration and ETL Process Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Software Development Technologies Master studies, second

More information

Week 3 lecture slides

Week 3 lecture slides Week 3 lecture slides Topics Data Warehouses Online Analytical Processing Introduction to Data Cubes Textbook reference: Chapter 3 Data Warehouses A data warehouse is a collection of data specifically

More information

Optimizing Your Data Warehouse Design for Superior Performance

Optimizing Your Data Warehouse Design for Superior Performance Optimizing Your Data Warehouse Design for Superior Performance Lester Knutsen, President and Principal Database Consultant Advanced DataTools Corporation Session 2100A The Problem The database is too complex

More information

2074 : Designing and Implementing OLAP Solutions Using Microsoft SQL Server 2000

2074 : Designing and Implementing OLAP Solutions Using Microsoft SQL Server 2000 2074 : Designing and Implementing OLAP Solutions Using Microsoft SQL Server 2000 Introduction This course provides students with the knowledge and skills necessary to design, implement, and deploy OLAP

More information

www.ijreat.org Published by: PIONEER RESEARCH & DEVELOPMENT GROUP (www.prdg.org) 28

www.ijreat.org Published by: PIONEER RESEARCH & DEVELOPMENT GROUP (www.prdg.org) 28 Data Warehousing - Essential Element To Support Decision- Making Process In Industries Ashima Bhasin 1, Mr Manoj Kumar 2 1 Computer Science Engineering Department, 2 Associate Professor, CSE Abstract SGT

More information

LearnFromGuru Polish your knowledge

LearnFromGuru Polish your knowledge SQL SERVER 2008 R2 /2012 (TSQL/SSIS/ SSRS/ SSAS BI Developer TRAINING) Module: I T-SQL Programming and Database Design An Overview of SQL Server 2008 R2 / 2012 Available Features and Tools New Capabilities

More information

Business Intelligence, Analytics & Reporting: Glossary of Terms

Business Intelligence, Analytics & Reporting: Glossary of Terms Business Intelligence, Analytics & Reporting: Glossary of Terms A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Ad-hoc analytics Ad-hoc analytics is the process by which a user can create a new report

More information

Outline. Data Warehousing. What is a Warehouse? What is a Warehouse?

Outline. Data Warehousing. What is a Warehouse? What is a Warehouse? Outline Data Warehousing What is a data warehouse? Why a warehouse? Models & operations Implementing a warehouse 2 What is a Warehouse? Collection of diverse data subject oriented aimed at executive, decision

More information

Business Intelligence, Data warehousing Concept and artifacts

Business Intelligence, Data warehousing Concept and artifacts Business Intelligence, Data warehousing Concept and artifacts Data Warehousing is the process of constructing and using the data warehouse. The data warehouse is constructed by integrating the data from

More information

Understanding Data Warehousing. [by Alex Kriegel]

Understanding Data Warehousing. [by Alex Kriegel] Understanding Data Warehousing 2008 [by Alex Kriegel] Things to Discuss Who Needs a Data Warehouse? OLTP vs. Data Warehouse Business Intelligence Industrial Landscape Which Data Warehouse: Bill Inmon vs.

More information

Bussiness Intelligence and Data Warehouse. Tomas Bartos CIS 764, Kansas State University

Bussiness Intelligence and Data Warehouse. Tomas Bartos CIS 764, Kansas State University Bussiness Intelligence and Data Warehouse Schedule Bussiness Intelligence (BI) BI tools Oracle vs. Microsoft Data warehouse History Tools Oracle vs. Others Discussion Business Intelligence (BI) Products

More information

An Overview of Data Warehousing, Data mining, OLAP and OLTP Technologies

An Overview of Data Warehousing, Data mining, OLAP and OLTP Technologies An Overview of Data Warehousing, Data mining, OLAP and OLTP Technologies Ashish Gahlot, Manoj Yadav Dronacharya college of engineering Farrukhnagar, Gurgaon,Haryana Abstract- Data warehousing, Data Mining,

More information

SQL Server Administrator Introduction - 3 Days Objectives

SQL Server Administrator Introduction - 3 Days Objectives SQL Server Administrator Introduction - 3 Days INTRODUCTION TO MICROSOFT SQL SERVER Exploring the components of SQL Server Identifying SQL Server administration tasks INSTALLING SQL SERVER Identifying

More information

Database Applications. Advanced Querying. Transaction Processing. Transaction Processing. Data Warehouse. Decision Support. Transaction processing

Database Applications. Advanced Querying. Transaction Processing. Transaction Processing. Data Warehouse. Decision Support. Transaction processing Database Applications Advanced Querying Transaction processing Online setting Supports day-to-day operation of business OLAP Data Warehousing Decision support Offline setting Strategic planning (statistics)

More information

SAS BI Course Content; Introduction to DWH / BI Concepts

SAS BI Course Content; Introduction to DWH / BI Concepts SAS BI Course Content; Introduction to DWH / BI Concepts SAS Web Report Studio 4.2 SAS EG 4.2 SAS Information Delivery Portal 4.2 SAS Data Integration Studio 4.2 SAS BI Dashboard 4.2 SAS Management Console

More information

Jet Data Manager 2012 User Guide

Jet Data Manager 2012 User Guide Jet Data Manager 2012 User Guide Welcome This documentation provides descriptions of the concepts and features of the Jet Data Manager and how to use with them. With the Jet Data Manager you can transform

More information

Multidimensional Modeling - Stocks

Multidimensional Modeling - Stocks Bases de Dados e Data Warehouse 06 BDDW 2006/2007 Notice! Author " João Moura Pires ([email protected])! This material can be freely used for personal or academic purposes without any previous authorization

More information

Data Warehousing, OLAP, and Data Mining

Data Warehousing, OLAP, and Data Mining Data Warehousing, OLAP, and Marek Rychly [email protected] Strathmore University, @ilabafrica & Brno University of Technology, Faculty of Information Technology Advanced Databases and Enterprise Systems

More information

Data Warehousing and Data Mining Introduction

Data Warehousing and Data Mining Introduction Data Warehousing and Data Mining Introduction General introduction to DWDM Business intelligence OLTP vs. OLAP Data integration Methodological framework DW definition Acknowledgements: I am indebted to

More information

Data W a Ware r house house and and OLAP II Week 6 1

Data W a Ware r house house and and OLAP II Week 6 1 Data Warehouse and OLAP II Week 6 1 Team Homework Assignment #8 Using a data warehousing tool and a data set, play four OLAP operations (Roll up (drill up), Drill down (roll down), Slice and dice, Pivot

More information

Presented by: Jose Chinchilla, MCITP

Presented by: Jose Chinchilla, MCITP Presented by: Jose Chinchilla, MCITP Jose Chinchilla MCITP: Database Administrator, SQL Server 2008 MCITP: Business Intelligence SQL Server 2008 Customers & Partners Current Positions: President, Agile

More information

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 15 - Data Warehousing: Cubes

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 15 - Data Warehousing: Cubes CSE 544 Principles of Database Management Systems Magdalena Balazinska Winter 2009 Lecture 15 - Data Warehousing: Cubes Final Exam Overview Open books and open notes No laptops and no other mobile devices

More information

SQL Server 2012 Business Intelligence Boot Camp

SQL Server 2012 Business Intelligence Boot Camp SQL Server 2012 Business Intelligence Boot Camp Length: 5 Days Technology: Microsoft SQL Server 2012 Delivery Method: Instructor-led (classroom) About this Course Data warehousing is a solution organizations

More information

Mastering Data Warehouse Aggregates. Solutions for Star Schema Performance

Mastering Data Warehouse Aggregates. Solutions for Star Schema Performance Brochure More information from http://www.researchandmarkets.com/reports/2248199/ Mastering Data Warehouse Aggregates. Solutions for Star Schema Performance Description: - This is the first book to provide

More information

Trivadis White Paper. Comparison of Data Modeling Methods for a Core Data Warehouse. Dani Schnider Adriano Martino Maren Eschermann

Trivadis White Paper. Comparison of Data Modeling Methods for a Core Data Warehouse. Dani Schnider Adriano Martino Maren Eschermann Trivadis White Paper Comparison of Data Modeling Methods for a Core Data Warehouse Dani Schnider Adriano Martino Maren Eschermann June 2014 Table of Contents 1. Introduction... 3 2. Aspects of Data Warehouse

More information

Speeding ETL Processing in Data Warehouses White Paper

Speeding ETL Processing in Data Warehouses White Paper Speeding ETL Processing in Data Warehouses White Paper 020607dmxwpADM High-Performance Aggregations and Joins for Faster Data Warehouse Processing Data Processing Challenges... 1 Joins and Aggregates are

More information

Foundations of Business Intelligence: Databases and Information Management

Foundations of Business Intelligence: Databases and Information Management Chapter 5 Foundations of Business Intelligence: Databases and Information Management 5.1 Copyright 2011 Pearson Education, Inc. Student Learning Objectives How does a relational database organize data,

More information

Moving Large Data at a Blinding Speed for Critical Business Intelligence. A competitive advantage

Moving Large Data at a Blinding Speed for Critical Business Intelligence. A competitive advantage Moving Large Data at a Blinding Speed for Critical Business Intelligence A competitive advantage Intelligent Data In Real Time How do you detect and stop a Money Laundering transaction just about to take

More information

Basics of Dimensional Modeling

Basics of Dimensional Modeling Basics of Dimensional Modeling Data warehouse and OLAP tools are based on a dimensional data model. A dimensional model is based on dimensions, facts, cubes, and schemas such as star and snowflake. Dimensional

More information

Data Warehouse Snowflake Design and Performance Considerations in Business Analytics

Data Warehouse Snowflake Design and Performance Considerations in Business Analytics Journal of Advances in Information Technology Vol. 6, No. 4, November 2015 Data Warehouse Snowflake Design and Performance Considerations in Business Analytics Jiangping Wang and Janet L. Kourik Walker

More information

Lection 3-4 WAREHOUSING

Lection 3-4 WAREHOUSING Lection 3-4 DATA WAREHOUSING Learning Objectives Understand d the basic definitions iti and concepts of data warehouses Understand data warehousing architectures Describe the processes used in developing

More information

IST722 Data Warehousing

IST722 Data Warehousing IST722 Data Warehousing Components of the Data Warehouse Michael A. Fudge, Jr. Recall: Inmon s CIF The CIF is a reference architecture Understanding the Diagram The CIF is a reference architecture CIF

More information

SQL SERVER TRAINING CURRICULUM

SQL SERVER TRAINING CURRICULUM SQL SERVER TRAINING CURRICULUM Complete SQL Server 2000/2005 for Developers Management and Administration Overview Creating databases and transaction logs Managing the file system Server and database configuration

More information

Building Cubes and Analyzing Data using Oracle OLAP 11g

Building Cubes and Analyzing Data using Oracle OLAP 11g Building Cubes and Analyzing Data using Oracle OLAP 11g Collaborate '08 Session 219 Chris Claterbos [email protected] Vlamis Software Solutions, Inc. 816-729-1034 http://www.vlamis.com Copyright 2007,

More information

IBM WebSphere DataStage Online training from Yes-M Systems

IBM WebSphere DataStage Online training from Yes-M Systems Yes-M Systems offers the unique opportunity to aspiring fresher s and experienced professionals to get real time experience in ETL Data warehouse tool IBM DataStage. Course Description With this training

More information

COURSE SYLLABUS COURSE TITLE:

COURSE SYLLABUS COURSE TITLE: 1 COURSE SYLLABUS COURSE TITLE: FORMAT: CERTIFICATION EXAMS: 55043AC Microsoft End to End Business Intelligence Boot Camp Instructor-led None This course syllabus should be used to determine whether the

More information

ORACLE OLAP. Oracle OLAP is embedded in the Oracle Database kernel and runs in the same database process

ORACLE OLAP. Oracle OLAP is embedded in the Oracle Database kernel and runs in the same database process ORACLE OLAP KEY FEATURES AND BENEFITS FAST ANSWERS TO TOUGH QUESTIONS EASILY KEY FEATURES & BENEFITS World class analytic engine Superior query performance Simple SQL access to advanced analytics Enhanced

More information

OLAP & DATA MINING CS561-SPRING 2012 WPI, MOHAMED ELTABAKH

OLAP & DATA MINING CS561-SPRING 2012 WPI, MOHAMED ELTABAKH OLAP & DATA MINING CS561-SPRING 2012 WPI, MOHAMED ELTABAKH 1 Online Analytic Processing OLAP 2 OLAP OLAP: Online Analytic Processing OLAP queries are complex queries that Touch large amounts of data Discover

More information

About the Tutorial. Audience. Prerequisites. Disclaimer & Copyright. ETL Testing

About the Tutorial. Audience. Prerequisites. Disclaimer & Copyright. ETL Testing About the Tutorial An ETL tool extracts the data from all these heterogeneous data sources, transforms the data (like applying calculations, joining fields, keys, removing incorrect data fields, etc.),

More information

Emerging Technologies Shaping the Future of Data Warehouses & Business Intelligence

Emerging Technologies Shaping the Future of Data Warehouses & Business Intelligence Emerging Technologies Shaping the Future of Data Warehouses & Business Intelligence Appliances and DW Architectures John O Brien President and Executive Architect Zukeran Technologies 1 TDWI 1 Agenda What

More information

SQL Server 2005 Features Comparison

SQL Server 2005 Features Comparison Page 1 of 10 Quick Links Home Worldwide Search Microsoft.com for: Go : Home Product Information How to Buy Editions Learning Downloads Support Partners Technologies Solutions Community Previous Versions

More information

<Insert Picture Here> Extending Hyperion BI with the Oracle BI Server

<Insert Picture Here> Extending Hyperion BI with the Oracle BI Server Extending Hyperion BI with the Oracle BI Server Mark Ostroff Sr. BI Solutions Consultant Agenda Hyperion BI versus Hyperion BI with OBI Server Benefits of using Hyperion BI with the

More information

Chapter 5. Learning Objectives. DW Development and ETL

Chapter 5. Learning Objectives. DW Development and ETL Chapter 5 DW Development and ETL Learning Objectives Explain data integration and the extraction, transformation, and load (ETL) processes Basic DW development methodologies Describe real-time (active)

More information

Turning your Warehouse Data into Business Intelligence: Reporting Trends and Visibility Michael Armanious; Vice President Sales and Marketing Datex,

Turning your Warehouse Data into Business Intelligence: Reporting Trends and Visibility Michael Armanious; Vice President Sales and Marketing Datex, Turning your Warehouse Data into Business Intelligence: Reporting Trends and Visibility Michael Armanious; Vice President Sales and Marketing Datex, Inc. Overview Introduction What is Business Intelligence?

More information

Data Warehouse Logical Modeling and Design (6)

Data Warehouse Logical Modeling and Design (6) Data Warehouse Logical Modeling and Design (6) Bernard ESPINASSE Professeur à Aix-Marseille Université (AMU) Ecole Polytechnique Universitaire de Marseille October 2013 Methodological framework Logical

More information

Monitoring Genebanks using Datamarts based in an Open Source Tool

Monitoring Genebanks using Datamarts based in an Open Source Tool Monitoring Genebanks using Datamarts based in an Open Source Tool April 10 th, 2008 Edwin Rojas Research Informatics Unit (RIU) International Potato Center (CIP) GPG2 Workshop 2008 Datamarts Motivation

More information

DATA WAREHOUSE E KNOWLEDGE DISCOVERY

DATA WAREHOUSE E KNOWLEDGE DISCOVERY DATA WAREHOUSE E KNOWLEDGE DISCOVERY Prof. Fabio A. Schreiber Dipartimento di Elettronica e Informazione Politecnico di Milano DATA WAREHOUSE (DW) A TECHNIQUE FOR CORRECTLY ASSEMBLING AND MANAGING DATA

More information

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1 Slide 29-1 Chapter 29 Overview of Data Warehousing and OLAP Chapter 29 Outline Purpose of Data Warehousing Introduction, Definitions, and Terminology Comparison with Traditional Databases Characteristics

More information

Implementing a Data Warehouse with Microsoft SQL Server 2012 MOC 10777

Implementing a Data Warehouse with Microsoft SQL Server 2012 MOC 10777 Implementing a Data Warehouse with Microsoft SQL Server 2012 MOC 10777 Course Outline Module 1: Introduction to Data Warehousing This module provides an introduction to the key components of a data warehousing

More information

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 16 - Data Warehousing

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 16 - Data Warehousing CSE 544 Principles of Database Management Systems Magdalena Balazinska Fall 2007 Lecture 16 - Data Warehousing Class Projects Class projects are going very well! Project presentations: 15 minutes On Wednesday

More information