A Multidimensional Design for the Genesis Data Set

Size: px
Start display at page:

Download "A Multidimensional Design for the Genesis Data Set"

Transcription

1 A Multidimensional Design for the Genesis Data Set Ray Hylock Management Sciences Department University of Iowa Iowa City, IA USA ABSTRACT Relational databases are designed to store and process moving data. There is a current that runs through these systems continuously inserting, updating, and deleting record after record; never ceasing to tire. Modern day database management systems such as Oracle and Microsoft SQL Server are designed to control this chaotic environment. However, it is the very same data models that maintain the robustness of these systems that hinders speed during data analysis. That is, the goal of database models such as the Entity Relationship (ER) model is to normalize the data; remove redundancy and separate data themes in order to avoid insert, update, and delete anomalies. Data warehouses, on the other hand, do not suffer from these anomalies. Therefore, the goal of the multidimensional models used by data warehouses is to denormalize the data as much as possible in order to increase the speed of analytical queries. This paper is an extension to the work done by Lu et al. The goal for [1] is to create a model to manage large volumes of clinical data for decision support and quality control at the point of patient care. One component of this is a data warehouse. This paper represents the first stage in converting the existing relational design created in [1] to a multidimensional one, suitable for the needs of data analysis. As you will see, the current healthcare related data warehouse research is geared more towards customer billing and does not allow for multi valued attributes. In fact, in similar settings, only the top n number of values (in the case of [2] and [3] n = 10) were selected and the rest were thrown out. In order to accurately build a model for decision support in this field, all data points need to be considered. 1 INTRODUCTION There is a clear distinction between traditional databases (e.g. relational), which are transactional in nature, and data warehouses. Transactional systems support online transaction processing (OLTP) which includes inserts, updates, deletes, and support querying small subsets of data held within. Data warehouse systems using online analytical processing (OLAP) tools, on the other hand, are mainly intended for decision support applications and are optimized for retrieval instead of routine transaction processing [11]. OLAP generally involves highly complex queries that use one or more aggregations; sometimes called decision support queries [6]. While everyday processing of healthcare information should be done with OLTP databases, analysis of the data stored within needs to be done on a platform intended for intensive, complex, and all inclusive data computations. Data warehouses can be characterized as a subject oriented, integrated, nonvolatile, time variant collection of data used to support management decision making. Each characteristic is as follows [4]: Subject oriented: The data warehouse is organized around key subjects such as customers, students, patients, and in our case, visits Integrated: All data in the data warehouse conforms to one standard. For example, in one OLTP database, male and female may be stored as m and f whereas in another, male and female. When records are added to the data warehouse, the domain (or set of values) of each attribute will be checked to make

2 Hylock: Multidimensional Design for the Genesis Data Set Page 2 sure all values adhere to the standards imposed by the data warehouse; in this case, if the acceptable values were set to { m, f }, the second system s values will need to be converted from male to m and female to f. This ensures data integrity and improves query performance. Nonvolatile: The data is loaded and refreshed in a single process and is not updatable by end users. This means that the new data is generally uploaded in batches and users do not have access to edit this data; only that which is stored in the database. Time variant: The same data can be stored as a time series such that you can analyze the data across time and look for trends and changes Since the goal of the Genesis project is to create a model to manage large volumes of clinical data for decision support and quality control at the point of patient care, this work will focus on converting the existing relational database design into a data warehouse supported, multidimensional one. 1.1 THE GENESIS DATA SET The Genesis data set was generated by the Genesis Medical Center (GMC) in Davenport, Iowa. Approximately 650 registered nurses (RN) provide acute and skilled inpatient, outpatient, and home care for this over 500 bed community hospital. Since 1983, GMC has maintained a computerized nursing information system which stores care plans in the form of North American Nursing Diagnosis Association approved codes. The original data set was given to Lu et al in the form of two flat files which were later converted into 10 relational entities (tables in a database)[1]. In order to understand the multidimensional design a little background in what the codes look like and what they mean is required. First, we will cover Nursing Diagnoses, then Interventions (which is almost the same for characteristics, etiologies, and outcomes. By almost, I mean that the code group varies in interventions while it does not in the others). Sample Nursing Diagnoses code: 01MBA This code can be broken up into four pieces: 1. Ordering: how important is this code to the patients treatment plan = Code Group: MBA 3. Unique Identifier: Date (YYMMDD): Sample Interventions code: 01MBV This code can be broken up into five pieces: 1. Ordering: Code Group: MBV (for interventions, MBE-MBZ) 3. Nursing Diagnoses Group: 039 (this code and any other code with 039 for this value, references the single Nursing Diagnoses Code in the sample above) 4. Unique Identifier: Date (YYMMDD): As you can see, Nursing Diagnoses is the parent node for these five categories (Nursing Diagnoses, Interventions, Outcomes, Characteristics, and Etiologies). For more information about the data set, see [1].

3 Hylock: Multidimensional Design for the Genesis Data Set Page THE RELATIONAL DESIGNDATABASE As mentioned above, 10 entity classes were generated from the flat files. These were based on the ER diagram designed in [1]. Since then, several more tables have been added to cover Clinical Classification Software Diagnoses (CCS) 1 codes which group together individual ICD 9 codes (diseases). Figure 1 is the ER diagram designed for this system before the CCS codes were added. For more information about the relational design, see [1]. Figure 1: ER Diagram for the Genesis data set [1] 1 A reference of all CCS and the ICD 9 codes that make them up can be found at us.ahrq.govtoolssoftwareccsappendixasingledx.txt

4 Hylock: Multidimensional Design for the Genesis Data Set Page 4 2 LITERATURE REVIEW 2.1 THE MULTIDIMENSIONAL MODEL Before we start with the actual constructs and models, it is important to understand the difference between normalization and denormalization. As defined by [11], normalization seeks to separate the logically related attributes into tables to minimize redundancy, and thereby avoid the update anomalies that lead to an extra processing overhead to maintain consistency in the database. Anomalies are errors or inconsistencies that may result when a user attempts to update a table that contains redundant data [5]. There are three different types of anomalies: insert, update, and delete. Basically, if redundancy exists, delete and updating information may require looping through a table to delete or update all affected attributesvalues. For example, say you have a table that stores individual sales along with who made the sale. Each individual transaction will be associated with a single salesperson. However, each salesperson will be listed in this table multiple times since they can sell to other customers. If you want to delete a salesperson or update a name, you will need to loop through the data and perform the desired operation on all of the rows that sales person is a part of. This example can also be used to discuss insert anomalies. These anomalies occur when there is no place to put a new record because it must be associated with something else (in our example a sale) first. That is, since salesperson data is stored with each sale, if a salesperson does not have a sale, then they do not exist in the database. Normalization solves this problem by removing salesperson from sales and leaving in their stead a foreign key (section ) that references the new salesperson table. This new table will have one record for every salesperson, avoiding the previous anomalies. Denormalization is the opposite of normalization. Since updates and deletes do not normally occur in a data warehouse and insertion are complete (all of the data necessary to make the information worth inserting is there), separating themes will only hinder performance. Take the previous example. By breaking up the table into two, we are adding time during the query process because of joins. This is counterproductive in an analytic environment. So, denormalization will ultimately bring those two tables back together in order to avoid time consuming table joins. Section defines the constructs used in multidimensional models as well as some basic principles such as primary keys, foreign keys, and relationship definitions. Section covers the three main types of multidimensional models: Star, Snowflake, and Fast Constellation MODEL CONSTRUCTS This portion of the paper will cover some basics of data warehouse terminology and conceptual design characteristics. As a general note, some of the definitions are recursive, meaning in order to fully understand each topic you will need to have some prior knowledge of another. I ve listed the terms in an order that minimizes this effect, but it still exists. So reading this section twice might help clear up any remaining questions Data Cubes A data cube is a way of modeling and viewing information stored in a data warehouse. Although it s called a cube, it is not limited to just 3 dimensions. A data cube in fact is an n dimensional object. The axes are determined by the selected group of dimensions ( ) and the values are measures stored in a fact table ( ). An example 3 dimensional data cube is shown in Figure 2 below.

5 Hylock: Multidimensional Design for the Genesis Data Set Page Primary and Foreign Keys Figure 2: A sample 3 dimensional data cube In order to differentiate between records in a table, a primary key is selected. The primary key is an attribute (or set of attributes) that uniquely identifies the tuples within a table. In a dimension, these keys are called surrogate keys. A surrogate key is a nonintelligent or system generated key (one produced from a sequence). The reason for this is attribute keys (known as business keys) can change over time and thus introduce potential data integrity issues [5]. If there are more than one attribute or set of attributes that can uniquely identify a record, then you select one to be the primary key and rest are known as candidate keys. Also, to ensure that every record is identified, a primary key can never be blank (NULL) or have any component (if a composite key, see below) that is NULL. This is generally called entity integrity [11]. For example, each person has a unique social security number which allows the government to distinguish one person from another using a single value. If a table wants to reference another, we can simply store the primarysurrogate key from that table as an attribute. This is called a foreign key. Unlike a primary key, this attribute does not follow the entity integrity constraints since it is not used to uniquely identify the records. Instead, it is simply used to say that a particular entity in one table is related to an entity in other. For instance, say we have two tables: employees and phone numbers. Each employee has a unique employee ID similar to a SSN and likewise for each phone number. Since an employee can have multiple phone numbers, the employee ID will be entered multiple times in the phone numbers table. But it is not the employee ID that makes each record unique, it is the phone number ID. This brings about Referential Integrity which states that a foreign key in one table needs to correspond to a primarysurrogate key (or candidate key) in the table it references. That is, you cannot have information about an employee that does not exist. Both primary keys and foreign keys can be composite. That is, they key is composed on multiple attributes that together uniquely identify the tuples. However, in the case of data warehouses, only fact tables (discussed below) are allowed to have composite keys Dimensions The textual descriptors of the business are described by its dimension tables (Figure 3). These give meaning to the fact table (discussed below) attributes and are also the primary source of query constraints, groupings, and report labels. For example, a system designed to keep records for store sales would have dimension tables such as store, time, and item. Dimension tables have a single attribute primary key, have less than 100 attributes on average, are small in terms of number of tuples (less than 1 million) compared to fact tables, and total between per design [7][8][11].

6 Hylock: Multidimensional Design for the Genesis Data Set Page 6 There are several dimension table subtypes. The first is a time dimension. This is a special dimension that stores time variables across different increments of measures. For example, a date can be represented as a specific day, week, month quarter, and year. The next is called a multi valued dimension [5][8]. This is generally not acceptable since it models a one to many relationship between a single fact and a dimension, but health care data has been considered one of the allowable exceptions [8]. For example, for a single visit, a patient can have multiple nursing diagnoses. Another type is a junk dimension [8]. This dimension is used to condense simple dimensions together. For example, say we want to track the answers to a 10 question survey given to customers; each with 3 possible values. In order to do this, we would have 10 foreign keys in our fact table (discussed below) storing the primary key for each dimension tuple containing the answers. In order to clean this up, we could create a junk table that stores all reported combinations of answers. In our case, we would have at worst 10 3 = 1,000 records in that junk table (as you can see, it is an exponential function and with a limit of around 1 million records per dimension, this only works is certain cases) and those 10 foreign keys would be reduced to 1. The final type of dimension I will discuss is the minidimension. This is used if for certain attributes in a dimension, there is a high frequency of analysis or updates or a vast majority of those fields are NULL. What happens is the dimension is split into two (or more) pieces along those lines; the lesser being the minidimension. A foreign key is added to the fact table (discussed below) for each minidimension and not the dimension from where it split [8]. Certain portions of a dimension can also be normalized into what are known as outriggers (Figure 4). These are dimensions that connect to other dimensions (also known as snowflaking). Reasons for using outriggers include: a different level of granularity (such as county versus an individual zip code), different updating times, and data reductions (if there are only a few values available for each field). For example, say we have a dimension table for patient information. For each patient, we store a birth date. Instead of storing the date in the patient table, we would create an outrigger table that stores the day, day of week, week, month, year, and generation for each patient. We can later use the information in the outrigger table to view patient data based on month or year of birth for example. In general, however, outrigger tables should not be used since it increases the complexity of the design, normalizes the data further, and makes it difficult for browsing [8]. Figure 3: Sample dimension tables

7 Hylock: Multidimensional Design for the Genesis Data Set Page 7 Figure 4: Sample outrigger tables Fact Tables The primary table in a dimensional model is the fact table (Figure 5). The fact table stores, as its columns, dimension keys (foreign keys) that connect the dimensions to each tuple and other attributes. It is these very same foreign keys that compose the primary key for the fact table. If the foreign keys are still not enough to ensure uniqueness, a degenerate dimension may be added. A degenerate dimension is a foreign key to a dimension that does not exist. It is simply an attribute created to aid in ensuring uniqueness. Also, measured or observed variable(s) of interest (such as sum or count preferably additive facts) are saved which represent measurements at the intersections of the dimensions [7][8][11]. Next, we will discuss several types of fact tables. The first is transaction grain. This is the most common of the three types. In this, the fact table has exactly one record for each individual transaction usually differentiated by a timestamp. For example, an item at a supermarket is sold to a single person at exactly one time. The second type is periodic snapshot grain. This is similar to the transaction grain except instead of an exact time, it will represent a time span; for example, sales by month. Finally, we have accumulating snapshot. As its name suggests, the fact table accumulates attribute responses. This would be useful in an orderingshipping environment. That is, a record is created for an order. When that order is shipped, the attribute corresponding to shipped is then updated. This could also be used to track the progress of a product through a process (by hand or RFID tags). This is the only type of fact table that is not static; that is, it is set up to be updated regularly [8]. Up until now, we have talked about fact tables that not only store dimension keys and other attributes, but measures as well. There are special cases in which the fact table is simply needed to signify the convergence of dimensions and thus, have no meaningful measures except a count across the tuples. These are called factless fact tables. There are two types. One is the event factless fact table. This type is used to represent an event such as student registration. During a specified period of time, a student registers for a unique set of classes. This is seen as an event because a student does not continuously register for classes. If something were to happen continuously it would be considered our second type, a coverage factless fact table. This could be applied to church facilities. A church would be interested in knowing when a particular facility (such as the sanctuary) was reserved (e.g. for a wedding or funeral). A single room can be rented multiple times per day and hundreds of times per year [8].

8 Hylock: Multidimensional Design for the Genesis Data Set Page Relationships Figure 5: Sample fact table A mapping from one fact to one dimension is known as a one to one relationship and is the preferred relational setup for a data warehouse. The reason for this is when building your data cubes, the size of dimensions are already known (how many attributes there will be on each axis) which allows for ease of viewing (once accustomed to the data) and aggregation; the most import being aggregating the cube. Some cubes will be pre computed to speed up the query process (I will not cover the selection process in this paper). There is a specific order in which a cube should be aggregated. This order depends on the size of the dimensions. If the dimensions are dynamic, then pre computing the cube becomes far more difficult because calculating in the wrong direction can lead to an increased use of RAM by orders of magnitude. One to many relationships are more complex in nature, requiring an additional table between fact and dimension (Section ). This increases the time necessary to form the data cube as well as the level of system understanding that is needed to traverse such bridges. So, in general, one to many relationships are frowned upon, but there are a few exceptions such as healthcare data Bridges In the case of multi valued dimensions (one to many relationship from fact to dimension), a bridge or helper table is created as an intermediary [5][8]. This bridge is used to store groupings of items associated with a particular fact or facts as well as additional attributes. One such additional attribute is a weighting factor. The value is a portion of 1 (for 100%) that corresponds to the weight in which that particular record has in the overall group. In the case of healthcare billing, a single patient can have any number of diseases, all of which are important, but some might cost more and therefore have a larger percentage. The bridge table has more records than the dimensions table because the dimension now only has to store unique items (or whatever it represents) whereas the bridge holds two keys, one to the fact table and one to the dimension table, and any other unique attributes that identify that particular item in relation to the fact table which would otherwise increase the length of the dimensions table. More details in Section 4.3.

9 Hylock: Multidimensional Design for the Genesis Data Set Page TYPES OF SCHEMAS The star schema is most widely used model for data warehousing. It consists of a central fact table surrounded by dimensions that do not have outriggers [5][7]. An example of a star schema design can be seen in Figure 6. As you can see, all four dimensions connect only to the central fact table. Figure 6: Sample star schema

10 Hylock: Multidimensional Design for the Genesis Data Set Page 10 The snowflake schema represents a star design with outrigger tables (Figure 7). This is also called snowflaking and in general, it is frowned upon because it normalizes the dimensions [5][7]. This leads to more tables and thus an increase in the number of joins performed for a query which ultimately converts into longer query processing times. In some instances, however, snowflaking a design is the best course of action. Later, when we discuss the data warehouse design for Genesis, we will see an example when normalizing tables is necessary. In this example, you can see that Department is connected to College which hold the information for each individual college. Also, City was added to create an effect much like that of a Time Dimension. It is used for the purpose of building hierarchies. Figure 7: Sample snowflake schema

11 Hylock: Multidimensional Design for the Genesis Data Set Page 11 The last type of multidimensional models I will discuss is the fast constellation model (Figure 8). This model combines multiple fact tables into one design [5][7]. In order for this design to make sense, the fact tables need to be connected together and the joining element is a shared dimension. In the example below, both fact tables connect via the Student table. Also, not all dimensions have to be shared as with Account, Date, and Semester. Figure 8: Sample fast constellation schema 2.2 SUGGESTED DESIGNS FOR MULTI VALUED DIMENSIONS There is a general consensus throughout literature [2][8][9][10] that multi valued dimensions are undesirable. However, where there are rules, there are always exceptions; healthcare data, such as diseases, is one such exception. There are four main ways to deal with multi valued dimensions as described by Ralph Kimball [9]: 1. Disqualify the dimension because it is multi valued 2. Choose one value (the "primary") and omit the other values 3. Extend the dimension list to have a fixed number of dimensions (i.e. the top N) 4. Put a helper table in between this fact table and the dimension table For this project, option 4 will be used as the needs of the user require all possible combinations. Herein lies the main challenge. From the readings so far, no one has attempted to join this many multi valued dimensions to a single fact table. In fact, the majority of the research indicates that using option 2 above is the best (although [2] and [3] use option 3).

12 Hylock: Multidimensional Design for the Genesis Data Set Page FACT TABLE DESIGNS Single Unique Attribute Per Record One option for removing the multi valued dimensions is to incorporate all collected values into the fact table. Each tuple consists of all necessary degenerate dimensions and only one foreign key. An example is shown below in Table 1. SEQ(DD) Intervention Code(FK) Characteristic Code(FK) Outcomes(FK) MBV MBL MBB MBB MBD MBD Table 1 The problem comes in the dimensionality of the cube. That is to say, tuples 1 and 2 have 1 dimension (Intervention Code) and 1 degenerate dimension (SEQ). Tuples 3 and 4 have the same SEQ degenerate dimension as tuples 1 and 2, but different code dimensions (Characteristic Code). This means that tuples 1 and 2 can be compared to one another, but not to 3 andor 4 because they do not share any common code dimensions. Even if we created an SEQ dimension, the best we could hope for is a one dimensional count. Although an Intervention Code might be tied to a Characteristic Code, this model does not indicate the way in which the codes are related. That is to say, in a two dimensional cube (Figure 9) with Characteristic Code on one axis and Intervention Code on the other, we will either know the row or the column, but not both for each data point. So basically, the dimensions become lookup tables defeating the purpose of a data warehouse which is the ability to walk around in the data via dimensions. Figure 9: we have a specific Intervention Code (row). To which Characteristic Code does it correspond?

13 Hylock: Multidimensional Design for the Genesis Data Set Page Combinations An alternative to the section above is to create a fact table with all possible permutations which is frowned upon due to is exponentially increasing nature as the number of dimensions increase [7]. Going back to Table 1, you will see that for each SEQ there are three dimensions each with two possible values each for a total of 6 records (only 1 SEQ). Below in Table 2, you will see that there are a total of 8 rows. Although this is the worst case scenario (i.e. every possible combination) you can see the effect of adding dimensions to the fact table. If we were to add another dimension such as Etiologies with two possible values or a second SEQ number, we would then have a total of 16 rows. So, if the number of values is constant across all dimensions (which is not the case in Table 2), then we can determine the maximum number of rows with the following exponential equation. Total number of rows where is the number of values and is the number of dimensions. Of course having dimensions with the exact same number of values is extremely rare, so in a general case, the total number of rows where is the number of dimensions and is number of possible values for dimension. So for Table 2, we have 1*2*2*2 = 8 rows max. If we were to add three more dimensions with 3, 4, and 5 values respectively, then we would have 1*2*2*2*3*4*5 = 480 row max. Even if 13 rd of the combinations actually exited, we would still have 160 rows for a single SEQ. In our case, we have 137,857 visits and many dimensions with hundreds if not thousands of unique values. As you can see, the number of tuples in our fact table gets out of hand in hurry and is thus not a viable solution BRIDGE DESIGN SEQ(DD) Intervention Code(FK) Characteristic Code(FK) Outcomes(FK) MBV MBB MBD MBV MBB MBD MBV MBB MBD MBV MBB MBD MBL MBB MBD MBL MBB MBD MBL MBB MBD MBL MBB MBD Table 2 Designing a bridge is quite simple. First, decide what information you want to store in the bridge. Remember that you are trying to remove attributes of the dimension that takes an otherwise unique item like an intervention code and forces there to be multiple records. That is, we are trying to normalize the dimension by breaking off a piece into a bridge. For example, for an intervention code, you have the code itself (which is unique from all other intervention codes) and then you have the date in which it was entered and the order of importance related to patient treatment. The last two attributes force there to be repetitions of the code itself. Therefore, we will remove date and order from the dimension table and store that information in the bridge (we are basically turning the dimension into a lookup table of intervention codes). Next, we need to add a primary key to the bridge and a foreign key that reverences the Interventions dimension. Then, in the fact table, replace (or add) the foreign key that references the dimension surrogate key with that of the bridge table. So, we are joining the fact table to the bridge and the bridge to the dimension (Figure 10).

14 Hylock: Multidimensional Design for the Genesis Data Set Page 14 Figure 10: bridge design for interventions Just to point out, you will have more records in the bridge table than the dimension table since all of the unique information is stored in the bridge. To check and make sure you have moved all of the records to where they should be, you can perform a simple SELECT DISTINCT count(<attribute to count>) FROM <table_name>, where <attribute to count> is the unique identifier for the dimension, to get the number or records that should be in the dimension table and SELECT count(*) FROM <table_name> to get number of rows that should be in the bridge. 3 REQUIREMENTS As mentioned earlier, the goal for this project is to create a model to manage large volumes of clinical data for decision support and quality control at the point of patient care. Part of this includes requires many different data mining operations to be performed on the set. In the typical relational database setting, the data is normalized as much as possible. For transaction processing, this in the best way to go in order to avoid insert, update, and delete anomalies (Section 2.1). However, this forces a lot of table joins and when executing SQL statements which takes time and can get a bit messy when dealing with multiple tables and aliases. In a data warehouse, data are denormalized as much as possible in order to avoid the massive amount of joins sometimes required. The point of a data warehouse is speed in analytical processing. Therefore, the data mining steps (which only read the data) do not require a relational setup and should perform faster in a setting devised for fast processing. 4 DATA WAREHOUSE DESIGN This design does not incorporate all of the entities shown in the ER Diagram (Figure 1). The subset consists of: Visits, Patients, Diseases, Nursing Diagnoses, Interventions, and Outcomes. There were several dimensions added which will be discusses in detail along with the others in the following sections.

15 Hylock: Multidimensional Design for the Genesis Data Set Page FACT TABLE The center of this model is the Visits fact table (Figure 11). The reason Visits was chosen as the hub for this design is quite simple. Going back to the ER Diagram in Figure 1, you can see that in order to traverse the diagram, virtually every entity needs to go through the Visits table. The joining attribute for all tables except Patients (Visits stores the patient ID as a foreign key) is the sequence number assigned to a particular visit (SEQ). Normally, the primary key for the fact table is the set of all foreign (dimension) keys. However, in this instance those values do not guarantee uniqueness. Therefore, SEQ was added to ensure entity integrity; SEQ is known as a degenerate dimension (Section ). The attributes ending in _group_id are foreign keys to bridge tables and the ones ending in _ID connect the fact table to dimensions directly. There are four more attributes and four measures remaining (measures bolded and italicized). The attributes are uniquely assigned to each visit: services, length of stay (los), discharge state (disdate), and age. The lists of measures can easily be added to if needed. As of now, the model takes into consideration the total number of patients, average age, average length of stay, and total length of stay. The values change based upon the axes of the cube and level of granularity. Figure 11: Fact Table 4.2 DIMENSIONS AND TIME DIMENSION As mentioned earlier, this model will not include all of the entities shown in Figure 1. Also, some dimensions not on the ER diagram are going to be added. Figure 12 shows the dimensions used in this paper. The first attribute in each dimension is the primary key. The first dimension is patients. The reason patient information is important is because we can define an axis of a cube by features (e.g. gender, age, race, and zipcode), which could give new meaning to the data. Outcomes, Interventions, and Nursing Diagnoses are all based on nursing codes. The root of all of these is Nursing Diagnoses which is referenced by Outcomes and Interventions. The data stored in diseases are ICD 9 and CCS codes (which as mentioned earlier group together ICD 9 codes). One main difference between the ER diagram and this is the use of SEQ. Before, SEQ was used to bind almost every table to the Visits table.

16 Hylock: Multidimensional Design for the Genesis Data Set Page 16 Now, we have removed the SEQ and inserted a Surrogate key since SEQ is a meaning attribute (see Section for more information). Finally the Date dimension, which is a special kind of dimension called a Time Dimension. This table is generated automatically by Oracle 11g Warehouse builder and consists of 41 attributes such as day of calendar week, month, year, month of quarter, day, and day name. This table also auto generates information for each day starting From the ER diagram to this one, patients did not change. All attributes are accounted for. For Outcomes, Interventions, Diseases, and Nursing Diagnoses, any visit specific attribute such as the order of importance for the patient (ordering) and any date values were removed and placed in the bridge tables (discussed in the next section). Another difference is the way the codes are broken up. For Outcomes and Nursing Diagnoses, from the original code, we removed the Code Group identifier since it is the same for all records. For Interventions, however, that value from MBE MBZ was required to ensure uniqueness. For Diseases, the CCS Code was added to the table to add a hierarchy. That is, now we can view the data by individual code or in groups. 4.3 BRIDGES Figure 12: Dimension tables Since dimensions Outcomes, Interventions, Diseases, and Nursing Diagnoses are all multi valued dimensions, we need to use a bridge table to connect them to the fact table (Figure 13). As mentioned in Section , bridge table attributes consist of a primary key (first attribute in the diagram) that is referenced by the fact table, a foreign key to the dimension (second attribute in the diagram), the rest are unique to each fact (visit). This removes code redundancies from the dimensions turning them into lookup tables (basically). Figure 13: Bridge tables

17 Hylock: Multidimensional Design for the Genesis Data Set Page FULL DESIGN Figure 14 shows the full multidimensional design for the Genesis data set used. As you can see, Visits is the central hub. Connected to Visits are Patients, Diseases_Group, Nursing_Diagnoses_Group, Interventions_Group, Outcomes_Group, and Date. Also, Nursing_Diagnoses_Group, Interventions_Group, and Outcomes_Group are connected to the Date Time Dimension because each code is entered at a specific date per visit. This will allow the user to roll up and drill down the axis by date. That is, they can group by day, month, quarter, or year for example. Also, Nursing_Diagnoses is connected to Interventions and Outcomes. Back in Section 1.1, the codes were broken down into parts and it was shown that Nursing_Diagnoses was the root node. With this relationship, we will be able to set an axis by Intervention or Outcomes individual code, or by the Nursing_Diagnoses code group they belong to. Figure 14: Full multidimensional design 4.5 CONCEPT HIERARCHIES Figure 15 shows the four concept hierarchies used in this project so far. The far left is the date hierarchy. Now, there are far too many possible combinations to list, so the one shown is the hierarchy that is most likely to be continually used. What it says is the year number is the most abstract level from which the data can be seen. As you start down the hierarchy (drill down), the next level of granularity is the quarter number. Then we have the month number and finally, the lowest level (highest level of granularity) is day. That is, the most specific data that can be retrieved from the date is by day.

18 Hylock: Multidimensional Design for the Genesis Data Set Page 18 To the right of date is the Diseases hierarchy. It consists of two levels, CCS which is the most course level of granularity (the most abstract) and ICD 9 which is the finest level of granularity. For the next two, we can see how to group together the dimensions Outcomes and Interventions to Nursing Diagnoses. Again, there are only two levels, the group level at the top consisting of the codes grouped by Nursing Diagnoses, and the lowest level consisting of all unique Outcomes and Interventions. 5 FUTURE WORK Figure 15: Concept hierarchies There are still many steps that will need to be performed before even the validity of the design can be verified. First, I will need to either alter the data warehouse design or build a temporary one in order to accommodate the Extraction, Transformation, and Loading (ETL) process. This is due to the change in primary and foreign keys between the relational and the multidimensional design. That is, as mentioned before, SEQ was the primary connector between tables in the relational design, but only used in the patients table for this design. Therefore, during the ETL process, those SEQ keys will have to be stored in order to retain the proper relationships. Once the data is loaded, then the experiments can. Using SQL Developer (which comes with Oracle 11g), one can enter an SQL query and either predict the running time (Explain function) or follows the actual steps and record system information during the process (Auto Trace function). As of now, I do not know if SQL Developer can be used to run more advanced OLAP queries. If not, SQL Plus can be used and the same functions can be run (requires more coding). If necessary, the data warehouse can be tested using Oracles build in data mining tools. All of these experiments will of course be compared to the current relational database system. Also, this is one possible design. In [8], Kimball states that the use of bridge tables might be overlooked if there does not exist a hierarchy in the connected dimension. That is, instead of normalizing the dimension thus creating the bridge, one could leave the dimension alone in order to further increase the speed of the system. As I am not privy to all of the hierarchies that could be implemented, the bridges were added to accommodate any future additions. If, however, there are not any, then another data warehouse will be generated and tested. 6 CONCLUSIONS In this paper, a multidimensional model for the Genesis data set was proposed. Previous work in the field does not take into consideration the use multi valued dimensions beyond the top n from each category. This work presented a way in which to model the use of multi valued dimensions in a healthcare environment. Although this model is specific to the Genesis data set, the ideas presented can be generalized to encompass many different types of data.

19 Hylock: Multidimensional Design for the Genesis Data Set Page 19 7 APPENDIX 7.1 SQL FOR TABLE CREATION * outrigger tables DROP table Date_Outrigger; dimension tables DROP table Interventions; DROP table Nursing_Diagnoses; DROP table Diseases; DROP table Outcomes; DROP table Patients; bridge tables DROP table Interventions_Group; DROP table Nuring_Diagnoses_Group; DROP table Diseases_Group; DROP table Outcomes_Group; fact table DROP table visits; * *outrigger dimension* This table is auto-generated by Oracle 11g Warehouse Builder CREATE TABLE DATE_OUTRIGGER ( DATE_ID NUMBER NOT NULL, DAY_DAY_CODE DAY_OF_CAL_WEEK DAY_OF_CAL_MONTH DAY DATE, DAY_DESCRIPTION VARCHAR2(2000), DAY_OF_CAL_YEAR DAY_START_DATE DATE, DAY_OF_CAL_QUARTER DAY_END_DATE DATE, DAY_ID DAY_TIME_SPAN JULIAN_DATE DAY_NAME VARCHAR2(25), CALENDAR_MONTH_NAME VARCHAR2(25), MONTH_OF_QUARTER CAL_MONTH_NUMBER CALENDAR_MONTH_TIME_SPAN CALENDAR_MONTH_ID CALENDAR_MONTH_DESCRIPTION VARCHAR2(2000), CALENDAR_MONTH_START_DATE DATE, CALENDAR_MONTH_END_DATE DATE, CALENDAR_MONTH_CAL_MONTH_CODE MONTH_OF_YEAR CALENDAR_QUARTER_END_DATE DATE, CALENDAR_QUARTER_START_DATE DATE, CAL_QUARTER_NUMBER

20 Hylock: Multidimensional Design for the Genesis Data Set Page 20 QUARTER_OF_YEAR CALENDAR_QUARTER_ID CALENDAR_QUARTER_TIME_SPAN CALENDAR_QUARTER_NAME CALENDAR_QUART_CAL_QUARTER_CO CALENDAR_QUARTER_DESCRIPTION CALENDAR_YEAR_START_DATE CALENDAR_YEAR_CAL_YEAR_CODE CALENDAR_YEAR_DESCRIPTION CALENDAR_YEAR_NAME CALENDAR_YEAR_END_DATE CALENDAR_YEAR_ID CAL_YEAR_NUMBER CALENDAR_YEAR_TIME_SPAN VARCHAR2(25), VARCHAR2(2000), DATE, VARCHAR2(2000), VARCHAR2(25), DATE, NUMBER *dimensions* CREATE table Patients (pt_id ssn NUMBER(9,0), birthdate_id gender CHAR(1), race CHAR(1), zipcode NUMBER(9,0), marstat CHAR(1), relig VARCHAR2(1), constraint patients_pk PRIMARY KEY (pt_id), constraint patients_fk_date_outrigger FOREIGN KEY (birthdate_id) REFERENCES Date_Outrigger(date_ID) CREATE table Diseases (icd_id Icd VARCHAR2(10), ccs_code VARCHAR2(10), constraint diseases_pk PRIMARY KEY (icd_id) CREATE table Nursing_Diagnoses (NDCode_ID NDCode NUMBER(3,0), constraint nursing_diagnoses_pk PRIMARY KEY (NDCode_ID) CREATE table Interventions (NI_ID NICode_group CHAR(3), NICode NUMBER(3,0), NIDefining_code NUMBER(2,0), constraint interventions_pk PRIMARY KEY (NI_ID) CREATE table Outcomes (NO_ID NOCode NUMBER(3,0), NODefining_code NUMBER(2,0), constraint outcomes_pk PRIMARY KEY (NO_ID)

21 Hylock: Multidimensional Design for the Genesis Data Set Page 21 *bridge tables* CREATE table Diseases_Group (diseases_group_id icd_id ordering constraint diseases_group_pk PRIMARY KEY (diseases_group_id), constraint diseases_group_fk_deseases FOREIGN KEY (icd_id) REFERENCES Diseases (icd_id) CREATE table Nursing_Diagnoses_Group (nursing_diagnoses_group_id NDCode_ID NUMBER(3,0), NDDate_ID ordering constraint nursing_diagnoses_group_pk PRIMARY KEY (nursing_diagnoses_group_id), constraint nursing_diagnoses_group_fk_nursing_diagnoses FOREIGN KEY (NDCode_ID) REFERENCES Nursing_Diagnoses(NDCode_ID), constraint nursing_diagnoses_group_fk_date_outrigger FOREIGN KEY (NDDate_ID) REFERENCES Date_Outrigger(date_ID) CREATE table Interventions_Group (interventions_group_id NIDate_ID ordering constraint interventions_group_pk PRIMARY KEY (interventions_group_id), constraint interventions_group_fk_interventions FOREIGN KEY (interventions_group_id) REFERENCES Interventions(interventions_group_ID), constraint interventions_group_fk_date_outrigger FOREIGN KEY (NIDate_ID) REFERENCES Date_Outrigger(date_ID) CREATE table Outcomes_Group (outcomes_group_id NO_ID NODate_ID ordering constraint outcomes_group_pk PRIMARY KEY (outcomes_group_id), constraint outcomes_group_fk_outcomes FOREIGN KEY (NO_ID) REFERENCES Outcomes (NO_ID), constraint outcomes_group_fk_date_outrigger FOREIGN KEY (NODate_ID) REFERENCES Date_Outrigger(date_ID)

22 Hylock: Multidimensional Design for the Genesis Data Set Page 22 *fact table* CREATE table Visits (seq pt_id diseases_group_id nursing_diagnoses_group_id interventions_group_id outcomes_group_id service VARCHAR2(10), admdate_id disdate_id los disstate NUMBER(5,0), age NUMBER(5,2), constraint visits_pk PRIMARY KEY (seq), *degenerate dimension* constraint visits_fk_patients_group FOREIGN KEY (pt_id) REFERENCES Patients_Group (pt_id), constraint visits_fk_diseases_group FOREIGN KEY (diseases_group_id) REFERENCES Diseases_Group(diseases_group_ID), constraint visits_fk_nursing_diagnoses_group FOREIGN KEY (nursing_diagnoses_group_id) REFERENCES Nursing_Diagnoses_Group(nursing_diagnoses_group_ID), constraint visits_fk_interventions_group FOREIGN KEY (interventions_group_id) REFERENCES Interventions_Group(interventions_group_ID), constraint visits_fk_outcomes_group FOREIGN KEY (outcomes_group_id) REFERENCES Outcomes_Group(outcomes_group_ID), constraint visits_fk_date_outrigger FOREIGN KEY (admdate_id) REFERENCES Date_Outrigger(date_ID), constraint visits_fk_date_outrigger FOREIGN KEY (disdate_id) REFERENCES Date_Outrigger(date_ID) 7.2 PLSQL FOR DIMENSIONS cwm2_olap_dimension.create_dimension( -- dimension name 'Date', 'Dates', 'Date', 'Date', 'Date' -- dimension type cwm2_olap_dimension.create_dimension( 'PATIENT_DIM', -- dimension name 'Patient', 'Patients', 'Patient', 'Patient', 'Patient' -- dimension type

23 Hylock: Multidimensional Design for the Genesis Data Set Page 23 cwm2_olap_dimension.create_dimension( 'DISEASE_DIM', -- dimension name 'Disease', 'Diseases', 'Disease', 'Disease', 'Disease' -- dimension type cwm2_olap_dimension.create_dimension( 'NURSING_DIAGNOSES_DIM', -- dimension name 'Nursing_Diagnose', 'Nursing_Diagnoses', 'Nursing_Diagnose', 'Nursing_Diagnose', 'Nursing_Diagnose' -- dimension type cwm2_olap_dimension.create_dimension( 'INTERVENTION_DIM', -- dimension name 'Intervention', 'Interventions', 'Intervention', 'Intervention', 'Intervention' -- dimension type cwm2_olap_dimension.create_dimension( 'OUTCOME_DIM', -- dimension name 'Outcome', 'Outcomes', 'Outcome', 'Outcome', 'Outcome' -- dimension type cwm2_olap_dimension.create_dimension( 'DISEASES_GROUP_DIM', -- dimension name 'Diseases_Group', 'Diseases_Groups', 'Diseases_Group', 'Diseases_Group', 'Diseases_Group' -- dimension type

24 Hylock: Multidimensional Design for the Genesis Data Set Page 24 cwm2_olap_dimension.create_dimension( 'NURSING_DIAGNOSES_GROUP_DIM', 'Nursing_Diagnoses_Group', 'Nursing_Diagnoses_Groups', 'Nursing_Diagnoses_Group', 'Nursing_Diagnoses_Group', 'Nursing_Diagnoses_Group' cwm2_olap_dimension.create_dimension( 'INTERVENTIONS_GROUP_DIM', 'Interventions_Group', 'Interventions_Groups', 'Interventions_Group', 'Interventions_Group', 'Interventions_Group' cwm2_olap_dimension.create_dimension( 'OUTCOMES_GROUP_DIM', 'Outcomes_Group', 'Outcomes_Groups', 'Outcomes_Group', 'Outcomes_Group', 'Outcomes_Group' -- dimension name -- dimension type -- dimension name -- dimension type -- dimension name -- dimension type 7.3 PLSQL FOR HIERARCHIES cwm2_olap_hierarchy.create_hierarchy ( -- owner of dimension to which hierarchy is assigned -- name of dimension to which hierarchy is assigned 'DATE_HIER', -- name of hierarchy 'Date hierarchy', 'Date hierarchy', 'Date hierarchy', 'UNSOLVED LEVEL-BASED' -- solved code

25 Hylock: Multidimensional Design for the Genesis Data Set Page 25 cwm2_olap_hierarchy.create_hierarchy ( -- owner of dimension to which hierarchy is assigned 'DISEASE_DIM', -- name of dimension to which hierarchy is assigned 'DISEASE_HIER', -- name of hierarchy 'Disease hierarchy', 'Disease hierarchy', 'Disease hierarchy', 'UNSOLVED LEVEL-BASED' -- solved code 7.4 PLSQL FOR LEVELS cwm2_olap_level.create_level ( -- owner of dimension to which level is assigned -- name of dimension to which level is assigned 'LVL_YEAR', 'Year', 'Years', 'Years', 'Years' cwm2_olap_level.create_level ( -- owner of dimension to which level is assigned -- name of dimension to which level is assigned 'LVL_QUARTER', 'Quarter', 'Quarters', 'Quarters', 'Quarters' cwm2_olap_level.create_level ( -- owner of dimension to which level is assigned -- name of dimension to which level is assigned 'LVL_MONTH', 'Month', 'Months', 'Months', 'Months'

26 Hylock: Multidimensional Design for the Genesis Data Set Page 26 cwm2_olap_level.create_level ( -- owner of dimension to which level is assigned -- name of dimension to which level is assigned 'LVL_WEEK', 'Week', 'Weeks', 'Weeks', 'Weeks' cwm2_olap_level.create_level ( -- owner of dimension to which level is assigned -- name of dimension to which level is assigned 'LVL_DAY ', 'Day', 'Days', 'Days', 'Days' cwm2_olap_level.create_level ( -- owner of dimension to which level is assigned 'DISEASE_DIM', -- name of dimension to which level is assigned 'LVL_CCS', 'CCS Group', 'CCS Groups', 'CCS Groups', 'CCS Groups' cwm2_olap_level.create_level ( -- owner of dimension to which level is assigned 'DISEASE_DIM', -- name of dimension to which level is assigned 'LVL_ICD ', 'ICD Number', 'ICD Numbers', 'ICD Numbers', 'ICD Numbers' 7.5 PLSQL FOR LEVELS TO HIERARCHIES cwm2_olap_level.add_level_to_hierarchy ( -- owner of dimension -- name of dimension 'DATE_HIER', -- name of hierarchy 'LVL_YEAR', null -- parent level

27 Hylock: Multidimensional Design for the Genesis Data Set Page 27 cwm2_olap_level.add_level_to_hierarchy ( -- owner of dimension -- name of dimension 'DATE_HIER', -- name of hierarchy 'LVL_QUARTER', 'LVL_YEAR' -- parent level cwm2_olap_level.add_level_to_hierarchy ( -- owner of dimension -- name of dimension 'DATE_HIER', -- name of hierarchy 'LVL_WEEK', 'LVL_YEAR' -- parent level cwm2_olap_level.add_level_to_hierarchy ( -- owner of dimension -- name of dimension 'DATE_HIER', -- name of hierarchy 'LVL_MONTH', 'LVL_QUARTER' -- parent level cwm2_olap_level.add_level_to_hierarchy ( -- owner of dimension -- name of dimension 'DATE_HIER', -- name of hierarchy 'LVL_DAY', 'LVL_MONTH' -- parent level cwm2_olap_level.add_level_to_hierarchy ( -- owner of dimension 'DISEASE_DIM', -- name of dimension 'DISEASE_HIER', -- name of hierarchy 'LVL_CCS', null -- parent level cwm2_olap_level.add_level_to_hierarchy ( -- owner of dimension 'DISEASE_DIM', -- name of dimension 'DISEASE_HIER', -- name of hierarchy 'LVL_ICD', 'LVL_CCS' -- parent level

Nursing Diagnosis and Multidimensional Design

Nursing Diagnosis and Multidimensional Design Proceedings of the 3 rd INFORMS Workshop on Data Mining and Health Informatics (DM-HI 2008) J. Li, D. Aleman, R. Sikora, eds. NursingCareWare: Warehousing for Nursing Care Research and Knowledge Discovery

More information

1. OLAP is an acronym for a. Online Analytical Processing b. Online Analysis Process c. Online Arithmetic Processing d. Object Linking and Processing

1. OLAP is an acronym for a. Online Analytical Processing b. Online Analysis Process c. Online Arithmetic Processing d. Object Linking and Processing 1. OLAP is an acronym for a. Online Analytical Processing b. Online Analysis Process c. Online Arithmetic Processing d. Object Linking and Processing 2. What is a Data warehouse a. A database application

More information

Designing a Dimensional Model

Designing a Dimensional Model Designing a Dimensional Model Erik Veerman Atlanta MDF member SQL Server MVP, Microsoft MCT Mentor, Solid Quality Learning Definitions Data Warehousing A subject-oriented, integrated, time-variant, and

More information

BUILDING BLOCKS OF DATAWAREHOUSE. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT

BUILDING BLOCKS OF DATAWAREHOUSE. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT BUILDING BLOCKS OF DATAWAREHOUSE G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT 1 Data Warehouse Subject Oriented Organized around major subjects, such as customer, product, sales. Focusing on

More information

Dimensional Data Modeling for the Data Warehouse

Dimensional Data Modeling for the Data Warehouse Lincoln Land Community College Capital City Training Center 130 West Mason Springfield, IL 62702 217-782-7436 www.llcc.edu/cctc Dimensional Data Modeling for the Data Warehouse Prerequisites Students should

More information

When to consider OLAP?

When to consider OLAP? When to consider OLAP? Author: Prakash Kewalramani Organization: Evaltech, Inc. Evaltech Research Group, Data Warehousing Practice. Date: 03/10/08 Email: erg@evaltech.com Abstract: Do you need an OLAP

More information

DATA WAREHOUSING AND OLAP TECHNOLOGY

DATA WAREHOUSING AND OLAP TECHNOLOGY DATA WAREHOUSING AND OLAP TECHNOLOGY Manya Sethi MCA Final Year Amity University, Uttar Pradesh Under Guidance of Ms. Shruti Nagpal Abstract DATA WAREHOUSING and Online Analytical Processing (OLAP) are

More information

Data Warehouse Snowflake Design and Performance Considerations in Business Analytics

Data Warehouse Snowflake Design and Performance Considerations in Business Analytics Journal of Advances in Information Technology Vol. 6, No. 4, November 2015 Data Warehouse Snowflake Design and Performance Considerations in Business Analytics Jiangping Wang and Janet L. Kourik Walker

More information

COURSE OUTLINE. Track 1 Advanced Data Modeling, Analysis and Design

COURSE OUTLINE. Track 1 Advanced Data Modeling, Analysis and Design COURSE OUTLINE Track 1 Advanced Data Modeling, Analysis and Design TDWI Advanced Data Modeling Techniques Module One Data Modeling Concepts Data Models in Context Zachman Framework Overview Levels of Data

More information

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1 Slide 29-1 Chapter 29 Overview of Data Warehousing and OLAP Chapter 29 Outline Purpose of Data Warehousing Introduction, Definitions, and Terminology Comparison with Traditional Databases Characteristics

More information

Understanding Data Warehousing. [by Alex Kriegel]

Understanding Data Warehousing. [by Alex Kriegel] Understanding Data Warehousing 2008 [by Alex Kriegel] Things to Discuss Who Needs a Data Warehouse? OLTP vs. Data Warehouse Business Intelligence Industrial Landscape Which Data Warehouse: Bill Inmon vs.

More information

CHAPTER 4: BUSINESS ANALYTICS

CHAPTER 4: BUSINESS ANALYTICS Chapter 4: Business Analytics CHAPTER 4: BUSINESS ANALYTICS Objectives Introduction The objectives are: Describe Business Analytics Explain the terminology associated with Business Analytics Describe the

More information

Optimizing Your Data Warehouse Design for Superior Performance

Optimizing Your Data Warehouse Design for Superior Performance Optimizing Your Data Warehouse Design for Superior Performance Lester Knutsen, President and Principal Database Consultant Advanced DataTools Corporation Session 2100A The Problem The database is too complex

More information

CHAPTER 5: BUSINESS ANALYTICS

CHAPTER 5: BUSINESS ANALYTICS Chapter 5: Business Analytics CHAPTER 5: BUSINESS ANALYTICS Objectives The objectives are: Describe Business Analytics. Explain the terminology associated with Business Analytics. Describe the data warehouse

More information

Sterling Business Intelligence

Sterling Business Intelligence Sterling Business Intelligence Concepts Guide Release 9.0 March 2010 Copyright 2009 Sterling Commerce, Inc. All rights reserved. Additional copyright information is located on the documentation library:

More information

Physical Design. Meeting the needs of the users is the gold standard against which we measure our success in creating a database.

Physical Design. Meeting the needs of the users is the gold standard against which we measure our success in creating a database. Physical Design Physical Database Design (Defined): Process of producing a description of the implementation of the database on secondary storage; it describes the base relations, file organizations, and

More information

Database Design Patterns. Winter 2006-2007 Lecture 24

Database Design Patterns. Winter 2006-2007 Lecture 24 Database Design Patterns Winter 2006-2007 Lecture 24 Trees and Hierarchies Many schemas need to represent trees or hierarchies of some sort Common way of representing trees: An adjacency list model Each

More information

Lost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole

Lost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole Paper BB-01 Lost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole ABSTRACT Stephen Overton, Overton Technologies, LLC, Raleigh, NC Business information can be consumed many

More information

Mario Guarracino. Data warehousing

Mario Guarracino. Data warehousing Data warehousing Introduction Since the mid-nineties, it became clear that the databases for analysis and business intelligence need to be separate from operational. In this lecture we will review the

More information

Optimizing the Performance of the Oracle BI Applications using Oracle Datawarehousing Features and Oracle DAC 10.1.3.4.1

Optimizing the Performance of the Oracle BI Applications using Oracle Datawarehousing Features and Oracle DAC 10.1.3.4.1 Optimizing the Performance of the Oracle BI Applications using Oracle Datawarehousing Features and Oracle DAC 10.1.3.4.1 Mark Rittman, Director, Rittman Mead Consulting for Collaborate 09, Florida, USA,

More information

Dimodelo Solutions Data Warehousing and Business Intelligence Concepts

Dimodelo Solutions Data Warehousing and Business Intelligence Concepts Dimodelo Solutions Data Warehousing and Business Intelligence Concepts Copyright Dimodelo Solutions 2010. All Rights Reserved. No part of this document may be reproduced without written consent from the

More information

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data INFO 1500 Introduction to IT Fundamentals 5. Database Systems and Managing Data Resources Learning Objectives 1. Describe how the problems of managing data resources in a traditional file environment are

More information

Data Integration and ETL Process

Data Integration and ETL Process Data Integration and ETL Process Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Software Development Technologies Master studies, second

More information

Foundations of Business Intelligence: Databases and Information Management

Foundations of Business Intelligence: Databases and Information Management Foundations of Business Intelligence: Databases and Information Management Content Problems of managing data resources in a traditional file environment Capabilities and value of a database management

More information

Technology in Action. Alan Evans Kendall Martin Mary Anne Poatsy. Eleventh Edition. Copyright 2015 Pearson Education, Inc.

Technology in Action. Alan Evans Kendall Martin Mary Anne Poatsy. Eleventh Edition. Copyright 2015 Pearson Education, Inc. Copyright 2015 Pearson Education, Inc. Technology in Action Alan Evans Kendall Martin Mary Anne Poatsy Eleventh Edition Copyright 2015 Pearson Education, Inc. Technology in Action Chapter 9 Behind the

More information

Business Intelligence, Data warehousing Concept and artifacts

Business Intelligence, Data warehousing Concept and artifacts Business Intelligence, Data warehousing Concept and artifacts Data Warehousing is the process of constructing and using the data warehouse. The data warehouse is constructed by integrating the data from

More information

Trends in Data Warehouse Data Modeling: Data Vault and Anchor Modeling

Trends in Data Warehouse Data Modeling: Data Vault and Anchor Modeling Trends in Data Warehouse Data Modeling: Data Vault and Anchor Modeling Thanks for Attending! Roland Bouman, Leiden the Netherlands MySQL AB, Sun, Strukton, Pentaho (1 nov) Web- and Business Intelligence

More information

Normalization. Reduces the liklihood of anomolies

Normalization. Reduces the liklihood of anomolies Normalization Normalization Tables are important, but properly designing them is even more important so the DBMS can do its job Normalization the process for evaluating and correcting table structures

More information

Part 22. Data Warehousing

Part 22. Data Warehousing Part 22 Data Warehousing The Decision Support System (DSS) Tools to assist decision-making Used at all levels in the organization Sometimes focused on a single area Sometimes focused on a single problem

More information

Chapter 7 Multidimensional Data Modeling (MDDM)

Chapter 7 Multidimensional Data Modeling (MDDM) Chapter 7 Multidimensional Data Modeling (MDDM) Fundamentals of Business Analytics Learning Objectives and Learning Outcomes Learning Objectives 1. To assess the capabilities of OLTP and OLAP systems 2.

More information

Data Warehousing and Data Mining

Data Warehousing and Data Mining Data Warehousing and Data Mining Part I: Data Warehousing Gao Cong gaocong@cs.aau.dk Slides adapted from Man Lung Yiu and Torben Bach Pedersen Course Structure Business intelligence: Extract knowledge

More information

Chapter 6 FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT Learning Objectives

Chapter 6 FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT Learning Objectives Chapter 6 FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT Learning Objectives Describe how the problems of managing data resources in a traditional file environment are solved

More information

Sizing Logical Data in a Data Warehouse A Consistent and Auditable Approach

Sizing Logical Data in a Data Warehouse A Consistent and Auditable Approach 2006 ISMA Conference 1 Sizing Logical Data in a Data Warehouse A Consistent and Auditable Approach Priya Lobo CFPS Satyam Computer Services Ltd. 69, Railway Parallel Road, Kumarapark West, Bangalore 560020,

More information

Week 3 lecture slides

Week 3 lecture slides Week 3 lecture slides Topics Data Warehouses Online Analytical Processing Introduction to Data Cubes Textbook reference: Chapter 3 Data Warehouses A data warehouse is a collection of data specifically

More information

Data Warehousing: Data Models and OLAP operations. By Kishore Jaladi kishorejaladi@yahoo.com

Data Warehousing: Data Models and OLAP operations. By Kishore Jaladi kishorejaladi@yahoo.com Data Warehousing: Data Models and OLAP operations By Kishore Jaladi kishorejaladi@yahoo.com Topics Covered 1. Understanding the term Data Warehousing 2. Three-tier Decision Support Systems 3. Approaches

More information

www.dotnetsparkles.wordpress.com

www.dotnetsparkles.wordpress.com Database Design Considerations Designing a database requires an understanding of both the business functions you want to model and the database concepts and features used to represent those business functions.

More information

DATA CUBES E0 261. Jayant Haritsa Computer Science and Automation Indian Institute of Science. JAN 2014 Slide 1 DATA CUBES

DATA CUBES E0 261. Jayant Haritsa Computer Science and Automation Indian Institute of Science. JAN 2014 Slide 1 DATA CUBES E0 261 Jayant Haritsa Computer Science and Automation Indian Institute of Science JAN 2014 Slide 1 Introduction Increasingly, organizations are analyzing historical data to identify useful patterns and

More information

ETL Process in Data Warehouse. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT

ETL Process in Data Warehouse. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT ETL Process in Data Warehouse G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT Outline ETL Extraction Transformation Loading ETL Overview Extraction Transformation Loading ETL To get data out of

More information

A DATA MODELING PROCESS FOR DECOMPOSING HEALTHCARE PATIENT DATA SETS

A DATA MODELING PROCESS FOR DECOMPOSING HEALTHCARE PATIENT DATA SETS OJNI Online Journal of Nursing Informatics, 13(1). Page 1 of 26 A DATA MODELING PROCESS FOR DECOMPOSING HEALTHCARE PATIENT DATA SETS By Der-Fa Lu, PhD RN 1 W. Nick Street, PhD 2 Faiz Currim, PhD 2 Ray

More information

The Data Warehouse ETL Toolkit

The Data Warehouse ETL Toolkit 2008 AGI-Information Management Consultants May be used for personal purporses only or by libraries associated to dandelon.com network. The Data Warehouse ETL Toolkit Practical Techniques for Extracting,

More information

Databases in Organizations

Databases in Organizations The following is an excerpt from a draft chapter of a new enterprise architecture text book that is currently under development entitled Enterprise Architecture: Principles and Practice by Brian Cameron

More information

Speeding ETL Processing in Data Warehouses White Paper

Speeding ETL Processing in Data Warehouses White Paper Speeding ETL Processing in Data Warehouses White Paper 020607dmxwpADM High-Performance Aggregations and Joins for Faster Data Warehouse Processing Data Processing Challenges... 1 Joins and Aggregates are

More information

University of Gaziantep, Department of Business Administration

University of Gaziantep, Department of Business Administration University of Gaziantep, Department of Business Administration The extensive use of information technology enables organizations to collect huge amounts of data about almost every aspect of their businesses.

More information

IT2305 Database Systems I (Compulsory)

IT2305 Database Systems I (Compulsory) Database Systems I (Compulsory) INTRODUCTION This is one of the 4 modules designed for Semester 2 of Bachelor of Information Technology Degree program. CREDITS: 04 LEARNING OUTCOMES On completion of this

More information

Chapter 6 8/12/2015. Foundations of Business Intelligence: Databases and Information Management. Problem:

Chapter 6 8/12/2015. Foundations of Business Intelligence: Databases and Information Management. Problem: Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Chapter 6 Case 1a: City of Dubuque Uses Cloud Computing and Sensors to Build a Smarter, Sustainable City Case 1b:

More information

CHAPTER 3. Data Warehouses and OLAP

CHAPTER 3. Data Warehouses and OLAP CHAPTER 3 Data Warehouses and OLAP 3.1 Data Warehouse 3.2 Differences between Operational Systems and Data Warehouses 3.3 A Multidimensional Data Model 3.4Stars, snowflakes and Fact Constellations: 3.5

More information

THE DATA WAREHOUSE ETL TOOLKIT CDT803 Three Days

THE DATA WAREHOUSE ETL TOOLKIT CDT803 Three Days Three Days Prerequisites Students should have at least some experience with any relational database management system. Who Should Attend This course is targeted at technical staff, team leaders and project

More information

Oracle Warehouse Builder 11gR2: Getting Started

Oracle Warehouse Builder 11gR2: Getting Started P U B L I S H I N G professional expertise distilled Oracle Warehouse Builder 11gR2: Getting Started Bob Griesemer Chapter No.3 "Designing the Target Structure" In this package, you will find: A Biography

More information

IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH

IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH Kalinka Mihaylova Kaloyanova St. Kliment Ohridski University of Sofia, Faculty of Mathematics and Informatics Sofia 1164, Bulgaria

More information

Data warehousing with PostgreSQL

Data warehousing with PostgreSQL Data warehousing with PostgreSQL Gabriele Bartolini http://www.2ndquadrant.it/ European PostgreSQL Day 2009 6 November, ParisTech Telecom, Paris, France Audience

More information

The Benefits of Data Modeling in Data Warehousing

The Benefits of Data Modeling in Data Warehousing WHITE PAPER: THE BENEFITS OF DATA MODELING IN DATA WAREHOUSING The Benefits of Data Modeling in Data Warehousing NOVEMBER 2008 Table of Contents Executive Summary 1 SECTION 1 2 Introduction 2 SECTION 2

More information

SAP BO Course Details

SAP BO Course Details SAP BO Course Details By Besant Technologies Course Name Category Venue SAP BO SAP Besant Technologies No.24, Nagendra Nagar, Velachery Main Road, Address Velachery, Chennai 600 042 Landmark Opposite to

More information

Turning your Warehouse Data into Business Intelligence: Reporting Trends and Visibility Michael Armanious; Vice President Sales and Marketing Datex,

Turning your Warehouse Data into Business Intelligence: Reporting Trends and Visibility Michael Armanious; Vice President Sales and Marketing Datex, Turning your Warehouse Data into Business Intelligence: Reporting Trends and Visibility Michael Armanious; Vice President Sales and Marketing Datex, Inc. Overview Introduction What is Business Intelligence?

More information

KDOT s Spatially Enabled Data Warehouse. Paul Bayless KDOT Data Warehouse Manager and Bill Schuman GeoDecisions Project Manager

KDOT s Spatially Enabled Data Warehouse. Paul Bayless KDOT Data Warehouse Manager and Bill Schuman GeoDecisions Project Manager KDOT s Spatially Enabled Data Warehouse Paul Bayless KDOT Data Warehouse Manager and Bill Schuman GeoDecisions Project Manager Goals of the Session Describe what a data warehouse is and why it is of value

More information

Module 1: Introduction to Data Warehousing and OLAP

Module 1: Introduction to Data Warehousing and OLAP Raw Data vs. Business Information Module 1: Introduction to Data Warehousing and OLAP Capturing Raw Data Gathering data recorded in everyday operations Deriving Business Information Deriving meaningful

More information

Mastering Data Warehouse Aggregates. Solutions for Star Schema Performance

Mastering Data Warehouse Aggregates. Solutions for Star Schema Performance Brochure More information from http://www.researchandmarkets.com/reports/2248199/ Mastering Data Warehouse Aggregates. Solutions for Star Schema Performance Description: - This is the first book to provide

More information

Chapter 6. Foundations of Business Intelligence: Databases and Information Management

Chapter 6. Foundations of Business Intelligence: Databases and Information Management Chapter 6 Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Case 1a: City of Dubuque Uses Cloud Computing and Sensors to Build a Smarter, Sustainable City Case 1b:

More information

The Design and the Implementation of an HEALTH CARE STATISTICS DATA WAREHOUSE Dr. Sreèko Natek, assistant professor, Nova Vizija, srecko@vizija.

The Design and the Implementation of an HEALTH CARE STATISTICS DATA WAREHOUSE Dr. Sreèko Natek, assistant professor, Nova Vizija, srecko@vizija. The Design and the Implementation of an HEALTH CARE STATISTICS DATA WAREHOUSE Dr. Sreèko Natek, assistant professor, Nova Vizija, srecko@vizija.si ABSTRACT Health Care Statistics on a state level is a

More information

IT2304: Database Systems 1 (DBS 1)

IT2304: Database Systems 1 (DBS 1) : Database Systems 1 (DBS 1) (Compulsory) 1. OUTLINE OF SYLLABUS Topic Minimum number of hours Introduction to DBMS 07 Relational Data Model 03 Data manipulation using Relational Algebra 06 Data manipulation

More information

Basics of Dimensional Modeling

Basics of Dimensional Modeling Basics of Dimensional Modeling Data warehouse and OLAP tools are based on a dimensional data model. A dimensional model is based on dimensions, facts, cubes, and schemas such as star and snowflake. Dimensional

More information

Data warehouse design

Data warehouse design DataBase and Data Mining Group of DataBase and Data Mining Group of DataBase and Data Mining Group of Database and data mining group, Data warehouse design DATA WAREHOUSE: DESIGN - 1 Risk factors Database

More information

Implementing a Data Warehouse with Microsoft SQL Server

Implementing a Data Warehouse with Microsoft SQL Server This course describes how to implement a data warehouse platform to support a BI solution. Students will learn how to create a data warehouse 2014, implement ETL with SQL Server Integration Services, and

More information

A Critical Review of Data Warehouse

A Critical Review of Data Warehouse Global Journal of Business Management and Information Technology. Volume 1, Number 2 (2011), pp. 95-103 Research India Publications http://www.ripublication.com A Critical Review of Data Warehouse Sachin

More information

Foundations of Business Intelligence: Databases and Information Management

Foundations of Business Intelligence: Databases and Information Management Foundations of Business Intelligence: Databases and Information Management Problem: HP s numerous systems unable to deliver the information needed for a complete picture of business operations, lack of

More information

Business Intelligence

Business Intelligence 8 Business Intelligence Business intelligence has become a buzzword in recent years. The database tools found under the heading of business intelligence include data warehousing, online analytical processing

More information

Dimensional Modeling for Data Warehouse

Dimensional Modeling for Data Warehouse Modeling for Data Warehouse Umashanker Sharma, Anjana Gosain GGS, Indraprastha University, Delhi Abstract Many surveys indicate that a significant percentage of DWs fail to meet business objectives or

More information

Foundations of Business Intelligence: Databases and Information Management

Foundations of Business Intelligence: Databases and Information Management Foundations of Business Intelligence: Databases and Information Management Wienand Omta Fabiano Dalpiaz 1 drs. ing. Wienand Omta Learning Objectives Describe how the problems of managing data resources

More information

Demystified CONTENTS Acknowledgments xvii Introduction xix CHAPTER 1 Database Fundamentals CHAPTER 2 Exploring Relational Database Components

Demystified CONTENTS Acknowledgments xvii Introduction xix CHAPTER 1 Database Fundamentals CHAPTER 2 Exploring Relational Database Components Acknowledgments xvii Introduction xix CHAPTER 1 Database Fundamentals 1 Properties of a Database 1 The Database Management System (DBMS) 2 Layers of Data Abstraction 3 Physical Data Independence 5 Logical

More information

META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING

META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING Ramesh Babu Palepu 1, Dr K V Sambasiva Rao 2 Dept of IT, Amrita Sai Institute of Science & Technology 1 MVR College of Engineering 2 asistithod@gmail.com

More information

Foundations of Business Intelligence: Databases and Information Management

Foundations of Business Intelligence: Databases and Information Management Chapter 6 Foundations of Business Intelligence: Databases and Information Management 6.1 2010 by Prentice Hall LEARNING OBJECTIVES Describe how the problems of managing data resources in a traditional

More information

Lecture Data Warehouse Systems

Lecture Data Warehouse Systems Lecture Data Warehouse Systems Eva Zangerle SS 2013 PART A: Architecture Chapter 1: Motivation and Definitions Motivation Goal: to build an operational general view on a company to support decisions in

More information

COURSE 20463C: IMPLEMENTING A DATA WAREHOUSE WITH MICROSOFT SQL SERVER

COURSE 20463C: IMPLEMENTING A DATA WAREHOUSE WITH MICROSOFT SQL SERVER Page 1 of 8 ABOUT THIS COURSE This 5 day course describes how to implement a data warehouse platform to support a BI solution. Students will learn how to create a data warehouse with Microsoft SQL Server

More information

Course 103402 MIS. Foundations of Business Intelligence

Course 103402 MIS. Foundations of Business Intelligence Oman College of Management and Technology Course 103402 MIS Topic 5 Foundations of Business Intelligence CS/MIS Department Organizing Data in a Traditional File Environment File organization concepts Database:

More information

Implementing a Data Warehouse with Microsoft SQL Server

Implementing a Data Warehouse with Microsoft SQL Server Page 1 of 7 Overview This course describes how to implement a data warehouse platform to support a BI solution. Students will learn how to create a data warehouse with Microsoft SQL 2014, implement ETL

More information

Fluency With Information Technology CSE100/IMT100

Fluency With Information Technology CSE100/IMT100 Fluency With Information Technology CSE100/IMT100 ),7 Larry Snyder & Mel Oyler, Instructors Ariel Kemp, Isaac Kunen, Gerome Miklau & Sean Squires, Teaching Assistants University of Washington, Autumn 1999

More information

INTRODUCTION TO BUSINESS INTELLIGENCE What to consider implementing a Data Warehouse and Business Intelligence

INTRODUCTION TO BUSINESS INTELLIGENCE What to consider implementing a Data Warehouse and Business Intelligence INTRODUCTION TO BUSINESS INTELLIGENCE What to consider implementing a Data Warehouse and Business Intelligence Summary: This note gives some overall high-level introduction to Business Intelligence and

More information

Extraction Transformation Loading ETL Get data out of sources and load into the DW

Extraction Transformation Loading ETL Get data out of sources and load into the DW Lection 5 ETL Definition Extraction Transformation Loading ETL Get data out of sources and load into the DW Data is extracted from OLTP database, transformed to match the DW schema and loaded into the

More information

A Design and implementation of a data warehouse for research administration universities

A Design and implementation of a data warehouse for research administration universities A Design and implementation of a data warehouse for research administration universities André Flory 1, Pierre Soupirot 2, and Anne Tchounikine 3 1 CRI : Centre de Ressources Informatiques INSA de Lyon

More information

LITERATURE SURVEY ON DATA WAREHOUSE AND ITS TECHNIQUES

LITERATURE SURVEY ON DATA WAREHOUSE AND ITS TECHNIQUES LITERATURE SURVEY ON DATA WAREHOUSE AND ITS TECHNIQUES MUHAMMAD KHALEEL (0912125) SZABIST KARACHI CAMPUS Abstract. Data warehouse and online analytical processing (OLAP) both are core component for decision

More information

Data Warehousing Overview

Data Warehousing Overview Data Warehousing Overview This Presentation will leave you with a good understanding of Data Warehousing technologies, from basic relational through ROLAP to MOLAP and Hybrid Analysis. However it is necessary

More information

Kimball Dimensional Modeling Techniques

Kimball Dimensional Modeling Techniques Kimball Dimensional Modeling Techniques Table of Contents Fundamental Concepts... 1 Gather Business Requirements and Data Realities... 1 Collaborative Dimensional Modeling Workshops... 1 Four-Step Dimensional

More information

SQL Server 2012 Business Intelligence Boot Camp

SQL Server 2012 Business Intelligence Boot Camp SQL Server 2012 Business Intelligence Boot Camp Length: 5 Days Technology: Microsoft SQL Server 2012 Delivery Method: Instructor-led (classroom) About this Course Data warehousing is a solution organizations

More information

Data Warehouse and Business Intelligence Testing: Challenges, Best Practices & the Solution

Data Warehouse and Business Intelligence Testing: Challenges, Best Practices & the Solution Warehouse and Business Intelligence : Challenges, Best Practices & the Solution Prepared by datagaps http://www.datagaps.com http://www.youtube.com/datagaps http://www.twitter.com/datagaps Contact contact@datagaps.com

More information

Database Management System Dr. S. Srinath Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No.

Database Management System Dr. S. Srinath Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No. Database Management System Dr. S. Srinath Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No. # 31 Introduction to Data Warehousing and OLAP Part 2 Hello and

More information

Oracle9i Data Warehouse Review. Robert F. Edwards Dulcian, Inc.

Oracle9i Data Warehouse Review. Robert F. Edwards Dulcian, Inc. Oracle9i Data Warehouse Review Robert F. Edwards Dulcian, Inc. Agenda Oracle9i Server OLAP Server Analytical SQL Data Mining ETL Warehouse Builder 3i Oracle 9i Server Overview 9i Server = Data Warehouse

More information

A Best Practice Guide to Designing TM1 Cubes

A Best Practice Guide to Designing TM1 Cubes White Paper A Best Practice Guide to Designing TM1 Cubes What You ll Learn in This White Paper: General principles for best practice cube design The importance of the Measures dimension Different approaches

More information

PowerDesigner WarehouseArchitect The Model for Data Warehousing Solutions. A Technical Whitepaper from Sybase, Inc.

PowerDesigner WarehouseArchitect The Model for Data Warehousing Solutions. A Technical Whitepaper from Sybase, Inc. PowerDesigner WarehouseArchitect The Model for Data Warehousing Solutions A Technical Whitepaper from Sybase, Inc. Table of Contents Section I: The Need for Data Warehouse Modeling.....................................4

More information

Trivadis White Paper. Comparison of Data Modeling Methods for a Core Data Warehouse. Dani Schnider Adriano Martino Maren Eschermann

Trivadis White Paper. Comparison of Data Modeling Methods for a Core Data Warehouse. Dani Schnider Adriano Martino Maren Eschermann Trivadis White Paper Comparison of Data Modeling Methods for a Core Data Warehouse Dani Schnider Adriano Martino Maren Eschermann June 2014 Table of Contents 1. Introduction... 3 2. Aspects of Data Warehouse

More information

ETL PROCESS IN DATA WAREHOUSE

ETL PROCESS IN DATA WAREHOUSE ETL PROCESS IN DATA WAREHOUSE OUTLINE ETL : Extraction, Transformation, Loading Capture/Extract Scrub or data cleansing Transform Load and Index ETL OVERVIEW Extraction Transformation Loading ETL ETL is

More information

Introduction to Data Warehousing. Ms Swapnil Shrivastava swapnil@konark.ncst.ernet.in

Introduction to Data Warehousing. Ms Swapnil Shrivastava swapnil@konark.ncst.ernet.in Introduction to Data Warehousing Ms Swapnil Shrivastava swapnil@konark.ncst.ernet.in Necessity is the mother of invention Why Data Warehouse? Scenario 1 ABC Pvt Ltd is a company with branches at Mumbai,

More information

DATA WAREHOUSING - OLAP

DATA WAREHOUSING - OLAP http://www.tutorialspoint.com/dwh/dwh_olap.htm DATA WAREHOUSING - OLAP Copyright tutorialspoint.com Online Analytical Processing Server OLAP is based on the multidimensional data model. It allows managers,

More information

Microsoft Data Warehouse in Depth

Microsoft Data Warehouse in Depth Microsoft Data Warehouse in Depth 1 P a g e Duration What s new Why attend Who should attend Course format and prerequisites 4 days The course materials have been refreshed to align with the second edition

More information

LEARNING SOLUTIONS website milner.com/learning email training@milner.com phone 800 875 5042

LEARNING SOLUTIONS website milner.com/learning email training@milner.com phone 800 875 5042 Course 20467A: Designing Business Intelligence Solutions with Microsoft SQL Server 2012 Length: 5 Days Published: December 21, 2012 Language(s): English Audience(s): IT Professionals Overview Level: 300

More information

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP Data Warehousing and End-User Access Tools OLAP and Data Mining Accompanying growth in data warehouses is increasing demands for more powerful access tools providing advanced analytical capabilities. Key

More information

Distance Learning and Examining Systems

Distance Learning and Examining Systems Lodz University of Technology Distance Learning and Examining Systems - Theory and Applications edited by Sławomir Wiak Konrad Szumigaj HUMAN CAPITAL - THE BEST INVESTMENT The project is part-financed

More information

Jet Data Manager 2012 User Guide

Jet Data Manager 2012 User Guide Jet Data Manager 2012 User Guide Welcome This documentation provides descriptions of the concepts and features of the Jet Data Manager and how to use with them. With the Jet Data Manager you can transform

More information

Data Warehousing. Read chapter 13 of Riguzzi et al Sistemi Informativi. Slides derived from those by Hector Garcia-Molina

Data Warehousing. Read chapter 13 of Riguzzi et al Sistemi Informativi. Slides derived from those by Hector Garcia-Molina Data Warehousing Read chapter 13 of Riguzzi et al Sistemi Informativi Slides derived from those by Hector Garcia-Molina What is a Warehouse? Collection of diverse data subject oriented aimed at executive,

More information

The Benefits of Data Modeling in Business Intelligence

The Benefits of Data Modeling in Business Intelligence WHITE PAPER: THE BENEFITS OF DATA MODELING IN BUSINESS INTELLIGENCE The Benefits of Data Modeling in Business Intelligence DECEMBER 2008 Table of Contents Executive Summary 1 SECTION 1 2 Introduction 2

More information

Data Warehousing Concepts

Data Warehousing Concepts Data Warehousing Concepts JB Software and Consulting Inc 1333 McDermott Drive, Suite 200 Allen, TX 75013. [[[[[ DATA WAREHOUSING What is a Data Warehouse? Decision Support Systems (DSS), provides an analysis

More information

Data Warehouse design

Data Warehouse design Data Warehouse design Design of Enterprise Systems University of Pavia 11/11/2013-1- Data Warehouse design DATA MODELLING - 2- Data Modelling Important premise Data warehouses typically reside on a RDBMS

More information