A Data Warehouse for Kimberly-Clark s competitor products



Similar documents
Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1

1. OLAP is an acronym for a. Online Analytical Processing b. Online Analysis Process c. Online Arithmetic Processing d. Object Linking and Processing

CHAPTER 5: BUSINESS ANALYTICS

14. Data Warehousing & Data Mining

CHAPTER 4: BUSINESS ANALYTICS

DATA WAREHOUSING AND OLAP TECHNOLOGY

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

Data Warehouse design

When to consider OLAP?

Week 3 lecture slides

University of Gaziantep, Department of Business Administration

DATA WAREHOUSING - OLAP

Unit -3. Learning Objective. Demand for Online analytical processing Major features and functions OLAP models and implementation considerations

Data Warehouse Snowflake Design and Performance Considerations in Business Analytics

Introduction to Data Warehousing. Ms Swapnil Shrivastava

Data W a Ware r house house and and OLAP II Week 6 1

CS2032 Data warehousing and Data Mining Unit II Page 1

B.Sc (Computer Science) Database Management Systems UNIT-V

Part 22. Data Warehousing

Data Warehouse: Introduction

2074 : Designing and Implementing OLAP Solutions Using Microsoft SQL Server 2000

Data Warehouses & OLAP

Data Warehousing and OLAP Technology for Knowledge Discovery

Basics of Dimensional Modeling

A Design and implementation of a data warehouse for research administration universities

UNIT-3 OLAP in Data Warehouse

Data Warehousing and Data Mining

Business Intelligence: Effective Decision Making

COURSE SYLLABUS COURSE TITLE:

BUILDING BLOCKS OF DATAWAREHOUSE. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT

Data Warehousing: Data Models and OLAP operations. By Kishore Jaladi

OLAP and OLTP. AMIT KUMAR BINDAL Associate Professor M M U MULLANA

LITERATURE SURVEY ON DATA WAREHOUSE AND ITS TECHNIQUES

SAS BI Course Content; Introduction to DWH / BI Concepts

A Critical Review of Data Warehouse

Week 13: Data Warehousing. Warehousing

Data Warehousing. Paper

Jet Data Manager 2012 User Guide

Published by: PIONEER RESEARCH & DEVELOPMENT GROUP ( 28

Overview of Data Warehousing and OLAP

Business Benefits From Microsoft SQL Server Business Intelligence Solutions How Can Business Intelligence Help You? PTR Associates Limited

CHAPTER 4 Data Warehouse Architecture

Migrating a Discoverer System to Oracle Business Intelligence Enterprise Edition

Turkish Journal of Engineering, Science and Technology

Bussiness Intelligence and Data Warehouse. Tomas Bartos CIS 764, Kansas State University

Data Warehousing. Read chapter 13 of Riguzzi et al Sistemi Informativi. Slides derived from those by Hector Garcia-Molina

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data

QAD Business Intelligence Data Warehouse Demonstration Guide. May 2015 BI 3.11

M Designing and Implementing OLAP Solutions Using Microsoft SQL Server Day Course

Course DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Chapter 6 FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT Learning Objectives

Business Intelligence Solutions. Cognos BI 8. by Adis Terzić

OLAP. Business Intelligence OLAP definition & application Multidimensional data representation

Designing a Dimensional Model

Analytics with Excel and ARQUERY for Oracle OLAP

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Database Applications. Advanced Querying. Transaction Processing. Transaction Processing. Data Warehouse. Decision Support. Transaction processing

DATA CUBES E Jayant Haritsa Computer Science and Automation Indian Institute of Science. JAN 2014 Slide 1 DATA CUBES

This tutorial will help computer science graduates to understand the basic-toadvanced concepts related to data warehousing.

Monitoring Genebanks using Datamarts based in an Open Source Tool

BUSINESS ANALYTICS AND DATA VISUALIZATION. ITM-761 Business Intelligence ดร. สล ล บ ญพราหมณ

CHAPTER 6: ANALYZE MICROSOFT DYNAMICS NAV 5.0 DATA IN MICROSOFT EXCEL

SQL SERVER BUSINESS INTELLIGENCE (BI) - INTRODUCTION

Data Warehousing Systems: Foundations and Architectures

Foundations of Business Intelligence: Databases and Information Management

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 16 - Data Warehousing

Course MIS. Foundations of Business Intelligence

Project Management Development Scheduling Plan: Data Mart. By Turell Makins

A Comparative Study of Database Design Tools

Quantrix & Excel: 3 Key Differences A QUANTRIX WHITE PAPER

Learning Objectives. Definition of OLAP Data cubes OLAP operations MDX OLAP servers

Building Data Cubes and Mining Them. Jelena Jovanovic

INTRODUCTION TO BUSINESS INTELLIGENCE What to consider implementing a Data Warehouse and Business Intelligence

Data Warehousing and Decision Support. Introduction. Three Complementary Trends. Chapter 23, Part A

Exploiting Key Answers from Your Data Warehouse Using SAS Enterprise Reporter Software

Delivering Business Intelligence With Microsoft SQL Server 2005 or 2008 HDT922 Five Days

70-467: Designing Business Intelligence Solutions with Microsoft SQL Server

Lost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole

Foundations of Business Intelligence: Databases and Information Management

Vendor briefing Business Intelligence and Analytics Platforms Gartner 15 capabilities

Foundations of Business Intelligence: Databases and Information Management

Foundations of Business Intelligence: Databases and Information Management

SQL Server Analysis Services Complete Practical & Real-time Training

Hybrid OLAP, An Introduction

IST722 Data Warehousing

A Technical Review on On-Line Analytical Processing (OLAP)

BUILDING OLAP TOOLS OVER LARGE DATABASES

Lection 3-4 WAREHOUSING

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

How To Model Data For Business Intelligence (Bi)

Business Intelligence, Analytics & Reporting: Glossary of Terms

SQL Server 2012 Business Intelligence Boot Camp

Microsoft Implementing Data Models and Reports with Microsoft SQL Server

IBM Cognos 8 Business Intelligence Analysis Discover the factors driving business performance

Data Testing on Business Intelligence & Data Warehouse Projects

How To Create A Report In Excel

Transcription:

A Data Warehouse for Kimberly-Clark s competitor products Melissa Gabrielle Duke Computing BSc. 2001/ 2002 The candidate confirms that the work submitted is their own and the appropriate credit has been given where reference has been made to the work of others. I understand that failure to attribute material which is obtained from another source may be considered as plagiarism.

Summary SUMMARY Kimberly-Clark is a $14.5 billion global consumer products company. The company is known best for its brands. Nearly a quarter of the wo rld s population, or 1.3 billion people, use the company s products each year. [9] Objectives The objectives of the project were: To design and implement a data warehouse for the analysis of the products of Kimberly - Clark s competitor products To decide where is best to obtain competitor data in the future Minimum Requirements The minimum requirements of the project were: Obtain a good understanding of Data Warehouses and Online Analytical Processing To analyse what information will be required from the data To design the tables in SQL Server, create a Data Preparation Area and the tables To clean the data, input it into the Data Preparation area, and then copy it to the tables To gain access to the Analysis Services to create a data cube from the tables To use OLAP to extract the relevant data from the cube To Test the system and analyse the data To investigate where to obtain data in the future Deliverables The items produced were: Data Warehouse Software The Project Report Including: A User guide An investigation into future data sources i

Contents CONTENTS CHAPTER 1 INTRODUCTION 1 1.1 The Problem 1 1.2 Objectives 1 1.3 Approach 1 1.4 Personal Objectives 2 1.41 Previous Study 2 1.41 Timescale 2 1.5 The Report 2 CHAPTER 2 DATA WAREHOUSING 4 2.1 Background to Data Warehousing 4 2.11 Data Warehousing 4 2.12 Data Sources 4 2.13 Characteristics 5 2.2 Stages in Data Warehousing 6 2.21 Designing a Data Warehouse 6 2.22 Data Cleaning 6 2.23 Data Preparation Area 6 2.24 Creating the Data Warehouse Database 7 2.25 Fact Tables 7 2.26 Dimension Tables 7 2.27 Using a Data Warehouse 7 2.28 Maintaining a Data Warehouse 8 2.29 Data Marts 8 CHAPTER 3 DATA CUBES AND OLAP 9 3.1 OLAP 9 3.11 Online Analytical Processing 9 3.12 Multidimensional OLAP (MOLAP) 9 3.13 Relational OLAP (ROLAP) 9 3.14 Hybrid OLAP (HOLAP) 9 3.15 Data Representations 10 3.2 Data Cubes 11 3.21 Data Cubes 11 3.22 Data sources 11 3.23 Measures 11 3.24 Dimensions 11 3.25 Partitions 12 3.26 Cube roles 12 ii

Contents CHAPTER 4 THE DATA WAREHOUSE 13 4.1 Background 13 4.11 Previous system 13 4.12 Limitations 13 4.2 Solution 13 4.3 Data Cleaning 14 4.4 Table Design 14 4.41 Data Warehouse Tables 14 4.42 Data Preparation Area 15 4.5 Implementation 16 CHAPTER 5 THE DATA CUBE 20 5.1 Data Source 20 5.2 Fact Table 20 5.3 Dimensions 21 5.4 Cubes 27 5.41 Case Data 27 5.42 Price Data 27 5.43 Sheet Data 27 5.44 Measures 27 5.45 Calculated Members 28 5.5 OLAP 29 CHAPTER 6 TESTING 31 6.1 Data Analysis 31 6.11 Washroom 31 6.12 Workplace 31 6.13 Skincare 31 6.11 Washroom 31 6.2 Testing System 32 6.21 Test 1 32 6.22 Test 2 32 6.23 Test 3 32 CHAPTER 7 EVALUATION 34 7.1 Introduction 34 7.2 Fulfilling the Objectives 34 7.3 Further Work 35 7.4 Conclusions 35 iii

Contents CHAPTER 8 AN INVESTIGATION INTO FUTURE DATA SOURCES 36 8.1 Explanation 36 8.2 Results 36 8.3 Discussion 39 8.4 Conclusions 39 8.5 Suggested Future Sources 40 CHAPTER 9 USER GUIDE 41 9.1 Introduction 41 9.2 Operation 42 9.3 Processing a Cube 46 9.4 Maintenance 47 9.41 Adding New Data 47 9.41 Updating Data 49 CHAPTER 10 APPENDIX A REFLECTION 50 CHAPTER 11 APPENDIX B CLEANED DATA 51 CHAPTER 12 APPENDIX C STANDARD MS EXCEL TEMPLATE 81 CHAPTER 13 APPENDIX D DATA CUBE 82 CHAPTER 14 REFERENCES 84 iv

Introduction 1 INTRODUCTION 1.1 The Problem Kimberly-Clark is a competitive organisation; trying to keep the major share they have in certain markets, and increase the share i n other markets. To do this they wish to compare data about competitor companies with data about their own company to establish how best to market products against competitors. There is no previous system that allows this to take place. This is a new ven competitive advantage. Various data about different company s products have been ture to gain a collected. At present, this is stored in several spreadsheets and therefore it is difficult to access the relevant information. It would be much better to store the data in one source. All users have Windows XP PCs (the company has recently updated). These run SQL Server 2000. There is no set deadline as this will be a new system, and does not replace any existing procedure. 1.2 Objectives The objectives of the project were: To design and implement a data warehouse for the analysis of the products of Kimberly - Clark s competitors To decide where is best to obtain competitor data in the future 1.3 Approach To do this it is necessary to: 1. Obtain a good understanding of Data Warehouses and Online Analytical Processing 2. To analyse what information will be required from the data 3. To design the tables in SQL Server, create a Data Preparation Area and the tables 4. To clean the data, input it into the Data Preparation area, and then copy it to the tables 5. To gain access to the Analysis Services to create a data cube from the tables 6. To use OLAP to extract the relevant data from the cube 7. To Test the system and analyse the data 8. To investigate where to obtain data in the future 1

Introduction 1.4 Personal Objectives 1.41 Previous Study The project uses knowledge and techniques gained from several modules previously studied as part of my degree program. This includes the stream of database modules: DB11 Introduction to Databases Studied in Year 1 DB21 Database Principles and Practice Studied in Year 2 DB32 Knowledge Management Currently studying in Final Year I wish to use what I have learned from these and add to this with reading from other sources and from applied experience to gain a go od theoretical and practical understanding of Data Warehousing and Data Cubes. 1.42 Timescale Following the objectives stated in 1.3 above, these fit into a Gantt chart to demonstrate the overall timescale of the project as shown in Figure 1.42 below. 1 2 3 4 5 6 7 8 Oct Nov Dec Jan Feb Mar Apr Figure 1.42 Project Timescale 1.5 The Report The Report has two clear fields, Data Warehousing, and Data Cubes. This applies both to the research chapters and the development chapters. After the introduction, the next two chapters are conc erned with research. The chapter titled Data Warehousing researches the subject of Data Warehouses in depth. This is followed by the chapter called Data Cubes and OLAP, which explores how cubes are used to accomplish Online Analytical Processing. 2

Introduction The sub sequent two chapters are concerned with the development of the system. The chapter, The Data Warehouse, investigates the background for the system, going on to decide on a suitable solution, and finally how the warehouse is developed. The chapter, The Data Cube, goes into detail on how the cube is developed. Chapter 7 reflects on the main discoveries and issues encountered throughout the whole project process. Chapter 8 investigates how reliable the current data sources are, and any better solutions in the future. Finally, Chapter 9 is the User Guide, which will be necessary to distribute to users of the installed system. 3

Data Warehousing 2 DATA WAREHOUSING 2.1 Background to Data Warehousing 2.11 Data Warehousing W.H Inmon has been credited with initially using the term da ta warehouse [4]. In [6] he characterised a data warehouse as a subject -oriented, integrated, non-volatile, time-variant collection of data in support of management s decisions. Data warehouses give access to data for complex analysis, knowledge discov ery, and to aid decision -making. They sustain an organisation s data and information demands. Taken from [16], the process of data warehousing is shown in figure 2.1. Source End User Source Source Data Integration Data Warehouse Data Transformation End User Source Figure 2.1 A Generic data warehouse architecture End User 2.12 Data Sources Data warehouses ar e intended to provide information to decision makers. To do so, data must be gathered and consolidated from many sources into a consistent set of data. Acquisition of data for the warehouse involves the following steps: The data must be extracted from multiple, heterogeneous sources. Data must be formatted for consistency within the warehouse. The data must be cleaned to ensure validity. The data must be fitted into the data model of the warehouse. The data must be loaded into the warehouse. 4

Data Warehousing 2.13 Characteristics Data warehouses have the following distinctive characteristics, summarised from [2]: Multidimensional conceptual view Manipulation of multidimensional data models is easier than single dimensional models. Manipulations such as pivoting and rotating are much more straightforward. Transparency The front-end, which the user is working with, does not disclose whether OLAP is being used. Accessibility The OLAP system should access only the data actually required to perform the indicated analysis and no t take the common kitchen sink approach which brings in unnecessary input [2]. Consistent reporting performance The user notices no change in the reporting performance, irrespective of the size of database or the number of dimensions. Client-server architecture It is imperative that the server component of OLAP tools be sufficiently intelligent such that various clients can be attached with minimum effort and integration programming [2]. Generic dimensionality Each dimension of the data is equal in composition and operations. Dynamic sparse matrix handling The tool adjusts to the particular model to give optimal sparse matrix handling. Multi-user support To be regarded as strategic, OLAP tools must provide concurrent access (retrieval and update), integrity, and security [2]. Unrestricted cross-dimensional operations The tool deduces the calculations to a certain extent, and does not involve specific definitions from the user. Intuitive data manipulation Direct access is available to the manipulations, without the need for menus. Flexible reporting The layout of data allows for groupings of dimension levels, displaying as few or as many dimensions as required. Unlimited dimensions and aggregation levels The OLAP tool accommodates at least fifteen to twenty dimensions of data. 5

Data Warehousing 2.2 Stages in Data Warehousing The stages of data warehousing are shown in Figure 2.2 below, from [4]. Back flushing DATA WAREHOUSE OLAP Cleaning Reformatting Databases Other Data Inputs Updates/ New Data Figure 2.2 The overall process of data warehousing DATA METADATA DSSI EIS DATA MINING 2.21 Designing a Data Warehouse The purpose of a data warehouse is to organi se large amounts of stable data for ease of analysis and retrieval. Data warehouse data must be organised to meet the purpose of the data warehouse, which is rapid access to information for analysis and reporting. 2.22 Data Cleaning Data cleaning involves che cking through the data to ensure that all data is present, correct and accurate. Any missing data needs to be recovered and entered, and if not possible the entire record may need to be deleted. Any incorrect data must be corrected at this stage of the process, before any calculations are carried out. All data must be in a consistent format, which will be agreed on by this stage. 2.23 Data Preparation Area Data to be used in the data warehouse must be extracted from the data sources, cleansed and formatted for consistency, and transformed into the data warehouse schema. According to [6], the data preparation area is a relational database into which data is extracted from the data sources, transformed into common formats, checked for consistency and ref integrity, and made ready for loading into the data warehouse database. After the initial erential load of a data warehouse, the data preparation area is used in an ongoing basis to prepare new data for updating the data warehouse. In most data warehouse systems, these ongoing operations are performed on a periodic basis, often scheduled to minimise performance impact on the operational data source systems. 6

Data Warehousing 2.24 Creating the Data Warehouse Database The data warehouse database can be created after the dat a warehouse schema has been designed. Tables need to be created for facts and dimensions, and indexes to be established on key fields in all tables. The data warehouse database schema is often quite simple. A single fact table connected directly to a number of dimension tables, forms a star schema This simple configuration may include unnormalised dimension tables Dimension tables may be left unnormalised for efficiency If dimension tables are normalised this leads to a snowflake schema In a snowflake schema dimension tables can be themselves connected to dimension tables In practice it is often most appropriate to normalise some dimension tables and leave others unnormalised This is sometimes called a starflake schema In a star schema, each dimension table has a single-part primary key that links to one part of the multipart key in the fact table. In a snowflake schema, one or more dimension tables are decomposed into multiple tables with the subordinate dimension tables joined to a primary dimension table instead of to the fact table. In most designs, star schemas are preferable to snowflake schemas because they involve fewer joins for information retrieval and are easier to manage. 2.25 Fact Tables Fact tables hold the quantitative data that is t o be queried. Each data warehouse includes one or more fact tables. A key characteristic of a fact table is that it contains numerical data (facts) that can be summarised to provide information about the history of the operation of the organisation. 2.26 Dimension Tables These are smaller tables and hold the data that reflects the dimensions of the data. 2.27 Using a Data Warehouse The traditional role of a data warehouse is to collect and organise historical business data so it can be analysed to assist management in making business decisions. 7

Data Warehousing 2.28 Maintaining a Data Warehouse Data warehouse collect and organise historical business data so it can be analysed to assist management in business decisions. To achieve this purpose, the data warehouse is created and initially loaded with the existing historical business data. It is then periodically updated with new data from operational data systems. Much of the effort in data warehouse maintenance is involved with updating the data in the data warehouse, adju presentation applications to incorporate new data. sting data 2.29 Data Marts A Data Mart is a subset of content drawn from a data warehouse [18]. It is normally used to support a particular set of business functions. It may be added to with content taken other warehouse sources. A data mart tends to contain data focused at the department level, or on a specific business area [12]. from 8

OLAP and Data Cubes 3 OLAP AND DATA CUBES 3.1 OLAP 3.11 Online Analytical Processing According to [4], the term OLAP describes the ana lysis of complex data from the data warehouse. It can be clarified as Tell me what happened, and why [18]. Patterns and irregularities are looked for in data and areas are explored to find the sources of the patterns and irregularities. OLAP is a too l designed to allow this kind of analysis of immense quantities of data in a data warehouse, and make it easier to do than in the past. OLAP tools can empower user -analysts to easily perform types of analysis which previously have been avoided because of their perceived complexity [2]. Operational Data Extraction Transformation Loading OLAP Business Users Data Warehouse Taken from [12], Figure 3.11 shows how OLAP fits into the process of data warehousing. Figure 3.11 The Stages of Data Warehousing 3.12 Multidimensional OLAP (MOLAP) MOLAP is a storage mode that uses a multidimensional s tructure to store a partition's facts and aggregations or a dimension. The data of a partition is stored entirely in the structure. 3.13 Relational OLAP (ROLAP) A storage mode that stores multidimensional structures using tables in a relational database. 3.14 Hybrid OLAP (HOLAP) HOLAP combines multidimensional data structures and relational database tables to store multidimensional data. In SQL Server, Analysis Services stores aggregations for a HOLAP partition in a multidimensional structure and stores facts in a relational database. 9

OLAP and Data Cubes 3.15 Data Representations OLAP uses multidimensional data representations, called cubes, to provide prompt access to data warehouse data. Cubes model data in the dimension and fact tables of the data warehouse and provide sophisticated query and analysis capabilities to the applications. Multidimensional models take advantage of inherent relationships in data to populate data in multidimensional matrices called data cubes [4]. Multidimensional models are perfect to repr esent hierarchical views in what is known as roll - up display and drill-down display. Roll-up display progresses up the hierarchy, grouping into bigger units along a dimension. A drill-down display offers the contrasting facility, providing a finer-grained [4] outlook, by moving down the hierarchy. 10

OLAP and Data Cubes 3.2 Data Cubes 3.21 Data Cubes Cubes are the main objects in OLAP. A cube is a set of data that is usually constructed from a subset of a data warehouse and is organised and summarised into a multidimensi structure defined by a set of dimensions and measures. A cube provides a user -friendly mechanism for querying data with quick and uniform response times. End users use client applications to connect to an Analysis server and query the cubes on the se rver. In most client applications, end users issue a query on a cube by manipulating the user interface controls, which determine the contents of the query. This spares end users from writing language queries. Precalculated summary data called aggregations provides the mechanism for rapid and uniform response times to queries. Aggregations are created for a cube before and users access it. The results of a query are retrieved from the aggregations, the cube s source data in the data warehouse, a copy of this data on the Analysis server, the client cache, or a combination of these sources. An Analysis server can support many different cubes, such as a cube for sales, a cube for inventory, a cube for customers, and so on. onal 3.22 Data sources A cube has a single data source. It can be selected from the data sources in the database or created during cube creation. A cube s dimensions must have the same data source as the cube, but its partitions can have different data sources. 3.23 Measures A cube s me asures are not shared with other cubes. The measures are created when the cube is created. A cube can have up to 128 measures. 3.24 Dimensions A cube s dimensions are either shared with other cubes in the database or private to the cube. Shared dimens ions can be created before or during cube creation. Private dimensions are created when the cube is created. Although the term cube suggests three dimensions, in SQL Server Analysis Services a cube can have up to 128 dimensions. 11

OLAP and Data Cubes 3.25 Partitions A singl e partition is automatically created for a cube when the cube is created. If using Analysis Services for SQL Server 2000 Enterprise Edition, after creating a cube, you can create additional partitions in the cube. These are storage containers for data an aggregations of a cube. Every cube includes one or more partitions. For a cube with multiple partitions, each partition can be stored in a different physical location. Each partition can be based on a different data source. Partitions are not visible to users; the cube appears to be a single object. d 3.26 Cube roles Every cube must have at least one cube role in order to provide access to end users. Cube roles are derived from database roles, which can be created before or after cube creation. Cube roles are created after cube creation. 12

The Data Warehouse 4 THE DATA WAREHOUSE 4.1 Background 4.11 Previous system Kimberly-Clark is a competitive organisation; trying to keep the major share they have in certain markets, and increase the share in other markets. To do this they wish to compare data about competitor companies with data about their own company to establish how best to market products against competitors. There is no previous system that allows this to take place. This is a new venture to gain a competitive advantage. 4.12 Limitations Various data about different company s products have been collected. At present, this is stored in several spreadsheets and therefore it is difficult to access the relevant information. It would be much better to store the data in one source. All users have Windows XP PCs (the company has recently updated). These run SQL Server 2000. There is no set deadline as this will be a new system, and does not replace any existing procedure. 4.2 Solution The solution is to use a da ta warehouse to store the data. This answer was chosen for several reasons. The system is to be used for decision support, to assist the marketing department. It is possible that Kimberly-Clark may wish to expand the system to incorporate other databases relating to other areas of the company. The data is over a wide range of time, not just current. It will be required to produce summaries for analysis. The data is originally from a wide range of unrelated sources in different formats. There is the p for a huge volume of data to be stored in the system. The data is all concerning the same otential area of discussion, competitor products for the marketing department. All these points make up the main principles of a data warehouse, therefore concludin solution. g that this is a suitable 13

The Data Warehouse 4.3 Data Cleaning Before anything could be done with the data, it had to be cleaned. This resulted in several products being removed from the fact table due to insufficient data. There was no point, for example, in st oring product information when there was no available price information as price is a major factor in the marketing of the product. This resulted in the absence of some of the Companies from the data: Kappler, Metsa Serla, Orvec (Arco) and Peter Grant. Consequently, these were not included in the initial data warehouse. They still remain, however, in the future information investigation in Chapter 7, as I am sure further product information from these companies will be stored in the future. 4.4 Table Design 4.41 Data Warehouse Tables The Details table is the Fact Table. The information is numerical and consists of many columns and a very large number of rows. The other tables are known as dimension tables. Figure 4.5 shows the defined join paths betwe en the fact and dimension tables. This demonstrates a snowflake schema as the dimension tables Product and Brand are then split to add the extra level dimension tables Group and Company. Fact Table Details (BrandId, ProductId, ProductCode, Description, Colour, Size, Ply, Units/case, Sheets/unit, Sheets/case, Sqmetre/unit, Sqmetre/case, Sheetwidcm, Sheetlencm, Metre/unit, Metre/case, Pence/sheet, Price/case, Pence/sqmetre, Photo, Sample) Dimension Tables Product (ProductId, Product, GroupId) Group (GroupId, Group) Brand (BrandId, Brand, CompanyId) Company (CompanyId, Company) 14

The Data Warehouse 4.42 Data Preparation Area Identical tables to those in the data warehouse were created. These were called Details Data Preparation, Group Data Preparation, Product Data Prepar ation, Brand Data Preparation and Company Data Preparation. This way, when the data is to be entered, it can be imported from an MS Excel worksheet into the appropriate preparation table. It can be easily checked to see that all data has been imported correctly without affecting the actual tables. Then the data can be copied to the final table. This will also be the case when adding new data. Details Data Preparation (BrandId, ProductId, ProductCode, Description, Colour, Size, Ply, Units/case, Sheets/unit, Sheets/case, Sqmetre/unit, Sqmetre/case, Sheetwidcm, Sheetlencm, Metre/unit, Metre/case, Pence/sheet, Price/case, Pence/sqmetre, Photo, Sample) Product Data Preparation (ProductId, Product, GroupId) Group Data Preparation (GroupId, Group) Brand Data Preparation (BrandId, Brand, CompanyId) Company Data Preparation (CompanyId, Company) Group GroupId Group Figure 4.5 Product ProductId Product GroupId Tables and Joins Details BrandId ProductId ProductCode Description Colour Size Ply Units/case Sheets/unit Sheets/case Sqmetre/unit Sqmetre/case Sheetwidcm Sheetlencm Metre/unit Metre/case Pence/sheet Price/case Pence/sqmetre Photo Sample Brand BrandId Brand CompanyId Company CompanyId Company 15

The Data Warehouse Group Company Product Brand Rollup moves up a hierarchy, grouping into larger units long that dimension: Given the average price for each Product type, rollup then averages to Group level. 4.5 Implementation SQL Server 2000 Enterprise Manager was used in the creation of the data warehouse. All the tables were created by inserting the column names, data type, data length and whether nulls were allowed. This technique is demonstrated in Figure 4.5 below with the Details fact table. Figure 4.51 Creating the Fact Table 16

The Data Warehouse This resulted in the list of tables shown in Figure 4.52 being produced. The highlighted tables are the data warehouse and the other tables are the data preparation area. Figure 4.5b Tables The Data was entered into the already created data preparation dimension tables, as in Figure 4.53, which shows the table Brand Data Preparation. This was carried out carefully ensuring each brand name was only entered once, a nd that it was entered with the corresponding CompanyId. The data to fill the Details Data Preparation fact table was imported from the MS Excel spreadsheet which contained all the cleaned data together. This is demonstrated in Figures 4.54 and 4.55. Once the data for all tables was entered and checked thoroughly, it was copied to the data warehouse tables. The data preparation tables were then emptied ready for adding new data in the future. 17

The Data Warehouse Figure 4.53 Entering Dimension Table Data Figure 4.54 Importing Fact Table Data 18

The Data Warehouse Figure 4.55 Importing Fact Table Data A Database diagram was produced to create the table joins. This diagram is shown in Figure 4.56. It shows all tables, the table column names, and how they are joined. Figure 4.56 Database Diagram 19

The Data Cube 5 THE DATA CUBE 5.1 Data Source The data source is the location of the tables to use in the data cube. In this case, the tables are stored in an SQL Server database. They are located in the database ctymgd, which is located on the server CSPCZ46. This was located as demonstrated in Figure 5.1. Figure 5.1 Data Source 5.2 Fact Table The fact table, Details, was chosen from those available in the database, as in Figure 5.2. Figure 5.2 The Fact Table 20

The Data Cube 5.3 Dimensions There are two dimensions to this cube: Company and Group as shown in Figures 5.35 and 5.36. All the cubes require the same dimensions, so these are created once, and shared with all cubes. A snowflake schema was used to minimise repetition of data; i.e. Companies and Groups were not repeated more than necessary. This was selected as shown in Figure 5.31. Figure 5.31 Schema This allowed more than one level of dimension table. The dimension tables were selected for each dimension as in Figure 5.32. Figure 5.32 Dimension Tables Then the joins were checked as in Figure 5.33. 21

The Data Cube Figure 5.33 Table Joins Finally, the levels were stated to give the structure of the dimension, as in Figure 5.34. Figure 5.34 Dimension Levels 22

The Data Cube 1 Company Brand Chicopee J-Cloth KO-TON Lavette Advantage Du Pont Easitex Masterwype Practik Protech Sontara Unbranded Wypclean Wypeasy Fort James Kittensoft Le Roll Lotus Nouvelle Reflex Serie 3000 Unbranded Kimberly-Clark Hostess Kimcel Kimtex Kimwipes Kleenex Kleenguard Scott Unbranded Workhorse Wypall Pendigo Unbranded Perini Unbranded SCA Spray Tork Unbranded Sentinel Labs Ltd. Unbranded Shiloh Primeguard Terinex Cloudsoft Duvet Figure 5.35 The Company Dimension 23

The Data Cube 2 Group Product Washroom Hand Towels Paper Roll Towels Folded Toilet Tissue Jumbo Toilet Tissue Small Roll Toilet Tissue Facial Tissues Dispensers Workplace 1 Ply Wiper 2 Ply Wiper 3 Ply Wiper 4 Ply Wiper 5 Ply Wiper 6 Ply Wiper 7 Ply Wiper 8 Ply Wiper Nonwoven Wiper Workwear Skincare Soap Figure 5.36 The Group Dimension 24

Terinex Duvet Cloudsoft Shiloh Primeguard Dolce Sentinel Labs Ltd Unbranded Unbranded SCA Tork Spray Perini Unbranded Company Pendigo Unbranded Wypall Workhorse Unbranded Kruger Unbranded Scott Kleenguard Kleenex Kimberly-Clark Unbranded Kimwipes Kimtex Serie 3000 Kimcel Reflex Hostess Figure 5.37 Company Hierarchy Fort James Du Pont Chicopee Nouvelle Lotus Le Roll Kittensoft Lavette KO-TON J-Cloth Wypeasy Wypclean Unbranded Sontara Protech Practik Masterwype Easitex Advantage

Skincare Soap Workwear Nonwoven Wiper 8 Ply Wiper 7 Ply Wiper Workplace 6 Ply Wiper 5 Ply Wiper 4 Ply Wiper Group 3 Ply Wiper 2 Ply Wiper 1 Ply Wiper Dispensers Facial Tissues Small Roll Toilet Tissue Figure 5.38 Group Hierarchy Washroom Jumbo Toilet Tissue Folded Toilet Tissue Paper Roll Towels Hand Towels

The Data Cube 5.4 Cubes All cubes can be explored by Group, Product, Company and Brand, or any combination of these. All data is the average, taken over the selected depth of dimension. This i s easy to compare by Company against Kimberly-Clark, as well as to drill-down to specific brands. 5.41 Case Data This gives all information regarding product cases to be compared. It shows the average units in a case and the average sheets in a case. Average Units per case Average Sheets per case 5.42 Price Data This gives all relevant information regarding product prices. It shows the average price per case, given in pounds sterling; the average pence per sheet of items that are sold in sheets; and the average pence per square metre. Average Price per case Average pence per sheet Average pence per square metre 5.43 Sheet Data This gives all information regarding sheets. Obviously this is only relevant to those product that are sold in sheets, all ot her products will be blank. It shows average sheets per unit as well as the average sheet width and length, in cm. Average sheets per unit Average Sheet width cm Average Sheet length cm 5.44 Measures Measures are the fields that are displayed, or are used to calculate the data that is displayed, from the fact table. They can be visible or hidden. Number of Products is worked out by using the text field Description is used, so that the fields are counted, rather than the figures used. Units per case and Sheets per case are taken from the fields Units/case and Sheets/case, as shown in Figure 5.44. The measures Price per case, pence per sheet, pence per square metre, sheets per unit, Sheet width cm, Sheet length cm are all done in the same way. 27

The Data Cube Figure 5.44 Measure 5.45 Calculated Members Calculated members are fields that are calculated from the members. As shown in Figure 5.45, Average Units per case is calculated by dividing the total Units per case by the total Number of Products. This is carried out in the same way for all the Average values. Figure 5.45 Calculated Member 28

The Data Cube 5.5 OLAP At present, MOLAP has been chosen as a storage mode. The facts are stored entirely in a multidimensional structure to be explored by the user. This is shown being chose n in Figure 5.51. Figure 5.51 Data Storage Next the aggregation options are chosen before the cube can be processed, as in Figure 5.52. Figure 5.52 Aggregation 29

The Data Cube Finally, the cube can be processed, so as the data is up to date. This is shown in Figure 5.54. When the user explores the data, it will be as it was the last time the cube was processed. The data is stored in the structure, without having to be recalculated each time. When the data in the data warehouse is updated, it is necessary to re -process the cube for the data in the data cube to be updated. Figure 5.54 Cube Processing Finally, Figure 5.53 gives a view of the tree that is used as navigation when creating and editing the cubes in Analysis Manager. Figure 5.53 Tree 30

Testing 6 TESTING 6.1 Data Analysis The data was analysed to give a clearer idea of the type of data stored and in which areas. This aids the designing of the testing stage. The testing will therefore ensure that all areas of data have been explored in sufficient detail and tested in a relevant manner. 6.11 Washroom KC SCA Fort James Kruger Pendigo Terinex Hand Towels Paper Roll Towels Folded TT Jumbo TT Small Roll TT Facial Tissues 6.12 Workplace KC SCA Fort James Kruger Chicopee Du Pont Shiloh 1 Ply Wiper 2 Ply Wiper 3 Ply Wiper 4 8 Ply Wiper Nonwoven Wiper Workwear 6.13 Skincare KC SCA Fort James Soap 31

Testing 6.2 Testing System 6.21 Test 1 Check that all data is present and correct. This was checked during data cleaning, but should be tested again to ensure that all values from the data warehouse are present in the data cube. For each cube, all Products and all Brands should be checked to guarantee that the data is present where it is supposed to be and that it is in the correct form. 6.22 Test 2 Check that the designed cubes have a useful purpose. Case Data The Case cube gives all information regarding product cases to be com pared. It shows the average number of units in a case and the average number of sheets in a case. Price Data The Price cube gives all relevant information regarding product prices. It shows the average price per case, given in pounds sterling; the averag e pence per sheet of items that are sold in sheets; and the average pence per square metre. Sheet Data The sheet cube gives all information regarding sheets. Obviously this is only relevant to those product that are sold in sheets, all other products will be blank. It shows the average number of sheets per unit as well as the average sheet width and length, in cm. 6.23 Test 3 Check that the visible measures and calculated members are suitable for the purpose. The average number of units Case Data Average number of units in a case Calculated by: Units per case / Number of Products This demonstrates how many units are sold at a time. Average number of sheets in a case Calculated by: Sheets per Case / Number of Products This gives an idea of how many sheets are in each unit. 32

Testing Price Data Average price per case Calculated by: Price per Case / Number of Products This value allows the user to compare the prices of cases, which is how the product will be sold. Average pence per sheet Calculated by: Pence per sheet/ Number of Products This allows the comparing of the prices by individual sheet, which may give clues on how tiny profit margins are made. This regards sheet size. Average pence per square metre Calculated by: Pence per square metre / Number of Products This shows the prices buy square metre, regardless of sheet size. Sheet Data Average number of sheets per unit Calculated by: Sheets per unit / Number of Products This demonstrated how generous the company are in the number of sheets in each roll or pack. Average sheet width, in cm Calculated by: Sheet width / Number of Products This shows how wide or narrow each roll or wiper is. Average sheet length, in cm Calculated by: Sheet length / Number of Products This value shows how long each sheet or roll is. This value depends very much on whether the roll is perforated, as some products are sold in one long roll. For example, the Kimberly - Clark brand Wypall produces long single piece rolls the Workplace group which is not visible until the Companies are bro ken down into Brands. This can be easily compared to brands selling similar products, such as SCA Tork which also produce long single piece rolls. On average, the Tork rolls are slightly longer but definitely narrower than Wypall. 33

Evaluation 7 EVALUATION 7.1 Introduction The initial set of data was taken from three separate worksheets in a spreadsheet. The cleaned data is shown in APPENDIX B Cleaned Data. There was not a very large amount of data to begin with, which would be usual for the use of a data warehouse. This was due to the testing of the system, and the small amount of data that had been collected at that time. However, there is the potential to add huge amounts of data concerning many different competitor companies. The data is to be collected us ing the sources suggested in the Investigation into future data sources. The data will be added to at regular intervals as new products are introduced to the data sources. 7.2 Fulfilling the Objectives 1. Obtain a good understanding of Data Warehouses and Online Analytical Processing I found it was not easy to find available information concerning data warehousing, as this subject is still a relatively new one. There are several books on order in the university library which are not yet available. The in formation has been collected from a number of sources, both online and books and summarised clearly and logically in Covered in Chapters 2 and 3. 2. To analyse what information will be required from the data This task was left primarily to myself. Kimberly -Clark had no clear ideas of what was required from the data, and allowed me to analyse this myself. I decided on the key areas to be looked into, at present these are Case data, Price data and Sheet data. These can easily be added, developed or altered at a later date. 3. To design the tables in SQL Server, create a Data Preparation Area and the tables This was straightforward, using the knowledge already gained from university modules and work experience. 4. To clean the data, input it into the Data Preparation area, and then copy it to the tables The data took time to clean, as there was much to go through. As some products had little data available for them, several had to be removed entirely. This lead to a few companies not being included, but these can easily be added in the future. 34

Evaluation 5. To gain access to the Analysis Services to create a data cube from the tables This was a challenge, as in university Analysis services can only be accessed via one server. Luckily I was allowed access to this server to complete my data cube. 6. To use OLAP to extract the relevant data from the cube Once, the cubes had been designed, they were processed. MOLAP (Multidimensional OLAP) was chosen at present to allow browsing the data in the multidimensional structure. This can be easily changed just by changing the storage options. ROLAP may be used to store the data in tables in the database. However, HOLAP would allow the data to be stored in the cube, as well as relevant data to be extracted from tables in the database for publishing on the company intranet. 7. To Test the system and analyse the data All data was studied. Everything seemed to be present and in the correct format. This topic is studied in Chapter 6. 8. To investigate where to obtain data in the future This is discus sed in detail in Chapter 8. This was difficult to conclude at present, however sources other than the company websites would be a good suggestion. 7.3 Further Work A front end could be applied to the system to make it easier for use by the Sales and Marketing departments, who do not have knowledge of SQL Server. This could be done using ASP pages on the company intranet, as with other company systems. The results of the queries gained by using HOLAP are stored in tables in the database. These tables ca easily be linked to the ASP pages to display the results as required. By using HOLAP, the data is still stored in a multidimensional structure to be explored if desired. More cubes could be added using the current dimensions. These would explore other areas of the data which may become relevant. Finally, this data warehouse could become one data mart from a larger data warehouse with other data marts to show information from other areas of the company. n 7.4 Conclusions I feel I have learned a great deal from this project. I now feel very confident using SQL Server to create data warehouses. I have also grasped the concepts of OLAP and data cubes which I found to be a great challenge. It is very difficult to understand the concept by reading textbooks and theory on the subject. It is not until practical examples are carried out that an understanding is gained. 35

Investigation 8 AN INVESTIGATION INTO FUTURE DATA SOURCES 8.1 Explanation Obviously, the competitor data already stored will go out of date, and will need to be replaced periodically. This means that a future source of information will be required. For each of the competitor companies, the website was examined. This was done to research if the site may provide a relevant data source in the future. 8.2 Results Company Chicopee Du Pont Fort James Website www.chixtowels.com www.dupont.com www.fortjames.com Now www.gp.com Data The information is all clearly laid out in a table The data can be viewed under several categories: Wiping solutions, Dusting solutions and Proprietary solutions The product information only seems to be available for products sold in the United States Now Georgia-Pacific General information about all Wipers or Napkins etc. is available Soap has a table of colours and size available Kappler www.kappler.com A product catalogue is available Kruger www.kruger.com The site just gives a general overview of how the company works There is no specific product information 36

Investigation Metsa Serla Orvec Pendigo Peter Grant SCA Shiloh www.metsaserla.com www.metsatissue.com www.orvec.com www.pendigo.com www.hygiene-supreme.co.uk www.sca.com www.scahygiene.com www.shiloh.co.uk The site links to the metsa tissue site This gives an overview of how the company runs with processes and sales figures There is no specific product information The site gives an overview of why the products are good There is no useful information available The information on Hand towels would be available However, the link does not seem to be working The information on other products is clear and easy to access The Peter Grant site is linked from this site The site gives an overview of what products are available There is no specific colours and dimensions (Svenska Cellulosa Aktiebolaget) There is an e-catalogue available for all Tork and Tena products This gives all details of information required A detailed specification of each product range is available Not all information is available Terinex www.terinex.com The website does not seem to be working 37

Investigation Collecting field data Sources: Sales force, Engineering staff, Distribution channels, Suppliers, Advertising agencies, Personnel hired from competitors, Professional meetings, Trade associations, Market research firms, Reverse engineering, Security analysts, etc. Compiling the data Collecting published data Sources: Articles, Newspapers in competitors locations, Want ads, Government documents, Speeches by management, Analyst s reports, Filings to government and regulatory agencies, Patent records, Court records, etc. Options Clipping services for information about competitors, Interviewing individuals who come into contact with competitors, Forms for reporting competitors key events to a central clearing house, Required regular situation reports on competitors by selected management. Cataloguing the data Options Files on competitors, Competitor library and assigned librarian or competitor analysis coordinator, Abstracting of sources, Computer cataloguing of sources and abstracts. Digestive analysis Options Ranking data by the reliability of the source, Summaries of the data, Digests of competitors annual reports, Quarterly comparative financial analyses of key competitors, Relative product line analysis, Estimation of competitors cost curves and relative costs, Pro-forma financial statements on competitors under different scenarios about the economy, prices, and competitive conditions. Communication to strategist Options Regular compilation of clippings to key managers, Regular competitor newsletter or situation reports, In-depth, perpetually updated reports on competitors, Competitor briefings in the planning process. Figure 8.1 Porter model Competitor analysis for strategy formulation Porter (1980) emphasized the importance of knowledge about competitors positions and possible strategies. He advocated a competitor information system (Fig. 8.1) that would enable information to be collected, consolidated and made available. 38

Investigation 8.3 Discussion The SCA website would be suitable to use as a source for all the information required to put in the data warehouse in future. It provides a clear catalogue from which the data could easily be taken. The Fort James website would not be adequate to use as an information source. The only suitable data available is regarding soap, which is not of use at this time. The Kruger and Metsa Serla websites would be of no use to find information. Neither giv es any product information. The Pendigo website may be of use as a data source. If the website is fixed, so that the Hand Towel links work properly, then sufficient information will probably be available. The Peter Grant website does not really give sufficient information to use. Maybe colours and size could be extracted, although this would not be straightforward and may take some time. At this time, the Terinex website does not appear to be working. No images appear on the home page, and the links caus e error messages to be displayed. Therefore, at least at present, it is not possible to use this sight as a source. The Chicopee website is quite a good source of information. All the data is laid out in a table which can be searched through under three different categories. This makes data retrieval quicker and easier. The Du Pont website only seems to cater for products which are sold in the United States. There are different European links, but these only seem to give contact details for employees in different European countries. This site will also be of no use for this purpose. The Shiloh website gives select product information, although does not provide details of all products. This may be of use, but it must be remembered that not all products are covered. The Kappler website provides a product catalogue. This site will therefore provide fairly easy access to the relevant information. As the Orvec website does not hold any useful information, it will be of no use for future data. 8.4 Conclusions Adequate access to data SCA, Chicopee, Kappler 3 Some of the required information Fort James, Peter Grant, Shiloh 3 No relevant data Kruger, Metsa Serla, Du Pont, Orvec 4 Data cannot be accessed Pendigo, Terinex 2 39

Investigation Since only half the websites (six out of twelve) provide access to any appropriate information, this does not seem a very reliable source of future data. However, as some sites provide all the relevant information laid out clearly, these sites may be used for these specific competitor companies. Since all the information is available over the internet, there is not legal reason why Kimberly-Clark cannot use it. 8.5 Suggested Future Sources The model by Porter should be further studied to give suggestions as future data sources. As an o utsider to the company, I do not have knowledge as to the availability of these data sources, and only have easy access to the websites. As in the results and discussion above, three to size of these websites will be useable as data sources. However, I f eel that as time progresses, the other companies will improve their websites, creating e -catalogues and allowing customers easy access to product information. At present other sources should bee sought to increase the information available for storage in the data warehouse and analysis in the data cube. 40

User Guide 9 USER GUIDE 9.1 Introduction The Data warehouse tables are stored in SQL Server and can be accessed via SQL Server Enterprise Manager. Here the Highlighted tables store the data, and the non -highlighted user tables are for adding new data. 41

User Guide 9.2 Operation The data cube is accessed via Analysis Services Analysis Manager. There are three cubes: Case Data This shows information concerning product cases. Price Data This shows information concerning product prices. Sheet Data This shows information concerning product sheets; where products are sold in sheets. To view the data, click on the name of the chosen cube in the tree on the left -hand side. Then double click on the Data tab on the right-hand side. This will show the relevant data by the two dimensions Company and Group. 42

User Guide Dimensions The Company dimension can be expanded to show each individual Brand. 43

User Guide The Group dimension will initially show data for all Groups as totals. This can be e xpanded to each individual Group by using the pull down menu, as shown. Then, into individual Product types. 44

User Guide Example If the cube Sheet Data is selected, there is a great deal of variation in the sheet length field. It is necessary to expand the Co mpany, Kimberly-Clark to show the Brand Wypall, which shows an average sheet length a great deal larger than the other brands. This is because Wypall produce rolls with one long wiper, rather than with perforations or individual wipers. This can be easily compared with a similar company such as SCA. When this is expanded, it is clear that Tork produce similar products. It can be seen that Tork produce longer rolls, however, these are narrower than the Wypall rolls. This shows a helpful example of what can be discovered and compared in the data, and is illustrated in the image below. 45

User Guide 9.3 Processing a Cube Whenever data is updated in the data warehouse or new data is added the cube will need to be reprocessed. This has to be carried out though Analysis Manager. Then right click the cube to be processed. Select the option to Refresh the data and continue. The cube will be processed, displaying messages similar to those shown below. 46

User Guide 9.4 Maintenance 9.41 Adding new data Before a new product can be added, it must be ensured that it corresponds to an existing Product type (i.e. the ProductId exists) and Brand (i.e. the BrandId exists). A number of products can be added at once as long as they are in a MS Excel spreadsheet using the standard template as in Appendix C. This task must be carried out in SQL Server Enterprise Manager. In the correct database, on the right-hand pane, right click on Tables and selected the option under All Tasks to Import Data. The Import Wizard should begin, as below. S elect the Data Source as the relevant version of Microsoft Excel, and browse the files to locate the correct spreadsheet. Then Click the Next button. Then the destination is to be chosen, this should already be filled in with the correct database, check that it is. Then Click the Next button again. 47

User Guide Check the option to Copy the table and view from the source database, and click the button again. Next Select the source Worksheet which contains the data to be imported. Then Select the Details Da ta Preparation table as the destination. You can select to preview the data to check that all data is shown in the correct columns. Then proceed by selecting the Next option and finally the option to Finish. 48

User Guide The data will be imported into the table De tails Data Preparation, which will look something like similar to the diagram below when opened. Check that all data is present, in the correct columns and does not need editing. Highlight all rows in this table and Select the option to Copy. Then op en the table Details and highlight the empty bottom row and select the option to Paste. The new products are now stored in the data warehouse. When new data has been added, it will not be included in the data cube until the cube is reprocessed. This is explained in section 9.3 of this chapter. 49

User Guide 9.42 Updating data To edit dimension table data, just open the table and edit it. To delete a record, highlight the row and select delete. To add a new record, first open the correct data warehouse table, such as Product below. Check the last Id in the table, which here is 18. Next, close this table and open the corresponding Data Preparation table, here Product Data Preparation. Enter the next number after the last Id, here it would be 19. Enter the ot her data, checking that any other Ids correspond to the correct data. Here, check that GroupId 2 is the correct Group for Disposable Wipers. Then check the data is correct. Copy the row, or rows and paste them to the end of the correct data warehouse tables. Then empty the Data Preparation tables ready for any new data. 50

Appendices 10 APPENDIX A REFLECTION Introduction Overall, I am very pleased with the progress made during the project. I was unsure about carrying out such a large project, as I have only work before. experienced much smaller scale practical Good Aspects I feel I researched the subject areas in great detail. I have no prior knowledge of the topics of data warehousing or Online Analytical processing, so everything I read was completely new to me. I learned as a combination of reading around the subject and experimenting practically using the SQL Server tools. Poor Aspects I planned my time to some extent, however I feel this was the downfall of the project. This project tended to take a back seat to other final year coursework, as there was always such a long time to the completion date. However, due to experimenting, and extensive research the creation of the data warehouse and the data cube took substantially less time than expected. Communication with Kimberly -Clark was not always as easy as it should have been. I did not always receive replies to e -mails promptly. Also, due to university holidays, I did not always receive messages immediately from Kimberly-Clark. However, I was left to work very independently, doing the Analysis and making decisions myself, so this was not crucial. 50

Appendices 11 APPENDIX B CLEANED DATA In the printed report this contains the Cleaned Data from an MS Excel Spreadsheet. 51

Appendices 12 APPENDIX C STANDARD MS EXCEL TEMPLATE The Column headings should be as follows: BrandId ProductId ProductCode Description Colour Size Ply Units/case Sheets/unit Sheets/case Sqmetre/sheet Sqmetre/unit Sqmetre/case Sheetwid Sheetlen Pence/sht Price/case Pence/sq.m Photo Sample 81

Appendices 13 APPENDIX D DATA CUBE In the printed report this contains the Data Cube from an MS Excel Spreadsheet. 82

References 14 REFERENCES [1] Chicopee, (2001), Chicopee Price List, URL: http://www.chixtowels.com/website/index.html, [December 2001]. [2] Codd, E, Codd, S, Salley, C, (1993), Providing OLAP (On-Line Analytical Processing) to User Analyst: An IT Mandate, URL: http://www.essbase.com/main.asp?webpagekey=147, [April 2002]. [3] DuPont, (2001), DuPont, United Kingdom, URL: http://www.dupont.com/corp/overview/worldwide/country_unitedkingdom.html, [December 2001]. [4] Elmasri, R, Navathe, S, (1997), Fundamentals of Database Systems, Third Edition, Addison-Wesley. [5] Fort James, (2001), Georgia-Pacific Product Catalog, URL: http://www.gp.com/products/index.html, [December 2001]. [6] Inmon, W, (1992), Building the Data Warehouse, Wiley. [7] Inmon, W, Imhoff, C, Sousa, R, (1998), Corporate Information Factory, Wiley. [8] Kappler, (2001), Kappler, URL: http://www.kappler.com, [December 2001]. [9] Kimberly-Clark, (2002), Kimberly-Clark, URL: http://www.kimberly-clark.com, [April 2002]. [10] Kruger, (2001), Kruger Business to Business, URL: http://www.kruger.com, [December 2001]. [11] Metsa Serla, (2001), Metsa Tissue, URL: http://www.metsatissue.com/frameindex.html, [December 2001]. [12] Microsoft, (2002), SQL Server Books Online, School of Computing, University of Leeds. 84

References [13] Micosoft, (2002), MSDN Library, URL: http://www.msdn.microsoft.com/library/, [14] [April 2002]. [15] Orvec, (2001), Orvec, URL: http://www.orvec.com, [December 2001]. [16] Pendigo, (2001), The Pendigo Product Catalogue, URL: http://www.pendigo.com, [December 2001]. [17] Peter Grant, (2001), Hygiene Supreme, Peter Grant Paper, URL: http://www.hygiene-supreme.co.uk/peter_grant/peter_grant.html, [December 2001]. [18] Poe, V, Klauer, P, Brobst, S, (1998), Building A Data Warehouse for Decision Support, Second Edition, Prentice Hall. [19] Roberts, S, (2002), DB32: Knowledge Management, Course Notes, School of Computing, University of Leeds. [20] Simon, A, Shaffer, S, (2001), Data Warehousing and Business Intelligence for E-Commerce, Morgan Kaufmann Publishers. [21] Shiloh, (2001), Shiloh Products, URL: http://www.shiloh.co.uk/opener.html, [December 2001]. [22] Svenska Cellulosa Aktiebolaget, (2001), Tork, URL: http://www.scahygiene.com/articles/entrance.asp, [December 2001]. [23] Terinex, (2001), Terinex, International Ltd., URL: http://www.terinex.com/en-prod-serv.shtml, [December 2001]. 85