Business Intelligence 1. Introduction September, 2013.
The content of the first lecture Introduction to data warehousing and business intelligence Star join 2
Data hierarchy Strategical data Operational planning data Operational data 3
The gap between existing data, information and knowledge about the business There are more and more data and information, but there is not enough time for their analysis Enterprises are overloaded with data, but there is not enough information to make decisions It is necessary to establish the processes that will collect data and transform them into information or knowledge 4
The gap between the data collected and used (symbolic image) The amount of data The data used in decision making Analyzed data Available data Knowledge gap Execution gap 1960 1970 1980 1990 2000 2010 Time 5
The relational model can not provide the rich analytical capabilities that modern businesses require Edgar F. Codd (1994.): "Attempting to force one technology or tool to satisfy a particular need for which another tool is more effective and efficient is like attempting to drive a screw into a wall with a hammer when a screwdriver is at hand: the screw may eventually enter the wall but at what cost?" 6
The basic idea Database Data Warehouse 7
The basic idea - time-space trade off In computer science, time-space trade-off (compromise, balancing the consumption of time and space) is a situation in which memory consumption can be reduced at the expense of slower performance of the program and vice versa. Memory/disk consumption Faster query times 8
Once... 5MB IBM hard disc 1956. g. >1 t 1980. g. 1GB, IBM 3380 oko 250 kg $40,000. 9
Data Warehouse is A copy of transaction data specifically structured for query and analysis.(r. Kimball) A single, complete and consistent source of data obtained from a variety of sources and made available to end users in a way that they can understand and use in a business context. (B. Devlin) A data warehouse is a subject oriented, integrated, non volatile, time variant collection of data designed to support management's decision support needs. (B. Inmon) ZPR FER Zagreb Business Intelligence 2013/2014 10
... subject oriented... Information systems are organized around applications eg. Insurance company's IS: Car insurance Health insurance Life insurance etc. In a DW, data is organized around major "subjects": Customer Insurance policy etc. Provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision support process 11
... Integrated... Data collected from multiple heterogeneous sources (relational databases, flat files, Internet,...) Cleaning and normalizing data (formats, nomenclatures, units of measurement, data types,...) m, ž 0, 1 m, f m, f male, female 12
... non volatile... The data in the data warehouse are (conditionally speaking) non volatile What to do with the errors (changes) in a transactional system? INS SEL DEL SEL UPD SEL INS transactional system (incremental) loading DW SEL 13
... time variant... All records have a timestamp Each unit of data in the data warehouse is correct (true) from a certain point in time Some records have a timestamp transactions, some are assigned a timestamp at the warehouse loading "a series of layers" Data are almost always analyzed in the context of time:: Current month's sales Quarterly sales compared to the previous year's quarterly sales etc. sales "of all times" mostly pointless 14
Differences between transactional systems and data warehouses (1) Transactional System Holds current data Detailed data Volatile date High transaction frequency Foreseeable usage patterns Oriented on daily operations and management of the business system DW Holds current and historical data Detailed and aggregated data (conditionally) Non-volatile data Medium to low transaction Frequency Unforeseeable usage patterns Oriented on data analysis 15
Differences between transactional systems and data warehouses(2) Transactional System Support for daily, operational decisions Supports a large number of operational users Extremely important availability The emphasis on data storage DW Supports strategic decisions Serves a small number of users, typically - decision makers Less important availability The emphasis on the information acquisition 16
A significant difference between transactional systems and data warehouses is the granularity of the data 17
Business Intelligence (1) The typical organization analyzes only "10%" of the data collected BI is a way to take advantage of the "remaining 90%" Data is a strategic asset Converting data into information Manage business information BI is not a product, it is a concept "business" should be taken in a broader sense, for example, education is a business The aim is to improve the business 18
Business Intelligence (2) Business intelligence (BI) is an umbrella term that includes the applications, infrastructure and tools, and best practices that enable access to and analysis of information to improve and optimize decisions and performance. Gartner Definition implies: BI is more than just tools - without the appropriate procedures and people tools themselves are of little worth The value of BI-I is realized in the profitable business decisions - if knowledge is not used in that sense, the procedures themselves are of little worth 19
Business Intelligence (3) BI includes a wide range of applications and technologies for gathering, storing, analyzing, and sharing information in order to make better business decisions BI applications include: decision support systems querying and reporting OLAP statistical analysis forecasting Data mining 20
Introduction to dimensional modeling - Star Join Poslovna inteligencija 21
In DW, data is stored in the dimensional model Dimensional model presents data in a simple, intuitive format that allows for efficient querying "Make everything as simple as possible, but not simpler" A. Einstein Dimensional model is not normalized Two models: Star Join Snowflake 22
Zvjezdasti model dimension dimension dimension keys dimension measures dimension fact table 23
Zvjezdasti model Dimensional and fact table are always always in a 1:N relationship ddate iddate day month year... 1 N N fexam iddate idstudent idteacher idcourse grade haspassed(0,1) N N 1 dstudent idstudent firstname lastname... dteacher idteacher firstname lastname... 1 1 dcourse idcourse coursename ECTS... 24
Fact table Fact table corresponds to a process being monitored in the data warehouse DW can have N fact tables Comprised of two sets of numerical attributes: Dimension table keys Measures Normalized (or nearly normalized - sometimes has derived attributes, such as price and pricevat) Often it is (almost) identical to the corresponding table in a relational database (but with substituted keys) A large number of records (10 5, 10 6, 10 7,...) One row (tuple) does not take up much space (normalized table, numerical attributes) 25