Why Business Intelligence Ferruccio Ferrando z IT Specialist Techline Italy March 2011 page 1 di 11
1.1 The origins In the '50s economic boom, when demand and production were very high, the only concern of the companies was to introduce the product to as many customers as possible : was born the mass marketing. The '70s were a decade marked by strong social conflicts and economic crisis, where demand starts to shrink and companies must understand that to sell turn more directly to their customers, they speak of direct marketing and target. During the decade after the market starts to saturate, and the attention moves progressively from product to customer through a greater focus to the process of differentiation. At the end of the 80s watchword is quality, and on this play the competitiveness of companies try to give each customer name and a face. The decade of the '90s saw the birth database marketing and containers computer that stores the personal data of customers and their business paths. The companies include that, to appeal to consumers who are increasingly sophisticated and demanding, it is necessary to acquire and process information and technology. The Industrial Society has finally passed to the Information Society, in which new approaches to marketing and operations aim to create a relationship with each client to meet and / or anticipate his needs. The term Business Intelligence was coined in 1989 by an analyst of Gartner Group to represent a class of tools and applications to solve the problems of corporate informations, especially the problems of reporting. The BI groups now all disciplines to make decisions, the supply of Datawarehouse, publication of information (on the Internet or other support), Applications of Data Mining and frontal analysis, previously called InfoCentre. The BI tools also includes Decision Support, the category of computing products designed to support management activities, but includes a subset of simpler tools, aimed directly at the managers who have a thorough knowledge of the computer. We can say that BI is analysis of the phenomena of organizations through inspection of the data information system to draw useful indicators to support strategic decisions, where the organization can be the customer or the company itself. page 2 di 11
1.2 The limits of relational model Each organization has an immense asset in terms of data, often only partially exploited. From this wealth of information is necessary to draw, in a flexible and fast way, useful information to management.the organization of relational data "in normal form, " is suitable for applications like transactional (OLTP) but not for analytical processing (OLAP) because User queries can lead to JOIN cascade between tables, performance may be unacceptable if the analysis involves a huge amount of data. The differences between these two transational activities are : with OLTP (On Line Transaction Processing) transactions are pre-defined and short-term transactions read and modify a few records data are detailed, current and recent data reside on a single database with OLAP (On Line Analytical Processing) : questions are complex and random processes read a huge number of records aggregate data are historic data can come from multiple databases page 3 di 11
1.3 The new model The complexity of the directional informations, the reports required by managers and executives, comes from their three key features: Timing : refers to a historical period, consolidated Aggregation : managements interprets the evolution of a phenomenon not taking into account the elementary events but with summary data : averages, trends, histograms Multidimensionality : data are examinated by a series of viewpoints. In the Business Intelligence terminology, data are called "Facts" and the viewpoints are called "Dimensions" in this new model of data is introduced the concept of n-cube or n-dimensional cube, which is a set of n-dimensional matrix made up of cells within the which aggregate data are placed at different levels of detail. This is a schema: page 4 di 11
- Dimensions : in the metaphor of the n-dimensional cube, the dimensions are the axes (X, Y, Z,...>> Size, Color, Month,...) - Facts : they are always numeric and represent measurements of phenomenon to be explored, through metrics appropriate. The facts are number virtually placed inside the n-cubic cells, and are the result of aggregation of the measure of phenomenon at coordinates. In the figure, the highlighted cell contains the value 30 which represents the aggregate amount of product sold at coordinates: size = 42, Color = Green, Month = Feb A set of tool to manage these kind of data must have the following features : Ability to integrate heterogeneous data sources Definition of extemporaneus questions from final user Multidimensional interaction with the indicator Optimized performance Security Management Distribution for required informations, via e-mail or Web page 5 di 11
We can show a reference architecture where the core of the system is the Data Warehouse : Metadata - describe the structure of the DW and have: - A description of tables and fields in the warehouse, including data types and the range of acceptable values. - A similar description of tables and fields in the source databases, with a mapping of fields from the source to the warehouse. - A description of how the data has been transformed, including formulae, formatting, currency conversion, and time aggregation. - Any other information that is needed to support and manage the operation of the data warehouse. page 6 di 11
About the DW we have to introduce the Data Mart concept; it is a database that has the same characteristics as a data warehouse, but is usually smaller and is focused on the data for one division or one workgroup within an enterprise. For example it could be the Marketing DB. In the data warehousing field, we often hear about discussions on where a organization's philosophy falls into Bill Inmon's camp or into Ralph Kimball's camp. The difference between the two are: Bill Inmon's paradigm: Data warehouse is one part of the overall business intelligence system. An enterprise has one data warehouse, and data marts source their information from the data warehouse. Ralph Kimball's paradigm: Data warehouse is the conglomerate of all data marts within the enterprise. Each model has its advantages and disadvantages which are : Inman' s model : + Integration, Data coherence - It is an heavy process Kimball' s model : + Greater autonomy, Flexibility - Lack of alignment between Data Mart The Data Mining 'consists of techniques to automatically search for the samples in large data archives, using computational techniques derived from statistics and pattern recognition. DSS, Decision Support Systems, are a class of information systems that support decision-making. EIS, Executive Information Systems, are systems to facilitate and support the needs of senior managers in decision-making by providing easy access to internal and external information relevant for achieving the strategic objectives of the company. They are often regarded as a specialized form of DSS.. page 7 di 11
1.4 Multidimensional Model We have already seen that the multi-dimensionality 'of the new model is derived from the points of view from which the data are examined. A multidimensional cube 'is based on a fact of interest for decision making, it represents a set of events, described in a quantitative manner by numerical measures. For example, consider the sales of certain products. The dimensions under which sales are analyzed are: product, time, customers : But the dimensions of analysis may be more than three. For example, the sales could be analyzed considering the agents that have engaged a negotiation: in this case we have an hypercube : page 8 di 11
To access the data of a sale you need to specify the coordinates, that' s the values for the dimensions of analysis. To reference the sale of May 16, 2010, 'car article, customer John BB' can be selected from 'hypercube only a portion of the data they have in place. If each of the dimensions you specify a precise value, then in 'the hypercube will be' found a single cell or a single fact, which in this case identifies a sale. 1.4.1 Operations on Multidimensional Data On these new kind of data, are allowed the following operations : ROLL UP : aggregates data at a higher level. It is 'the dual operator of the drill down as you go up one or more'dimensions. Example: from the analysis of a particular product you can' move to the analysis of a full range of products. DRILL DOWN : disaggregates the data introducing a higher detail. It si the operator which allow to go into detail of one or more dimensions. Example : you'can move from an analysis of sales by county in a more detailed one by city. DRILL ACROSS : combines the data associated with more facts SLICE & DICE : selects and projects the n-cube onto a plane PIVOT : reorients the cube (Introduce, remove, move dimensions) page 9 di 11
Roll Up - from Month to Quarter : Roll Up - from Region to State : page 10 di 11
Slice and Dice - selection by Category = electronic ; Profit > 80 ; Year = 1997 : page 11 di 11