DATA WAREHOUSE E KNOWLEDGE DISCOVERY
|
|
|
- Solomon Robbins
- 10 years ago
- Views:
Transcription
1 DATA WAREHOUSE E KNOWLEDGE DISCOVERY Prof. Fabio A. Schreiber Dipartimento di Elettronica e Informazione Politecnico di Milano
2 DATA WAREHOUSE (DW) A TECHNIQUE FOR CORRECTLY ASSEMBLING AND MANAGING DATA COMING FROM DIFFERENT SOURCES TO OBTAIN A DETAILED VIEW OF AN ECONOMIC SYSTEM IT IS AN INTEGRATED PERMANENT TIME VARIANT TOPIC ORIENTED DATA COLLECTION TO SUPPORT MANAGERIAL DECISIONS IT IS THE SEPARATION ELEMENT BETWEEN OLTP AND DSS WORKLOADS Fabio A. Schreiber data warehouse 1
3 ON LINE TRANSACTION PROCESSING (OLTP) TYPICAL OF ELECTRONIC DATA PROCESSING (EDP) APPLICATIONS TRANSACTIONS MUST POSSESS ACID PROPERTIES STRUCTURED AND REPETITIVE SHORT AND ISOLATED DATA MUST BE DETAILED AND UP-TO-DATE ACCESS TO DATA IS MAINLY MADE THROUGH THE PRIMARY KEY DATABASE SIZE VARIES BETWEEN 10 2 MBYTE AND 10 GBYTE TRANSACTION THROUGHPUT IS THE MAIN PERFORMANCE METRICS Fabio A. Schreiber data warehouse 2
4 ON LINE ANALYTICAL PROCESSING (OLAP) TYPICAL OF DECISION SUPPORT SYSTEMS WORKLOAD IS CONSTITUTED OF VERY COMPLEX QUERIES ACCESSING MORE THAN 10 6 RECORDS DATA ARE OF HISTORICAL TYPE, AGGREGATED FROM DIFFERENT SOURCES WAREHOUSE DIMENSIONS GO BEYOND A TBYTE QUERY THROUGHPUT AND RESPONSE TIME ARE THE MAIN PERFORMANCE INDEXES Fabio A. Schreiber data warehouse 3
5 DATA MODELS FOR OLAP THEY MUST SUPPORT SOPHISTICATED ANALYSES AND COMPUTATIONS OVER DIFFERENT HIERARCHICAL DIMENSIONS THE MOST SUITABLE LOGICAL DATA MODEL IS A MULTIDIMENSIONAL STRUCTURE - THE DATA CUBE THE DIMENSIONS OF THE CUBE ARE THE ATTRIBUTES ON WHICH SEARCHES ARE MADE (KEYS) EACH DIMENSION IN TURN CAN BE A HIERARCHY DATE {DAY-MONTH-QUARTER-YEAR} PRODUCT {NAME - TYPE - CATHEGORY} (LAND ROVER - CROSS-COUNTRY - MOTOR VEHICLE) THE CELLS OF THE CUBE CONTAIN THE METRICAL VALUES PERTAINING TO DIMENSIONAL VALUES Fabio A. Schreiber data warehouse 4
6 LOGICAL DATA MODELS FOR OLAP INSURANCE COMPANY EXAMPLE DIMENSIONS METRICAL VALUES POLICY NUMBER, TOTAL AMOUNT YEAR < >80 LIFE-HEALTH THIRD-PARTY FIRE-THEFT RC-CAR TYPE AGE Fabio A. Schreiber data warehouse 5
7 OLAP OPERATIONS ROLL-UP INCREASES DATA AGGREGATION LEVEL DRILL-DOWN INCREASES DATA DETAIL LEVEL SLICE-AND-DICE SELECTS AND PROJECTS IN ORDER TO REDUCE DATA DIMENSION PIVOTING SELECTS TWO DIMENSIONS TO AGGREGATE METRICAL DATA AROUND RANKING SORTS DATA FOLLOWING PREDEFINED CRITERIA TRADITIONAL OPERATIONS (SELECTION, COMPUTED ATTRIBUTES, ECC.) Fabio A. Schreiber data warehouse 6
8 OLAP OPERATIONS DRILL-DOWN TYPE TYPE AGE ROLL-UP AGE YEAR MONTH Fabio A. Schreiber data warehouse 7
9 OLAP OPERATIONS YEAR PIVOTING YEAR AGE TYPE TYPE AGE AGE TYPE Fabio A. Schreiber data warehouse 8 YEAR
10 OLAP OPERATIONS YEAR SLICE AND DICE YEAR AGE TYPE AGE TYPE Fabio A. Schreiber data warehouse 9
11 DW ARCHITECTURE DATABASE 1 DATABASE 2 DATABASE LEGACY SPARSE FILES WAREHOUSE CONSTRUCTION COMPANY WAREHOUSE REPLICATION AND PROPAGATION DATA MART DATA MART DATA MART KNOWLEDGE DISCOVERY AND DATA MINING 1 +1 = 3 INFORMATION ACCESS AND MANAGEMENT? Fabio A. Schreiber data warehouse 10
12 WAREHOUSE CONSTRUCTION DATA COME FROM DIFFERENT AND DIRTY SOURCES LEGACY NON DOCUMENTED SYSTEMS PRODUCTION SYSTEMS WITHOUT ANY INTERNAL INTEGRITY CHECK EXTERNAL SOURCES WITH DOUBTFUL QUALITY FEATURES IT IS MANDATORY TO RESTORE DATA QUALITY IN ORDER TO COMMIT RELIABLE DECISIONS TO THEM Fabio A. Schreiber data warehouse 11
13 WAREHOUSE CONSTRUCTION TOOLS FOR DATA QUALITY FOR MIGRATION TRASFORM AND REFORMAT DATA FROM DIFFERENT SOURCES FOR SCRUBBING USE DOMAIN KNOWLEDGE TO SCRUB AND HOMOGENIZE FOR AUDITING DISCOVER RULES AND RELATIONS AMONG DATA AND VERIFY THEIR VALIDITY TOOLS FOR DATA LOADING VERIFY REFERENTIAL INTEGRITY VIOLATIONS SORT, AGGREGATE, BUILD DERIVED DATA BUILD INDEXES AND OTHER ACCESS PATHS REPRESENT A HEAVY BATCH WORKLOAD NEED TO ORGANIZE LOADING OPERATIONS IN A PARALLEL OR IN AN INCREMENTAL WAY Fabio A. Schreiber data warehouse 12
14 WAREHOUSE CONSTRUCTION INCREMENTAL LOADING PERFORMED DURING UPDATING; ONLY NEW OR MODIFIED TUPLES ARE LOADED CONFLICTS WITH NORMAL WORK REQUIRES SHORT TRANSACTIONS (<1000 RECORD) NEEDS COORDINATION TO GUARANTEE INDEXES AND DERIVED DATA CONSISTENCY UPDATING PERIODICALLY APPLIED FOLLOWING APPLICATIONS NEEDS DUPLICATION SERVER USAGE DATA DISPATCHING: FOR EACH CHANGE OF THE SOURCE TABLE, TRIGGERS ARE USED TO UPDATE A SNAPSHOT LOG TABLE WHICH IS PROPAGATED THEREAFTER TRANSACTION DISPATCHING: THE STANDARD LOG IS MONITORED AND THE CHANGES ON THE REPLICATED TABLES ARE TRANSFERRED TO THE DUPLICATION SERVER Fabio A. Schreiber data warehouse 13
15 INPUT DATA ANALYSIS DW DESIGN METHODOLOGY SELECTION OF RELEVANT INFORMATION SOURCES TRANSLATION INTO A REFERENCE CONCEPTUAL MODEL (E-R) INFORMATION SOURCES ANALYSIS IDENTIFICATION OF FACTS, MEASURES, DIMENSIONS INTEGRATION IN A GLOBAL CONCEPTUAL SCHEMA DATA WAREHOUSE DESIGN CONCEPTUAL AGGREGATED DATA, HISTORICAL DATA, INTRODUCTION LOGICAL DATA MART DESIGN (MULTIDIMENSIONAL DB) IDENTIFICATION OF FACTS AND DIMENSIONS E-R SCHEMA RESTRUCTURING DIMENSIONAL GRAPH DERIVATION TRANSLATION INTO THE LOGICAL MODEL Fabio A. Schreiber data warehouse 14
16 FREQUENT FAILURE CAUSES IGNORE DATA QUALITY ACCURACY COMPLETENESS CONSISTENCY TIMELINESS AVAILABILITY NOT STORING NECESSARY DATA IGNORE DATA IN EXTERNAL SOURCES IGNORE SOFT DATA (e.g. subjective judgements) Fabio A. Schreiber data warehouse 15
17 MULTIDIMENSIONAL OLAP SERVER (MOLAP) DIRECTLY IMPLEMENTS THE CUBE MODEL MULTIDIMENSIONAL MATRIX STRUCTURES OPTIMAL FOR DENSE STRUCTURES SEARCH BY ADDRESS BECOMES AN ALGEBRAIC COMPUTATION HIGH AND CONSTANT QUERY PROCESSING PERFORMANCE SPECIALIZED ACCESS METHODS AGGREGATION AND COMPILATION ARE MADE IN ADVANCE LIMITED SCALABILITY DUE TO PREPROCESSING MORE INGENUITY REQUIRED TO THE DATA ADMINISTRATOR Fabio A. Schreiber data warehouse 16
18 RELATIONAL OLAP SERVER (ROLAP) USES A STANDARD RDBMS TO IMPLEMENT THE MULTIDIMENSIONAL STRUCTURE BY MEANS OF THE GROUP_BY OPERATION THE SCHEMA HAS A STAR OR A SNOWFLAKE SHAPE (IT NORMALIZES HIERARCHIES) CENTRAL FACT TABLE TUPLES ARE CONSTITUTED OF POINTERS (EXTERNAL KEYS) TO THE DIMENSION TABLES AND OF THE VALUES FOR THE DESCRIBED COORDINATES f=(k 1,..., k n, v 1, v m ) DIMENSION TABLES STORE TUPLES WITH ATTRIBUTES PERTAINING TO EACH DIMENSION d 1 =(k 1, a 1,, a n ) FACTS CONSTELLATION MANY FACT TABLES SHARE DIMENSION TABLES HAVING THE SAME STRUCTURE Fabio A. Schreiber data warehouse 17
19 STAR SCHEMA CONSTELLATION COMPONENT COMPONENT#.. PROJECT PROJECT# COMPONENT#.. FACT TABLE INVOICE# DATEK TOTAL INVOICE INVOICE# DIMENSION TABLE... SNOWFLAKE PROJECT# MANAGER MANAGER#... MANAGER# DATEK QUANTITY COST FACT TABLE DATE DATEK DATE MONTH MONTH MONTH YEAR YEAR YEAR STAR Fabio A. Schreiber data warehouse 18
20 MAIN PROBLEMS IN DW DESIGN LOGICAL STRUCTURES DESIGN IN ORDER TO OPTIMIZE QUERIES NEED TO MINIMIZE JOINS DENORMALIZATION AND DATA REPLICATION TABLES DIMENSIONS REDUCTION HORIZONTAL PARTITIONING VERTICAL PARTITIONING BY ROWS BREAKING (USEFUL FOR DRILL-DOWN) Fabio A. Schreiber data warehouse 19
21 MAIN PROBLEMS IN DW DESIGN PHYSICAL STRUCTURES DESIGN CHOICE OF INDEXES CHOICE OF MATERIALIZED VIEWS VIEW AND METADATA MAINTENANCE REPLICATION MANAGEMENT HOW AND WHEN TO UPDATE CONSISTENCY MANAGEMENT APPLICATIONS IMPLEMENTATION Fabio A. Schreiber data warehouse 20
22 KNOWLEDGE HIERARCHY VARIABLES INVOICES SALES TREND MARKET RULES STRATEGIC DECISIONS ELEMENTS (VOLUME) DATA INFORMATION KNOWLEDGE WISDOM VALUE ADDED STATISTICAL PROCESSING KNOWLEDGE DISCOVERY PROCEDURES EXPERIENCE Fabio A. Schreiber data warehouse 21
23 KNOWLEDGE DISCOVERY AND DATA MINING KNOWLEDGE DISCOVERY IN DATABASES AND DATA WAREHOUSES TO IDENTIFY THE MOST SIGNIFICANT INFORMATION TO SHOW IT TO THE USER IN THE MOST CONVENIENT WAY DATA MINING ALGORITHM APPLICATION TO RAW DATA IN ORDER TO EXTRACT KNOWLEDGE (RELATIONS, PATHS, ) PREDICTIVE AIM (SIGNAL ANALYSIS, VOICE RECOGNITION, ECC.) DESCRIPTIVE AIM (DECISION SUPPORT SYSTEMS) Fabio A. Schreiber data warehouse 22
24 KNOWLEDGE DISCOVERY PROCESS (1) EVEN IF SPECIALIZED TOOLS ARE AVAILABLE IT REQUIRES A COMPETENCE IN USED TECHNIQUES A VERY GOOD APPLICATION DOMAIN KNOWLEDGE SEQUENTIAL STEPS SELECTION CHOICE OF THE SAMPLE DATA THE ANALYSIS SHALL BE FOCUSED ON PREPROCESSING DATA SAMPLING IN ORDER TO REDUCE THEIR VOLUME DATA SCRUBBING FOR ERRORS AND OMISSIONS Fabio A. Schreiber data warehouse 23
25 KNOWLEDGE DISCOVERY PROCESS (2) TRANSFORMATION DATA TYPES HOMOGENEIZATION AND/OR CONVERSION DATA MINING CHOICE OF THE METHOD/ALGORITHM INTERPRETATION AND EVALUATION RETRIEVED INFORMATION FILTERING POSSIBLE REFINING BY PREVIOUS STEPS REPETITION SEARCH RESULTS VISUAL PRESENTATION (GRAPHICAL OR LOGICAL) Fabio A. Schreiber data warehouse 24
26 KNOWLEDGE DISCOVERY PROCESS (3) INTERPRETATION KNOWLEDGE DATA MINING TRANSFORMATION PRE-PROCESSING PRE- PROCESSED DATA TRANSFORMED DATA CORRELATIONS AND PATHS SELECTION TARGET DATA RAW DATA da: G. Piatesky-Shapiro 1996 Fabio A. Schreiber data warehouse 25
27 DATA MINING APPLICATIONS CORRESPONDENCE AND RETAIL SALES WHICH SPECIAL OFFER SHOULD BE MADE HOW TO LOCATE GOODS ON THE SHELVES MARKETING SALES FORECASTS PRODUCTS BUYING PATHS BANKS LOANS CONTROL CREDIT CARDS USAGE (AND MISUSE) TELECOMMUNICATIONS SPECIAL FARES Fabio A. Schreiber data warehouse 26
28 DATA MINING APPLICATIONS ASTRONOMY AND ASTROPHYSICS STARS AND GALAXIES CLASSIFICATION CHEMICAL AND PHARMACEUTIC RESEARCH NEW COMPOUNDS DISCOVERY INTERRELATIONS AMONG CHEMICAL COMPOUNDS MOLECULAR BIOLOGY PATTERNS IN GENETICAL DATA AND IN MOLECULAR STRUCTURES REMOTE SURVEY AND METEOROLOGY SATELLITE DATA ANALYSIS ECONOMIC AND DEMOGRAPHIC STATISTICS CENSUS DATA ANALYSIS Fabio A. Schreiber data warehouse 27
29 DATA MINING ALGORITHMS STRUCTURE MODEL REPRESENTATION FORMALISMS TO REPRESENT AND DESCRIBE POSSIBLE PATHS MODEL EVALUATION STATISTICAL OR LOGICAL ESTIMATE OF THE CORRESPONDENCE OF A PATH TO THE SEARCH CRITERIA SEARCH METHOD OF PARAMETERS SEARCH OF THE PARAMETERS WHICH OPTIMIZE THE EVALUATION CRITERIA, THE OBSERVATIONS SET AND THE MODEL REPRESENTATION BEING GIVEN OF MODEL THE PARAMETERS ARE APPLIED TO MODELS BELONGING TO THE SAME FAMILY, DIFFERENTIATED BY THE REPRESENTATION TYPE, FOR QUALITY EVALUATION Fabio A. Schreiber data warehouse 28
30 WHAT KIND OF INFORMATION DO WE GET? ASSOCIATIONS SET OF RULES SPECIFYING THE JOINT OCCURRENCE OF TWO (OR MORE) ELEMENTS SEQUENCES POSSIBILITY OF STATING TEMPORAL SEQUENCES OF EVENTS CLASSIFICATIONS GROUPING OF ELEMENTS INTO CLASSES FOLLOWING A GIVEN MODEL CLUSTERS GROUPING OF ELEMENTS INTO CLASSES WHICH HAVE NOT BEEN DEFINED A-PRIORI TRENDS DISCOVERY OF PECULIAR TEMPORAL PATHS HAVING A FORECASTING VALUE Fabio A. Schreiber data warehouse 29
31 DATA MINING TECHNIQUES AND MODELS NEURAL NETWORKS VERY POWERFUL TOOL LEARNING CAPABILITY OPAQUE BEHAVIOUR INDUCTION DECISION TREES (HIERARCHY OF if - then DECISIONS) RULE INDUCTION (NON HIERARCHICAL SETS DERIVED FROM DECISION TREES) SELF EXPLAINING Fabio A. Schreiber data warehouse 30
32 DATA MINING TECHNIQUES AND MODELS RULES DISCOVERY ASSOCIATION RULES (SIMULTANEOUS EVENTS X Y) SEQUENTIAL ASSOCIATIONS DATA VISUALIZATION DATA ARE PREPARED AND GRAPHICALLY PRESENTED IN ORDER TO EVIDENTIATE POSSIBLE IRREGULARITIES OR STRANGE PATTERNS Fabio A. Schreiber data warehouse 31
33 DATA MINING TECHNIQUES TECHNIQUES INFORMATION ASSOCIATIONS SEQUENCES CLASSIFICATIONS CLUSTERS TRENDS NEURAL NETWORKS INDUCTION RULES DISCOVERY DATA VISUALIZATION Fabio A. Schreiber data warehouse 32
34 THE MARKET BASKET THE BEST-KNOWN MODEL ON WHICH DATA MINING TECHNIQUES ARE APPLIED MAINLY, BUT NOT EXCLUSIVELY, USED FOR RETAIL SALE PROBLEMS THE GOAL IS TO DISCOVER RECURRENT PATTERNS IN DATA (ASSOCIATION RULES) Fabio A. Schreiber data warehouse 33
35 THE MARKET BASKET I = {i 1,..., i k } B = {b 1,..., b n } b i I SET OF k ELEMENTS (ITEM) SET OF n SUBSETS (BASKET) OF I I GOODS IN A SUPERMARKET WORDS IN A DICTIONARY B A CUSTOMER S PURCHASE A DOCUMENT IN A CORPUS ASSOCIATION RULE i 1 i 2 i 1 AND i 2 SHOW TOGETHER IN AT LEAST s% OF THE n BASKET (SUPPORT) OF ALL THE BASKETS CONTAINING i 1 AT LEAST c% CONTAIN ALSO i (CONFIDENCE) 2 Fabio A. Schreiber data warehouse 34
36 EXAMPLES OF RULES RULES HAVING DIET COKE AS CONSEQUENT HELP IN DEFINING THE STRATEGIES TO INCREASE THE SALES OF A SPECIFIC PRODUCT RULES HAVING GRISSINI AMONG THEIR ANTECEDENTS HELP IN UNDERSTANDING THE IMPACT OF CEASING A SPECIFIC PRODUCT SALE RULES HAVING WURSTEL AMONG THEIR ANTECEDENTS AND MUSTARD AS CONSEQUENT EVIDENTIATE THE PRODUCT COUPLINGS IN THE ANTECEDENT INDUCING THE SALE OF THE CONSEQUENT Fabio A. Schreiber data warehouse 35
37 EXAMPLES OF RULES RULES CONNECTING ELEMENTS IN SHELF A TO ELEMENTS IN SHELF B HELP TO AN EFFECTIVE ALLOCATION OF PRODUCTS ON THE SHELVES THE BEST k RULES HAVING GRISSINI IN THE CONSEQUENT IN TERMS OF CONFIDENCE IN TERMS OF SUPPORT Fabio A. Schreiber data warehouse 36
38 ANY PROBLEM? c COFFEE IS IN THE BASKET c NO COFFEE IN THE BASKET t TEA IS IN THE BASKET t NO TEA IN THE BASKET t t Σcolumns c c Σ rows t c IS TRUE??? s = 20% c = P[t c] / P[t] =20/25= 80% PERHAPS, BUT... THOSE WHO BUY COFFEE ANYHOW REACH 90%!!! WARNING! A CORRELATION EXISTS BETWEEN TEA AND COFFEE r = P[t c] / (P[t] x P[c] ) = 0.89 Fabio A. Schreiber data warehouse 37
39 DATA MINING IN A RELATIONAL ENVIRONMENT SQL EXTENSION BY INTEGRATING DATA MINING OPERATORS INTO THE LANGUAGE INTEGRATION OF AN SQL SERVER WITH A DATA MINING ENGINE MINE RULE FOR ASSOCIATION RULES MINE CLASSIFICATION FOR CLASSIFICATION PROBLEMS MINE INTERVAL FOR DISCRETIZING CONTINUOUS ATTRIBUTES Fabio A. Schreiber data warehouse 38
40 MINE RULE: EXAMPLE MINE RULE SimpleAssociation AS SELECT DISTINCT 1..n item AS BODY, 1..1 item AS HEAD, SUPPORT, CONFIDENCE FROM Purchase GROUP BY transaction EXTRACTING RULES WITH SUPPORT: 0.1, CONFIDENCE: 0.2 TR CLIENT ITEM DATE PRICE QUANT. 1 Rossi Ski-trousers 17/ Rossi Climbing-boots 17/ Bianchi Sport shirt 18/ Bianchi Brown shoes 18/ Bianchi Rossi Jacket Jacket 18/12 18/ Bianchi Sport shirt 19/ Bianchi Jacket 19/ Fabio A. Schreiber data warehouse 39
41 MINE RULE: EXAMPLE BODY HEAD SUPPORT CONFIDENCE Ski-trousers Climbing-boots Sport shirt Sport shirt Brown shoes Brown shoes Jacket Jacket Climbing-boots Ski-trousers Brown shoes Jacket Sport shirt Jacket Sport shirt Brown shoes Sport shirt, Brown shoes Jacket Sport shirt, Jacket Brown shoes Brown shoes, Jacket Sport shirt Fabio A. Schreiber data warehouse 40
42 CLASSIFICATION PROBLEM EACH ELEMENT (RECORD) OF A DATA SET IS ASSOCIATED WITH A DISTINCTIVE FEATURE CALLED CLASS, ON THE BASIS OF PREVIOUS EXPERIENCES AND OBSERVATIONS PHASE 1: TRAINING A MODEL IS BUILT ON THE KNOWN SET (TRAINING SET) IN SUCH A WAY AS EACH CLASS IS A PARTITION OF THE SET PHASE 2: APPLICATION THE MODEL IS USED TO CLASSIFY NEW DATA Fabio A. Schreiber data warehouse 41
43 CLASSIFICATION PROBLEM : AN EXAMPLE PHASE 1: TRAINING MINE CLASSIFICATION CarInsuranceRules AS SELECT DISTINCT RULES ID, *, CLASS FROM CarInsurance CLASSIFY BY Risk PHASE 2: APPLICATION MINE CLASSIFICATION TEST ClassifiedApplicants AS SELECT DISTINCT *, CLASS FROM Applicants USING CLASSIFICATION FROM CarInsuranceRules AS RULES Fabio A. Schreiber data warehouse 42
44 CLASSIFICATION PROBLEM : AN EXAMPLE AGE CAR TYPE RISK 17 sports high 43 family low 68 family low 32 truck low 23 family high 18 family high 20 family high 45 sports high 50 truck low 64 truck high 46 family low 40 family low 1. IF Age 23 THEN Risk IS High; 2. IF CarType = sports THEN Risk IS High; 3. IF CarType IN {family, truck} AND Age > 23 THEN Risk IS Low; 4. DEFAULT Risk IS Low AGE CAR TYPE 22 family 60 family 35 sports Fabio A. Schreiber data warehouse 43 AGE CAR TYPE 22 family 60 family 35 sports MINE CLASSIFICATION TEST CLASS high low high
45 DISCRETIZATION PROBLEM ONE MUST TRANSFORM A NUMERIC ATTRIBUTE IN A CATHEGORIAL ONE BY DIVIDING THE NUMERIC DOMAIN INTO A SET OF INTERVALS, EACH OF WHICH IS LABELLED WITH A CLASS LABEL, CONSTITUTING THE CATHEGORIAL DOMAIN DISCRETIZATION METHODS NON SUPERVISIONED ONLY THE VALUES DISTRIBUTION OF THE ATTRIBUTE IS CONSIDERED SUPERVISIONED AN EFFORT IS MADE TO KEEP AS MUCH INFORMATION AS POSSIBLE ABOUT THE CLASSES OF A TUPLE ASSOCIATED ATTRIBUTE Fabio A. Schreiber data warehouse 44
46 DISCRETIZATION PROBLEM SIMPLE DISCRETIZATION SAME WIDTH INTERVALS THE NUMERIC DOMAIN IS DIVIDED INTO A GIVEN NUMBER OF EQUAL WIDTH INTERVALS SAME FREQUENCY INTERVALS INTERVALS ARE NARROWER WHERE VALUES ARE DENSE AND WIDER WHERE VALUES ARE SPARSE MINE INTERVAL SameWidthInterval AS SELECT DISTINCT ID, LOWER, UPPER GENERATING DiscreteCarInsurance FROM CarInsurance DISCRETIZE Age BY WIDTH USING 3 INTERVALS Fabio A. Schreiber data warehouse 45
47 DISCRETIZATION PROBLEM : AN EXAMPLE ID LOWER UPPER AGE CAR TYPE RISK 17 sports high 43 family low 68 family low 32 truck low 23 family high 18 family high 20 family high 45 sports high 50 truck low 64 truck high 46 family low 40 family low AGE CAR TYPE RISK 1 sports high truck low 2 family low truck low family low Fabio A. Schreiber data warehouse 46
OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP
Data Warehousing and End-User Access Tools OLAP and Data Mining Accompanying growth in data warehouses is increasing demands for more powerful access tools providing advanced analytical capabilities. Key
Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1
Slide 29-1 Chapter 29 Overview of Data Warehousing and OLAP Chapter 29 Outline Purpose of Data Warehousing Introduction, Definitions, and Terminology Comparison with Traditional Databases Characteristics
1. OLAP is an acronym for a. Online Analytical Processing b. Online Analysis Process c. Online Arithmetic Processing d. Object Linking and Processing
1. OLAP is an acronym for a. Online Analytical Processing b. Online Analysis Process c. Online Arithmetic Processing d. Object Linking and Processing 2. What is a Data warehouse a. A database application
DATA WAREHOUSING AND OLAP TECHNOLOGY
DATA WAREHOUSING AND OLAP TECHNOLOGY Manya Sethi MCA Final Year Amity University, Uttar Pradesh Under Guidance of Ms. Shruti Nagpal Abstract DATA WAREHOUSING and Online Analytical Processing (OLAP) are
Fluency With Information Technology CSE100/IMT100
Fluency With Information Technology CSE100/IMT100 ),7 Larry Snyder & Mel Oyler, Instructors Ariel Kemp, Isaac Kunen, Gerome Miklau & Sean Squires, Teaching Assistants University of Washington, Autumn 1999
OLAP and OLTP. AMIT KUMAR BINDAL Associate Professor M M U MULLANA
OLAP and OLTP AMIT KUMAR BINDAL Associate Professor Databases Databases are developed on the IDEA that DATA is one of the critical materials of the Information Age Information, which is created by data,
Week 3 lecture slides
Week 3 lecture slides Topics Data Warehouses Online Analytical Processing Introduction to Data Cubes Textbook reference: Chapter 3 Data Warehouses A data warehouse is a collection of data specifically
OLAP. Business Intelligence OLAP definition & application Multidimensional data representation
OLAP Business Intelligence OLAP definition & application Multidimensional data representation 1 Business Intelligence Accompanying the growth in data warehousing is an ever-increasing demand by users for
Course 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
Oman College of Management and Technology Course 803401 DSS Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization CS/MIS Department Information Sharing
Data Warehouse: Introduction
Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of base and data mining group,
Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
BUILDING BLOCKS OF DATAWAREHOUSE. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT
BUILDING BLOCKS OF DATAWAREHOUSE G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT 1 Data Warehouse Subject Oriented Organized around major subjects, such as customer, product, sales. Focusing on
14. Data Warehousing & Data Mining
14. Data Warehousing & Data Mining Data Warehousing Concepts Decision support is key for companies wanting to turn their organizational data into an information asset Data Warehouse "A subject-oriented,
Data Warehousing. Read chapter 13 of Riguzzi et al Sistemi Informativi. Slides derived from those by Hector Garcia-Molina
Data Warehousing Read chapter 13 of Riguzzi et al Sistemi Informativi Slides derived from those by Hector Garcia-Molina What is a Warehouse? Collection of diverse data subject oriented aimed at executive,
Chapter 5. Warehousing, Data Acquisition, Data. Visualization
Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization 5-1 Learning Objectives
OLAP & DATA MINING CS561-SPRING 2012 WPI, MOHAMED ELTABAKH
OLAP & DATA MINING CS561-SPRING 2012 WPI, MOHAMED ELTABAKH 1 Online Analytic Processing OLAP 2 OLAP OLAP: Online Analytic Processing OLAP queries are complex queries that Touch large amounts of data Discover
Week 13: Data Warehousing. Warehousing
1 Week 13: Data Warehousing Warehousing Growing industry: $8 billion in 1998 Range from desktop to huge: Walmart: 900-CPU, 2,700 disk, 23TB Teradata system Lots of buzzwords, hype slice & dice, rollup,
European Archival Records and Knowledge Preservation Database Archiving in the E-ARK Project
European Archival Records and Knowledge Preservation Database Archiving in the E-ARK Project Janet Delve, University of Portsmouth Kuldar Aas, National Archives of Estonia Rainer Schmidt, Austrian Institute
B.Sc (Computer Science) Database Management Systems UNIT-V
1 B.Sc (Computer Science) Database Management Systems UNIT-V Business Intelligence? Business intelligence is a term used to describe a comprehensive cohesive and integrated set of tools and process used
1. What are the uses of statistics in data mining? Statistics is used to Estimate the complexity of a data mining problem. Suggest which data mining
1. What are the uses of statistics in data mining? Statistics is used to Estimate the complexity of a data mining problem. Suggest which data mining techniques are most likely to be successful, and Identify
LITERATURE SURVEY ON DATA WAREHOUSE AND ITS TECHNIQUES
LITERATURE SURVEY ON DATA WAREHOUSE AND ITS TECHNIQUES MUHAMMAD KHALEEL (0912125) SZABIST KARACHI CAMPUS Abstract. Data warehouse and online analytical processing (OLAP) both are core component for decision
Introduction to Data Mining
Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association
Part 22. Data Warehousing
Part 22 Data Warehousing The Decision Support System (DSS) Tools to assist decision-making Used at all levels in the organization Sometimes focused on a single area Sometimes focused on a single problem
PowerDesigner WarehouseArchitect The Model for Data Warehousing Solutions. A Technical Whitepaper from Sybase, Inc.
PowerDesigner WarehouseArchitect The Model for Data Warehousing Solutions A Technical Whitepaper from Sybase, Inc. Table of Contents Section I: The Need for Data Warehouse Modeling.....................................4
When to consider OLAP?
When to consider OLAP? Author: Prakash Kewalramani Organization: Evaltech, Inc. Evaltech Research Group, Data Warehousing Practice. Date: 03/10/08 Email: [email protected] Abstract: Do you need an OLAP
Delivering Business Intelligence With Microsoft SQL Server 2005 or 2008 HDT922 Five Days
or 2008 Five Days Prerequisites Students should have experience with any relational database management system as well as experience with data warehouses and star schemas. It would be helpful if students
Building Data Cubes and Mining Them. Jelena Jovanovic Email: [email protected]
Building Data Cubes and Mining Them Jelena Jovanovic Email: [email protected] KDD Process KDD is an overall process of discovering useful knowledge from data. Data mining is a particular step in the
Data Warehousing and Data Mining in Business Applications
133 Data Warehousing and Data Mining in Business Applications Eesha Goel CSE Deptt. GZS-PTU Campus, Bathinda. Abstract Information technology is now required in all aspect of our lives that helps in business
Data Mining: Concepts and Techniques. Jiawei Han. Micheline Kamber. Simon Fräser University К MORGAN KAUFMANN PUBLISHERS. AN IMPRINT OF Elsevier
Data Mining: Concepts and Techniques Jiawei Han Micheline Kamber Simon Fräser University К MORGAN KAUFMANN PUBLISHERS AN IMPRINT OF Elsevier Contents Foreword Preface xix vii Chapter I Introduction I I.
SQL Server Analysis Services Complete Practical & Real-time Training
A Unit of Sequelgate Innovative Technologies Pvt. Ltd. ISO Certified Training Institute Microsoft Certified Partner SQL Server Analysis Services Complete Practical & Real-time Training Mode: Practical,
DATA WAREHOUSING - OLAP
http://www.tutorialspoint.com/dwh/dwh_olap.htm DATA WAREHOUSING - OLAP Copyright tutorialspoint.com Online Analytical Processing Server OLAP is based on the multidimensional data model. It allows managers,
Outline. Data Warehousing. What is a Warehouse? What is a Warehouse?
Outline Data Warehousing What is a data warehouse? Why a warehouse? Models & operations Implementing a warehouse 2 What is a Warehouse? Collection of diverse data subject oriented aimed at executive, decision
CS2032 Data warehousing and Data Mining Unit II Page 1
UNIT II BUSINESS ANALYSIS Reporting Query tools and Applications The data warehouse is accessed using an end-user query and reporting tool from Business Objects. Business Objects provides several tools
DATA WAREHOUSE CONCEPTS DATA WAREHOUSE DEFINITIONS
DATA WAREHOUSE CONCEPTS A fundamental concept of a data warehouse is the distinction between data and information. Data is composed of observable and recordable facts that are often found in operational
IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH
IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH Kalinka Mihaylova Kaloyanova St. Kliment Ohridski University of Sofia, Faculty of Mathematics and Informatics Sofia 1164, Bulgaria
New Approach of Computing Data Cubes in Data Warehousing
International Journal of Information & Computation Technology. ISSN 0974-2239 Volume 4, Number 14 (2014), pp. 1411-1417 International Research Publications House http://www. irphouse.com New Approach of
CHAPTER 4 Data Warehouse Architecture
CHAPTER 4 Data Warehouse Architecture 4.1 Data Warehouse Architecture 4.2 Three-tier data warehouse architecture 4.3 Types of OLAP servers: ROLAP versus MOLAP versus HOLAP 4.4 Further development of Data
CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 16 - Data Warehousing
CSE 544 Principles of Database Management Systems Magdalena Balazinska Fall 2007 Lecture 16 - Data Warehousing Class Projects Class projects are going very well! Project presentations: 15 minutes On Wednesday
2074 : Designing and Implementing OLAP Solutions Using Microsoft SQL Server 2000
2074 : Designing and Implementing OLAP Solutions Using Microsoft SQL Server 2000 Introduction This course provides students with the knowledge and skills necessary to design, implement, and deploy OLAP
Data Warehousing, OLAP, and Data Mining
Data Warehousing, OLAP, and Marek Rychly [email protected] Strathmore University, @ilabafrica & Brno University of Technology, Faculty of Information Technology Advanced Databases and Enterprise Systems
Ramesh Bhashyam Teradata Fellow Teradata Corporation [email protected]
Challenges of Handling Big Data Ramesh Bhashyam Teradata Fellow Teradata Corporation [email protected] Trend Too much information is a storage issue, certainly, but too much information is also
CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 15 - Data Warehousing: Cubes
CSE 544 Principles of Database Management Systems Magdalena Balazinska Winter 2009 Lecture 15 - Data Warehousing: Cubes Final Exam Overview Open books and open notes No laptops and no other mobile devices
Overview of Data Warehousing and OLAP
Overview of Data Warehousing and OLAP Chapter 28 March 24, 2008 ADBS: DW 1 Chapter Outline What is a data warehouse (DW) Conceptual structure of DW Why separate DW Data modeling for DW Online Analytical
Distance Learning and Examining Systems
Lodz University of Technology Distance Learning and Examining Systems - Theory and Applications edited by Sławomir Wiak Konrad Szumigaj HUMAN CAPITAL - THE BEST INVESTMENT The project is part-financed
Foundations of Business Intelligence: Databases and Information Management
Chapter 5 Foundations of Business Intelligence: Databases and Information Management 5.1 Copyright 2011 Pearson Education, Inc. Student Learning Objectives How does a relational database organize data,
Data W a Ware r house house and and OLAP II Week 6 1
Data Warehouse and OLAP II Week 6 1 Team Homework Assignment #8 Using a data warehousing tool and a data set, play four OLAP operations (Roll up (drill up), Drill down (roll down), Slice and dice, Pivot
SAS BI Course Content; Introduction to DWH / BI Concepts
SAS BI Course Content; Introduction to DWH / BI Concepts SAS Web Report Studio 4.2 SAS EG 4.2 SAS Information Delivery Portal 4.2 SAS Data Integration Studio 4.2 SAS BI Dashboard 4.2 SAS Management Console
Framework for Data warehouse architectural components
Framework for Data warehouse architectural components Author: Jim Wendt Organization: Evaltech, Inc. Evaltech Research Group, Data Warehousing Practice. Date: 04/08/11 Email: [email protected] Abstract:
BUILDING OLAP TOOLS OVER LARGE DATABASES
BUILDING OLAP TOOLS OVER LARGE DATABASES Rui Oliveira, Jorge Bernardino ISEC Instituto Superior de Engenharia de Coimbra, Polytechnic Institute of Coimbra Quinta da Nora, Rua Pedro Nunes, P-3030-199 Coimbra,
Basics of Dimensional Modeling
Basics of Dimensional Modeling Data warehouse and OLAP tools are based on a dimensional data model. A dimensional model is based on dimensions, facts, cubes, and schemas such as star and snowflake. Dimensional
Business Intelligence, Analytics & Reporting: Glossary of Terms
Business Intelligence, Analytics & Reporting: Glossary of Terms A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Ad-hoc analytics Ad-hoc analytics is the process by which a user can create a new report
Foundations of Business Intelligence: Databases and Information Management
Foundations of Business Intelligence: Databases and Information Management Problem: HP s numerous systems unable to deliver the information needed for a complete picture of business operations, lack of
Foundations of Business Intelligence: Databases and Information Management
Foundations of Business Intelligence: Databases and Information Management Content Problems of managing data resources in a traditional file environment Capabilities and value of a database management
CHAPTER 5: BUSINESS ANALYTICS
Chapter 5: Business Analytics CHAPTER 5: BUSINESS ANALYTICS Objectives The objectives are: Describe Business Analytics. Explain the terminology associated with Business Analytics. Describe the data warehouse
SQL Server Administrator Introduction - 3 Days Objectives
SQL Server Administrator Introduction - 3 Days INTRODUCTION TO MICROSOFT SQL SERVER Exploring the components of SQL Server Identifying SQL Server administration tasks INSTALLING SQL SERVER Identifying
Data Mining as Part of Knowledge Discovery in Databases (KDD)
Mining as Part of Knowledge Discovery in bases (KDD) Presented by Naci Akkøk as part of INF4180/3180, Advanced base Systems, fall 2003 (based on slightly modified foils of Dr. Denise Ecklund from 6 November
5.5 Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall. Figure 5-2
Class Announcements TIM 50 - Business Information Systems Lecture 15 Database Assignment 2 posted Due Tuesday 5/26 UC Santa Cruz May 19, 2015 Database: Collection of related files containing records on
Data Warehousing Systems: Foundations and Architectures
Data Warehousing Systems: Foundations and Architectures Il-Yeol Song Drexel University, http://www.ischool.drexel.edu/faculty/song/ SYNONYMS None DEFINITION A data warehouse (DW) is an integrated repository
IBM WebSphere DataStage Online training from Yes-M Systems
Yes-M Systems offers the unique opportunity to aspiring fresher s and experienced professionals to get real time experience in ETL Data warehouse tool IBM DataStage. Course Description With this training
Data Warehousing. Overview, Terminology, and Research Issues. Joachim Hammer. Joachim Hammer
Data Warehousing Overview, Terminology, and Research Issues 1 Heterogeneous Database Integration Integration System World Wide Web Digital Libraries Scientific Databases Personal Databases Collects and
Data Warehousing and OLAP
1 Data Warehousing and OLAP Hector Garcia-Molina Stanford University Warehousing Growing industry: $8 billion in 1998 Range from desktop to huge: Walmart: 900-CPU, 2,700 disk, 23TB Teradata system Lots
Designing a Dimensional Model
Designing a Dimensional Model Erik Veerman Atlanta MDF member SQL Server MVP, Microsoft MCT Mentor, Solid Quality Learning Definitions Data Warehousing A subject-oriented, integrated, time-variant, and
CHAPTER 4: BUSINESS ANALYTICS
Chapter 4: Business Analytics CHAPTER 4: BUSINESS ANALYTICS Objectives Introduction The objectives are: Describe Business Analytics Explain the terminology associated with Business Analytics Describe the
MDM and Data Warehousing Complement Each Other
Master Management MDM and Warehousing Complement Each Other Greater business value from both 2011 IBM Corporation Executive Summary Master Management (MDM) and Warehousing (DW) complement each other There
BUSINESS ANALYTICS AND DATA VISUALIZATION. ITM-761 Business Intelligence ดร. สล ล บ ญพราหมณ
1 BUSINESS ANALYTICS AND DATA VISUALIZATION ITM-761 Business Intelligence ดร. สล ล บ ญพราหมณ 2 การท าความด น น ยากและเห นผลช า แต ก จ าเป นต องท า เพราะหาไม ความช วซ งท าได ง ายจะเข ามาแทนท และจะพอกพ นข
Data Warehouse Design
Data Warehouse Design Modern Principles and Methodologies Matteo Golfarelli Stefano Rizzi Translated by Claudio Pagliarani Mc Grauu Hill New York Chicago San Francisco Lisbon London Madrid Mexico City
Data Mining. Vera Goebel. Department of Informatics, University of Oslo
Data Mining Vera Goebel Department of Informatics, University of Oslo 2011 1 Lecture Contents Knowledge Discovery in Databases (KDD) Definition and Applications OLAP Architectures for OLAP and KDD KDD
Web Log Data Sparsity Analysis and Performance Evaluation for OLAP
Web Log Data Sparsity Analysis and Performance Evaluation for OLAP Ji-Hyun Kim, Hwan-Seung Yong Department of Computer Science and Engineering Ewha Womans University 11-1 Daehyun-dong, Seodaemun-gu, Seoul,
Data Warehousing and Decision Support. Introduction. Three Complementary Trends. Chapter 23, Part A
Data Warehousing and Decision Support Chapter 23, Part A Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke 1 Introduction Increasingly, organizations are analyzing current and historical
Data Warehousing. Outline. From OLTP to the Data Warehouse. Overview of data warehousing Dimensional Modeling Online Analytical Processing
Data Warehousing Outline Overview of data warehousing Dimensional Modeling Online Analytical Processing From OLTP to the Data Warehouse Traditionally, database systems stored data relevant to current business
Adobe Insight, powered by Omniture
Adobe Insight, powered by Omniture Accelerating government intelligence to the speed of thought 1 Challenges that analysts face 2 Analysis tools and functionality 3 Adobe Insight 4 Summary Never before
Introduction to Data Mining
Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:
Data Warehousing and Data Mining
Data Warehousing and Data Mining Part I: Data Warehousing Gao Cong [email protected] Slides adapted from Man Lung Yiu and Torben Bach Pedersen Course Structure Business intelligence: Extract knowledge
Data Warehousing. Paper 133-25
Paper 133-25 The Power of Hybrid OLAP in a Multidimensional World Ann Weinberger, SAS Institute Inc., Cary, NC Matthias Ender, SAS Institute Inc., Cary, NC ABSTRACT Version 8 of the SAS System brings powerful
Data Mining Solutions for the Business Environment
Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania [email protected] Over
Introduction to Data Warehousing. Ms Swapnil Shrivastava [email protected]
Introduction to Data Warehousing Ms Swapnil Shrivastava [email protected] Necessity is the mother of invention Why Data Warehouse? Scenario 1 ABC Pvt Ltd is a company with branches at Mumbai,
SQL SERVER TRAINING CURRICULUM
SQL SERVER TRAINING CURRICULUM Complete SQL Server 2000/2005 for Developers Management and Administration Overview Creating databases and transaction logs Managing the file system Server and database configuration
ORACLE OLAP. Oracle OLAP is embedded in the Oracle Database kernel and runs in the same database process
ORACLE OLAP KEY FEATURES AND BENEFITS FAST ANSWERS TO TOUGH QUESTIONS EASILY KEY FEATURES & BENEFITS World class analytic engine Superior query performance Simple SQL access to advanced analytics Enhanced
Data Warehousing and OLAP Technology for Knowledge Discovery
542 Data Warehousing and OLAP Technology for Knowledge Discovery Aparajita Suman Abstract Since time immemorial, libraries have been generating services using the knowledge stored in various repositories
IST722 Data Warehousing
IST722 Data Warehousing Components of the Data Warehouse Michael A. Fudge, Jr. Recall: Inmon s CIF The CIF is a reference architecture Understanding the Diagram The CIF is a reference architecture CIF
Data Warehousing Concepts
Data Warehousing Concepts JB Software and Consulting Inc 1333 McDermott Drive, Suite 200 Allen, TX 75013. [[[[[ DATA WAREHOUSING What is a Data Warehouse? Decision Support Systems (DSS), provides an analysis
Database Applications. Advanced Querying. Transaction Processing. Transaction Processing. Data Warehouse. Decision Support. Transaction processing
Database Applications Advanced Querying Transaction processing Online setting Supports day-to-day operation of business OLAP Data Warehousing Decision support Offline setting Strategic planning (statistics)
Alejandro Vaisman Esteban Zimanyi. Data. Warehouse. Systems. Design and Implementation. ^ Springer
Alejandro Vaisman Esteban Zimanyi Data Warehouse Systems Design and Implementation ^ Springer Contents Part I Fundamental Concepts 1 Introduction 3 1.1 A Historical Overview of Data Warehousing 4 1.2 Spatial
TIM 50 - Business Information Systems
TIM 50 - Business Information Systems Lecture 15 UC Santa Cruz March 1, 2015 The Database Approach to Data Management Database: Collection of related files containing records on people, places, or things.
CHAPTER SIX DATA. Business Intelligence. 2011 The McGraw-Hill Companies, All Rights Reserved
CHAPTER SIX DATA Business Intelligence 2011 The McGraw-Hill Companies, All Rights Reserved 2 CHAPTER OVERVIEW SECTION 6.1 Data, Information, Databases The Business Benefits of High-Quality Information
CS54100: Database Systems
CS54100: Database Systems Date Warehousing: Current, Future? 20 April 2012 Prof. Chris Clifton Data Warehousing: Goals OLAP vs OLTP On Line Analytical Processing (vs. Transaction) Optimize for read, not
Data Modeling for Big Data
Data Modeling for Big Data by Jinbao Zhu, Principal Software Engineer, and Allen Wang, Manager, Software Engineering, CA Technologies In the Internet era, the volume of data we deal with has grown to terabytes
Chapter 20: Data Analysis
Chapter 20: Data Analysis Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 20: Data Analysis Decision Support Systems Data Warehousing Data Mining Classification
Data warehouse Architectures and processes
Database and data mining group, Data warehouse Architectures and processes DATA WAREHOUSE: ARCHITECTURES AND PROCESSES - 1 Database and data mining group, Data warehouse architectures Separation between
IT and CRM A basic CRM model Data source & gathering system Database system Data warehouse Information delivery system Information users
1 IT and CRM A basic CRM model Data source & gathering Database Data warehouse Information delivery Information users 2 IT and CRM Markets have always recognized the importance of gathering detailed data
Data Warehouse design
Data Warehouse design Design of Enterprise Systems University of Pavia 21/11/2013-1- Data Warehouse design DATA PRESENTATION - 2- BI Reporting Success Factors BI platform success factors include: Performance
Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland
Data Mining and Knowledge Discovery in Databases (KDD) State of the Art Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland 1 Conference overview 1. Overview of KDD and data mining 2. Data
A Design and implementation of a data warehouse for research administration universities
A Design and implementation of a data warehouse for research administration universities André Flory 1, Pierre Soupirot 2, and Anne Tchounikine 3 1 CRI : Centre de Ressources Informatiques INSA de Lyon
Data Warehousing and Data Mining. A.A. 04-05 Datawarehousing & Datamining 1
Data Warehousing and Data Mining A.A. 04-05 Datawarehousing & Datamining 1 Outline 1. Introduction and Terminology 2. Data Warehousing 3. Data Mining Association rules Sequential patterns Classification
Foundations of Business Intelligence: Databases and Information Management
Chapter 5 Foundations of Business Intelligence: Databases and Information Management 5.1 See Markers-ORDER-DB Logically Related Tables Relational Approach: Physically Related Tables: The Relationship Screen
