Contents PREFACE FOREWORD xi xiii LIST OF CONTRIBUTORS xv Chapter 1 INFORMATION FILTERS SUPPLYING DATA WAREHOUSES WITH BENCHMARKING INFORMATION 1 Witold Abramowicz, 1 Data Warehouses 2 The HyperSDI System 4 User Profiles in the HyperSDI System 11 Building Data Warehouse Profiles 11 Techniques for Improving Profiles 18 Implementation Notes 22 25 Chapter 2 PARALLEL MINING OF ASSOCIATION RULES David Cheung, Sau Dan Lee Parallel Mining of Association Rules Pruning Techniques and The FPM Algorithm Metrics for Data Skewness and Workload Balance Partitioning of the Database Experimental Evaluation of the Partitioning Algorithms Discussions 29 29 32 33 39 48 56 62 64 65
vi Chapter 3 UNSUPERVISED FEATURE RANKING AND SELECTION Manoranjan Dash, Huan Liu, Jun Yao Basic Concepts and Possible Approaches An Entropy Measure for Continuous and Nominal Data Types Algorithm to Find Important Variables Experimental Studies Clustering Using SUD Discussion and Conclusion CONTENTS 67 67 69 72 75 76 80 82 84 Chapter 4 APPROACHES TO CONCEPT BASED EXPLORATION OF INFORMATION RESOURCES Hele-Mai Haav, Jørgen Fischer Nilsson Conceptual Taxonomies Ontology Driven Concept Retrieval Search based on formal concept analysis Conclusion Acknowledgements 89 89 91 99 104 109 109 109 Chapter 5 HYBRID METHODOLOGY OF KNOWLEDGE DISCOVERY FOR BUSINESS INFORMATION Hippe Present Status of Data Mining Experiments with Mining Regularities from Data Discussion Acknowledgements 111 111 113 118 125 126 126 Chapter 6 FUZZY LINGUISTIC SUMMARIES OF DATABASES FOR AN EFFICIENT BUSINESS DATA ANALYSIS AND DECISION SUPPORT Janusz Kacprzyk, Ronald R. Yager and Idea of Linguistic Summaries Using Fuzzy Logic with Linguistic Quantifiers On Other Validity Criteria Derivation of Linguistic Summaries via a Fuzzy Logic Based Database Querying Interface Implementation for a Sales Database at a Computer Retailer 129 129 131 134 140 147
CONTENTS vii Concluding Remarks 150 150 Chapter 7 INTEGRATING DATA SOURCES USING A STANDARDIZED GLOBAL DICTIONARY Ramon Lawrence and Ken Barker 9. Data Semantics and the Integration Problem Previous work The Integration Architecture The Global Dictionary The Relational Integration Model Special Cases of Integration Applications to the WWW Future Work and 153 154 154 156 157 160 164 169 171 171 172 Chapter 8 MAINTENANCE OF DISCOVERED ASSOCIATION RULES Sau Dan Lee, David Cheung Problem Description The FUP Algorithm for the Insertion Only Case The FUP Algorithm for the Deletions Only Case The FUP 2 Algorithm for the General Case Performance Studies Discussions Notes 173 173 176 179 183 189 194 204 208 209 209 Chapter 9 MULTIDIMENSIONAL BUSINESS PROCESS ANALYSIS WITH THE PROCESS WAREHOUSE Beate List, Josef Schiefer, A Min Tjoa, Gerald Quirchmayr Related Work Goals of the Data Warehouse Approach Data Source Basic Process Warehouse Components Representing Business Process Analysis Requirements Data Model and Analysis Capabilities Conclusion and Further Research 211 211 213 215 216 216 219 225 225
viii CONTENTS Chapter 10 AMALGAMATION OF STATISTICS AND DATA MINING TECHNIQUES: EXPLORATIONS IN CUSTOMER LIFETIME VALUE MODELING D. R. Mani, James Drew, Andrew Betz and Piew Datta Statistics and Data Mining Techniques: A Characterization Lifetime Value (LTV) Modeling Customer Data for LTV Tenure Prediction Classical Statistical Approaches to Survival Analysis Neural Networks for Survival Analysis From Data Models to Business Insight Conclusion: The Amalgamation of Statistical and Data Mining Techniques 229 229 231 232 234 235 239 244 247 249 Chapter 11 Chapter 12 ROBUST BUSINESS INTELLIGENCE SOLUTIONS Jan Mrazek Business Intelligence Architecture Data Transformation Data Modelling Integration Of Data Mining Conclusion THE ROLE OF GRANULAR INFORMATION IN KNOWLEDGE DISCOVERY IN DATABASES Witold Pedrycz Granulation of information The development of data-justifiable information granules Building associations in databases From associations to rules in databases The construction of rules in data mining Properties of rules induced by associations Detailed computations of the consistency of rules and its analysis 9. Acknowledgment 251 251 252 258 260 268 272 273 275 276 277 284 287 292 293 298 300 303 304 304
CONTENTS Chapter 13 DEALING WITH DIMENSIONS IN DATA WAREHOUSING Jaroslav Pokorny DW Modelling with Tables Dimensions Constellations Dimension Hierarchies with ISA-hierarchies 307 308 310 311 314 315 323 324 ix Chapter 14 ENHANCING THE KDD PROCESS IN THE RELATIONAL DATABASE MINING FRAMEWORK BY QUANTITATIVE EVALUATION OF ASSOCIATION RULES Giuseppe Psaila The Relational Database Mining Framework The Evaluate Rule Operator Enhancing the Knowledge Discovery Process and Future Work Notes 325 325 327 331 344 348 349 349 Chapter 15 SPEEDING UP HYPOTHESIS DEVELOPMENT Jörg A. Schlösser, Peter C. Lockemann, Matthias Gimbel Information Model The Execution Architecture of CITRUS Searching the Information Directory Documentation of the Process History Linking the Information Model with the Relational Model Generation of SQL Queries Automatic Materialization of Intermediate Results 9. Experimental Results 10. Utilizing Past Experience 1 Related Work 1 Concluding Remarks 351 352 354 356 359 361 362 364 368 369 371 372 374 374
x Chapter 16 CONTENTS SEQUENCE MINING IN DYNAMIC AND INTERACTIVE ENVIRONMENTS 377 Srinivasan Parthasarathy, Mohammed J. Zaki, Mitsunori Ogihara, Sandhya Dwarkadas Problem Formulation The SPADE Algorithm Incremental Mining Algorithm Interactive Sequence Mining Experimental Evaluation Related Work Acknowledgements 378 379 382 384 388 390 394 395 395 395 Chapter 17 INVESTIGATION OF ARTIFICIAL NEURAL NETWORKS FOR CLASSIFYING LEVELS OF FINANCIAL DISTRESS OF FIRMS: THE CASE OF AN UNBALANCED TRAINING SAMPLE Jozef Zurada, Benjamin P. Foster, Terry J. Ward Motivation and Literature Review Logit Regression, Neural Network, and Principal Component Analysis Fundamentals Research Methodology Discussion of the Results and Future Research Directions Appendix - Neural Network Toolbox 397 398 399 402 409 415 420 421 423 INDEX 425