KNOWLEDGE DISCOVERY FOR SUPPLY CHAIN MANAGEMENT SYSTEMS: A SCHEMA COMPOSITION APPROACH Shi-Ming Huang and Tsuei-Chun Hu* Department of Accounting and Information Technology *Department of Information Management National Chung Cheng University, Taiwan Email:smhuang@mis.ccu.edu.tw ABSTRACT Decision support and global knowledge discovery on SCM system have become an important issue, since SCM has become a common way for achieving competitive advantage. This research proposes a distributed data mining mechanism that uses a relationship metadata to integrate the data source derived from partner of the supply chain, thus solving the problem of excessive and various data volume in the current supply chain. With prior relevance detection, the mining process will be much better. On the other hand, since data provided by the partner is not extremely complete for traditional data mining, we can utilize peculiarity oriented data mining process to generate the rules across the organizations in order to achieve more competitive advantage. Therefore, all important rules will be mined successfully in our architecture. The results provided by the experimental system can identify the relevant resource effectively with more user-defined requirements. Keywords: Distributed Data Mining, Granule Computing, Peculiarity Oriented Data Mining, Contingency Table, Supply Chain Management. INTRODUCTION With the growing popularity of the Internet and e-commerce, individual companies no longer compete as a solely entity, but rather as part of the supply chain. The concept of supply chain management (SCM) is to utilize some efficient ways to integrate data and process among partners to achieve higher customer satisfaction and lower cost. SCM is increasingly important for enterprises. It is considered for a necessary IT investment with better capability for SCM, it tends to keep or get more competitive advantages (6). Figure1 illustrates the supply chain process. The information flow in SCM serve as the bridge between various phases within a supply chain, for allowing supply partners to coordinate their actions and increase inventory visibility (5). Information sharing on SCM is another important topic. The degree of information sharing and the availability of information at each level of supply chain cause an immediate influence on activities in SCM. Operational performance is particularly driven by information. Some studies reveal that information sharing will provide significant cost saving and inventory reduction (10). On the other hand, decision support and decision technologies have become increasingly important in SCM (11). They enable companies to engage in smart business (s-business). It is the next stage in the evolution of business, beyond the supply chain (3). Currently, three technologies are widely adopted as decision tools: (i) data warehousing; (ii) OLAP (On-Line Analytical Process- Volume V, No 2, 2004 523 Issues in Information Systems
ing); and (iii) Data Mining. However, information and decision-supporting on SCM face two serious problems: Resource variety: the sharable information is dispersed at different members of the supply chain. Information overloading: according to the study of (7), as shown in Figure2, the rate of growth of information sources available for technical and managerial executives is rapidly increasing. Figure1 The Supply Chain Process Figure2 Growth in information sources available to corporate decision-makers (7) To resolve the above problem, some studies suggested the distributed data mining approach. However, the distributed data mining algorithms nowadays focus on homogeneous data which are horizontally or vertically partitioned into multi-parts. It is not suitable for the heterogeneous data of the supply chain environment. In this study, we propose a distributed data mining mechanism with semantic composition, which can produce the decision rules and the reasons which influence the efficiency of the supply chain. The objectives of this paper are described as follows: 1. To investigate a mechanism to for building the relationship between different data sources; 2. To investigate a mechanism for retrieving the data for mining; 3. To investigate a mining mechanism for generating the decision rules; With our approach, distributed knowledge on various resources and extremely large datasets can be properly found and integrated. The quality of knowledge will therefore be basically certified. In the next section, we first introduce some of the related works. In Section3, we discuss our distributed data mining mechanism. Section4 shows an implementation of DDM (Distributed Data Mining) mechanism using JAVA and a real case study for the feasibility analysis. The final section contains the conclusion Decision Tools on SCM RELATED WORKS Data mining refers to the process of shifting through a large amount of corporate data (1) to look up nuggets of information serving as decision support in enterprises. There are some characteristics in operation of the supply chain. (i) The relationships between the data of companies are weak! (ii) The data derived from the company, such as sales and predictive information are segmental or predictive, (iii) The data sources are heterogeneous meaning that the characteristics of sources are different. These characteristics have become more and more important in the selec- Volume V, No 2, 2004 524 Issues in Information Systems
tion of decision tools for the supply chain. The decision-makers within the supply chain have to choose a better decision model in order to achieve competitive advantage. However, there are drawbacks in decision model of SCM as follows. Simplicity: Focused on single period, single echelon, and computational difficulty. Dependency: Because of difference in driving force behind the supply chain linkage, the members of the supply chain have to figure out which model is needed. Incompleteness for recommendation: Reinventing traditional analytical tools will not be the answer to many managerial issues. Distributed Data Mining Briefly, data mining, which is referred to as known as knowledge discovery in databases, denotes to extracting or mining knowledge from a large amount of data (8), and has been recognized as a new area for database research. Further, many areas, including decision support, market strategy, and fraud detection, have been employed to extract useful information for decision making. The traditional method of data mining utilizes the centralized data, such as data warehouse; nevertheless, it is fundamentally improper for most of the distributed and ubiquitous data mining applications. A new architecture, distributed data mining (DDM) is proposed to solve this problem. Lots of DDM algorithms have been developed, such as Count Distribution (2), Data Distribution (2) and Fast Distributed Mining (4), and are designed for relational database. Owing to the diverse data, the algorithms mentioned above can not deal with heterogeneous data. (12) proposed a new algorithm, peculiarity oriented mining, to solve the problems of heterogeneous data. It focuses on exploration of peculiarity rules from the different data sources. Peculiarity rules are a typical regularity hidden in some of domains, such as scientific, statistical, and transaction database. They are difficult to be discovered by the standard association rule due to the requirement of large support. Overview MECHANISM FOR DATA MINING ON SCM Our proposed mechanism for distributed data mining in supply chain environment is depicted in Figure3. The mechanism constructs the relationship between different tables within the multi-database from different members of the supply chain. It then generates the decision rule with background knowledge. There are two basic process modules in our mechanism. The two phases are Relationship Composition (RC) and Rule Discovery (RD). Phase1: Relationship Composition (RC) The rules generated from the mining algorithm provide less background knowledge. To explore the background knowledge, the mechanism will connect the entire tables using the structure of graph to present the relationship. There are three steps for implementing the RC in this model. Step1.1: Granule Definition The objective of granule definition is to transfer the precise data into the granule which represents a specific domain. Each attribute is divided into different granules. There are two kinds Volume V, No 2, 2004 525 Issues in Information Systems
of granule style, and are defined by user: Qualitative style: to transfer the quantitative value into qualitative value. Generalization style: to transfer the particular into generalization. Figure3 An overview of our mechanism.step 1.2: Relation detection The concept of Entity Relationship Diagrams (ERD) is adopted to detect the relationship between the multi-tables dispersed to multi-databases. Primary and foreign keys are utilized to connect the tables. It does not mean that the attributes of two tables will be mutually relevant. There are two steps in this module for determine which attributes are relevant. They are: Step 1.2.1: Local Detection Its objective is to detect the relevance with two fields and calculate the degree of relation according to ERD. We adopt Pawlak s rough sets which is one of the granular computing techniques for finding out the relevant fields in the tables. Step 1.2.2: Global Detection The relationship within the company and his supplies or customers is depended on the interactive messages (IM), such as purchase order. Because the degree of normalization is different between companies, the relation within multi-tables dispersed to different companies could be divided into the following definitions. One-to-One Association: All of attributes is come from one table in the site, and is corresponding to all or part of attributes in a table in the other site. One-to-Many Association: All of attributes is come from one table in the site, and is divided into multi-tables in the other site. Many-to-Many Association: All of attributes is come from multi-tables in the site, and is arranged to multi-tables in the other site. Step1.3: Relation construction After relation detection, all relevant fields between tables were recognized. The relationship is stored with Relationship Metadata (shown in Figure4) for rule discovery. Phase2: Rule Discovery (RD) Rule discovery is responsible for discovering the peculiar rule which can reveal the further meanings. Here, we refine the peculiarity oriented multidatabase mining which was proposed by (13) in this section, we illustrate each step for the module. Volume V, No 2, 2004 526 Issues in Information Systems
TABLENAM:BELONG TABLE DATABASE INSTANCE TABLENAM:ELEMENT ELENAME TABLENAM:EQUAL SYNNO MAPNO TABLENAM:MAP MAPNO OPERATOR CONT Figure4 The schema of relationship metadata (9) Step 2.1: Peculiar Data Discovery There are many ways of finding peculiar data which have very low frequency of appearance and could be lead the company to take emergency measures. An attribute-oriented method is utilized and is different from traditional statistical methods. In this step, to calculate the peculiarity factor, the threshold, and pick up the data are over the threshold. Step2.2: Exploration of Background Knowledge Two foundations which are relationship metadata and peculiar data assist in exploring the background knowledge in this step. To explore the background knowledge, it has to investigate the relevant fields and pick up the peculiar data. Step 2.3: Rule Generation After exploration of background knowledge, the interesting information are discovered according to relationship metadata. We adopt a detailed analysis of probability-related measures associated with the rule which was given by (12) to generate and interpret the rules. The characteristics of a rule φ ψ can be represented by the following contingency table (shown in Figure5). And the peculiarity rules represent only a subset of all rules with high change of CS( φ ψ ) m( φ) m( ψ ) m( φ) support (CS), the formula of CS is m( φ) U and the rules are more peculiar. =. The value of CS is higher Figure5 Contingency Table EVALUATION OF DDM MECHANISM The quality of decision is improved with our approach, such as providing the rule derived from different organizations, since the effective of the rules are improved using Pawlak s Rough Sets and peculiarity mining to determine the relation between the data and mining from fragmented data. In this paper, we evaluate our approach from two respects. The feasibility analysis is first performed with a prototype system and a real case study. System Implementation We have developed a prototype system for feasibility study. The system interface is a web-based application, and developed using Java programming language. Sample screenshots Volume V, No 2, 2004 527 Issues in Information Systems
are shown in the following Figure6 and Figure7. Figure6 The Granule Definition Figure7 The Rule Generator Case Study: Scenario and Results with DDM Mechanism Taiwan Uncle Sam s Apparel Company is one of the wholesalers and retailers for apparel goods. In order to reduce stock, the decision makers want to understand the relationships between the supply and sales. That is, a decision tree for association rules is necessary for decision support. We adopt the transactionlog, export, and import data from the database, CRM and Quixote, of Uncle Sam s Apparel Company for the case study. According to the case, we use relationship metadata to store the relevant fields for further generating business rules. The following Figure8 and Figure9 show the result with the relationship metadata and business rules. Figure8 The Relationship Metadata of real case Figure9 The rules of DDM Mechanism CONCLUSION This paper proposes a distributed data mining mechanism to resolve the problems, such as dispersed, heterogeneous and fragmented data, since traditional mining methodology can not effectively resolve the problems of information variety and overloading. From the results of an experimented implementation of the system, we show that the proposed structure effectively generates the useful rule for decision making. Volume V, No 2, 2004 528 Issues in Information Systems
The mechanism for distributed data mining in the environment of supply chain has been introduced. In order to increase efficiency of mining from the multiple data sources, we detect the relevance of the data source using Pawlak s Rough Sets. Furthermore, our approach can store the rule relationship metadata during rule generation process and can filter the useless rules. With our approach, distributed knowledge on various resources and extreme large data sets can be properly found and integrated. The quality of knowledge will therefore be basically certified. ACKNOWLEDGEMENT The National Science Council, Taiwan, under Grant No. NSC92-2213-E-194-033 has supported the work presented in this paper. We greatly appreciate their financial support and encouragement REFERENCES 1. R. Agrawal, T. Imielinski, and A. Swami (1993). Database Mining: A Performance Perspective, IEEE Transactions on Knowledge And Data Engineering, 5(6), 912-925. 2. R. Agrawal, and J. C. Shafer (1996). Parallel Mining of Association Rules, IEEE Transactions on Knowledge And Data Engineering, 8(6), 962-969. 3. P. S. Bender (2000). Debunking 5 Supply Chain Myths, Supply chain Management Review, 4(1), 52-58. 4. D. W. Cheung, J. Han, N. Vincent T, A. W. Fu, and Y. Fu (1996). A Fast Distributed Algorithm for Mining Association Rules, In International Conference on Parallel and Distributed Information Systems, 31-42. 5. S. Chopra, and P. Meindl (2000). Supply Chain Management: Strategy, planning and operation: Prentice Hall College Div; 1st edition. 6. R. W. Dik, H. v. Lewinski, J. D. Whitaker, and J. D. Brooks (2003). A Global Study Of Supply Chain Leadership and Its Impact On Business Performance, Accenture Institute www.accenture.com. 7. F. T. Edum-Fotwe, A. Thorpe, and R. McCaffer (2001). Information procurement practices of key actors in construction supply chain, European Journal of Purchasing & Supply Management, 7(3), 155-164. 8. J. Han, and M. Kamber (2001). Data Mining: Concepts and Techniques, Hardcover ed: Morgan Kaufmann. 9. S.-M. Huang, I. Kawn, D. C. Yen, and Hsiang-Yuan Hsueh (2000). Developing an XML Gateway for Business-to-Business Commerce, In Proceeding of International Conference on Web Information Systems. 10. H. L. Lee, K. C. So, and C. S. Tang (2000). The Value of Information Sharing in a Two-Level Supply Chain, Management Science, 46(5), 626-643. 11. H.-J. Sebastian, T. Grunert, and M. E. Nissen (2002). Introduction to the Minitrack Decision Technologies for Supply Chain Management, In Proceedings of the 35th Hawaii International Conference on System Sciences, 867-868. 12. Y. Y. Yao, and N. Zhong (1999). An analysis of Quantitative Measures Associated with Rules, In Pacific-Asia Conference on Knowledge Discovery and Data Mining, 479-488. 13. N. Zhong, Y. Yao, and M. Ohshima (2003). Peculiarity Oriented Multidatabase Mining, IEEE Transactions On Knowledge And Data Engineering, 15(4), 952-960. Volume V, No 2, 2004 529 Issues in Information Systems