International Journal of Soft Computing Applications ISSN: 1453-2277 Issue 4 (2009), pp.35-40 EuroJournals Publishing, Inc. 2009 http://www.eurojournals.com/ijsca.htm An Approach for Facilating Knowledge Data Warehouse Ala a H. AL-Hamami Amman Arab University for Graduate Studies Graduate College for Computing Studies Zip Code: 11953, P.O.B. 2234 Amman, Jordan, 2007 E-mail: alaa_hamami@yahoo.com Soukaena Hssan Hashem University of Technology, Computer Science Dept Baghdad, Iraq E-mail: soukaena_hassan@yahoo.com Abstract The main promise of Business Intelligence (BI), and other knowledge-based technologies, is to provide the enterprise with the necessary knowledge to compete in the global economy. From a technical point of view, the DWH is basically a large reservoir of integrated data. The DWH does not provide the intelligence or the knowledge sought by users. The burden of data analysis, and information and knowledge extraction from such analysis, lies upon the analyst. The DWH merely provides the right environment that allows the analyst to achieve such goals. In the recent times the use of data mining is increased and being very usual especially with data warehouse after was previously difficult, so in addition to the results of warehouse application SQL and OLAP there are also the results of DM. The main problems addressed in this research are the creation, and sharing of information/knowledge from the data warehouse and, as a derived problem, the increasing population of the DWH users. We are proposing an alternative method to the capturing and distributing of information and/or knowledge obtained from the DWH, we called the knowledge warehouse (KWH). In this research the following suggestions have been proposed: grouping all the results obtained with warehouse then store and organize these results with new suggested database suitable for saving these different results. The database will be saved on new suggested server added to the traditional architecture of the warehouse to make the infrastructure of warehouse supporting the new suggested database. The suggested database will be knowledge base which stores all the results of SQL, OLAP and DM. The purpose of this suggestion is: instead of extracting the results from warehouse databases by using an extraction tool (Data Mining, SQL, or OLAP), this research aims to save the time by searching the stored results of previous analysis to check if the desired analysis is extracted and stored previously. The result of the analysis will be displayed directly since it is suitable to be presented for user request. This will save the time and gives a fast and accurate result. If there is no result convenient for the user request, the system will use a tool for extraction to meet the user requirements.
An Approach for Facilating Knowledge Data Warehouse 36 Keywords: SQL, OLAP, DM, knowledge base, and Data warehouse. 1. Introduction A data warehouse means different things to different people. Some definitions are limited to data; others refer to people, processes, software, tools, and data. One of the global definitions is that the data warehouse is a collection of integrated, subject-oriented databases designed to support the Decision- Support Functions (DSF), where each unit of data is relevant to some moment in time. Based on this definition, a data warehouse can be viewed as an organization's repository of data, set up to support strategic decision-making. The function of the data warehouse is to store the historical data of an organization in an integrated manner that reflects the various facets of the organization and business. The data in a warehouse are never updated but used only to respond to queries from end users who are generally decision-makers. Typically, data warehouses are huge, storing billions of records. In many instances, an organization may have several local or departmental data warehouses often called data marts. A data mart is a data warehouse that has been designed to meet the needs of a specific group of users. It may be large or small, depending on the subject areas. The DWH is a relatively new concept/technology that came about in response to a major business need: The analysis of extremely large volumes of historical, disparate data in an efficient manner to help answer difficult businesses questions like "What segment of customers will respond favourably to a certain marketing campaign?" or "Which credit card customer segment will most probably default on payments for more than three months?" etc. Existing technology before the DWH lacked the ability to integrate the disparate data or efficiently extract accurate answers to such questions. Most information systems at that time were designed to produce pre-defined reports containing shallow knowledge. 2. The Proposed System To explain the proposed system in all its details there is a need to discuss some important issues. 2.1. Warehouse Operations There are three basic applications with the warehouse, these are: 1. Data mining (DM): represents one of the major applications for data warehousing, since the sole function of a data warehouse is to provide information to end users for decision support. Unlike other query tools and application systems, the data-mining process provides an end-user with the capacity to extract hidden, nontrivial information. 2. Structured Query Languages (SQL): is a standard relational database language that is good for queries that impose some kind of constraints on data in the database in order to extract an answer. In contrast, data-mining methods are good for queries that are exploratory in nature, trying to extract hidden, not so obvious information. SQL is useful when we know exactly what we are looking for and we can describe it formally. We will use data-mining methods when we know only vaguely what we are looking for. Therefore, these two classes of data-warehousing applications are complementary. 3. On Line Analytical Process (OLAP): tools and methods have become very popular in recent years as they let users analyze data in a warehouse by providing multiple views of the data, supported by advanced graphical representations. In these views, different dimensions of data correspond to different business characteristics. OLAP tools make it very easy to look at dimensional data from any angle or to slice-and-dice it. Although OLAP tools, like data-mining tools, provide answers that are derived from data, the similarity between them ends here. The derivation of answers from data in OLAP is analogous to calculations in a spreadsheet; because
37 Ala a H. AL-Hamami and Soukaena Hssan Hashem they use simple and given-in-advance calculations, OLAP tools do not learn from data, nor do they create new knowledge. They are usually special-purpose visualization tools that can help end-users draw their own conclusions and decisions, based on graphically condensed data. OLAP tools are very useful for the data-mining process; they can be a part of it but they are not a substitute. 2.2. The Design and Infrastructure of the Proposed System After explaining the three basic operations with warehouse and we saw how the results were extracted from warehouse by these operations, it is noticeable that they are different in structure. This research suggests the following steps: First step: this step will explain the proposed architecture of the proposed system, see Figure 1. This architecture will composed of the following components: Figure 1: the proposed system architecture 1. KWH-Manager: this component will represent the interface between the user and the data warehouse where, the user will present the request over one of the warehouse operations, and then waiting for the results. The request may be a query for SQL, request for analyze specific probability by OLAP, or a request for prediction novel pattern for specific knowledge from the data stored in warehouse by data mining techniques. 2. KWH-base: it is a proposed base which contains the results files of previous user's requests of SQL, OLAP and DM.This knowledge warehouse base will have four attributes, see Figure 2. The first is called the type of the file operation result (SQL, OLAP and DM), and the second attribute called name of the result file, the third attribute called path of the result file (the storing location in the proposed server) and the fourth attribute called metadata. This will present the basic keywords and subject of the results for SQL, subject of analysis for OLAP, or subject to extract the novel pattern for DM. This knowledge base deals with a local search engine, which takes the request of the user and the type of results (SQL, OLAP, and DM) from the KWH-interface and then search in the KWH-base. Finally if this engine finds the desired results file it will load it and display the content over the KWH-interface to the user. If it is not found the desired file it will present the requested information to Warehouse system to extract the results according to its traditional manner then take the results and store it and download all its information and metadata to the KWH-base.
An Approach for Facilating Knowledge Data Warehouse 38 Figure 2: The Attributes of Type of file Name of file Path of file Metadata of Content SQL customer avg c:\my document video store, customer Second step: Now these two components (KWH-Manager and KWH-base) and the search engine must be stored and implemented in the architecture, Figure 3 presents the general traditional warehouse architecture while Figure 4 shows the general proposed warehouse architecture. Figure 4 contains the added new component which is KWH-Server. This server will accommodate all files of the results, KWH-base and KWH-Manager. Figure 3: Warehouse Architecture Figure 4: The proposed architecture KWH- Server 3. The Implementation To implement the proposed system, it will begin with the main interface which represents the KWHmanager, see Figure 5. Figure 5: KWH-Manager
39 Ala a H. AL-Hamami and Soukaena Hssan Hashem This interface accepts the request from the user and then analyzes the query to extract the critical keywords. It takes these keywords and submits them as an input for the local search engine to search the KWH-base (see Figure 2) by comparing these keywords with the keywords recorded in metadata attribute to find the similarities. If there is no similar query or analysis found in the KWHbase, the local search engine will notify the KWH-Manager that: there is no suitable results for the submitted query and the system must extract the results from the warehouse databases. Figure 6 shows the process of taking the critical keywords that extracted from the submitted query to determine which operation of warehouse will be considered to extract the desired result. The critical keywords will be supplied to the related procedure from small database, see Figure 6. Figure 6: Small database Operation type SQL OLAP DM Related keywords all record, records with attribute a has value b, compare, Analyze prediction, Classify This small data base has two attributes the first one called operation type and the second called related keyword. By using this database, the system will determine which operation of warehouse must be applied to extract the results. For example if the critical keywords of the submitted query are: salary, loans, customers and relation, the procedure takes these keywords and search the small database. If the keyword relation in the second attribute, it will take the OLAP operation type from the first attribute which lies on the same raw. Then the proposed system will extract the result from warehouse using OLAP technique and save these results in a file in the proposed server. Also it will save the file name, file path, metadata (critical keywords obtained from the analysis) and the warehouse operation type in the proposed KWH-base. Finally the obtained results will be displayed. If the request is already stored in the KDW-base, the system will take the query for analysis and extract the critical keywords. The critical keywords, then submitted to the local search engine to search the KWH-base by compare it with metadata of all lines. If the local search engine finds the suitable line which contains the convenient metadata for critical keywords then this engine will take the operation type, name and path of the file and display its contents. The display process (visualization) will depend on the type of operation. The user will gain all the desired results in much more speed since the results are retrieved and not extracted. 4. Conclusions 1. The traditional manner of warehouse is to deal with different customer query by submitting the query and extracts the knowledge from the data in it according to the operation types that determined by the customer. This will take a considerable time and less automatic operation. 2. The proposed system aims to make the warehouse works in full automatic, by allowing the user to write the query results without determining the operation that is suitable for the query. 3. The proposed system aims to reduce the spent processing time as much as possible. This is done by retrieving all the results that obtained previously if they are exist in the KDW-base. 4. KWH-manger represents the basic step in the proposed system since it receives the query and then sends it for analysis. This will produce the critical keywords of query to the local search engine.
An Approach for Facilating Knowledge Data Warehouse 40 5. KWH-base represents the core of the system since it represents the proposed database which will be the storage of the previous results. So the local engine will search it to check if the results extracted previously are exist to display it immediately without any extraction process. 6. To make the proposed system efficient, this is done by building the KWH-base to have four attributes: metadata attribute which will depended for searching by comparing it with critical keywords, name and path file which determine the file location in the proposed server and the last attribute referring to the operation type. References [1] Alberto Pan and Angel Vina; An Alternative Architecture for Financial Data Integration; CACM, (5/2004), Vol. 47, No. 5, pp. 37-40. [2] Alkis Simitsis; Mapping Conceptual to Logical Models for ETL Processes; DOLAP Proceedings, (4-5/11/2005), Bremen, Germany; pp. 67-76. [3] Alkis Simitsis, Panos Vassiliadis, and Spiros Skiadoupolos; Conceptual Modeling for ETL Processes; DOLAP Proceedings, (8/11/2002), Bremen, Germany; pp. 14-21. [4] Angela Bonifati, Stephano Ceri, Alfonso Fuggetta, Stephano Paraboschi; Designing Data marts for data warehouses; ACM transactions on Software Engineering and methodology, (Oct. 2001), Vol. 10, No. 4, pp. 452-483. [5] Anne-Muriel Arigon; Handling Multiple Points of View in a Multi-Media Data Warehouse; ACM Transactions on Multi Media Computing, Communications and Applications, Vol. 2, No. 3, pp. 119-218, August 2006. [6] Arron Ceglar, John Roddick; Association Mining; ACM Computing Surveys, Vol. 38, No. 2, Article 5, pp. 1-42, July 2006. [7] Arun Sen and Atish P. Sinha, A comparison of data warehousing methodologies using a common set of attributes to determine which methodology to use in a particular warehousing project, CACM, March 2005, Vol. 48, No. 3, pp. 79-84.