Project Proposal (Draft) Text Mining for Business Intelligence By Abhinut Srimasorn (5322793399) Advisor Dr. Thanaruk Theeramunkong School of Information, Computer and Communication Technology, Sirindhorn International Institute of Technology, Thammasat University Semester 1, Academic Year 2013 August 1 st, 2013
Table of Contents 1 Introduction... 1 2 Background... 2 3 Objectives... 2 4 Outputs and Expected Benefits... 3 4.1 Outputs... 3 4.2 Benefits... 3
Statement of Contribution By submitting this document, all students in the group agree that their contribution in the project so far, including the preparation of this document, is as follows: Name: Abhinut Srimasorn ID: 5322793399 100%
Senior Project 2013 TM for BI 1. Introduction Since my topic "Text Mining for Business Intelligence" will be more concern about how to make a better and faster business decisions and in the accuracy way. So the studies of Business Intelligence or BI and Text Mining or TM are the most valuable for this purpose. An example case, let assume that we are a Coca Cola (Thailand). In the past year, cola market in Thailand has been experience the big change. That SermSuk Plc, formally made PEPSI in Thailand, is making new brand named est. This because of PEPSI wants to run their owned business in Thailand. Coca Cola, on the other side, they notice this move lately, so they lost to gain this opportunity to find ways to increase their revenue. If Coca Cola notice this change fast enough, they may gain advantage in cola market. So, how Coca Cola can notice this, simply by use Text Mining as a tool for their BI system. Since Coca Cola have some market information, for example Annual Report, from the PEPSI and SermSuk. But these come in the document types that contain a lot of information that need time to analyze. If they re using Text Mining technique together with BI system, they can overcome this problem easily and leads to a chance for gain benefit from this situation. The important of text mining for BI is that it will reduce the time and give you an accurate knowledge for making a business decisions. Because for doing business, the more faster decision you can make, the more benefit you will gain. In the same time, the more accurate knowledge you know, the less risk you will get. What I am going to achieve for this topic "Text Mining for Business Intelligence" are that I want to analyze both BI system and TM technique and try to find the overlap feature, which can be work together, and then improve it and made it to be new BI tool for making better business decision. Section 2 is the background of the proposal. Section 3 lists the objectives of the project. Section 4 expands on the project s outputs and benefits, which are separated into short term and long term benefits. The rest sections will be on the final proposal. School of ICT, SIIT 1
Senior Project 2013 TM for BI 2. Background Business intelligence systems are driven by data warehouses excel at telling us what happened when, but they are not very good at answering WHY. For example, we can easily discover that a product's sales margins decreased by 10% in the last quarter without knowing the reasons WHY. Since the answer for these WHY question are buried in form of text document such as marketing campaigns, contracts and government reports. So, to extend the depth of these kinds of documents, text mining must be considered. Text mining is the study and practice of extracting information from text using the principles of computational linguistics. Let me introduce you a very simple data structure in text mining called feature vector, or weighted list of words. It will list the most important words in a text along with a measure of their relative importance. To do this, text mining systems perform several operations. First, commonly used words (e.g., the, and, other) are removed. Second, words are replaced by their roots. For example, eaten and eating are mapped to eat. This provides the means to measure how often a particular concept appears in a text without having to worry about minor variations. 3. Objectives The aim of this project is to analyze both BI system and TM technique and try to find the overlap feature, which can be work together, and then improve it and made it to be new BI tool for making better business decision. In order to achieve these aims, there are 7 objectives: 1. Text mining algorithm development. 2. Extraction of knowledge from technical documentation. 3. Application of algorithms to real textual data related to Business Intelligence domain. 4. Business Intelligence analysis. 5. Business Intelligence tools development. 6. Analysis of business performance management analytic processes. 7. Decision engineering framework development. School of ICT, SIIT 2
Senior Project 2013 TM for BI 4. Outputs and Expected Benefits For this section, I will describe things that are going to be developed or deliverable. 4.1 Outputs The output of this project will be the BI tools that using Text Mining technique for business purpose. Main features of the tools are as follows: Structure the unstructured documents. Analyze the business documents faster than only using BI system. Collect the necessary information. Provide the framework for making business decisions. 4.2 Benefits For short term benefits, this tool cans be useful for organisations that want to enhance the performance of making business decision faster and accuracy. The main customer segment for this tool will be Chief Officers such as CEO, CBO, and CIO. For long term benefits, with further development of Text mining for Business Intelligence it will generate more tools for verity kind of industrials such as Hotel management and Hospital management. School of ICT, SIIT 3