Nagarjuna College Of



Similar documents
Data Warehousing and Data Mining in Business Applications

Customer Analysis - Customer analysis is done by analyzing the customer's buying preferences, buying time, budget cycles, etc.

Data Mining for Successful Healthcare Organizations

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Data Mining Applications in Higher Education

Course DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Healthcare Measurement Analysis Using Data mining Techniques

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

Data Warehousing and OLAP Technology for Knowledge Discovery

Fluency With Information Technology CSE100/IMT100

Business Intelligence. Data Mining and Optimization for Decision Making

A Knowledge Management Framework Using Business Intelligence Solutions

Foundations of Business Intelligence: Databases and Information Management

Digging for Gold: Business Usage for Data Mining Kim Foster, CoreTech Consulting Group, Inc., King of Prussia, PA

How Organisations Are Using Data Mining Techniques To Gain a Competitive Advantage John Spooner SAS UK

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery

ANALYTICS CENTER LEARNING PROGRAM

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

Data Mart/Warehouse: Progress and Vision

Database Marketing, Business Intelligence and Knowledge Discovery

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

Data Mining: Overview. What is Data Mining?

Published by: PIONEER RESEARCH & DEVELOPMENT GROUP ( 28

SPATIAL DATA CLASSIFICATION AND DATA MINING

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

Supply chain intelligence: benefits, techniques and future trends

5.5 Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall. Figure 5-2

LITERATURE SURVEY ON DATA WAREHOUSE AND ITS TECHNIQUES

KNOWLEDGE BASE DATA MINING FOR BUSINESS INTELLIGENCE

Data Mining Solutions for the Business Environment

Importance or the Role of Data Warehousing and Data Mining in Business Applications

A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH

DATA WAREHOUSING AND OLAP TECHNOLOGY

Data Mining with SAS. Mathias Lanner Copyright 2010 SAS Institute Inc. All rights reserved.

TEXT ANALYTICS INTEGRATION

IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH

Data Warehouse Architecture Overview

Prediction of Heart Disease Using Naïve Bayes Algorithm

Data Warehousing: A Technology Review and Update Vernon Hoffner, Ph.D., CCP EntreSoft Resouces, Inc.

The 2012 Data Informed Analytics and Data Survey

Data Mining for Fun and Profit

Introduction to Data Mining

Designing an Object Relational Data Warehousing System: Project ORDAWA * (Extended Abstract)

B.Sc (Computer Science) Database Management Systems UNIT-V

Mario Guarracino. Data warehousing

Course MIS. Foundations of Business Intelligence

Executive Briefing White Paper Plant Performance Predictive Analytics

CHAPTER 1 INTRODUCTION

Business Intelligence Solutions for Gaming and Hospitality

Technology in Action. Alan Evans Kendall Martin Mary Anne Poatsy. Eleventh Edition. Copyright 2015 Pearson Education, Inc.

Foundations of Business Intelligence: Databases and Information Management

Learning outcomes. Knowledge and understanding. Competence and skills

The Role of Data Warehousing Concept for Improved Organizations Performance and Decision Making

Foundations of Business Intelligence: Databases and Information Management

An Overview of Database management System, Data warehousing and Data Mining

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

Introduction to Data Mining

Data Mining Algorithms Part 1. Dejan Sarka

An Overview of Knowledge Discovery Database and Data mining Techniques

Foundations of Business Intelligence: Databases and Information Management

Master of Science in Health Information Technology Degree Curriculum

International Journal of Scientific & Engineering Research, Volume 5, Issue 4, April ISSN

Web Data Mining: A Case Study. Abstract. Introduction

1. What are the uses of statistics in data mining? Statistics is used to Estimate the complexity of a data mining problem. Suggest which data mining

VisionWaves : Delivering next generation BI by combining BI and PM in an Intelligent Performance Management Framework

Part 22. Data Warehousing

A Review of Data Mining Techniques

Animation. Intelligence. Business. Computer. Areas of Focus. Master of Science Degree Program

Oracle9i Data Warehouse Review. Robert F. Edwards Dulcian, Inc.

Application of Business Intelligence in Transportation for a Transportation Service Provider

Grow Revenues and Reduce Risk with Powerful Analytics Software

14. Data Warehousing & Data Mining

Republic Polytechnic School of Information and Communications Technology C355 Business Intelligence. Module Curriculum

3/17/2009. Knowledge Management BIKM eclassifier Integrated BIKM Tools

Moving Large Data at a Blinding Speed for Critical Business Intelligence. A competitive advantage

Discovering, Not Finding. Practical Data Mining for Practitioners: Level II. Advanced Data Mining for Researchers : Level III

ก ก ก ก ก (3-0-6) ก ก ก (Introduction to Business) (Principles of Marketing)

Analyzing Polls and News Headlines Using Business Intelligence Techniques

BENEFITS OF AUTOMATING DATA WAREHOUSING

DATA MINING TECHNIQUES AND APPLICATIONS

Pentaho Data Mining Last Modified on January 22, 2007

How To Use Data Mining For Knowledge Management In Technology Enhanced Learning

Senior Business Intelligence Analyst

How To Use Data Mining For Loyalty Based Management

Statistics for BIG data

An Introduction to Data Warehousing. An organization manages information in two dominant forms: operational systems of

Tracking System for GPS Devices and Mining of Spatial Data

Introduction to SAS Risk Management

DATA ANALYSIS USING BUSINESS INTELLIGENCE TOOL. A Thesis. Presented to the. Faculty of. San Diego State University. In Partial Fulfillment

Turkish Journal of Engineering, Science and Technology

Available online at Available online at Advanced in Control Engineering and Information Science

Chapter 6 FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT Learning Objectives

COURSE RECOMMENDER SYSTEM IN E-LEARNING

HELSINKI UNIVERSITY OF TECHNOLOGY T Enterprise Systems Integration, Data warehousing and Data mining: an Introduction

How to Enhance Traditional BI Architecture to Leverage Big Data

Dynamic Data in terms of Data Mining Streams

Master Data Management and Data Warehousing. Zahra Mansoori

NEURAL NETWORKS IN DATA MINING

2015 Workshops for Professors

Hexaware E-book on Predictive Analytics

Transcription:

Nagarjuna College Of Information Technology (Bachelor in Information Management) TRIBHUVAN UNIVERSITY Project Report on World s successful data mining and data warehousing projects Submitted By: Submitted To: Submission Date:

Data Mining Data mining is the process of analyzing data from different perspectives and summarizing it into useful information that can be used to increase revenue, cuts costs, or both. Data mining software is one of a number of analytical tools for analyzing data. It allows users to analyze data from many different dimensions or angles, categorize it, and summarize the relationships identified. Data mining is the process of finding correlations or patterns among dozens of fields in large relational databases. Data mining is becoming an increasingly important tool to transform the data into information. It is commonly used in a wide range of profiling practices, such as marketing, surveillance, fraud detection and scientific discovery. Data Warehouse A data warehouse is a repository of an organization's electronically stored data, designed to facilitate reporting and analysis. The data warehouse focuses on data storage. However, the means to retrieve and analyze data, to extract, transform and load data, and to manage the data dictionary are also considered essential components of a data warehousing system. Many references to data warehousing use this broader context. Thus, an expanded definition for data warehousing includes business intelligence tools, tools to extract, transform and load data into the repository, and tools to manage and retrieve metadata. Data warehousing arises in an organization's need for reliable, consolidated, unique and integrated analysis and reporting of its data, at different levels of aggregation. Data mining Challenges To be successful, data mining requires the right team, the right methodology, the right architecture, and the right technology.

1. The Right Team Data mining projects must be a collaborative effort driven by business experts, developed by analytic modelers and supported by IT. Internal skill sets may be developed over time, which may mean initially hiring data mining consultants to develop your data mining capability with the ultimate objective of transferring knowledge to the team. To ensure a successful data mining outcome, it will need the following three classes of experts on the team: business domain experts, information technology support, and analytic modelers/data marts. 2. The Right Methodology Data mining is an ongoing process that must be maintained and changed as business drivers change. The key to a successful project is to base it on a proven methodology. Below is a data mining methodology that has delivered successful models that have uncovered millions of dollars in revenue and cost savings for customers. This section defines the data mining methodology. 3. The Right Architecture There are several data mining architectures commonly used today. They include the distributed independent data mart, data warehouse with dependent data marts, and the centralized data warehouse and mining architectures. In the data mining technique the primary architecture are process architecture and system architecture. These architecture should clearly defined. 4. The Right Technology The right technology begins with the right foundation: the database. Effective data mining depends on a comprehensive and robust data warehouse, not a summarized data mart, because it s difficult to predict the specific attributes that will contribute to a data mining model. Some companies are trying to do data warehousing with a database that was designed for OLTP operational processing of high-speed transactions. The operations performed in databases optimized for OLTP adding, deleting, modifying records, and other row-level update functions are quite different from those that are necessary to analyze large volumes of historical data, and therefore require different database capabilities

Data mining Project in Biosteel Here is the one successful data mining project of Baoshan Iron and Steel Co. Ltd. that tells us that what they do for the success of the project. 1. Introduction There are lots of problems in the operation process of metallurgical industry needed to solve, such as integrated quality control and supply chain management. Because of their multivariable and nonlinear properties, it is difficult to achieve the optimum at enterprise level by using traditional local optimizing method. The data distributing in all parts of plants are organized into data warehouse. Based on it, data mining is carried out, and the knowledge acquired from data is applied to practical control and management system, doing things better than before. 2. Data mining methodology The data mining methodology can be regarded as the meta knowledge of data mining, which shows the direction from data to knowledge. In general, the workflow of data mining can be divided into three steps: data preparation, data mining (in narrow sense), and result interpretation as shown in Figure 1. At first, data preparation provides data mining with appropriate data. Afterward data mining uses a set of algorithms to extract patterns or models from data. In the end, field experts give explanations, to convert the patterns or models into knowledge and guide daily work. Figure 1: The general workflow of data mining For metallurgical industry process field, a set of data mining methodology named SEMMAO is adopted as shown in Figure 2, which can be divided into 6 steps: sampling (S), exploring (E), modifying (M), modeling (M), assessing (A) and optimizing (O); an approach to extract knowledge from data step by step. SEMMAO methodology is derived from data mining practice in Baosteel and proved effective.

Figure 2: SEMMAO methodology The data source of data mining is data warehouse (at enterprise level) or data mart (at business division or department level). It is emphasized that data mining should be based on data warehouse rather than traditional database management system (DBMS) because of their different orientations. More specifically, DBMS has usually been used to create operational databases and on-line transaction processing (OLTP) systems. In contrast, for the purpose of statistical analysis, data mining and on-line analytical processing (OLAP), a non-standardized data structure is required. Thus data warehouse is born from the reorganization of database. The sampling step selects some samples from a large sample set according to the specific rule. It could be random sampling or nonrandom sampling. The goal of sampling is to reduce the amount of the data for next steps, and to improve the distribution of the data. The exploring step does some visual explorations to data. It can help the analyst to get acquaintance to the distribution of the data, providing useful hints for the following steps. The modifying step adjusts dissatisfactory data to meet the requirement of modeling algorithms. There are lots of modifying methods, such as missing data processing, outlier processing, contradiction processing, data standardization, variable transformation, and so on. The modeling step extracts knowledge from data with mathematical model. All models can be divided into two categories: supervised model and unsupervised model. In supervised mode, the target variables have given values. In unsupervised

mode, the target variables are absent, and accordingly data samples are divided into several clusters by only using the information of input variables, which can be also used for classification. The assessing step reports the results of modeling, error analysis and assessment of the models. As soon as being proved acceptable, models can be considered as a sort of knowledge and used for forecasting and optimizing later. The optimizing step utilizes acquired knowledge to solve practical problem. It answers questions such as "how to set the values of input variables to meet the goals of target variables". After foregoing steps, the knowledge derived from practical data is applied in producing process, bringing out new data again. Thus it forms a cycle to promote production capability continuously. 3. Data mining software tools There are lots of commercial softwares of data mining.two data mining software tools are introduced in this paper. One is Practical Miner (shortly PM); the other is SAS Enterprise Miner (shortly SAS/EM). They are proved useful by practical applications in their company. Practical Miner is a simple and practical data mining software tool, just like an automatic camera, which completes all work with just one push. It is developed by a group of Baosteel Research Institute according to SEMMAO methodology. PM is based on basic SAS platform. SAS is selected as developing and running environment, because it is the best statistical software and popular in various applications. PM has powerful function, covering the whole data mining process from data preprocessing to data presentation. Moreover, PM affords user-friendly interface, and with its Chinese help system, users can easily handle whether they are familiar with data mining technology or not. But they chose SAS/EM to data mining professional. The latest delivery version of SAS/EM was 4.2. It adapts object-oriented visual programming technology, and contains most algorithms of data mining. As powerful data mining software, SAS/EM has stricter requirement on users, who need extensive statistics knowledge.

4. Some applications in Baosteel Baosteel has accumulated lots of production data since it launched production in 1985. As the leader in steel industry, Baosteel has carried out the research and application on data warehouse and data mining keeping pace with the latest international development. Through several years' efforts, an enterprise data warehouse has been constructed. A series of data mining research and application have been taken based on it. The widest data mining applications in Baosteel focus on quality control. The first data mining case was a project of ship plate quality analysis, in which some key variables were found to improve the product quality. It helped the ship plate to get the certificate of international ship organizations, such as LR, BV, RINA, and DnV. After it, Baosteel Manufacturing Management Department applied data mining to the quality control of hot rolling mill and cold rolling mill, with profit exceeding 30 million RMB in 2001.Baosteel was entitled to top National Quality Control Award in 2001. There are also some other successful data mining cases in manufacturing management. The most profitable project of data mining in Baosteel is the optimization of iron ore mixing. The proportion of different iron ores was optimized, reducing production cost as well as assuring quality, bringing Baosteel annual profit of 60 million RMB. Data mining was also applied to the analysis of rolling plan, aiming to improve the hit rates of contracts. In addition, some work was done to optimize inventory structure to cut down inventory cost and balance resources. Data mining is applied to production process control too. For example, in the hot rolling process, a rolling stress prediction model was built by data mining. Furthermore, data mining has taken effect on enterprise marketing and sales management. On the one hand, Baosteel implements shipment by week for some important customers based on data mining in shipment period, speeding up supply chain response and improving customer service quality. On the other hand, a customer-oriented supply chain management application is under construction, whose benchmark values will be extracted from data warehouse by data mining.

5. Conclusion They discuss the data mining methodology and software tools in the manufacturing management of metallurgical industry, and introduces some practical applications in Baosteel from. As participants in the field for years, they share their experience as: a. Data mining can bring profits to conventional industry enterprise in fact. Acquiring hidden knowledge by data mining, we can promote informatization level, and convert potential productivity into realized productivity. b. Data mining is driven by application. The selection of methodology and software tools must serve for solving practical problems. Application projects can succeed based on the seamless cooperation between data mining professionals and end users. c. The knowledge discovered by data mining must be applied to problems in real world. It is the ultimate goal of informatization. 6. Other successful data mining projects Texas A&M University, College Station, TX used the data mining technique to investigate the Open Source Software (OSS) success. In this project they want to know the best way of model formulation, validation techniques, and testing approach of the software. They use the predictive modeling techniques of Logistic Regression (LR), Decision Trees (DT) and Neural Networks (NN) together for their analysis. After the use of these techniques for data analysis, the findings are used for the model formulation, validation, and testing, they get more successful than their previous research projects. According to the preliminary findings of this research, the projects that were created before the year 2003 were lesslikely to succeed as compared to the more recent projects that use data mining technique. One of the reasons can be that OSS movement isbecoming more popular and the newer projects offer more promise to developers and the users compared to theolder projects. This would also imply that with time, OSS teams are improving their project management process. Another important finding is that the number of downloads are positively related to success. Projects that have more downloads are more likely to succeed. The

number of bugs reported has a positive relationship to success. Therefore, the higher the number of bugsreported, implies that the software is being used and therefore has a positive relationship to success. The number of bugs open is an indicator of the inability of the project team to fix the bugs; therefore it has a negative impact onsuccess. The team size has a positive impact on success, so the bigger the team size, the probability of success ofthe project increases. OSS projects also have the option to use a project manager or not. Use of project managementmethods has a positive impact on success of the project.