205 A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH ABSTRACT MR. HEMANT KUMAR*; DR. SARMISTHA SARMA** *Assistant Professor, Department of Information Technology (IT), Institute of Innovation in Technology and Management, Institutional Area, Janakpuri, New Delhi 110058. **Assistant Professor, Department of Management, Institute of Innovation in Technology and Management, Institutional Area, Janakpuri, New Delhi 110058. Data mining, the extraction of hidden predictive information from large databases, is a powerful new technology with great potential to help companies focus on the most important information in their data warehouses. Data mining (also known as Knowledge Discovery in Databases-KDD) has been defined as The nontrivial extraction of implicit, previously unknown, and potentially useful information from data. Data mining is an artificial intelligence powered tool that can discover useful information within a database that can then be used to improve actions. Data mining is simply filtering through large amounts of raw data for useful information that gives businesses a competitive edge. This paper discusses a few data mining process, categorization, techniques, data mining in marketing, limitations and its future scope. KEYWORDS: Data mining, Knowledge Discovery Database (KDD), Artificial Intelligence (AI), Artificial Neural networks. 1.0 INTRODUCTION A large amount of information is collected normally in business, government departments and research & development organizations. They are typically stored in large information warehouses or bases which can range in size equivalent to terabytes, i.e., more than 1,000,000.000,000 bytes of data. Within these masses of data lies hidden information of strategic importance. But when there are so many trees, how do you draw meaningful conclusions about the forest? An attempt has been made to answer this question through Data mining. Data mining helps in finding relevant knowledge and pulling out important information that can allow us to make strategic decisions. Data mining, also known as Knowledge Discovery in Databases, can be technically defined as the automated mining of hidden information from large databases for predictive analysis. Data mining is based on mathematical algorithm and analytical skills to drive the desired results from the huge database collection. The current state-of- the-art analysis of databases is done by high-tech analysts (typically statisticians) using sophisticated tools (e.g. SPSS, SAS, S-Plus) etc. In essence these analysts are manual data miners. In contrast, data mining software technology promises to automate the analysis, allowing business users to develop a more accurate and sophisticated understanding of their data. In recent years, business intelligence systems have played pivotal roles in helping
206 organizations to fine tune business goals such as improving customer retention, market penetration, profitability and efficiency. In most cases, these insights are driven by analyses of historical data. Data mining contribution in marketing is stupendous as it provides them with useful and accurate trends about their customers purchasing behavior. Based on these trends, marketers can direct their marketing attentions to their customers with more precision. Not only can a software company advertise about a new software to consumers who have a lot of software purchasing history with its help, data mining may also help marketers in predicting which products their customers may be interested in buying. Through this prediction, marketers can surprise their customers and make the customer s shopping experience becomes a pleasant one. Data mining does not replace traditional statistical techniques. Rather, it is an extension of statistical methods that is in part the result of a major change in the statistics community. The development of most statistical techniques was, until recently, based on elegant theory and analytical methods that worked quite well on the modest amounts of data being analyzed. The increased power of computers and their lower cost, coupled with the need to analyze enormous data sets with millions of rows, have allowed the development of new techniques based on a brute force exploration of possible solutions. Data mining takes advantage of advances in the fields of artificial intelligence (AI) and statistics. It is the application of AI and statistical techniques to solve common business problems in a way that makes these techniques available to the skilled knowledge worker as well as the trained statistics professional. Both disciplines have been working on problems of pattern recognition and classification. 2.0 PROCESS OF DATA MINING The process of data mining is divided into following steps which are as follows: a) Selection Inconsistent data b) Initial Exploration Data Cleaning Missing Values (i.e. data Preparation) Noisy data Data Integration Data Reduction c) Model building or Pattern identification with validation/verification d) Deployment 3.0 CATEGORIZATION OF DATA MINING METHODS The goal of data mining is to produce new knowledge that the user can act upon. It does this by building a model of the real world based on data collected from a variety of sources which may include corporate transactions, customer histories and demographic information, process control data, and relevant external databases. The result of the model building is a description of patterns
207 and relationships in the data that can be confidently used for prediction. The ultimate goal of data mining is prediction. The technique that is used in data mining is called modeling. Modeling is simply the act of building a model on one condition where one knows the answers and then applying it to another condition that one does not know. FIGURE 1: CATEGORIZATION OF DATA METHODS 4.0 DATA MINING IN MARKETING In the recent decades, the development of information and communications technologies injects new vitality for enterprise marketing. For example, barcode technology and the emergence of online stores greatly enhance the efficiency of the enterprise because of which company managers are beginning to face the enormous data. However, the data and business profits are not directly proportional. Unfortunately, the human brain can t handle so much data. In the meanwhile, data mining technology becomes very mature in theory. The technology-oriented applications for enterprise decision makers with a new perspective to look at market. Those advanced technologies let enterprises obtain a lot of resources from different channels, and use those effective tools to translate data into unlimited opportunities. 5.0 APPLICATION OF DATA MINING IN MARKETING Data mining technology in the marketing is a relatively universal application. Such applications are referred to a Boundary Science, because it sets a variety of scientific theories in all. First, two basic disciplines: Information Technology and Marketing. Another very important basis is Statistics. In addition, it relates to the psychology and sociology as well. The charm of this area is just about the wide scope of disciplines study. Generally speaking, through the collection,
208 processing and disposal of the large amount of information involving consumer behavior, identify the interest of specific consumer groups or individual, consumption habits, consumer preferences and demand, moreover infer corresponding consumption group and the next group or individual consumption behavior, then based on them sale produces to the identification consumer groups for a specific content-oriented marketing. This is the basic idea. As automation is popular in all the industry operation processes, enterprises have a lot of operational data. The data are not collected for the purpose of analysis, but come from commercial operation. Analysis of these data does not aim at studying it, but for giving business decision-maker the real valued information, in order to get profits. Commercial information comes from the market through various channels. For example, purchasing process by credit card, we can collect the customer s consumption data, such as time, place, interesting goods or services interested, willing price and the level of reception capacity; when buying a brand of cosmetics or filling in a member form can collect customer purchase trends and frequency. In addition, enterprises can also buy a variety of customer information from other consulting firms. It should be emphasized data mining is application-oriented. There are several typical applications in banking, insurance, traffic-system, retail and such kind of commercial field. Generally speaking, the problems that can be solved by data mining technologies include: analysis of market, such as Database Marketing, Customer Segmentation & classification, Profile Analysis and Cross-selling. And they are also used for Churn Analysis, Credit Scoring and Fraud Detection. The figure below shows us the relation between application and data mining techniques. RELATION BETWEEN APPLICATION AND DATA MINING TECHNIQUES
209 The basic process of data mining in marketing is as follows: PRINCIPLE OF DATA MINING APPLICATION IN MARKETING A. PREPARE PRIMITIVE DATA It includes individual character information (such as age, gender, hobby, background, profession, address, postcode, and income), the previous purchase experience, and the relationship within customers. The preprocessing of primitive data is very important for selecting potential customers. B. ESTABLISH A CERTAIN MODEL This model may utilize plenty of traditional data mining technologies and many technologies from other related subjects. However, the problem which those technologies should solve is seeking for the best or acceptable market plan, within limited data source, limited time, and limited expense. The three limits are the fundamentality of modeling algorithm. C. ANALYZE THE MODEL AND USE THE RESULT FOR DECISION MAKING PURPOSE At last, according to the model, utilize testing data to get each pattern or parameter. Ultimately, use this model to select customers and decide marketing plan. Today, businesses are having a huge collection of data, that is available in a variety of formats. This includes: operational data, sales reports, customer data, inventory lists, forecast data, etc. In
210 order to effectively manage and grow the business, all of the data gathered requires effective management and analysis. Data Mining allows a business to collect data from a variety of sources, analyze the data using software, load the information into a database, store the information, and provide analyzed data in a useful format such as a report, table, or graph. As it relates to business analysis and business forecasting, the information analyzed is classified to determine important patterns and relationships. The idea is to identify relationships, patterns, and correlations from a broad number of different angles from a large database. These kinds of software and techniques allow a business easy access to a much simpler process which makes it more lucrative. Data mining works allows a company to use the information to maintain competitiveness in a highly competitive business world. For instance, a company may be collecting a large volume of data from various regions of the country on consumers buying behavior. The software can compile the mined data, categorize it, and analyze it, to reveal a host of useful information that a marketer can use for marketing strategies. The outcome of the process should be an effective business analysis that allows a company to fully understand the information in order to make accurate business decisions that contributes to the success of the business. An example of a very effective use of data mining is acquiring a large amount of retail chain store data and analyzing it for market research. Data mining software allows for statistical analysis, data processing, and categorization, which all helps achieve accurate results. It is mostly used by businesses with a strong emphasis on consumer information such as shopping habits, financial analysis, marketing assessments...etc. It allows a business to determine key factors such as demographics, product positioning, competition, pricing, customer satisfaction, sales, and business expenditures. The result is the business is able to streamline its operations, develop effective marketing plans, and generate more sales. The overall impact is an increase in revenue and increased profitability. For retailers, this process allows them the use of sales transactions to develop targeted marketing campaigns based on their customers shopping habits. Today, mining applications and software are available on all system sizes and platforms. For instance, the more information that has to be gathered and processed, the bigger the database. As well, the type of software a business will use depends on how complicated the data mining project. The more multifaceted the queries and the more queries performed, the more powerful system will be needed. 6.0 LIMITATIONS OF DATA MINING Data Mining systems depend on database to supply the raw data for input. The raises problems because databases tend are dynamic, incomplete, noise and large. Besides the other problems arise as a result of the adequacy and relevance of the information stored. A. LIMITED INFORMATION: A database is often designed note for data mining now a days, databases are of different types and stores data with complex and diverse data types. It is very difficult to expect a data mining system. To efficiently and effectively obtain excellent mining information on all kinds of data and sources required specialized mining algorithms and techniques.
211 B. SECURITY AND SOCIETY: Security is an important problem associated with any data collection used for decision making. Organizations collect a large amount of personal data for customers profiling and to know the last user behavior. Such a collection of data has a large amount of sensitive personal information of individuals and organizations. Such collections of confidential personal information is illegal. Thus data mining techniques provides new implicit knowledge about groups or individuals which can break privacy policies if illegally disclosed. In this way unauthorized access to personal information will threaten security if used with control. C. NOISE AND MISSING VALUES: The data mining algorithms that we assumes that the stored data is always noise free and in most cases, it is a forceful assumption. Most data sets contain exceptions invalid or incomplete information which complicates the data analyses process. Presence of noisy data reduces the accuracy of data mining results. Hence cleaning of data and its transformation becomes necessary which is a time consuming process. D. UNCERTAINITY: Here uncertainty means the severity of the error and the degree of noise in the data because data precision is an important point in a discovery system. E. SIZE, UPDATES AND IRRELEVANT FIELDS: Databases are found to be large and dynamic in the sense that their contents are always changing because information is added, modified or removed. This produces a problem in applying data mining because how it can be assured that rules are up to date and consistent with the current information. Morever the learning system has to be time sensitive because some data values vary over time and the discovery system is affected by the timeliness of the data. 7.0 CONCLUSION Data Mining is a logical process that can help in the best possible utilization of available resources. It can be a help in the decision making process of organization with respect to marketing and its various aspects along with decisions on supply chain management, logistic management. A sound knowledge of statistical packages and the various statistical techniques is a must. However the industrial experts in general and the academicians are concerned about the implication of data mining on the privacy of the data. 8.0 REFERENCES 1. Robert Groth, Data Mining: A hands-on Approach for Business Professionals. 2. Margaret H. Dunham, Data Mining-Introductory and Advanced Topics. 3. Kurt Thearling, An Introduction to Data Mining. 4. Asuncion Mochon, David Quintana, Yago Saez and Pedro Isasi W. Frawley and G. Piatetsky-Shapiro and C. Matheus, Soft Computing techniques applied to finance.
212 5. Kurt Thearling, From Data mining to database marketing. 6. Rajanish Dass, Indian Institute of Management Ahmedabad, Data mining in banking and finance. 7. AI magazine, Fall 1992, Knowledge Discovery in Databases: An Overview. 8. Alex Berson, Stephen Smith, and Kurt Thearling, Data Mining Techniques.