Knowledge Creation Opportunities in the Data Mining Process



Similar documents
CRISP-DM, which stands for Cross-Industry Standard Process for Data Mining, is an industry-proven way to guide your data mining efforts.

How To Use Data Mining For Knowledge Management In Technology Enhanced Learning

Database Marketing, Business Intelligence and Knowledge Discovery

Business Intelligence and Decision Support Systems

CRISP - DM. Data Mining Process. Process Standardization. Why Should There be a Standard Process? Cross-Industry Standard Process for Data Mining

Research Methods: Qualitative Approach

DMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support

A Knowledge Management Framework Using Business Intelligence Solutions

Chapter 13: Knowledge Management In Nutshell. Information Technology For Management Turban, McLean, Wetherbe John Wiley & Sons, Inc.

Introduction to Management Information Systems

5. SOCIAL PERFORMANCE MANAGEMENT IN MICROFINANCE 1

Knowledge Management

Global Headquarters: 5 Speen Street Framingham, MA USA P F

Information Visualization WS 2013/14 11 Visual Analytics

one Introduction chapter OVERVIEW CHAPTER

How To Teach A Health Theory Course

Data Mining for Successful Healthcare Organizations

Database Marketing simplified through Data Mining

relevant to the management dilemma or management question.

Soft Skills Requirements in Software Architecture s Job: An Exploratory Study

Customer Analysis - Customer analysis is done by analyzing the customer's buying preferences, buying time, budget cycles, etc.

Business Intelligence Engineer Position Description

Qualitative data acquisition methods (e.g. Interviews and observations) -.

CHAPTER III METHODOLOGY. The purpose of this study was to describe which aspects of course design

Data Mining Applications in Higher Education

POLAR IT SERVICES. Business Intelligence Project Methodology

Introduction to Business Intelligence

TEXT ANALYTICS INTEGRATION

The Impact of Market Orientation and IT Management Orientation on Customer Relationship Management (CRM) Technology Adoption

Chapter 9 Knowledge Management

Computing & Communications Services

Empower loss prevention with strategic data analytics

Cover Page. The handle holds various files of this Leiden University dissertation.

White Paper. Data Mining for Business

Building a Data Quality Scorecard for Operational Data Governance

OCCUPATIONAL STANDARD (For use in the development of supply chain related job descriptions, performance evaluations, career development plans, etc.

Methods in Case Study Analysis

SPATIAL DATA CLASSIFICATION AND DATA MINING

It s about you What is performance analysis/business intelligence analytics? What is the role of the Performance Analyst?

Appendix B Checklist for the Empirical Cycle

Miracle Integrating Knowledge Management and Business Intelligence

Framing Requirements for Predictive Analytic Projects with Decision Modeling

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

BENEFITS REALIZATION FROM ERP SYSTEMS: THE ROLE OF CUSTOMIZATION

IBM Social Media Analytics

10 Biggest Causes of Data Management Overlooked by an Overload

FEP Market Research Lge 508. Defining the Market Research Problem. Ana Brochado 1

CHAPTER 5: BUSINESS ANALYTICS

Application of Business Intelligence in Transportation for a Transportation Service Provider

Living and Learning with Technology: Faculty as Reflective Practitioners in the Online Classroom Patricia A. Lawler, Kathleen P. King, Stephen C.

KNOWLEDGE BASE DATA MINING FOR BUSINESS INTELLIGENCE

The Integration of Agent Technology and Data Warehouse into Executive Banking Information System (EBIS) Architecture

BUSINESS INTELLIGENCE AS SUPPORT TO KNOWLEDGE MANAGEMENT

Self-Service Big Data Analytics for Line of Business

INFO What are business processes? How are they related to information systems?

THE INTELLIGENT BUSINESS INTELLIGENCE SOLUTIONS

5 PRACTICES. That Improve the Business Impact of Research EFFECTIVE DSTAKEHOLDERS

DATA VISUALIZATION: When Data Speaks Business PRODUCT ANALYSIS REPORT IBM COGNOS BUSINESS INTELLIGENCE. Technology Evaluation Centers

The Art and Science of Teaching the Common Core State Standards

FINANCE AND ACCOUNTING OUTSOURCING AN EXPLORATORY STUDY OF SERVICE PROVIDERS AND THEIR CLIENTS IN AUSTRALIA AND NEW ZEALAND.

Data Mining Solutions for the Business Environment

Center for Effective Organizations

A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH

Grounded Theory. 1 Introduction Applications of grounded theory Outline of the design... 2

Data Mining is sometimes referred to as KDD and DM and KDD tend to be used as synonyms

The Applications of Business Intelligence to the Improvement of Supply Chain Management A Case of an Electronic Company

Enhancing Decision Making

WHITEPAPER. Creating and Deploying Predictive Strategies that Drive Customer Value in Marketing, Sales and Risk

ECLT 5810 E-Commerce Data Mining Techniques - Introduction. Prof. Wai Lam

School of Advanced Studies Doctor Of Management In Organizational Leadership. DM 004 Requirements

Qlik s Associative Model

IBM Social Media Analytics

What is Customer Relationship Management? Customer Relationship Management Analytics. Customer Life Cycle. Objectives of CRM. Three Types of CRM

Five Ways Retailers Can Profit from Customer Intelligence

The role of intuitive feelings as a diagnostic tool in the criminal investigation process

The role of business intelligence in knowledge sharing: a Case Study at Al-Hikma Pharmaceutical Manufacturing Company

Research Methods Carrie Williams, ( Grand Canyon University

Usability Evaluation with Users CMPT 281

20 A Visualization Framework For Discovering Prepaid Mobile Subscriber Usage Patterns

Tom Khabaza. Hard Hats for Data Miners: Myths and Pitfalls of Data Mining

Rethinking Information Security for Advanced Threats. CEB Information Risk Leadership Council

How To Use Data Mining For Loyalty Based Management

Navigating Big Data business analytics

How Do I Choose my KPIs?

Transcription:

Knowledge Creation Opportunities in the Data Mining Process M. Kathryn Brohman Queen s School of Business, Queen s University kbrohman@business.queensu.ca Abstract Nonaka s modes of knowledge were used to ground an exploratory study of knowledge creation opportunities in the data mining process. A two-phased research study, including 49 interviews with data analysts and decision makers, was completed. Results support the idea that multiple knowledge creation opportunities exist throughout the data mining process. Prior research has defined a data warehouse as a support system for transforming explicit knowledge into new explicit knowledge by merging, categorizing, reclassifying, and synthesizing data [1]. By expanding the research scope from technology to the process, evidence was found that data analysts convert tacit knowledge to new tacit knowledge through social interaction, tacit knowledge to explicit knowledge through evaluation of data mining results, and explicit knowledge to new tacit knowledge through learning from deployment of a decision. This paper explains how one organization implemented a knowledge-oriented data mining process; results of the implementation are presented. 1. Introduction Researchers have determined that today's information management technologies (i.e., data warehouses) are more complex than traditional databases and that effective usage of these technologies can generate extensive benefits for an organization [12, 16, 27]. One determinant of success or failure is an organization's ability to convert data into knowledge. The process of converting large sets of data to knowledge by use of engineering mathematical patterns (i.e., analysis) is commonly defined an "data mining" [11, 17]. Data mining and knowledge management are the two primary activities that make up the business intelligence market; a market predicted to be worth $150 billion by 2006 [3]. To date, researchers have examined data mining and knowledge management as separate fields of study. This paper attempts to integrate these fields to explore the extent to which knowledge creation provides a new perspective on data mining. A general agreement is that data mining is an explorative and iterative process; an extensive literature review uncovered several models that describe the data mining process [10, 20, 22]. Saarenvirta (1998) described six steps: business requirements analysis, data requirements analysis, data mining opportunity identification, data mining project implementation, business application, and business results analysis. Later, Shearer (2000) introduced the Cross-Industry Standard Process for Data Mining (CRISP-DM) model; this model describes actions taken by a data analyst to develop a predictive model. CRISP-DM has six stages; business understanding, data understanding, data preparation, modeling, evaluation, and deployment. The Business Intelligence Value Chain (BIVC) (see Figure 1) is a data mining process model that specifically defines the insight generation process in data mining [6]. This model was developed and validated by interviews conducted with 15 data analysts and decision makers from 6 large organizations. Nineteen codes were created based on stages and factors in the BIVC. A total of 162 interview pages were analyzed using Miles and Huberman's (1994) qualitative data analysis technique. A thorough description of analysis and results of the interviews are described in Brohman, Parent, Pearce, and Wade (2001). The BIVC describes actions taken by both decision makers and data analysts to support effective decision making through the development of new ideas and insights. The value chain concept implies that all of the stages and relationships in the model add value to the process. 0-7695-2507-5/06/$20.00 (C) 2006 IEEE 1

DATA MINING STAGES INITIATION ANALYSIS EVALUATION DEPLOYMENT Clarify Data Analysis Needs Clarify Business Problem Data Understanding and Cleaning Strengthen Business Case BUSINESS INTELLIGENCE DEPLOYMENT Decision Making BUSINESS PROBLEM TASK APPROACH Report Quality Evaluate and Interpret Results Roles: Decision Maker DM & A Analyst DM & A Structured Data Analysis BUSINESS VALUE Analyst DM&A Decision Maker Decision Maker SOCIALIZATION COMBINATION EXTERNALIZATION INTERNALIZATION MODES OF KNOWLEDGE CREATION Figure 1: The business intelligence value chain and modes of knowledge creation Four primary stages define all data mining process models: initiation, analysis, evaluation, and deployment [6, 10, 22]. Initiation is the translation of the business problem into a data analysis task. The analysis stage consists of preparing and cleaning the data, analyzing the data, and producing a report to summarize results. In evaluation, the results are assessed for accuracy and relevancy and interpreted in relation to the business problem. Finally, deployment is the act of using the results to make a decision and monitoring the deployment to learn from its success or failure. Before attempting to identify knowledge creation opportunities in the data mining process, it is important to define the difference between data, information, and knowledge. Knowledge is authenticated information that is created from processing raw data [13, 18]. Nonaka (1994) defined two dimensions of knowledge in organizations: tacit and explicit. Tacit knowledge is comprised of both cognitive and technical elements. Cognitive elements refer to an individual's mental model consisting of mental maps, beliefs, paradigms, and view-points. The technical component consists of concrete know-how and skills that apply to a specific context. An experienced bank manager that believes he inherently knows what type of customer will default on loan payments is an example of tacit knowledge. Some decision makers make decisions on tacit knowledge; this is often referred to as relying on your gut. However, other decision makers may choose to seek explicit knowledge before making a decision. Explicit knowledge is articulated, codified, and communicated in symbolic form and/or language. In a review of knowledge management systems, Alavi and Leidner (2001) defined a data warehouse and data mining tools as technologies that support knowledge creation. They argue that these technologies enable the merging, categorizing, reclassifying, and synthesizing of explicit knowledge into new explicit knowledge. Knowledge creation is the result of cognitive processing triggered by the inflow of new stimuli [9, 24]. The process of knowledge creation involves a continual interplay between the tacit and explicit dimensions of knowledge [18]. Table 1 defines Nonaka's four modes of knowledge creation: socialization, combination, externalization, and internalization. The last row in the table lists information systems that support each mode of knowledge creation [1]. Computer-mediated communications enable socialization by providing a forum for constructing and sharing beliefs, for confirming consensual interpretation, and for allowing expression of new ideas [2, 4]. Collaboration and collaboration systems facilitate externalization by enabling individuals to communicate, articulate, and codify tacit knowledge by sharing documents and sharing applications (e.g., team software development software). Intranets enable internalization by providing exposure to greater amounts of on-line organizational information. Like Alavi and Leidner (2001), this research identifies data warehouse and data mining technology as support systems for the combination mode of knowledge creation. However, the study examines beyond the technology to explore the idea that other knowledge creation opportunities exist throughout the data mining process. 2

Table 1: Modes of knowledge creation and supporting technologies Mode Interplay Relationship Definition Socialization Tacit to Tacit Conversion of tacit knowledge to new tacit knowledge through social interactions and shared experience among organizational members. Combination Explicit to Creation of new explicit knowledge by merging, Explicit categorizing, reclassifying, and synthesizing existing explicit knowledge. Externalization Tacit to Explicit Creation of new explicit knowledge from tacit knowledge by supporting beliefs, paradigms, and view-points with codified evidence. Internalization Explicit to Tacit Creation of new tacit knowledge from explicit knowledge by understanding and learning from reports or discussion. Information System Computer-mediated communication Data warehousing, data mining, document repositories, and software agents Collaboration, coordination, & (e.g., GSS) Intranets To examine knowledge creation opportunities in data warehousing, interviews were conducted with 49 data analysts and decision makers in seven organizations. Results concluded that there is knowledge creation opportunity beyond the combination mode in data mining. This research aims to achieve two main objectives: (1) extend the BIVC to describe knowledge creation opportunities in the data mining process, and (2) identify data mining tasks (i.e., trigger types) and outcomes that explain variation in the data mining process. 2. Research Methodology A two-phased research approach was adopted; in both phases the unit of analysis was the data mining task. The purpose of the first phase was to extend the BIVC to identify knowledge creation opportunities in data mining. A series of 23 semi-structured interviews were conducted with data warehouse users in seven organizations. An interview guide (Appendix 1) was used to focus the interview on the research questions [14, 29]. As Phase 1 interviews were exploratory in nature and knowledge creation opportunities were not yet identified, the questions did not address knowledge creation specifically. Questions were phrased in the context of decision support as this is a more common and broader definition of data warehouse success [11, 19, 27]. Interviewers were conscious of the risk that knowledge-oriented questions may force interviewees into a knowledge creation mindset. Possible answers to the questions were not predetermined and the interviewer used discretion in asking questions to encourage participants to be open in their responses. All seven sites were large organizations that had implemented a data warehouse or data mart infrastructure. Participating organizations represented a cross-industry sample: two from retail, three from finance, and two from telecommunications. These industries were chosen as they represented those most advanced in data mining usage [12, 19]. The second phase was an in-depth case study in a large financial organization in Canada. At the time of the research, the organization had 14,000 employees and annual revenue of $2 billion. The objective of Phase 2 was to demonstrate the relevance and accuracy of the extended BIVC model and identify triggers and outcomes of the data mining process. The case study began 12- months after completion of Phase 1 and included a series of 16 semi-structured interviews; eight data analysts and eight decision makers. As the goal of this research phase was to describe and demonstrate knowledge creation opportunities, the interview guide (Appendix 2) focused more specifically on knowledge creation in the data mining process. Decision makers and data analysts were asked to explain specific data mining tasks and to highlight events during the process where new ideas were generated. This group of 16 individuals represented the doers in the process. To capture the perspective of those who manage the process, ten decision support and marketing managers participated in a focus group to identify macro-level triggers and outcomes specific to the case site. 3. Phase 1: Analysis and Results Multiple units of analysis were interviewed in each of the seven participating organizations; 13 data analysts and 10 decision makers. Mean data warehouse experience was 5 years, with a range from 1.5 years to 12.5 years. Variation in company and data warehouse experience illustrated that multiple perspectives were gathered in the exploratory study. Participant demographics also varied across industries; finance participants were more experienced data warehouse users than participants from retail and telecommunications. For this reason, a financial organization was chosen as the case study site for Phase 2 of the research project. 3

Miles and Huberman s [15] qualitative data analysis technique was used to analyze interview data; a coding scheme was derived based on Nonaka's (1994) knowledge dimensions and data mining stages [6, 10, 22]. Two individuals were employed to code a subset of the interview transcripts. Each coder was given the coding scheme as well as a copy of the interview guide. Interviews from three companies, one from each industry, were analyzed first. For each case, both coders developed a high-level conceptual model of where knowledge was created throughout the data mining process as well as the types of users involved. Within case models were compared first, then models across organizations to develop a comprehensive cross-case model [8]. Consistent with Miles and Huberman's recommendations, discrepancies were discussed between coders and minor refinements of definitions were made. Additional constructs were identified to develop the final conceptual model and coding scheme. The primary researcher adopted the revised coding scheme and analyzed the interviews from the four remaining organizations; results were used to generalize constructs across organizations. Results supported the hypothesis that combination is not the only knowledge creation opportunity in the data mining process; the role of other knowledge modes in the data mining are illustrated in Figure 1 and described in detail below. 3.1. Socialization A participant explained that decision makers initiate the data mining process by defining one or more ideas related to the business problem at hand. Other research on the data mining process has identified the manager as the one responsible for bringing domain knowledge to the data mining process. They acquire knowledge about causal relationships through their experience [10]. Once presented with the business problem, data analysts interact with decision makers to form the data mining task. Data analysts may identify new factors based on their knowledge of patterns in the data; alternatively, they may identify possible biases and selection effects that could limit the generalization of patterns to cases not recorded in the database. Through socialization in the early stages of the data mining process, participants explained that decision makers and data analysts share experiences and perspectives that may lead to the generation of new knowledge. A specific example of socialization in the data mining process was found at one of the financial organizations, A manager was informed that his bank had higher default figures for personal loans than the industry average. Based on the manager s experience, he identified possible factors that may have caused the problem and asked a data miner to analyze the influence of these factors on loan default behavior. When presented the task, the data miner informed the manager that she knew of other factors that may have caused the problem based on her knowledge of variable relationships uncovered in previous analyses. After an in-depth discussion of each factor, the manager explained that he developed a broader scope of the problem and both he and the data miner gained new knowledge that lead to more effective analysis of the problem. 3.2. Combination In the combination phase, the data miner transforms explicit knowledge in the data warehouse into new explicit knowledge by merging, categorizing, reclassifying, and synthesizing data related to the task at hand. First, the research analyst uses queries and visualization to explore the data. Data preparation covers all activities completed to construct the final data set [22]. Data analysis includes the application of mathematical and statistical techniques to conduct simple analysis, intermediate analysis, or complex analysis. The role of the data warehouse in transforming explicit knowledge into new explicit knowledge was commonly discussed by research participants. Data analysts spoke of merging and categorizing data to find new patterns. However, an unexpected result from this study was the risk that uncovering patterns in data may not always lead to accurate explicit knowledge about the relationship between factors. A data analyst from the retail industry explained, Sometimes we find patterns in data because they have been put there by the organization. For example, once I found that students were particularly interested in a specific product and made a recommendation that the marketing effort for this product be directed at the student demographic. When I presented this to the manager, I found out that the correlation between factors was due to success of a prior campaign that offered a product discount to students. 3.3. Externalization Once analysis is complete, the data analyst summarizes findings in a report and makes recommendations related to the business problem. This report is given to the manager for evaluation; the manager searches the report for coded evidence to support the beliefs, paradigms, and view-points he applied to develop the business problem and/or translate the business problem into the data mining task. When asked how new insights were uncovered, one data analyst from a telecommunications organization explained: My manager wanted me to investigate some variables he expected may be related to loyalty to our services I had my doubts, but I did what he asked. One factor, the type of service, was highly significant to loyalty this surprised me and I didn t understand its 4

relevance until my manager explained his thinking. The other factors were not significant. We worked together to make sure I didn t miss anything in the analysis, in the end I believe he gained new insight, I know I learned from the experience. 3.4. Internalization Internalization opportunities exist in the deployment and follow-up stages of the data mining process. Deployment is the action the manager takes, most commonly a decision made, as a result of the analysis and interpretation completed. For example, a car manufacturer may make a decision about a design change to improve the safety of a vehicle after analysis and interpretation of fatal accident data [10]. Results may also be deployed in the form of a new campaign to target new customers, retain existing customers, or create new products or services. The opportunity for internalization exists in the manager s effort to monitor the success of the deployment and learn what went right, what went wrong, and why. Dedicating time and energy to the follow-up stage of the data mining process enables decision makers to generate new beliefs, paradigms, and viewpoints based on the success of the decision or campaign. One manager from the retail industry explained how new tacit knowledge was formed from the results of a sales campaign: We sell a lot of seasonal products and ever since I ve been here, we have placed all these products in the seasonal areas of our stores. I believe the primary motivation to do so was for in store efficiency and convenience but no one really knew why this decision was made. So I decided to challenge it. I asked one of my analysts to do some basket analysis to determine what products sell with seasonal products; I wanted to see if a change in store design would improve our seasonal merchandise sales. So, we completed the analysis and decided to distribute some of our seasonal goods within non-seasonal areas of the stores, we moved them closer to products that commonly sold together. After a year trial with the new store design, we determined that seasonal sales of these products decreased. I have some ideas that may explain why this happened. Without further examination, it is not clear whether the manager in this situation explained the result based on pre-existing beliefs and viewpoints, or created new beliefs and viewpoints from what he learned during the experience. Internalization only occurs when new beliefs and new viewpoints are developed as a result of what the manager learned from the deployment. 4. Phase 2: Analysis and Results The relevance of the extended BIVC model was demonstrated in a single case study at a large financial institution. Sixteen data analysts and decision makers were asked about knowledge creation opportunities in the data mining process. All participants made reference to combination and externalization modes of knowledge creation in the data mining process. Five of the eight decision makers (62%) interviewed made reference to learning from deployment of decisions and campaigns. All eight data analysts talked about gaining new insight during socialization, however only 2 of the 8 decision makers (25%) made a reference to this mode of knowledge creation. One interpretation of this result is that data analysts have more to gain in terms of domain knowledge in the socialization stage, decision makers are either already aware of knowledge in the database or they find the knowledge gained less relevant at this stage in the data mining process. The idea that data analysts alone do not have the necessary business knowledge to evaluate and interpret data mining results has been presented in literature [20]. Interview transcripts were also used to identify a preliminary list of triggers and outcomes that explain variation in the data mining process. Results from analysis were presented to ten marketing and decision support managers in a workshop; managers discussed the preliminary list, modifications were made to address concerns, and a final list of triggers and outcomes was developed (see Table 2). 5

Table 2: Data mining triggers and related outcomes Increasing Degree of Complication Trigger Type Trigger Definition Analysis Model Data Exploration In-Depth Explanation Basic Explanation Visualization Pure data discovery, the motivation is to search and discover relationships, patterns, and trends in data. No variables are pre-defined. Generate data-driven insight by researching loosely defined hypotheses using analytical and statistical tools. Some variables are pre-defined. Generate support for business logic by testing clearly defined hypotheses of data relationships using analytical and statistical tools. All variables are predefined. Data summarization and presentation to display trends among the data elements. Inductive Deductive Deductive Deductive Outcome: Ratio of Insight (I) and Efficiency (E) 90% I : 10% E 70% I : 30% E 30% I : 70% E 10% I : 90% E Triggers are task types that inspire a different data mining process with regard to the knowledge creation opportunity in the process. Other research has proposed the idea that data mining process can vary based on the type of business problem being analyzed and the type of outcome being sought [17]. Four triggers were identified and defined; each trigger was associated with an underlying analysis model: inductive or deductive [23]. Inductive analysis operates on a set of related model instances that represent historical situations familiar to the decision-maker and/or what if cases. Deductive analysis applies paradigm- or model-specific knowledge to a single instance of the model. The underlying analysis model is defined by the business problem assigned by the manager. If the manager has an idea about the factors that may explain the phenomenon being examined, they may ask the data analyst to explore the specific factors in their mental model. This type of business problem would encourage a deductive task approach. The specific type of deductive trigger is defined by the clarity and definitive nature of the factors assigned as well as the outcome goal. If the factors are loosely defined and the outcome goal is primarily insight, an in-depth explanation task would trigger the data mining process. If the factors are clearly defined and outcome goal is efficiency, a visualization task would trigger the data mining process. Outcome goals define the business need for the data mining effort. Researchers and practitioners have defined a wide range of data mining outcomes including usage, perceived net benefits, and decision making performance [26-28]. Fewer studies have defined data warehousing success as the creation of knowledge to support better quality decision-making. In the Brohman et. al. (2000) study, one organization described how they used the data warehouse to create knowledge related to customer purchase patterns and designed a new store layout that provided them with a competitive advantage. The concept of knowledge creation was also pertinent at First American Corporation; they used their data warehouse to identify 20 percent of customers who were most profitable [7]. Participants in this study defined outcomes as a trade-off between decision efficiency and new insight. For example, if the business need is to identify the number of customers who responded to a campaign in order to make a decision about whether or not to extend the campaign, a manager will be most interested in enhancing the efficiency of the data mining process. On the other hand, if the business need is to support strategic development (i.e. new product development, campaign development), the organization will be most interested in new insight and willing to sacrifice efficiency in order to give the data miner time to thoroughly explore the data. The goal of each task type is some combination of efficiency and new insight; combinations are unique to each trigger. Participants of the management focus group provided examples of each trigger and expected outcome. Results are presented in Table 3. 6

Table 3: Examples of trigger types Trigger Type Data Exploration In-Depth Explanation Basic Explanation Visualization Example Analyzing an infinite number of data attributes (i.e., pure data mining) to uncover relationships and patterns in data related to loan management. Manager needs to give analyst lots of time to explore data as the goal is new insight. Analysis of multiple variables (not predefined by business logic) that may have a significant statistical relationship with loan-default behavior. Decision makers need to give the analyst ample time to complete the analysis however; a deadline is likely as decision makers are hoping for new insight to support a strategic business decision. Grounded in business logic that age and education influence loan default behavior, this analysis would involve a statistical test of significance between age, education, and loan default. Efficiency is key here and decision makers only expect new insight be generated by confirming, or disconfirming, the logic they present. Determine the % of positive responses, % of negative responses, and the % of non-responses to a particular campaign and display the results in a pie chart. Fast turnaround is often required as decision makers are looking for ways to present what they know, not generate new insight. 5. Discussion and implications Two important implications result from the identification of trigger types and outcomes. First, if decision makers and data analysts identify the task as a specific trigger type early in the data mining process, a common goal (or outcome) will be inferred from the beginning. For example, if a manager assigns a data exploration task, the data miner will know the manager is looking for insight in lieu of efficiency. They will know not to push the manager to predefine variables as it is the data analyst s responsibility to search and discover relationships. However, if the manager defines the same task as an in-depth explanation, the data analyst will be more sensitive to a deadline. She will push the manager to predefine some variables based on the manager s domain knowledge. A common goal, defined by the trigger type and outcome, is expected to help decision makers and data analysts work together more effectively. The second implication is an extension to the data warehouse user typology. Literature has previously differentiated between five types of data warehouse users: technical users, data warehouse end-users, model-based users, strategic decision-makers, and research analysts [6, 12, 21]. This research extends this typology by differentiating types of research analysts. In fact, the organization that participated in Phase 2 used trigger types to develop a new incentive plan and maturity model (see Figure 2) for their data analysts. The maturity model is based on the level of data analyst competence. Other researchers have identified user competence as an important factor in data warehousing success [25]. Practitioners have claimed that development of user competence is the most prominent challenge they face in successful data warehouse implementation [21]. The maturity model (see Figure 2) has two paths, data exploration path and campaign execution. In the case site, analysts on the data exploration path worked with decision makers across the organization. Analysts on the campaign execution path worked primarily with marketing decision makers. There are nine roles in the maturity model; each role is defined by a required skill level (1 limited experience to 4 very experienced) in terms of data extraction skills, problem solving skills, and business skills. Data analysts mature through the roles as they develop skills. For example, a data miner with some data extraction skills and limited problem solving and business skills may start in visualization. Visualization is simple analysis of percentages and averages; decision makers that assign visualizations are looking for effective ways to present and interpret information. Once an analyst improves their business knowledge (estimated to take 12 months), they may be promoted to basic explanation. Basic explanation analysis is grounded in pre-defined business logic. Decision makers are looking to test existing business logic to support efficient decision making. As basic explanation tasks are more complex in terms of database skills and business comprehension, it is expected that analysts at this maturity level will have ample opportunity to improve on all competence dimensions. Once they have advanced to experienced data extraction and business skills, and some problem solving skills (estimated 12 months) they may be promoted to indepth explanation where they are required to analyze multiple variables that are not predefined by inherent business logic; decision makers expect new insight as an outcome of this trigger. Alternatively, if a data analyst is not interested in following the creative road to data exploration, they may decide to pursue a more regimented career in decision support management. Maturity paths to management require a broader range of skills than data extraction, problem solving, and business and are therefore illustrated by dotted lines in the maturity model. Although campaign execution triggers are identified in the maturity model, the interviews for Phase 2 focused on the data exploration path only. Marketing executives in 7

the focus group developed campaign execution triggers based on the data exploration triggers presented. The validity of these triggers was not examined in this study. Specialist (3:3:4) DSS Management 12 months Exploration (4:3:4) Modeling (4:2:3) 24 months 18 months 12 months 12 months In-Depth Explanation (3:2:3) Campaign Development (3:2+:2+) 18 months Campaign Execution (2:1+:1) 12 months Basic Explanation (2:1:2) 6 months 12 months 6 months Figure 2: Research analyst maturity model 6. Conclusion and directions for future research Following the management focus group, the case site implemented the maturity model in the decision support group. The organization identified the following benefits one year after implementation. Most impressive was the decrease in data analyst turnover from an average of 30% from 1998-2001 to 15% in 2002; a percentage well below the industry average turnover rate of 53.8% in 2002. Equally impressive was the financial benefit gained from knowledge-oriented data mining initiatives. For example, one analyst worked closely with the Risk Management department for a five-month period. This analyst was given time to explore the data and was encouraged to derive insight related to the problem (i.e., he was assigned a data exploration trigger). The patterns he uncovered in the data derived over five million dollars in savings for the bank. An analyst/manager team was given similar leeway to explore a customer loyalty problem; their analysis resulted in innovative customer treatment approaches that increased customer retention. Overall, implementation of the maturity model improved employee satisfaction and retention, increased data miner competence development and productivity, and better aligned decision support staff to business needs. Visualization (2:1:1) Results from this study conclude that knowledge management in data mining is more complex than simply managing the transformation of explicit knowledge in the data warehouse to new explicit knowledge (i.e., combination). Data analysts gain knowledge early in the process through interactions with decision makers in forming the business problem and data mining task. Both data analysts and decision makers gain knowledge by using data mining results to support beliefs, paradigms, and view points with codified evidence. Finally, decision makers develop new tacit knowledge by understanding and learning from the deployment of decisions and campaigns. The implication of a new knowledge-oriented perspective on data mining is that decision makers and data analysts may eventually share more common knowledge and work more effectively together as a result. Common knowledge will enable better communication as well as a mutual respect for what each individual is contributing to the data mining process. Evidence of this was found in the case organization; improved employee satisfaction and retention were defined as benefits of implementing a more knowledge-oriented data mining process. Previous grounded theory research in data warehousing suggested that management of social relationships between data analysts and marketers during the data mining process contributed to better decision 8

making [5]. In support of the BIVC model, participants in this research also recognized the importance of social relationships in the data mining process. Managers in the focus group hoped the maturity model would be used as a tool by decision makers in choosing appropriate analysts to work on their business problems. They explained that decision makers commonly worked with the same analyst for all tasks; choice of working partner was based primarily on existing relationship and convenience. Using the maturity model as a tool, managers planned to encourage decision makers to choose their partner based on the task type. For example, if a manager assigned an exploration task to an exploration analyst in the maturity model, he would be sure the analyst had the appropriate competence to effectively explore data. Also, by assigning an exploration trigger, this would clearly communicate the task is inductive in nature and the goal is insight generation. Managers also hoped that the maturity model would initiate development of a knowledge-based culture. One may transition to a knowledge-based culture by developing and incorporating knowledge vocabulary into problem statements, job descriptions, job titles. It is also important that decision makers give analysts time and incentive to generate new ideas and insights. From a management perspective, if the goal for a data analysis team is to generate new ideas and insights, as was in the case with the customer retention team, managers need to start at the socialization stage. Encourage analysts and decision makers to listen to and learn from each other throughout the entire data mining process to enhance the knowledge creation process. There are several limitations to the study that warrant mention. First, the study examined multiple organizations to develop the extended BIVC model but demonstration of the relevance and implementation of the model was limited to a single case site. Future research will attempt to validate the conceptual model across multiple organizations. Researchers plan to work with decision makers and data analysts to develop logs to capture knowledge created at each stage in the data mining process. A second limitation is the implicit assumption that there is a trade-off between insight and efficiency in data warehouse performance. As this study was focused on knowledge generation, decision efficiency and effectiveness were secondary definitions of success. Future research will attempt to study the knowledgeoriented process with a more universal definition of data warehouse success. To generalize results, a multi-site survey will also be employed to test the role of trigger types on a multiple dimensions of data warehouse performance. 7. References [1] Alavi, M. and Leidner, D.E., "Review: Knowledge Management and Knowledge Management Systems: Conceptual Foundations and Research Issues". MIS Quarterly, 25:1, 2001, p.107-136. [2] Baird, L., Henderson, J., and Watts, S., "Learning from action: An analysis of the Center for Army Lessons Learned". Human Resource Management, 36:4, 1997, p.385-395. [3] Betts, M., "The future of business intelligence". Computerworld, 2003. [4] Boland, R.J., Tenkasi, R.J., and Te-eni, D., "Designing Information Technology for Distributed Cognition". Organization Science, 4:3, 1994, p.463-474. [5] Brohman, M.K. and Boudreau, M.C. The Dance: Getting Managers and Miners on the Floor Together. in Administrative Sciences Association of Canada (ASAC). 2004. Quebec City. [6] Brohman, M.K., Parent, M., Pearce, M.R., and Wade, M. The Business Intelligence Value Chain. in Proceedings from the Thirty-third Hawaii International Conference on System Sciences (HICSS-33). 2000. Maui, HI. [7] Cooper, B., Watson, H.J., Wixom, B.H., and Goodhue, D.L., "Data warehousing supports corporate strategy at First American Corporation". MIS Quarterly, 24:4, 2000. [8] Eisenhardt, K., "Building theory from case study research". Academy of Management Review, 14:4, 1989, p.532-550. [9] Fahey, L. and Prusak, L., "The eleven deadliest sins of knowledge management". California Management Review, 40:3, 1998, p.265-268. [10] Feelders, A., Daniels, H., and Holsheimer, M., "Methodological and Practical Aspects of Data Mining". Information & Management, 37:2000, p.271-281. [11] Gray, P. and Watson, H.J., Decision Support in the Data Warehouse, Prentice Hall, Upper Saddle River, 1998. [12] Haley, B., Implementing the Decision Support Infrastructure: Key Success Factors in Data Warehousing. 1998, University of Georgia: Athens, GA. [13] Huber, G.P., "Organizational learning: The contributing processes and the literatures". Organization Science, 2:1, 1991, p.88-115. [14] Merton, R.K., Fiske, M., and Kendall, P.L., The Focused Interview, Free Press, New York, 1956. [15] Miles, M.B. and Huberman, A.M., Qualitative data analysis: an expanded source book, Sage, Thousand Oaks, CA, 1994. 9

[16] Morain, S.K. and Norris, D.M., "A Study of Perceived Post-Implementation Benefits of Data Warehousing". Journal of Data Warehousing, 6:1, 2001, p.53-63. [17] Nemati, H.R. and Barko, C.D., "Issues in Organizational Data Mining: A Survey of Current Practices". Journal of Data Warehousing, 6:1, 2001, p.25-36. [18] Nonaka, I., "A dynamic theory of organizational knowledge creation". Organization Science, 5:1, 1994, p.14-37. [19] Park, Y., "Strategic uses of data warehouses: An organization's suitability for data warehousing". Journal of Data Warehousing, 2:1, 1997, p.24-33. [20] Saarenvirta, G., Data Mining to Improve Profitability, in CMA Magazine. 1998. p. 8-12. [21] Sakaguchi, T. and Frolick, M.N., "A Review of the Data Warehousing Literature". Journal of Data Warehousing, 2:1, 1997, p.34-54. [22] Shearer, C., "The CRISP-DM Model: The New Blueprint for Data Mining". Journal of Data Warehousing, 5:4, 2000, p.13-22. [23] Steiger, D.M., "Enhancing user understanding in a decision support system: A theoretical basis and framework". Journal of Management Information Systems, 15:2, 1998, p.199-220. [24] Tuomi, I., "Data is more than knowledge: Implications of the reversed knowledge hierarchy for knowledge management and organizational memory". Journal of Management Information Systems, 16:3, 1999, p.103-117. [25] Watson, H.J., Annino, D.A., Avery, K.L., and Gerard, J.G., "Perspectives on Data Warehousing". Journal of Data Warehousing, 5:3, 2000, p.2-7. [26] Wetherbe, J.C., "Executive Information Requirements: Getting It Right". MIS Quarterly, 51:1, 1991, p.51-66. [27] Wixom, B.H. and Watson, H.J., "An Empirical Investigation of the Factors Affecting Data Warehousing Success". MIS Quarterly, 25:1, 2001, p.17-41. [28] Wybo, M.D. and Goodhue, D.L., "Using Interdependence as a Predictor of Data Standards: Theoretical and Measurement Issues". Information & Management, 29:6, 1995, p.317-330. 2. Where do you see the data mining process evolving over the next 5 years? 3. How are data mining applications used? 4. How do you decide which application(s) to use for a specific task? 5. Can you describe a real example of how you used the data warehouse over the last 6 months? 6. How do you use the data warehouse? Do other individuals within the organization use it differently? 7. How does data mining impact the organization? 8. Do different data mining episodes have different impacts on the organization? 9. How does data mining relate to decision making? Appendix 2: Phase Two Interview Guide 1. Can you provide an in-depth description of the data mining process? 2. When does the process involve working with others? In these data mining situations, who do you mostly interact with? 3. What factors influence the degree of interaction you have with others during the data mining process? 4. Can you provide some examples of specific questions that triggered the data mining process? 5. How does the organization benefit from data mining results? 6. What knowledge does a user need to have in order to be an effective data miner? 7. Have you ever needed to seek knowledge from others to complete a data mining task? 8. Do you gain additional knowledge throughout the data mining process? If so, where do you gain knowledge throughout the process? 9. Do all data mining tasks generate the same degree of knowledge? If not, what factors differentiate the degree of knowledge created? [29] Yin, R.K., Case study research: Design and methods, Sage Publications, Beverly Hills, 1988. 8. Appendices Appendix 1: Phase One Interview Guide 1. Please explain how the use of data and database technology has evolved in your organization over the last 10 years. 10