How To Create A Text Mining Model For An Auto Analyst
|
|
|
- Abel Farmer
- 5 years ago
- Views:
Transcription
1 Paper Text Mining Warranty and Call Center Data: Early Warning for Product Quality Awareness John Wallace, Business Researchers, Inc., San Francisco, CA Tracy Cermack, American Honda Motor Co., Inc., Torrance, CA ABSTRACT An early warning system was implemented that leverages SAS/Text Miner and SAS/QC. While analysts already used comment fields to further understand automotive problems under investigation, an early warning system was tasked to monitor incoming free form text data in a systematic fashion. Clustering models serve to group similar warranty claims together. Once the automotive analyst has reviewed the cluster definition, all new data about that particular auto model will be scored into the best cluster. Cluster sizes and variances are monitored weekly and deviations from expected values are flagged for human review. INTRODUCTION This paper discusses the process of creating text-based clustering models and monitoring the clusters for change over time. Using textual data to define a unit of analysis offers a different view into data that has already received significant attention. TERMINOLOGY Some of the terms used within this paper are: Document: One record of text generated from a phone call or warranty claim. Document Collection: All of the documents that belong to a particular database, auto model and model year. Modelset: The document collection used to build a clustering model. Scoreset: The newly collected documents that will be scored and added to the document collection. DATA SOURCES The text field of several different databases is collected for analysis. Each of the sources differs in the vocabulary used and the types of issues discussed. 1. Warranty: when dealers complete warranty service claims, a comment field is available to further describe the problem. 2. Customer Relations: the call center logs parts of conversations and written communications with customers. 3. Techline: calls from dealer service technicians to specialized mechanics create more text data. There is no requirement on the text s format in each database; any combination of characters is allowed. WORKFLOW The Early Warning Process begins by collecting the data for a modelset (see Figure 1). Modelsets included all of the records (a minimum of three months) in the database about a specific car. The statistical models were separated by the three types of data listed above, as well as car model and model year. Once the clustering model has completed the process described below, the model is put into production and it begins to score new data. On a weekly basis, new data is scored and assigned to the best cluster. The week s assignments are aggregated and analyzed using SAS/QC. Any departure from the expected aggregate values is flagged and an alert is included in the weekly alert . 1
2 Early Warning Process Flow Data Collection Text Mining Model Review Score Data Collection Statistical Process Control Alerts/ Reporting Figure 1. CLUSTERING PROCESS Figure 2 shows the flow of documents through the text mining process. The step Term by Document Matrix is where words are converted to numbers (see Getting Started with Text Miner for more detail). The size and sparseness of the term by document matrix is driven by the size and complexity of the document collection. Text Miner Process Flow Parsing Stemming Synonyms StopList Term by Document Matrix Singular Value Decompostion Clustering Figure 2. The inputs to the clustering algorithm are the outputs from the Singular Value Decomposition: the SVD document vectors. The number of vectors needed to best approximate that matrix typically ranges from 10 to 100. In Text Miner, the parameter resolution is applied in order to decide how many of the vectors to use for clustering. In this project, up to 70 dimensions were passed to the clustering algorithm as input. A space with so many dimensions is very empty; the observations are pushed to the edges of the projected space. In essence, the clustering algorithm is identifying intersections on the edges where there are groups of observations. Figure 3 depicts what this space may look like. Clusters of varying sizes and varying distances from one another inhabit the edges of the input space. 2
3 Clusters on the Outer Edges of the Input Space Figure 3. Some experimentation during early development, with both K-means (PROC FASTCLUS) and Expectation- Maximization (PROC EMCLUS) clustering, led to the selection of Expectation-Maximization clustering for all models. The decision was based on the quality of the models as defined in the Application section below. Once the statistical modeler has built an initial model, the modeler presents the clusters to the domain expert: the auto analyst. Lastly, after the auto analyst makes any changes, the final clustering model becomes the baseline knowledge about all of the free-form text for that car. APPLICATION The most general objective of clustering is to form groups of similar records. The objective of grouping documents together in a meaningful fashion can only be met when the auto analyst finds the clusters useful. In this case, the usefulness of a cluster is subjective but focuses on the homogeneity of the documents within it. The domain expert assesses the homogeneity of a cluster as to how well that cluster describes a particular engineering problem. For example, if every document in a cluster refers to the headliner, it is more homogeneous then a hypothetical cluster with documents about the bumper and the headliner. Since a document collection may have hundreds of different topics discussed in a one-year period, tens or hundreds of clusters may be necessary in order to achieve homogeneity among documents in a cluster. It was recognized early on that the best initial clustering model would not meet the auto analyst s expectations. In fact, it was determined that a series of models would be required to satisfy their needs. It was considered a remarkable success if the analyst was pleased with the clusters containing 75% of the documents in the modelset. Based on the auto analyst s domain expertise, clustering models were tuned for homogeneity using two techniques: subclustering and merging clusters. Subclustering can be defined as using the documents assigned to a cluster in a first model as the modelset for a subsequent model. The subcluster model had a much smaller term by document matrix, and therefore allowed documents that were once viewed as similar to be grouped separately. In the case of a catch all cluster, subclustering was necessary because the parent cluster lacked homogeneity, as in the bumper and headliner example above. In other cases, a cluster that was clearly about leaks was broken into different types: oil, water, etc. Lastly, if the model identified a cluster about cars with a dead battery and another cluster about cars being jump started, it was easy to merge the two clusters together outside of Text Miner. 3
4 TEXT MINING UTILITIES At first glance, text mining seems to contradict the adage that data mining is 80% data preparation and 20% modeling. The raw text itself is the input to the algorithms. However, two other important inputs end up taking considerable time to develop: the synonym list and stop list. Just as in data mining the fitting of lines to the data is arguably less important than understanding the problem and creating useful input variables, in text mining creating the synonym and stop lists is more important than the selection of clustering algorithms. These lists are the heart (or brains) of any Text Miner model. Due to the number of abbreviations and auto-specific terms, this project s synonym list has grown to over 30,000 entries. During the course of developing the first models, we opted to build some utilities to extend the functionality of Text Miner using Base SAS and SAS Macro. Several of these extensions were related to synonym and stop list creation. A partial list of the utilities developed follows: 1. Data Preprocessor. Particularly dirty data may require the removal of unwanted characters using rules other than those employed by Text Miner. The portion of the macro shown below addresses the conditional removal of a slash and removes incomplete records. The slash is used to create the word A/C but may also be used to join two separate words like repaired/installed. The conditional removal is necessary, as A/C cannot become A C. If repaired/installed is left alone, it means neither repaired nor installed, but a new third term to Text Miner. The algorithm in this macro works in this order: a. Count up to the first three words in the string. b. If there are less than three, delete the record. c. Look for a slash at the beginning and end of the string and remove it. d. From the beginning of the string, identify the first occurrence of a slash. e. Count the number of characters in front of the slash. f. If three or more characters, reconstruct the string with a space in place of the slash. g. If the string preceding the slash is shorter (i.e. A/C), skip and continue searching in the rest of the string. %macro RemoveUnwantedCharacters(DsIn=, TextField=); Code removed here data &DsIn (drop=location location2 count i); set &DsIn; wordcount=0; do i=1 to 3; if scan(&textfield, i, " ") ne "" then wordcount+1; if wordcount <=2 then delete; if substr(reverse(&textfield),1,1)="/" then &TextField=reverse(substr(reverse(&TextField),2)); if substr(&textfield,1,1)="/" then &TextField=substr(&TextField,2); count=1; location=index(&textfield,"/"); if location > 0 then do until (location2=0); if length(scan(reverse(scan(&textfield, count, "/")),1," ")) > 2 then do; &TextField=trim(substr(&TextField, 1, location-1))!!" "!!trim(substr(&textfield, location+1)); else do; count+1; location2=index(substr(&textfield,location+1),"/"); location=location+location2; run; %m 4
5 2. Dataset Extractor. This routine extracts term lists from Text Miner into Excel for human review. On the way into Excel, these lists were then processed by other SAS macros not detailed here. 3. History Recorder. This step maintains a master term list so that only new words (i.e. words that have not been seen before) in new documents are reviewed. 4. Text Helper. The human decision process of adding terms to the synonym and/or stop list is facilitated with Excel-based (Visual Basic) macros. 5. Synonym Integrity Checker. A macro analyzes the integrity of the synonym list in order to remove inconsistencies (i.e. a term that has multiple parents) that can have an unpredictable impact on Text Miner. Any exceptions are presented to the operator for correction and corrections are integrated into the synonym and stop lists. These checks are required often because the synonym and stop lists are updated each time a new model is built. 6. Nuisance Record Suppressor. Adding terms to the stop list removes the term from cluster definition. However, sometimes it is meaningful to remove the whole document. Whatever types of records were suppressed in the modeling stage also need to be suppressed during the scoring process. 7. Model Reporter. Having the top terms for every cluster and subcluster in a file became very useful when reviewing models with the domain experts. SCORING The main facility for scoring provided in Text Miner on SAS 8.2 is scoring from the Enterprise Miner interface. However, all of the necessary score code to score new data in batch is available in the score node. Some of the modeling decisions, such as subclustering and suppressing observations, called for careful creation of Enterprise Miner diagrams and several enhancements to the Text Miner score code. After some experimentation, there was success in gathering the score code from a diagram with multiple models and scoring data outside of the EM interface. Simplifying the process of deploying the score code was critical since nearly 100 clustering models would be put into production. MONITORING Building clustering models was a means to an end: to monitor incoming text data. The monitoring process was tasked with finding potential problems and to alert auto analysts about them. The initial set of monitoring processes and reports are described below. CHANGE-IN-SIZE ALERTS The change-in-size alert monitors cluster sizes on a weekly basis and signals abnormal growth by utilizing the p- charting capability of SAS/QC s PROC SHEWHART. In this case, the proportion of total records in a specific cluster took the place of the traditional proportion nonconforming, and the week s total number of records became the varying sample total. PROC SHEWHART calculates the appropriate control limits (3sigma) from the data depending on the variance of the clusters proportion and total sample size. The procedure also saves calculated limits and reads them back in during subsequent weeks. Analysts are alerted about any cluster for which one or more weeks saw the cluster s proportion of total records above the upper control limit. Figure 3 contains some sample output. The name of the cluster has been changed to bumper to match the hypothetical example above. In addition, if five or more periods in a row are trending in the same direction, an alert is also generated. Although this is not the classic application of a p-chart, its capabilities have proven solid. One concern early on was that the proportion that was being charted was a percent of total, not a simple proportion conforming. The zero-sum nature of a percent of total may have been problematic, but the large number of clusters (>100) softened the impact of one cluster s growth, reducing a constant cluster s proportion of total. NEW WORDS ALERTS The new words alert is just that: the words in the week s data that have not previously appeared in the document collection. This report was designed to mitigate a property of scoring with a clustering model based on textual data: only the words used to build the model can be used to relate new data back to the model. This is analogous to not being able to add a new parameter to a regression model without re-estimating the regression. The new word processing checks the synonym list and suppresses alerts where just another synonym of a known parent appeared. 5
6 Figure 3. CHANGE IN SHAPE ALERTS This alert was designed early in the project to mitigate non-detection of a change in size when clusters have more than one concept (i.e. bumpers and headliners). In theory, a simultaneous reduction in the number of claims for bumpers, and an increase in claims for headliners, would cancel one another. Although watching the size of the cluster over time would not detect the increase in headliner claims, it was felt that the distribution of types of documents within the cluster had changed. The report required the calculation of the cumulative Root Mean Square Standard Deviation (RMSSTD) of each cluster each week. By reviewing some empirical plots of RMSSTD from the first scored data, it was determined that either large increases or decreases in variance signaled a change in the data, meriting the attention of the respective auto analyst. Representing the change in the data that RMSSTD identified presented a new challenge. Figure 4 is a two-dimensional representation of what the change in shape might look like. The outside of the sphere represents the cluster boundary and the masses inside represent the location of documents. Although a change in the distribution of terms within a cluster caused the changing variance, trying to count and track the terms over time to explain the change was not attempted. Instead, frequency lists of the part numbers for the records were produced and a comparison was made between the current and previous periods. If the only change was the language used to describe a problem, an auto analyst could deem the alert a false positive. The decision discussed above -- to make more homogeneous clusters by increasing the number of clusters -- diminished the need for monitoring the cluster variance. However, change in variance may still prove useful for particularly large clusters or clusters where homogeneity is in question. 6
7 Week 5 Week 6 Figure 4. CONCLUSION The initial results of this methodology of deploying SAS/Text Miner and SAS/QC have been very positive. Systematic analysis has been enabled for text data that was too large to attempt to read. Working directly with the domain experts has increased both the usefulness of the models and their acceptance within the company. The process continues to evolve. Further automation has reduced the time required to bring a new clustering model into production and new ideas on increasing the homogeneity of clusters will be tested. REFERENCES SAS Institute Inc., SAS/STAT Users Guide, Version 8, Volume 1, Cary, NC: SAS Institute Inc., SAS Institute Inc., Getting Started with SAS Text Miner, Cary, NC: SAS Institute Inc., ACKNOWLEDGMENTS The authors would like to recognize SAS Professional Services, the prime contractor on this project. A special thanks also goes to those who reviewed this paper: Benjamin Scott (UC Berkeley/Business Researchers), Scott Carl (Tricision, Inc.) and Philip Corrin (Business Researchers). CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the authors at: John Wallace, Principal Business Researchers, Inc. 74 Mallorca Way San Francisco, CA Work Phone: [email protected] Web: Tracy Cermack American Honda Motor Co., Inc Torrance Blvd., 500-2S-1B Torrance, CA Phone: [email protected] SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies. 7
How Organisations Are Using Data Mining Techniques To Gain a Competitive Advantage John Spooner SAS UK
How Organisations Are Using Data Mining Techniques To Gain a Competitive Advantage John Spooner SAS UK Agenda Analytics why now? The process around data and text mining Case Studies The Value of Information
The Science and Art of Market Segmentation Using PROC FASTCLUS Mark E. Thompson, Forefront Economics Inc, Beaverton, Oregon
The Science and Art of Market Segmentation Using PROC FASTCLUS Mark E. Thompson, Forefront Economics Inc, Beaverton, Oregon ABSTRACT Effective business development strategies often begin with market segmentation,
dm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING
dm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING ABSTRACT In most CRM (Customer Relationship Management) systems, information on
Paper 3508-2015. Downtime of a truck = Truck repair end date - Truck repair start date
Paper 3508-2015 Using Text from Repair Tickets of a Truck Manufacturing Company to Predict Factors that Contribute to Truck Downtime Ayush Priyadarshi and Dr. Goutam Chakraborty, Oklahoma State University
W. Heath Rushing Adsurgo LLC. Harness the Power of Text Analytics: Unstructured Data Analysis for Healthcare. Session H-1 JTCC: October 23, 2015
W. Heath Rushing Adsurgo LLC Harness the Power of Text Analytics: Unstructured Data Analysis for Healthcare Session H-1 JTCC: October 23, 2015 Outline Demonstration: Recent article on cnn.com Introduction
2015 Workshops for Professors
SAS Education Grow with us Offered by the SAS Global Academic Program Supporting teaching, learning and research in higher education 2015 Workshops for Professors 1 Workshops for Professors As the market
IBM SPSS Data Preparation 22
IBM SPSS Data Preparation 22 Note Before using this information and the product it supports, read the information in Notices on page 33. Product Information This edition applies to version 22, release
What's New in SAS Data Management
Paper SAS034-2014 What's New in SAS Data Management Nancy Rausch, SAS Institute Inc., Cary, NC; Mike Frost, SAS Institute Inc., Cary, NC, Mike Ames, SAS Institute Inc., Cary ABSTRACT The latest releases
STATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
Dynamic Decision-Making Web Services Using SAS Stored Processes and SAS Business Rules Manager
Paper SAS1787-2015 Dynamic Decision-Making Web Services Using SAS Stored Processes and SAS Business Rules Manager Chris Upton and Lori Small, SAS Institute Inc. ABSTRACT With the latest release of SAS
How To Develop Software
Software Engineering Prof. N.L. Sarda Computer Science & Engineering Indian Institute of Technology, Bombay Lecture-4 Overview of Phases (Part - II) We studied the problem definition phase, with which
Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP
Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP ABSTRACT In data mining modelling, data preparation
Text Analytics Illustrated with a Simple Data Set
CSC 594 Text Mining More on SAS Enterprise Miner Text Analytics Illustrated with a Simple Data Set This demonstration illustrates some text analytic results using a simple data set that is designed to
Using Edit-Distance Functions to Identify Similar E-Mail Addresses Howard Schreier, U.S. Dept. of Commerce, Washington DC
Paper 073-29 Using Edit-Distance Functions to Identify Similar E-Mail Addresses Howard Schreier, U.S. Dept. of Commerce, Washington DC ABSTRACT Version 9 of SAS software has added functions which can efficiently
The Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
Leveraging Ensemble Models in SAS Enterprise Miner
ABSTRACT Paper SAS133-2014 Leveraging Ensemble Models in SAS Enterprise Miner Miguel Maldonado, Jared Dean, Wendy Czika, and Susan Haller SAS Institute Inc. Ensemble models combine two or more models to
Internet/Intranet, the Web & SAS. II006 Building a Web Based EIS for Data Analysis Ed Confer, KGC Programming Solutions, Potomac Falls, VA
II006 Building a Web Based EIS for Data Analysis Ed Confer, KGC Programming Solutions, Potomac Falls, VA Abstract Web based reporting has enhanced the ability of management to interface with data in a
Paper 064-2014. Robert Bonham, Gregory A. Smith, SAS Institute Inc., Cary NC
Paper 064-2014 Log entries, Events, Performance Measures, and SLAs: Understanding and Managing your SAS Deployment by Leveraging the SAS Environment Manager Data Mart ABSTRACT Robert Bonham, Gregory A.
COC131 Data Mining - Clustering
COC131 Data Mining - Clustering Martin D. Sykora [email protected] Tutorial 05, Friday 20th March 2009 1. Fire up Weka (Waikako Environment for Knowledge Analysis) software, launch the explorer window
White Paper. Thirsting for Insight? Quench It With 5 Data Management for Analytics Best Practices.
White Paper Thirsting for Insight? Quench It With 5 Data Management for Analytics Best Practices. Contents Data Management: Why It s So Essential... 1 The Basics of Data Preparation... 1 1: Simplify Access
Why is Internal Audit so Hard?
Why is Internal Audit so Hard? 2 2014 Why is Internal Audit so Hard? 3 2014 Why is Internal Audit so Hard? Waste Abuse Fraud 4 2014 Waves of Change 1 st Wave Personal Computers Electronic Spreadsheets
Strengthening Diverse Retail Business Processes with Forecasting: Practical Application of Forecasting Across the Retail Enterprise
Paper SAS1833-2015 Strengthening Diverse Retail Business Processes with Forecasting: Practical Application of Forecasting Across the Retail Enterprise Alex Chien, Beth Cubbage, Wanda Shive, SAS Institute
Data Mining Techniques
TIMELY. PRACTICAL. RELIABLE. Data Mining Techniques For Marketing, Sales, and Customer Relationship Management Third Edition Gordon S. Linoff Michael J. A. Berry Chapter 21 n Listen Carefully to What Your
STATISTICA. Financial Institutions. Case Study: Credit Scoring. and
Financial Institutions and STATISTICA Case Study: Credit Scoring STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table of Contents INTRODUCTION: WHAT
Paper 109-25 Merges and Joins Timothy J Harrington, Trilogy Consulting Corporation
Paper 109-25 Merges and Joins Timothy J Harrington, Trilogy Consulting Corporation Abstract This paper discusses methods of joining SAS data sets. The different methods and the reasons for choosing a particular
Alex Vidras, David Tysinger. Merkle Inc.
Using PROC LOGISTIC, SAS MACROS and ODS Output to evaluate the consistency of independent variables during the development of logistic regression models. An example from the retail banking industry ABSTRACT
!"!!"#$$%&'()*+$(,%!"#$%$&'()*""%(+,'-*&./#-$&'(-&(0*".$#-$1"(2&."3$'45"
!"!!"#$$%&'()*+$(,%!"#$%$&'()*""%(+,'-*&./#-$&'(-&(0*".$#-$1"(2&."3$'45"!"#"$%&#'()*+',$$-.&#',/"-0%.12'32./4'5,5'6/%&)$).2&'7./&)8'5,5'9/2%.%3%&8':")08';:
ORACLE ENTERPRISE DATA QUALITY PRODUCT FAMILY
ORACLE ENTERPRISE DATA QUALITY PRODUCT FAMILY The Oracle Enterprise Data Quality family of products helps organizations achieve maximum value from their business critical applications by delivering fit
This software agent helps industry professionals review compliance case investigations, find resolutions, and improve decision making.
Lost in a sea of data? Facing an external audit? Or just wondering how you re going meet the challenges of the next regulatory law? When you need fast, dependable support and company-specific solutions
KnowledgeSEEKER Marketing Edition
KnowledgeSEEKER Marketing Edition Predictive Analytics for Marketing The Easiest to Use Marketing Analytics Tool KnowledgeSEEKER Marketing Edition is a predictive analytics tool designed for marketers
C o p yr i g ht 2015, S A S I nstitute Inc. A l l r i g hts r eser v ed. INTRODUCTION TO SAS TEXT MINER
INTRODUCTION TO SAS TEXT MINER TODAY S AGENDA INTRODUCTION TO SAS TEXT MINER Define data mining Overview of SAS Enterprise Miner Describe text analytics and define text data mining Text Mining Process
BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES
BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 123 CHAPTER 7 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 7.1 Introduction Even though using SVM presents
InfiniteInsight 6.5 sp4
End User Documentation Document Version: 1.0 2013-11-19 CUSTOMER InfiniteInsight 6.5 sp4 Toolkit User Guide Table of Contents Table of Contents About this Document 3 Common Steps 4 Selecting a Data Set...
Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing. C. Olivia Rud, VP, Fleet Bank
Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing C. Olivia Rud, VP, Fleet Bank ABSTRACT Data Mining is a new term for the common practice of searching through
CAPTURING THE VALUE OF UNSTRUCTURED DATA: INTRODUCTION TO TEXT MINING
CAPTURING THE VALUE OF UNSTRUCTURED DATA: INTRODUCTION TO TEXT MINING Mary-Elizabeth ( M-E ) Eddlestone Principal Systems Engineer, Analytics SAS Customer Loyalty, SAS Institute, Inc. Is there valuable
Advanced In-Database Analytics
Advanced In-Database Analytics Tallinn, Sept. 25th, 2012 Mikko-Pekka Bertling, BDM Greenplum EMEA 1 That sounds complicated? 2 Who can tell me how best to solve this 3 What are the main mathematical functions??
1 Choosing the right data mining techniques for the job (8 minutes,
CS490D Spring 2004 Final Solutions, May 3, 2004 Prof. Chris Clifton Time will be tight. If you spend more than the recommended time on any question, go on to the next one. If you can t answer it in the
Oracle Advanced Analytics 12c & SQLDEV/Oracle Data Miner 4.0 New Features
Oracle Advanced Analytics 12c & SQLDEV/Oracle Data Miner 4.0 New Features Charlie Berger, MS Eng, MBA Sr. Director Product Management, Data Mining and Advanced Analytics [email protected] www.twitter.com/charliedatamine
Customer Analytics. Turn Big Data into Big Value
Turn Big Data into Big Value All Your Data Integrated in Just One Place BIRT Analytics lets you capture the value of Big Data that speeds right by most enterprises. It analyzes massive volumes of data
Analyzing the Server Log
87 CHAPTER 7 Analyzing the Server Log Audience 87 Introduction 87 Starting the Server Log 88 Using the Server Log Analysis Tools 88 Customizing the Programs 89 Executing the Driver Program 89 About the
Make Better Decisions with Optimization
ABSTRACT Paper SAS1785-2015 Make Better Decisions with Optimization David R. Duling, SAS Institute Inc. Automated decision making systems are now found everywhere, from your bank to your government to
Rachel J. Goldberg, Guideline Research/Atlanta, Inc., Duluth, GA
PROC FACTOR: How to Interpret the Output of a Real-World Example Rachel J. Goldberg, Guideline Research/Atlanta, Inc., Duluth, GA ABSTRACT THE METHOD This paper summarizes a real-world example of a factor
Normalized EditChecks Automated Tracking (N.E.A.T.) A SAS solution to improve clinical data cleaning
Normalized EditChecks Automated Tracking (N.E.A.T.) A SAS solution to improve clinical data cleaning Frank Fan, Clinovo, Sunnyvale, CA Ale Gicqueau, Clinovo, Sunnyvale, CA WUSS 2010 annual conference November
Reevaluating Policy and Claims Analytics: a Case of Non-Fleet Customers In Automobile Insurance Industry
Paper 1808-2014 Reevaluating Policy and Claims Analytics: a Case of Non-Fleet Customers In Automobile Insurance Industry Kittipong Trongsawad and Jongsawas Chongwatpol NIDA Business School, National Institute
IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH
IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH Kalinka Mihaylova Kaloyanova St. Kliment Ohridski University of Sofia, Faculty of Mathematics and Informatics Sofia 1164, Bulgaria
Introduction to SAS Business Intelligence/Enterprise Guide Alex Dmitrienko, Ph.D., Eli Lilly and Company, Indianapolis, IN
Paper TS600 Introduction to SAS Business Intelligence/Enterprise Guide Alex Dmitrienko, Ph.D., Eli Lilly and Company, Indianapolis, IN ABSTRACT This paper provides an overview of new SAS Business Intelligence
Chapter 3: Data Mining Driven Learning Apprentice System for Medical Billing Compliance
Chapter 3: Data Mining Driven Learning Apprentice System for Medical Billing Compliance 3.1 Introduction This research has been conducted at back office of a medical billing company situated in a custom
Easily Identify Your Best Customers
IBM SPSS Statistics Easily Identify Your Best Customers Use IBM SPSS predictive analytics software to gain insight from your customer database Contents: 1 Introduction 2 Exploring customer data Where do
Short-Term Forecasting in Retail Energy Markets
Itron White Paper Energy Forecasting Short-Term Forecasting in Retail Energy Markets Frank A. Monforte, Ph.D Director, Itron Forecasting 2006, Itron Inc. All rights reserved. 1 Introduction 4 Forecasting
Data Mining Applications in Higher Education
Executive report Data Mining Applications in Higher Education Jing Luan, PhD Chief Planning and Research Officer, Cabrillo College Founder, Knowledge Discovery Laboratories Table of contents Introduction..............................................................2
ANALYTICS IN BIG DATA ERA
ANALYTICS IN BIG DATA ERA ANALYTICS TECHNOLOGY AND ARCHITECTURE TO MANAGE VELOCITY AND VARIETY, DISCOVER RELATIONSHIPS AND CLASSIFY HUGE AMOUNT OF DATA MAURIZIO SALUSTI SAS Copyr i g ht 2012, SAS Ins titut
How To Make A Credit Risk Model For A Bank Account
TRANSACTIONAL DATA MINING AT LLOYDS BANKING GROUP Csaba Főző [email protected] 15 October 2015 CONTENTS Introduction 04 Random Forest Methodology 06 Transactional Data Mining Project 17 Conclusions
SAS In-Database Processing
Technical Paper SAS In-Database Processing A Roadmap for Deeper Technical Integration with Database Management Systems Table of Contents Abstract... 1 Introduction... 1 Business Process Opportunities...
ON-BOARDING WITH BPM. Human Resources Business Process Management Solutions WHITE PAPER. ocurements solutions for financial managers
ocurements solutions for financial managers 1 WHITE PAPER ON-BOARDING WITH BPM Human Resources Business Process Management Solutions BonitaSoft democratizes business process management (BPM) by bringing
Building a Data Quality Scorecard for Operational Data Governance
Building a Data Quality Scorecard for Operational Data Governance A White Paper by David Loshin WHITE PAPER Table of Contents Introduction.... 1 Establishing Business Objectives.... 1 Business Drivers...
Taking EPM to new levels with Oracle Hyperion Data Relationship Management WHITEPAPER
Taking EPM to new levels with Oracle Hyperion Data Relationship Management WHITEPAPER This document contains Confidential, Proprietary, and Trade Secret Information ( Confidential Information ) of TopDown
Instructions for Analyzing Data from CAHPS Surveys:
Instructions for Analyzing Data from CAHPS Surveys: Using the CAHPS Analysis Program Version 3.6 The CAHPS Analysis Program...1 Computing Requirements...1 Pre-Analysis Decisions...2 What Does the CAHPS
Text Mining in JMP with R Andrew T. Karl, Senior Management Consultant, Adsurgo LLC Heath Rushing, Principal Consultant and Co-Founder, Adsurgo LLC
Text Mining in JMP with R Andrew T. Karl, Senior Management Consultant, Adsurgo LLC Heath Rushing, Principal Consultant and Co-Founder, Adsurgo LLC 1. Introduction A popular rule of thumb suggests that
Modeling Lifetime Value in the Insurance Industry
Modeling Lifetime Value in the Insurance Industry C. Olivia Parr Rud, Executive Vice President, Data Square, LLC ABSTRACT Acquisition modeling for direct mail insurance has the unique challenge of targeting
Performing a data mining tool evaluation
Performing a data mining tool evaluation Start with a framework for your evaluation Data mining helps you make better decisions that lead to significant and concrete results, such as increased revenue
Emailing Automated Notification of Errors in a Batch SAS Program Julie Kilburn, City of Hope, Duarte, CA Rebecca Ottesen, City of Hope, Duarte, CA
Emailing Automated Notification of Errors in a Batch SAS Program Julie Kilburn, City of Hope, Duarte, CA Rebecca Ottesen, City of Hope, Duarte, CA ABSTRACT With multiple programmers contributing to a batch
The Data Mining Process
Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data
Paper PO 015. Figure 1. PoweReward concept
Paper PO 05 Constructing Baseline of Customer s Hourly Electric Usage in SAS Yuqing Xiao, Bob Bolen, Diane Cunningham, Jiaying Xu, Atlanta, GA ABSTRACT PowerRewards is a pilot program offered by the Georgia
How To Use Statgraphics Centurion Xvii (Version 17) On A Computer Or A Computer (For Free)
Statgraphics Centurion XVII (currently in beta test) is a major upgrade to Statpoint's flagship data analysis and visualization product. It contains 32 new statistical procedures and significant upgrades
Harnessing the power of advanced analytics with IBM Netezza
IBM Software Information Management White Paper Harnessing the power of advanced analytics with IBM Netezza How an appliance approach simplifies the use of advanced analytics Harnessing the power of advanced
A MULTIVARIATE OUTLIER DETECTION METHOD
A MULTIVARIATE OUTLIER DETECTION METHOD P. Filzmoser Department of Statistics and Probability Theory Vienna, AUSTRIA e-mail: [email protected] Abstract A method for the detection of multivariate
Robust Outlier Detection Technique in Data Mining: A Univariate Approach
Robust Outlier Detection Technique in Data Mining: A Univariate Approach Singh Vijendra and Pathak Shivani Faculty of Engineering and Technology Mody Institute of Technology and Science Lakshmangarh, Sikar,
STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and
Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table
SAS Does Data Science: How to Succeed in a Data Science Competition
Paper SAS2520-2015 SAS Does Data Science: How to Succeed in a Data Science Competition Patrick Hall, SAS Institute Inc. ABSTRACT First introduced in 2013, the Cloudera Data Science Challenge is a rigorous
Migrating to vcloud Automation Center 6.1
Migrating to vcloud Automation Center 6.1 vcloud Automation Center 6.1 This document supports the version of each product listed and supports all subsequent versions until the document is replaced by a
Counting the Ways to Count in SAS. Imelda C. Go, South Carolina Department of Education, Columbia, SC
Paper CC 14 Counting the Ways to Count in SAS Imelda C. Go, South Carolina Department of Education, Columbia, SC ABSTRACT This paper first takes the reader through a progression of ways to count in SAS.
SAS Enterprise Guide in Pharmaceutical Applications: Automated Analysis and Reporting Alex Dmitrienko, Ph.D., Eli Lilly and Company, Indianapolis, IN
Paper PH200 SAS Enterprise Guide in Pharmaceutical Applications: Automated Analysis and Reporting Alex Dmitrienko, Ph.D., Eli Lilly and Company, Indianapolis, IN ABSTRACT SAS Enterprise Guide is a member
Graph Mining and Social Network Analysis
Graph Mining and Social Network Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann
Customer Classification And Prediction Based On Data Mining Technique
Customer Classification And Prediction Based On Data Mining Technique Ms. Neethu Baby 1, Mrs. Priyanka L.T 2 1 M.E CSE, Sri Shakthi Institute of Engineering and Technology, Coimbatore 2 Assistant Professor
Introduction. Background
Predictive Operational Analytics (POA): Customized Solutions for Improving Efficiency and Productivity for Manufacturers using a Predictive Analytics Approach Introduction Preserving assets and improving
9.2 User s Guide SAS/STAT. Introduction. (Book Excerpt) SAS Documentation
SAS/STAT Introduction (Book Excerpt) 9.2 User s Guide SAS Documentation This document is an individual chapter from SAS/STAT 9.2 User s Guide. The correct bibliographic citation for the complete manual
SAS BI Dashboard 4.3. User's Guide. SAS Documentation
SAS BI Dashboard 4.3 User's Guide SAS Documentation The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2010. SAS BI Dashboard 4.3: User s Guide. Cary, NC: SAS Institute
WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat
Information Builders enables agile information solutions with business intelligence (BI) and integration technologies. WebFOCUS the most widely utilized business intelligence platform connects to any enterprise
Data Preparation Part 1: Exploratory Data Analysis & Data Cleaning, Missing Data
Data Preparation Part 1: Exploratory Data Analysis & Data Cleaning, Missing Data CAS Predictive Modeling Seminar Louise Francis Francis Analytics and Actuarial Data Mining, Inc. www.data-mines.com [email protected]
A Property & Casualty Insurance Predictive Modeling Process in SAS
Paper AA-02-2015 A Property & Casualty Insurance Predictive Modeling Process in SAS 1.0 ABSTRACT Mei Najim, Sedgwick Claim Management Services, Chicago, Illinois Predictive analytics has been developing
Marketing Mix Modelling and Big Data P. M Cain
1) Introduction Marketing Mix Modelling and Big Data P. M Cain Big data is generally defined in terms of the volume and variety of structured and unstructured information. Whereas structured data is stored
M15_BERE8380_12_SE_C15.7.qxd 2/21/11 3:59 PM Page 1. 15.7 Analytics and Data Mining 1
M15_BERE8380_12_SE_C15.7.qxd 2/21/11 3:59 PM Page 1 15.7 Analytics and Data Mining 15.7 Analytics and Data Mining 1 Section 1.5 noted that advances in computing processing during the past 40 years have
Top 10 Reasons to Automate your IT Run Books
Top 10 Reasons to Automate your IT Run Books DS12 Top 10 Reasons to Automate Your IT Run Books Run Book Automation is an emerging technology space that is being adopted by many of the largest, most sophisticated
Automate Data Integration Processes for Pharmaceutical Data Warehouse
Paper AD01 Automate Data Integration Processes for Pharmaceutical Data Warehouse Sandy Lei, Johnson & Johnson Pharmaceutical Research and Development, L.L.C, Titusville, NJ Kwang-Shi Shu, Johnson & Johnson
Intelligent Log Analyzer. André Restivo <[email protected]>
Intelligent Log Analyzer André Restivo 9th January 2003 Abstract Server Administrators often have to analyze server logs to find if something is wrong with their machines.
Lost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole
Paper BB-01 Lost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole ABSTRACT Stephen Overton, Overton Technologies, LLC, Raleigh, NC Business information can be consumed many
SAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
Use of Data Mining Techniques to Improve the Effectiveness of Sales and Marketing
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 4, April 2015,
The GeoMedia Fusion Validate Geometry command provides the GUI for detecting geometric anomalies on a single feature.
The GeoMedia Fusion Validate Geometry command provides the GUI for detecting geometric anomalies on a single feature. Below is a discussion of the Standard Advanced Validate Geometry types. Empty Geometry
Crime Pattern Analysis
Crime Pattern Analysis Megaputer Case Study in Text Mining Vijay Kollepara Sergei Ananyan www.megaputer.com Megaputer Intelligence 120 West Seventh Street, Suite 310 Bloomington, IN 47404 USA +1 812-330-01
Oracle Data Miner (Extension of SQL Developer 4.0)
An Oracle White Paper September 2013 Oracle Data Miner (Extension of SQL Developer 4.0) Integrate Oracle R Enterprise Mining Algorithms into a workflow using the SQL Query node Denny Wong Oracle Data Mining
Decision Support System Methodology Using a Visual Approach for Cluster Analysis Problems
Decision Support System Methodology Using a Visual Approach for Cluster Analysis Problems Ran M. Bittmann School of Business Administration Ph.D. Thesis Submitted to the Senate of Bar-Ilan University Ramat-Gan,
