AUTOMATED PROVISIONING OF CAMPAIGNS USING DATA MINING TECHNIQUES
|
|
|
- Alaina Mitchell
- 10 years ago
- Views:
Transcription
1 AUTOMATED PROVISIONING OF CAMPAIGNS USING DATA MINING TECHNIQUES Saravanan M Ericsson R&D Ericsson India Global Services Pvt. Ltd Chennai, Tamil Nadu, India [email protected] Deepika S M Department of Information Technology Thiagarajar College of Engineering Madurai, Tamil Nadu, India [email protected] Abstract Today s telecom industry needs to use customer information for carrying out analysis, design and execution of new campaigns. The present and the traditional campaign generation process has identified few broad segments of customers based on some generalized information and their interaction with such segments. Automation in the process make it easy for the marketers to monetize data, increase customer life time value and to fill up gaps in existing systems. Using data mining techniques in telecom industry helps operators to tune interactive relationships to each of the segment needs. In this paper, we analyze the customer CDRs (Call Detail Records) details to expedite feature extraction on their behavioral aspects. Suitable data mining algorithms have been employed on the extracted features to understand the customer behavior and the likelihood that a user will accept the campaign. Usually the number of campaign takers are very less when compared to other large collection. This leads to the problem of class imbalance. Different combinations of techniques have been used to overcome it. Thus we find out the importance of the customer features for the different types of campaigns and the contributing features for each type of campaign. The customers who exhibit certain features have been selected for specific type of campaign. We have evaluated the use of different data mining classification techniques and the results obtained demonstrate that the Cost Stacking method performs comparatively better than other individual classification methods related to customer selection for specific campaigns. I. INTRODUCTION Telecom Industry provides the preliminary means of communication and a number of related services to the customers. One of the important procedures is the way they communicate the services by generating campaigns to customers. Campaign Management process can be described as a business driver which uses marketing tactics to achieve a particular business goal. The business goal is to obtain a higher response rate from the customers for the campaigns. The process of marketing campaign optimization takes a set of offers to customers and determines the offers that should be provided to a particular set of customers at a specific time. Automation in the campaign management process allows marketers to create, test, execute, and report on multiple campaigns direct from the marketing desktop, without any need for complex scripting and with minimal human involvement [1]. It also helps to overcome the existing gaps such as paying less attention to existing customers, absence of a method to select the appropriate group of customers for launching campaigns and measurement of the response of the customers for previous campaigns. Automated process facilitates the building of new campaigns depending upon the features, which reduces the time to a large extent. Since automated campaigns selects a group of customers depending upon their previous behaviors for launching, the problems of losing focus on existing customers and proper measurement of response are avoided. Automation process uses data mining techniques to find the specific group of customers to launch campaigns [2]. Data mining in telecom domain is generally used to analyze the large databases and provide meaningful results. It plays a significant role in recognizing and tracking patterns within the data. The advantage of using data mining techniques is that it even extracts information from the databases which may not be known as existed [3]. From our initial experimental studies, we found that some of the relevant data mining classification algorithms such as Cost classification [4], OneR rule [5], Bayes Net [6], J48 Decision Tree, Logistic regression [7] and Combined Regression and Ranking method (CRR) [8] performs well compared to the other classification algorithms like SVM, gradient boosting, etc. These algorithms were employed in this study. Classification algorithms generally faces class imbalance problem when worked with skewed data samples. Class imbalance occurs when the number of instances representing a particular class is very less when compared to the remaining instances (other class). The classification algorithms, when trained with class imbalance data, generate poor results for the corresponding test set. The proposed method analyses the customer behavior from the CDRs and extracts relevant features for different types of campaigns and thus selects the set of customers to launch a new campaign. It analyses the behavior from their previous responses to campaigns, importance to the existing customers is assured and their responses are measured. In addition to the simple classification algorithms, voting and cost sensitive stacking methods using different combinations of those algorithms with different values for the cost matrix were trained on the training set and evaluated on the test set. Precision, Recall, F-Measure, Kappa statistic and Accuracy [9] are the different measures considered to evaluate the applied data mining algorithms. Also, we have applied
2 CRR to generate a better recall value. The remainder of this paper is organized as follows. In Section 2, we give an overview of the areas related to this study. Section 3 explains the experimentation of various data mining algorithms in our approach. We provide an outline of our proposed system in Section 4. Section 5 deals with the various Evaluation measures. We discuss the results in Section 6. Section 7 discusses the importance of this application and the concluding remarks are given in Section 8. II. RELATED WORK A very few of the existing techniques uses the method of customer segmentation for campaign management. Intelligence value based customer segmentation method [10] investigates the customer behavior using a Recency, Frequency and Monetary (RFM) model and then uses a customer Life Time Value (LTV) model to evaluate the proposed segmented customers. This method also proposes to use Genetic Algorithms (GA) to select appropriate customers for each individual campaign. Predicting mobile phone churners in the telecom industry uses a similar method of analyzing the features from the customer CDRs. To tackle the problem of churn prediction, the telecom operator needs to understand the behavior of customers, and classify the churn and non-churn customers, so that the necessary decisions will be taken before the probable churner switch to a competitor. One of the solutions adapted for this problem is building up an adaptive and dynamic data-mining model in order to efficiently understand the system behavior and allow time to make the right decisions [11]. Our problem is similar to that of churn prediction problem where there is a need to tackle the imbalance in model usage. The churn prediction problem attached to telecom domain is thoroughly studied with different machine learning techniques such as Bayesian networks, association rules, decision trees and neural networks [11]. Evaluation in the customer segmentation method for campaign management uses a customer Life Time Value (LTV). This method takes the customer acquisition, customer development and customer retention stages into account [10]. Hybrid classification methods have been used for evaluation in churn prediction for mobile telecom data [12]. Since the number of churners is very less when compared to the non-churners, the existing classification models suffers from the problem of imbalance. Similar problem is faced in our method as the number of campaign takers for each and every unique type of campaign is very less when compared to the other group of customers. To solve this problem, a hybrid framework which combines the results of two or more classifiers which boosts the performance of the models is used [12]. In this paper, we have used cost sensitive stacking method to combine the results of two basic classifier results. Cost sensitive stacking has been used for audio Tag annotation and Retrieval [13]. The co-occurrence of tags is considered to model the audio tagging problem as a multi-label classification problem. Cost sensitive multi-label classification is used to boost the performance of individual classifiers. Cost sensitive stacking combines the outputs of multiple independent classifiers for multi-label classification. [13]. CRR [8] uses stochastic gradient descent that makes the algorithm easy to implement, and efficient for use on large-scale data sets. The number of customers to be handled is huge in telecom domain; manual work takes a long processing time for campaign generation. Also to help operators run effective marketing campaigns by leveraging subscriber information and external data to build target lists and then to execute them through multiple channels, automated provisioning of campaign process becomes essential. Automation also helps the service providers to respond quickly to changing customer behavioral trends for the immediate generation of advertisements [1]. III. USAGE OF DATA MINING TECHNIQUES The proposed approach aims to identify the actual customers who would respond to the campaigns based on their previous behavior. The behavior of the customers can be obtained by analyzing the various attributes from the CDR files. The attributes chosen for analysis includes the preliminary and derived attributes, aggregated on a weekly basis. In the process of extraction of features and understanding the method of implementation, automation in campaign management is studied through various related techniques which are discussed here. A. Feature Selection Feature Selection is a technique which is used to reduce the number of features before applying a data mining algorithm. Irrelevant features may have negative effects on a prediction task. It is also used for enhancing generalization capability, speeding up learning process, and improving the model interpretability. Feature selection has been applied in fields such as multimedia database search, image classification and biometric recognition [14]. The usage of feature selection techniques can be generalized into three main categories [15]: embedded approaches where feature selection is part of the classification algorithm, (e.g. decision tree), filter approaches where features are selected before the classification algorithm is used and wrapper approaches where the classification algorithm is used as a black box to find the best subset of attributes. Filtering methods assumes that the feature selection processes are independent from the classification step. Wrappers usually provide better results, the price being higher computational complexity. Certain classification algorithms use embedded approach of feature selection by finding the information gain of the selected attributes [16]. In our method, embedded approach and Wrappers has been used as the feature selection techniques. Depending upon the value of the features selected for every particular type of campaign, classification algorithms are applied to predict the response of the customers for the campaigns.
3 B. Classification Algorithms Prediction of the response of the customers to campaigns requires employing a supervised learning paradigm. In a supervised learning approach, we have training data D containing N samples. Each training instance X is a vector of d attributes (i.e) X= {x 1,x 2,...x d } The attributes can be numeric or nominal in nature. Classification algorithms are used to build a model using the training set and apply the built model on the test data set to predict the class labels. Some of the classification algorithms which suited to our dataset are described in this section. 1) OneR Classification: OneR [5], which is a "One Rule" classification, is a simple, yet accurate, classification algorithm that generates one rule for each predictor in the data, and then selects the rule with the smallest total error as its "one rule". To create a rule for a predictor, a frequency table is constructed for each predictor against the target. The total error calculated from the frequency tables is the measure of each predictor contribution. A low total error indicates a higher contribution to the predictability of the model. To create a rule for an attribute, the most frequent class for each attribute value must be determined. The most frequent class is simply the class that appears most often for that attribute value. A rule is simply a set of attribute values bound to their majority class. For certain type of campaigns, one particular attribute may act as an influencing attribute. For such models, OneR classifier performs well and provides good results. In our analysis, experiments are conducted for each and every attribute and comparison between them. Moreover the use of OneR method resulted in good evaluation measures for a particular type of attribute for a campaign. 2) Logistic Regression Method Regression analysis is a technique to predict the value of a dependent variable by fitting a function with least error on the training data. Logistic Regression fits an S-shaped curve to the data [7]. This method estimates the average value of the target variable when the independent variables are held fixed. This model is a non-linear transformation of a linear regression model. It is also referred as Logit function. It makes use of one or more predictor variables that may be either numerical or categorical. A Logistic function always takes on the values between zero and one. Binary logistic regression refers to the instance in which the criterion can take only two possible outcomes. Our approach aims to predict whether the customer would belong to the specific class in which he can take up the campaign or fall in to other class. Since it is a binary classification [11], binary logistic regression can be considered as an appropriate method for building the predictor model. 3) BayesNet Classification A Bayesian network is a probabilistic graphical model (a type of statistical model) that represents a set of random variables and their conditional dependencies via a directed acyclic graph(dag). To use a Bayesian network as a classifier, one simply calculates argmaxy P (y x) using the distribution P (U) represented by the Bayesian network. Thus P (y x ) = P (U ) / P (x ) = u U p (u pa(u )) (1) where pa(u) is the set of parents of u in network structure. From the observation of many studies, we found that evaluation of Bayesian network algorithm implementation usually results in high recall [17], hence it is chosen as one of the classifiers to be used in our approach. 4) J48 Decision tree Method J48 Decision tree [18] is used to classify the given instances by constructing a decision tree using the training set data. J48 implements Quinlan s C4.5 algorithm for generating a pruned or expand C4.5 decision tree. The decision trees generated by J48 can be used for classification. J48 builds decision trees from a set of labeled training data using the concept of information entropy. J48 examines the normalized information gain (difference in entropy) that results from choosing an attribute for splitting the data and constructs the tree taking each attribute one by one. J48 can handle both continuous and discrete attributes, and also data with missing attribute values. Further it provides an option for pruning trees after creation. J48 Decision tree algorithm performs feature selection as a part of the classification procedure. As this algorithm includes each attribute by analyzing its information gain, the attributes considered in this method forms a better feature set for classification. This algorithm is used in our method for performing the initial process of feature selection. 5) Combined Regression and Ranking(CRR) CRR is a method that optimizes regression and ranking objectives simultaneously [8]. This method is applied to range of large-scale tasks, including click prediction for online advertisements. This combination guards against learning degenerate models that perform well on one set of metrics but poorly on another. Since we face a similar problem in our analysis, we chose to use this method. Another importance of this method is that it applied to the case of rare events or skewed sample distributions to improve regression performance due to the addition of informative ranking constraints. Since we are also facing imbalance problem in our sample distribution, CRR is employed as one of the method in this study. 6) Voting Voting classifier is used for combining classifiers using un-weighted average of probability estimates (classification) or numeric predictions (regression) [12]. The algorithm takes an inducer and a training set as input and runs the inducer multiple times by changing the distribution of training set instances. The generated classifiers are then combined to create a final classifier that is used to classify the test set. This system also selects a classifier from the set of classifiers by
4 minimizing error on the training data. Voting method is a suitable method for the datasets suffering from the problem of class imbalance. Thus it is chosen to be experimented in our approach. 7) Cost Stacking Cost Classification [13] extends the usual classification methods by coupling a cost vector c i to each training sample (x i, y i ). It introduces a methodology for extending regular classification algorithms to cost-sensitive ones with any cost. The j th component c ij denotes the cost to be paid when the label y ij is misclassified. For a two class classification, the following matrix denotes the cost value: Actual Negative Actual Positive Predict Negative 0 C2 Predict Positive C1 0 The entries [1, 2] (C1) and [2, 1] (C2) denotes the cost for misclassified labels. Thus this method tries its best to reduce the misclassifications. Cost sensitive stacking with a cost value associated with it denotes the cost of the misclassification on the actual positive column, but which is predicted negative. In our case, it denotes the cost value on the customers who actually belong to the class of accepting the campaigns but predicted to be the other. For example, cost 20 refers to the cost given to customers who were misclassified as those belonging to the class of not accepting the campaigns. The higher, the value given to this cost, the classifier will try its best to reduce this number of misclassifications. Stacking is a method of combining the outputs of multiple independent classifiers for classification. Stacking uses the outputs of all classifiers, f 1 (x), f 2 (x),..f K (x) as features to form a new feature set z = (z 1, z 2,...,z K ). Then, the new feature set together with the true label is used to learn the parameters w ij of the stacking classifiers: H i (z) = K j=1 w ij z j, (2) where the weight w ij will be positive if tag j is positively correlated to tag i; otherwise, w ij will be negative. The stacking classifiers can recover misclassified tags by using the correlation information captured in the weight w ij. This method boosts the results by reducing the misclassifications and so it is chosen as one of the boosting model for our experiments. III. SYSTEM OVERVIEW Fig 1 shows the block diagram representation of the proposed system. The customer details, obtained from the CDR (Call Detail Record) files are pre processed to remove the noise and outliers. Some of the attributes which are not relevant for campaign management process are also removed during pre-processing. Generally CDR data consists of a number of related attributes. These attributes are considered as the simple or primary attributes. Some of the primary attributes Fig 1. Block Diagram of the System considered in our work are related to revenue direction technology used by the customer, plan, annual revenue etc. Some of the attributes are generated from the simple attributes which are known as Derived Attributes. These derived attributes gives a clear representation of the user behavior. Some of the derived attributes are average duration of the calls for each customer, total cost spent by the customer etc. Average duration of the calls is derived from the primary attributes like duration of the calls and the total number of calls. Similarly total cost spent by the customer is obtained by summing up the cost of all the calls made by the customer. The derived attributes along with the simple attributes together constitutes the input file for preliminary analysis. Those CDRs are aggregated for each customer and also on a weekly basis. Feature extraction is the process of retrieving the useful attributes (features) from the input. Such features play an important role in contributing to the behavior of the customer in accepting the campaigns. Using J48 decision tree classifier as an embedded method of feature extraction gives a feature set of attributes that can be used for prediction. OneR rule classifier acts as a wrapper method of feature extraction to find out the single contributing feature for the campaign. Other classification algorithms are used to predict the customer behavior in responding to the campaigns. Among the seven single classifiers and hybrid approaches used in this method, finally the classifier which performs well (i.e. which gives high precision, recall, F-measure etc) is selected and it is used to predict the behavior of the customers. The customers who were predicted to be the takers of the campaign are selected for launching new relevant campaigns. A variety of campaigns exists in telecom domain. Each and every campaign will have its own contributing features. Some familiar types are bonus campaign, discount campaign etc. For each type of campaign, a separate model is framed and the process is performed individually. For the evaluation of our study, we split the entire data available into training test which constitute the user behavior for a particular period and test set constituting the user behavior for a consecutive different period of time. The class label of the training and test set denotes the response of the customer to that particular campaign. Model is built by the classification algorithms using the training set and it is evaluated over the test set. Also a validation set consisting of the
5 training and test set together is constructed and a tenfold cross validation is done on the set and results are analyzed to find out the usage of different classifiers and its predictions. IV. EVALUATION MEASURES The training and test data contains the details of those taking up the campaigns (class 0) and others (class 1). The work of a classifier is to construct a model by learning the training set and then predict whether a customer in the test set will belong to class 0 or class 1. Quality of the predictions is measured using the following matrix parameters. A. Precision Precision is the degree to which repeated measurements under unchanged conditions show the same results. In other words, Precision is the fraction of retrieved instances that are relevant [9]. Precision = tp tp + fp (3) B. Recall Recall is the fraction of the documents that are relevant to the query that are successfully retrieved [9]. Recall = tp tp + fn (4) C. F-Measure F-Measure is the weighted harmonic mean of Precision and Recall [9]. F-Measure = 2 * Precision * Recall Precision + Recall (5) D. Accuracy Accuracy is the degree of closeness of measurements of a quantity to that quantity's actual (true) value. Accuracy = Predicted class 0 Predicted class 1 Actual class 0 True Positives (tp) False Negatives (fn) Actual class 1 False Positives (fp) True Negatives (tn) tp + tn tp + fp + tn + fn (6) E. Kappa Statistic Kappa Statistic can be defined as a measure of the degree of non-random agreement between observers or measurements of the same categorical variable and it is commonly used for the purpose of finding the agreement between the observers [19]. One of the uses of kappa is quantifying the actual levels of agreement. Kappa's calculation uses a term called the proportion of chance (or expected) agreement. This is interpreted as the proportion of times, the raters would agree by chance alone. F. Area Under the Curve (AUC) It is a measure to find the goodness of a classification algorithm by plotting a certain curve called the ROC curve and measuring the area under this curve [9]. G. Cross Validation Cross-validation [20] is a technique for assessing how the results of a statistical analysis will generalize to an independent data set. One round of cross-validation involves partitioning a sample of data into complementary subsets, performing the analysis on one subset (called the training set), and validating the analysis on the other subset (called the testing set). In k- fold cross-validation, the original sample is randomly partitioned into k subsamples. Of the k subsamples, a single subsample is retained as the validation data for testing the model, and the remaining k 1 sub-samples are used as training data. The cross-validation process is then repeated k times (the folds), and the k results from the folds are averaged (or otherwise combined) to produce a single estimation. V. EXPERIMENTAL RESULTS Various CDR files containing the spent details, refill details and customer details for six months were combined together using customer mobile number as the primary key. This input CDR file contained approximately 1 million records. From this CDR file, training data is generated by taking the data for a period of 15 days and test data is generated for a period of 15 days consecutively. From these sets, aggregation was made for combining the details of each customer (using mobile number),and also on a weekly basis. After this aggregation, the training set contained 9463 records with 430 campaign acceptors. The test set contained 5581 records with 257 campaign acceptors were selected for running different experiments. Also a validation set combining the training and test set was generated. This CDR file with 39 attributes was pre processed to remove the missing values, outliers, error values and some of the attributes which was not necessary for our analysis. From the pre processed CDRs, 27 primary attributes were selected. Out of the 27 attributes, 17 attributes were considered as such without any changes. 7 attributes were aggregated on a weekly basis for each unique customer. The remaining 3 attributes were used for generating 8 derived attributes. Thus our final data file consists of 32 attributes, 17 primary and 15 (7 + 8) derived and aggregated attributes. There is no restriction for the number of attribute usage. Fig 2. Comparison of classification algorithms on a training set From another CDR which contained details about the campaigns such as campaign name, launch date and the customers who accepted the campaigns were retrieved. The available details also include the type of
6 campaign and the date of launch. From this data, information about every unique campaign was retrieved for our analysis. In this study, a discount campaign is considered for further analysis. Fig 3. Comparison of classification algorithms on a test set A. Evaluation of Single Classifiers The classification algorithms are numbered as follows for easy representation: 1. OneR, 2. Logistic Regression, 3. Bayes Net, 4. J48 Tree induction, 5.Voting, 6. Cost Stacking and 7. CRR. The evaluation measure (F-Measure) on the training set using 6 different classification algorithms is represented in Fig 2 and those on the test set are represented in Fig 3. There is comparatively a similar performance between classification algorithms on both training and test sets. Only Classification models 5 and 6 shows some improved performance on test set. It is clear that both models work on combining the results of different classifiers. Also AUC values generated on the test set, given in Fig 4 clearly illustrates the same that methods 5 and 6 outperforms the other single classifiers on the campaigning task. Table 1: Evaluation Combined Regression and Ranking Threshold Precision Recall F-measure AUC The evaluation measures on the validation set using 6 different classification algorithms on performing a cross-validation are tabulated in Table 2. The results demonstrate that the Precision, Accuracy and F- measure obtained on the cross validation sets are higher than that of the test set. This shows that the period of time considered between the training set and test set accounts for the changes in the features considered. Once the training set and test set are considered together, the features obtained are accurate and it results in satisfactory F-measure. As this is more suitable for a binary classification, Logistic Regression fits the function with least error and gives a high F-measure than other classifiers in this case. Table 2: Evaluation measures on performing a cross validation Method Precision Recall F-Measure Accuracy (%) Kappa Statistic Table 3. Evaluation of Hybrid Models on Test Set Fig 4. Comparison of classification algorithms AUC Combinati Method on of performing Methods well 1 and 2 Cost Stacking(Co st 20) 3 and 4 Cost Stacking(Co st 10) 1 and 3 Voting- Simple Averaging Precision Recall F- Meas ure Accura cy(%) Kappa Statisti c The results illustrated that a combination of the better performing individual classifiers provides better F-measure and AUC. Their combination was experimented by stacking them (Cost Stacking) and through Voting. Moreover, Evaluation measures obtained on the test set are low when compared with that of the training set. It shows that the features obtained on the training set suits only up to 50% of the features on the test set. The results of the method CRR is given in Table 1, which clearly shows the vast improvement in recall value compare to other methods for different threshold levels, but precision value is very low. Our purpose is to get better F- measure for the campaign task without compromising the retrieval of relevant customers. 2 and 3 Cost Stacking (Cost 7) 2 and 4 Cost Stacking (Cost 15) B. Evaluation of Hybrid Models Evaluation of the Hybrid approach is done by combining two of the single classifiers at a time and finding out the method (Voting, Cost Stacking for various Costs) which gives good results. The results are tabulated in Table 3. The usage of hybrid model of combining OneR and logistic regression results better F-measure compare to other combinations.
7 VI. DISCUSSION It is observed from Table 3 that the single classifiers which performed well (in terms of F- Measure) individually (Logistic and OneR) provided good F-measure when they were combined together using Cost Stacking. The combination of the classifiers OneR and logistic regression, performing well shows that this campaign has one feature as an influencing attribute and here logistic classifier acts as a booster to improve the results. Usage of Cost 20 in Cost Stacking proves that a higher cost minimizes the misclassifications on the specified class. Also the results obtained from CRR proves that combination of ranking with regression approach gives better results in terms of understanding suitable customers for campaigning in our model. So the methods can be implemented based on the requirements of the model. In addition to performing multiple experiments, we have developed a GUI (Graphical User Interface) for automated campaign process which starts performing well for all the tasks from campaign generation to campaign tracking. A sample screenshot of the GUI is shown in Fig 5. Data mining techniques were useful in selection of features from the CDRs and thus to find out the proper target of customers to launch different types of campaigns. The automated provisioning of campaign management process became effective and economical in this process. Figure 5. GUI for Automated Campaign Management Process VII. CONCLUSION Predicting the behavior of the customers relating to their campaign acceptance has been done successfully by extracting the important features. The experimental results are verified for a particular type of campaign. Similar method can be used for the other types and the groups of customers for which the campaign should be launched can be easily determined. The above result shows that an accuracy rate of % and an F-measure of 40% can be reached in our method. Moreover, by the method of automotive provisioning, our approach becomes more innovative and effective in finding out the group of customers for launching the campaigns promptly. ACKNOWLEDGMENT This work was supported by Ericsson R&D, Chennai. We would like to thank my team and Research Manager, R& D Head, for moral support and encouragement to complete this work. Also we propose our sincere thanks to other colleagues Venkatesh and Vikas Verma for their excellent support in figure out specific models for our research activities. REFERENCES [1] [2] whitepaper_marketing_automation.pdf [3] Michael J. A. Berry and Gordon S. Linoff, Data Mining Techniques For Marketing, Sales, and Customer Relationship Management, Second Edition, Wiley Eastern, 2004 [4] Charles Elkan, The Foundations of Cost Learning, Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence (IJCAI), pp , [5] Buddhinath G, Derry D. A Simple Enhancement to One Rule Classification. Department of Computer Science & Software Engineering. University of Melbourne, Australia. [6] Jie Cheng and Russell Greiner, Learning Bayesian Belief Network Classifiers: Algorithms and System, Proceedings of 14 th Biennial conference of the Canadian Society on Computational Studies of Intelligence: Advances in Artificial Intelligence, Springer- Verlag, London, UK, 2001 [7] Agresti A, "Building and applying logistic regression models". An Introduction to Categorical Data Analysis. Hoboken, New Jersey: Wiley. p. 138, 2007 [8] Schulley, D, Combined Regression and Ranking, KDD 10, July 25-28, 2010, Washington, DC, USA. [9] David MW Powers, Evaluation: From Precision, Recall and F- Factor to ROC, Informedness, Markedness & Correlation, Technical Report SIE , extended version of paper presented in HC Science SummerFest, Dec [10] Chu Chai Henry Chan, Intelligent value- based customer segmentation method for campaign management: A case study of automobile retailer, Elsevier, pp ,2008. [11] Tarik Rashid, Classification of Churn and non-churn customers for Telecommunication Companies, International Journal of Biometrics and Bioinformatics (IJBB), Volume 3, Issue 5,pp ,2008. [12] Yeshwanth V, Vimal Raj A, and Saravanan, M. Evolutionary Churn Prediction in Mobile Networks using Hybrid Learning, in Proceedings of 24 th International Florida Artificial Intelligence Research Society Conference (FLAIRS-24), May 18-20, 2011, Palm Beach, Florida, USA. [13] Hung-Yi Lo, Ju-Chiang Wang, Hsin-Min Wang and Shou-De Lin, Cost- Stacking for Audio Tag Annotation an Retrieval, Proceeding of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp , [14] Sandro Saitta, Introduction to feature selection(part 1), Data Mining Research, Sep [15] Eugene Tuv and Alexander Borisov, Feature Selection with Ensembles, Artificial Variables, and Redundancy Elimination, Journal of Machine Learning Research 10, pp ,2009. [16] Yvan Saey, Inaki Inza and Pedro Larranaga A review of feature selection techniques in bioinformatics, Bioinformatics Advance Access, [17] Edward O.Cannon, Ata Amini, Support vector inductive logic programming outperforms the naive Bayes classifier and inductive logic programming for the classification of bioactive chemical compounds, Springer, Mar [18] Ross Quinlan, C4.5: Programs for Machine Learning,, Morgan Kaufmann Publishers, San Mateo, CA, [19] Anthony J.Viera and Joannae M.Garrett, Understanding Interobserver Agreement:The Kappa Statistic, Research Series, vol.37, no.5, May [20] -validation.pdf
Data Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
Data Mining Algorithms Part 1. Dejan Sarka
Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka ([email protected]) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses
Overview. Evaluation Connectionist and Statistical Language Processing. Test and Validation Set. Training and Test Set
Overview Evaluation Connectionist and Statistical Language Processing Frank Keller [email protected] Computerlinguistik Universität des Saarlandes training set, validation set, test set holdout, stratification
Enhanced Boosted Trees Technique for Customer Churn Prediction Model
IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 Vol. 04, Issue 03 (March. 2014), V5 PP 41-45 www.iosrjen.org Enhanced Boosted Trees Technique for Customer Churn Prediction
Using Random Forest to Learn Imbalanced Data
Using Random Forest to Learn Imbalanced Data Chao Chen, [email protected] Department of Statistics,UC Berkeley Andy Liaw, andy [email protected] Biometrics Research,Merck Research Labs Leo Breiman,
Data Mining Solutions for the Business Environment
Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania [email protected] Over
Using Data Mining for Mobile Communication Clustering and Characterization
Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer
Chapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
Selecting Data Mining Model for Web Advertising in Virtual Communities
Selecting Data Mining for Web Advertising in Virtual Communities Jerzy Surma Faculty of Business Administration Warsaw School of Economics Warsaw, Poland e-mail: [email protected] Mariusz Łapczyński
How To Identify A Churner
2012 45th Hawaii International Conference on System Sciences A New Ensemble Model for Efficient Churn Prediction in Mobile Telecommunication Namhyoung Kim, Jaewook Lee Department of Industrial and Management
Keywords Data mining, Classification Algorithm, Decision tree, J48, Random forest, Random tree, LMT, WEKA 3.7. Fig.1. Data mining techniques.
International Journal of Emerging Research in Management &Technology Research Article October 2015 Comparative Study of Various Decision Tree Classification Algorithm Using WEKA Purva Sewaiwar, Kamal Kant
Data Mining. Nonlinear Classification
Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
Classification of Bad Accounts in Credit Card Industry
Classification of Bad Accounts in Credit Card Industry Chengwei Yuan December 12, 2014 Introduction Risk management is critical for a credit card company to survive in such competing industry. In addition
International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES
DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES Vijayalakshmi Mahanra Rao 1, Yashwant Prasad Singh 2 Multimedia University, Cyberjaya, MALAYSIA 1 [email protected]
Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier
International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-1, Issue-6, January 2013 Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing
AUTO CLAIM FRAUD DETECTION USING MULTI CLASSIFIER SYSTEM
AUTO CLAIM FRAUD DETECTION USING MULTI CLASSIFIER SYSTEM ABSTRACT Luis Alexandre Rodrigues and Nizam Omar Department of Electrical Engineering, Mackenzie Presbiterian University, Brazil, São Paulo [email protected],[email protected]
Experiments in Web Page Classification for Semantic Web
Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address
DATA MINING TECHNIQUES AND APPLICATIONS
DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,
Roulette Sampling for Cost-Sensitive Learning
Roulette Sampling for Cost-Sensitive Learning Victor S. Sheng and Charles X. Ling Department of Computer Science, University of Western Ontario, London, Ontario, Canada N6A 5B7 {ssheng,cling}@csd.uwo.ca
Choosing the Best Classification Performance Metric for Wrapper-based Software Metric Selection for Defect Prediction
Choosing the Best Classification Performance Metric for Wrapper-based Software Metric Selection for Defect Prediction Huanjing Wang Western Kentucky University [email protected] Taghi M. Khoshgoftaar
How To Solve The Kd Cup 2010 Challenge
A Lightweight Solution to the Educational Data Mining Challenge Kun Liu Yan Xing Faculty of Automation Guangdong University of Technology Guangzhou, 510090, China [email protected] [email protected]
Constrained Classification of Large Imbalanced Data by Logistic Regression and Genetic Algorithm
Constrained Classification of Large Imbalanced Data by Logistic Regression and Genetic Algorithm Martin Hlosta, Rostislav Stríž, Jan Kupčík, Jaroslav Zendulka, and Tomáš Hruška A. Imbalanced Data Classification
Mining the Software Change Repository of a Legacy Telephony System
Mining the Software Change Repository of a Legacy Telephony System Jelber Sayyad Shirabad, Timothy C. Lethbridge, Stan Matwin School of Information Technology and Engineering University of Ottawa, Ottawa,
CHURN PREDICTION IN MOBILE TELECOM SYSTEM USING DATA MINING TECHNIQUES
International Journal of Scientific and Research Publications, Volume 4, Issue 4, April 2014 1 CHURN PREDICTION IN MOBILE TELECOM SYSTEM USING DATA MINING TECHNIQUES DR. M.BALASUBRAMANIAN *, M.SELVARANI
Data Mining Application in Direct Marketing: Identifying Hot Prospects for Banking Product
Data Mining Application in Direct Marketing: Identifying Hot Prospects for Banking Product Sagarika Prusty Web Data Mining (ECT 584),Spring 2013 DePaul University,Chicago [email protected] Keywords:
Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing
www.ijcsi.org 198 Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing Lilian Sing oei 1 and Jiayang Wang 2 1 School of Information Science and Engineering, Central South University
Supervised Learning (Big Data Analytics)
Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used
BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts
BIDM Project Predicting the contract type for IT/ITES outsourcing contracts N a n d i n i G o v i n d a r a j a n ( 6 1 2 1 0 5 5 6 ) The authors believe that data modelling can be used to predict if an
Evaluation & Validation: Credibility: Evaluating what has been learned
Evaluation & Validation: Credibility: Evaluating what has been learned How predictive is a learned model? How can we evaluate a model Test the model Statistical tests Considerations in evaluating a Model
Data Quality Mining: Employing Classifiers for Assuring consistent Datasets
Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Fabian Grüning Carl von Ossietzky Universität Oldenburg, Germany, [email protected] Abstract: Independent
REVIEW OF ENSEMBLE CLASSIFICATION
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IJCSMC, Vol. 2, Issue.
Mining Life Insurance Data for Customer Attrition Analysis
Mining Life Insurance Data for Customer Attrition Analysis T. L. Oshini Goonetilleke Informatics Institute of Technology/Department of Computing, Colombo, Sri Lanka Email: [email protected] H. A. Caldera
Pattern-Aided Regression Modelling and Prediction Model Analysis
San Jose State University SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research Fall 2015 Pattern-Aided Regression Modelling and Prediction Model Analysis Naresh Avva Follow this and
T-61.3050 : Email Classification as Spam or Ham using Naive Bayes Classifier. Santosh Tirunagari : 245577
T-61.3050 : Email Classification as Spam or Ham using Naive Bayes Classifier Santosh Tirunagari : 245577 January 20, 2011 Abstract This term project gives a solution how to classify an email as spam or
HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION
HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION Chihli Hung 1, Jing Hong Chen 2, Stefan Wermter 3, 1,2 Department of Management Information Systems, Chung Yuan Christian University, Taiwan
STATISTICA. Financial Institutions. Case Study: Credit Scoring. and
Financial Institutions and STATISTICA Case Study: Credit Scoring STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table of Contents INTRODUCTION: WHAT
E-commerce Transaction Anomaly Classification
E-commerce Transaction Anomaly Classification Minyong Lee [email protected] Seunghee Ham [email protected] Qiyi Jiang [email protected] I. INTRODUCTION Due to the increasing popularity of e-commerce
Using reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management
Using reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management Paper Jean-Louis Amat Abstract One of the main issues of operators
Churn Prediction. Vladislav Lazarov. Marius Capota. [email protected]. [email protected]
Churn Prediction Vladislav Lazarov Technische Universität München [email protected] Marius Capota Technische Universität München [email protected] ABSTRACT The rapid growth of the market
Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification
Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Tina R. Patil, Mrs. S. S. Sherekar Sant Gadgebaba Amravati University, Amravati [email protected], [email protected]
Analysis of WEKA Data Mining Algorithm REPTree, Simple Cart and RandomTree for Classification of Indian News
Analysis of WEKA Data Mining Algorithm REPTree, Simple Cart and RandomTree for Classification of Indian News Sushilkumar Kalmegh Associate Professor, Department of Computer Science, Sant Gadge Baba Amravati
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,
FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS
FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS Breno C. Costa, Bruno. L. A. Alberto, André M. Portela, W. Maduro, Esdras O. Eler PDITec, Belo Horizonte,
Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets
Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets http://info.salford-systems.com/jsm-2015-ctw August 2015 Salford Systems Course Outline Demonstration of two classification
Predictive Modeling for Collections of Accounts Receivable Sai Zeng IBM T.J. Watson Research Center Hawthorne, NY, 10523. Abstract
Paper Submission for ACM SIGKDD Workshop on Domain Driven Data Mining (DDDM2007) Predictive Modeling for Collections of Accounts Receivable Sai Zeng [email protected] Prem Melville Yorktown Heights, NY,
Detection. Perspective. Network Anomaly. Bhattacharyya. Jugal. A Machine Learning »C) Dhruba Kumar. Kumar KaKta. CRC Press J Taylor & Francis Croup
Network Anomaly Detection A Machine Learning Perspective Dhruba Kumar Bhattacharyya Jugal Kumar KaKta»C) CRC Press J Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint of the Taylor
MAXIMIZING RETURN ON DIRECT MARKETING CAMPAIGNS
MAXIMIZING RETURN ON DIRET MARKETING AMPAIGNS IN OMMERIAL BANKING S 229 Project: Final Report Oleksandra Onosova INTRODUTION Recent innovations in cloud computing and unified communications have made a
Azure Machine Learning, SQL Data Mining and R
Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:
Knowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 11 Sajjad Haider Fall 2013 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right
A General Approach to Incorporate Data Quality Matrices into Data Mining Algorithms
A General Approach to Incorporate Data Quality Matrices into Data Mining Algorithms Ian Davidson 1st author's affiliation 1st line of address 2nd line of address Telephone number, incl country code 1st
Predicting Student Performance by Using Data Mining Methods for Classification
BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 13, No 1 Sofia 2013 Print ISSN: 1311-9702; Online ISSN: 1314-4081 DOI: 10.2478/cait-2013-0006 Predicting Student Performance
ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA
ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA D.Lavanya 1 and Dr.K.Usha Rani 2 1 Research Scholar, Department of Computer Science, Sree Padmavathi Mahila Visvavidyalayam, Tirupati, Andhra Pradesh,
1. Classification problems
Neural and Evolutionary Computing. Lab 1: Classification problems Machine Learning test data repository Weka data mining platform Introduction Scilab 1. Classification problems The main aim of a classification
UNDERSTANDING THE EFFECTIVENESS OF BANK DIRECT MARKETING Tarun Gupta, Tong Xia and Diana Lee
UNDERSTANDING THE EFFECTIVENESS OF BANK DIRECT MARKETING Tarun Gupta, Tong Xia and Diana Lee 1. Introduction There are two main approaches for companies to promote their products / services: through mass
DATA PREPARATION FOR DATA MINING
Applied Artificial Intelligence, 17:375 381, 2003 Copyright # 2003 Taylor & Francis 0883-9514/03 $12.00 +.00 DOI: 10.1080/08839510390219264 u DATA PREPARATION FOR DATA MINING SHICHAO ZHANG and CHENGQI
Extension of Decision Tree Algorithm for Stream Data Mining Using Real Data
Fifth International Workshop on Computational Intelligence & Applications IEEE SMC Hiroshima Chapter, Hiroshima University, Japan, November 10, 11 & 12, 2009 Extension of Decision Tree Algorithm for Stream
Feature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification
Feature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification Henrik Boström School of Humanities and Informatics University of Skövde P.O. Box 408, SE-541 28 Skövde
Implementation of Data Mining Techniques to Perform Market Analysis
Implementation of Data Mining Techniques to Perform Market Analysis B.Sabitha 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, P.Balasubramanian 4 PG Scholar, Indian Institute of Information Technology, Srirangam,
DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM
INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS ISSN 2320-7345 DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM M. Mayilvaganan 1, S. Aparna 2 1 Associate
Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05
Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification
SVM Ensemble Model for Investment Prediction
19 SVM Ensemble Model for Investment Prediction Chandra J, Assistant Professor, Department of Computer Science, Christ University, Bangalore Siji T. Mathew, Research Scholar, Christ University, Dept of
Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms
Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Scott Pion and Lutz Hamel Abstract This paper presents the results of a series of analyses performed on direct mail
A Content based Spam Filtering Using Optical Back Propagation Technique
A Content based Spam Filtering Using Optical Back Propagation Technique Sarab M. Hameed 1, Noor Alhuda J. Mohammed 2 Department of Computer Science, College of Science, University of Baghdad - Iraq ABSTRACT
Introduction to Data Mining
Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be
Knowledge Discovery from patents using KMX Text Analytics
Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs [email protected] Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers
A Logistic Regression Approach to Ad Click Prediction
A Logistic Regression Approach to Ad Click Prediction Gouthami Kondakindi [email protected] Satakshi Rana [email protected] Aswin Rajkumar [email protected] Sai Kaushik Ponnekanti [email protected] Vinit Parakh
Data Mining as a tool to Predict the Churn Behaviour among Indian bank customers
Data Mining as a tool to Predict the Churn Behaviour among Indian bank customers Manjit Kaur Department of Computer Science Punjabi University Patiala, India [email protected] Dr. Kawaljeet Singh University
An innovative application of a constrained-syntax genetic programming system to the problem of predicting survival of patients
An innovative application of a constrained-syntax genetic programming system to the problem of predicting survival of patients Celia C. Bojarczuk 1, Heitor S. Lopes 2 and Alex A. Freitas 3 1 Departamento
ROC Curve, Lift Chart and Calibration Plot
Metodološki zvezki, Vol. 3, No. 1, 26, 89-18 ROC Curve, Lift Chart and Calibration Plot Miha Vuk 1, Tomaž Curk 2 Abstract This paper presents ROC curve, lift chart and calibration plot, three well known
Financial Statement Fraud Detection: An Analysis of Statistical and Machine Learning Algorithms
Financial Statement Fraud Detection: An Analysis of Statistical and Machine Learning Algorithms Johan Perols Assistant Professor University of San Diego, San Diego, CA 92110 [email protected] April
Customer Classification And Prediction Based On Data Mining Technique
Customer Classification And Prediction Based On Data Mining Technique Ms. Neethu Baby 1, Mrs. Priyanka L.T 2 1 M.E CSE, Sri Shakthi Institute of Engineering and Technology, Coimbatore 2 Assistant Professor
A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier
A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier G.T. Prasanna Kumari Associate Professor, Dept of Computer Science and Engineering, Gokula Krishna College of Engg, Sullurpet-524121,
An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015
An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content
Active Learning SVM for Blogs recommendation
Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the
Advanced analytics at your hands
2.3 Advanced analytics at your hands Neural Designer is the most powerful predictive analytics software. It uses innovative neural networks techniques to provide data scientists with results in a way previously
Consolidated Tree Classifier Learning in a Car Insurance Fraud Detection Domain with Class Imbalance
Consolidated Tree Classifier Learning in a Car Insurance Fraud Detection Domain with Class Imbalance Jesús M. Pérez, Javier Muguerza, Olatz Arbelaitz, Ibai Gurrutxaga, and José I. Martín Dept. of Computer
A Hybrid Data Mining Model to Improve Customer Response Modeling in Direct Marketing
A Hybrid Data Mining Model to Improve Customer Response Modeling in Direct Marketing Maryam Daneshmandi [email protected] School of Information Technology Shiraz Electronics University Shiraz, Iran
not possible or was possible at a high cost for collecting the data.
Data Mining and Knowledge Discovery Generating knowledge from data Knowledge Discovery Data Mining White Paper Organizations collect a vast amount of data in the process of carrying out their day-to-day
ANALYSIS OF FEATURE SELECTION WITH CLASSFICATION: BREAST CANCER DATASETS
ANALYSIS OF FEATURE SELECTION WITH CLASSFICATION: BREAST CANCER DATASETS Abstract D.Lavanya * Department of Computer Science, Sri Padmavathi Mahila University Tirupati, Andhra Pradesh, 517501, India [email protected]
Evaluation of Feature Selection Methods for Predictive Modeling Using Neural Networks in Credits Scoring
714 Evaluation of Feature election Methods for Predictive Modeling Using Neural Networks in Credits coring Raghavendra B. K. Dr. M.G.R. Educational and Research Institute, Chennai-95 Email: [email protected]
Random Forest Based Imbalanced Data Cleaning and Classification
Random Forest Based Imbalanced Data Cleaning and Classification Jie Gu Software School of Tsinghua University, China Abstract. The given task of PAKDD 2007 data mining competition is a typical problem
Decision Support Systems
Decision Support Systems 50 (2011) 602 613 Contents lists available at ScienceDirect Decision Support Systems journal homepage: www.elsevier.com/locate/dss Data mining for credit card fraud: A comparative
Projektgruppe. Categorization of text documents via classification
Projektgruppe Steffen Beringer Categorization of text documents via classification 4. Juni 2010 Content Motivation Text categorization Classification in the machine learning Document indexing Construction
Data Mining: Overview. What is Data Mining?
Data Mining: Overview What is Data Mining? Recently * coined term for confluence of ideas from statistics and computer science (machine learning and database methods) applied to large databases in science,
Customer Relationship Management using Adaptive Resonance Theory
Customer Relationship Management using Adaptive Resonance Theory Manjari Anand M.Tech.Scholar Zubair Khan Associate Professor Ravi S. Shukla Associate Professor ABSTRACT CRM is a kind of implemented model
Learning is a very general term denoting the way in which agents:
What is learning? Learning is a very general term denoting the way in which agents: Acquire and organize knowledge (by building, modifying and organizing internal representations of some external reality);
Predict Influencers in the Social Network
Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, [email protected] Department of Electrical Engineering, Stanford University Abstract Given two persons
Towards better accuracy for Spam predictions
Towards better accuracy for Spam predictions Chengyan Zhao Department of Computer Science University of Toronto Toronto, Ontario, Canada M5S 2E4 [email protected] Abstract Spam identification is crucial
Data Mining in CRM & Direct Marketing. Jun Du The University of Western Ontario [email protected]
Data Mining in CRM & Direct Marketing Jun Du The University of Western Ontario [email protected] Outline Why CRM & Marketing Goals in CRM & Marketing Models and Methodologies Case Study: Response Model Case
Introducing diversity among the models of multi-label classification ensemble
Introducing diversity among the models of multi-label classification ensemble Lena Chekina, Lior Rokach and Bracha Shapira Ben-Gurion University of the Negev Dept. of Information Systems Engineering and
Machine Learning and Data Mining. Fundamentals, robotics, recognition
Machine Learning and Data Mining Fundamentals, robotics, recognition Machine Learning, Data Mining, Knowledge Discovery in Data Bases Their mutual relations Data Mining, Knowledge Discovery in Databases,
Data Mining Part 5. Prediction
Data Mining Part 5. Prediction 5.1 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Classification vs. Numeric Prediction Prediction Process Data Preparation Comparing Prediction Methods References Classification
STATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
Enhancing Quality of Data using Data Mining Method
JOURNAL OF COMPUTING, VOLUME 2, ISSUE 9, SEPTEMBER 2, ISSN 25-967 WWW.JOURNALOFCOMPUTING.ORG 9 Enhancing Quality of Data using Data Mining Method Fatemeh Ghorbanpour A., Mir M. Pedram, Kambiz Badie, Mohammad
