Introduction to Data Mining

Size: px
Start display at page:

Download "Introduction to Data Mining"

Transcription

1 Introduction to Data Mining Part 5: Prediction Spring 2015 Ming Li Department of Computer Science and Technology Nanjing University

2 Prediction Predictive modeling can be thought of as learning a mapping from an input instance x to a label y Classification Predicts the categorical labels Prediction Regression Predicts numerical labels Ranking Predicts ordinal labels

3 Two step process of prediction (I) Step 1: Construct a model based on a training set the set of tuples used for model construction is called training set the set of tuples can be called as a sample (a tuple can also be called as a sample) a tuple is usually called an example (usually with the label) or an instance (usually without the label) The attribute to be predicted is called label Training Data label Learning algorithm Name Rank Years Tenured Mike Assistant Prof 3 No Mary Assistant Prof 7 Yes Bill Professor 2 Yes Jim Associate Prof 7 Yes Dave Assistant Prof 6 No Anne Associate Prof 3 no Prediction model e.g., IF rank = professor OR years > 6 THEN tenured = yes

4 Two step process of prediction (II) Step 2: Use the model to predict unseen instances before use the model, we can estimate the accuracy of the model by a test set Test set is different from training set The desired output of a test instance is compared with the actual output from the model for classification, the accuracy is usually measured by the percentage of test instances that are correctly classified by the model for regression, the accuracy is usually measured by mean squared error accuracy Test Data Prediction model Tenured? Name Rank Years Tenured Tom Assistant Prof 2 No Merlisa Associate Prof 7 No George Professor 5 Yes Joseph Assistant Prof 7 Yes Yes Unseen Data (Jeff, Professor,7)

5 Turing Award 2011 PAC (Probably Approximately Correct) : There exists a sample size m, L. G. Valiant. A theory of the learnable. Communications of the ACM, 1984, 27(11): Leslie Valiant ( ) 2011 (Harvard Univ.)

6 Supervised vs. Unsupervised learning Supervised learning - the training data are accompanied by labels indicating the desired outputs of the observations - the concerned property of unseen data is predicted - usually: classification, regression Unsupervised learning - the labels of training data are unknown - given a set of observations, to discover the inherent properties, such as the existence of classes or clusters, in the data - usually: clustering, density estimation

7 How to evaluate prediction algorithms? Generalization the ability of the model to correctly predict unseen instances. Speed the computational cost involved in generating and using the model training time cost vs. test time cost usually, larger training time cost but smaller test time cost Robustness the ability of model to deal with noise or missing values Scalability the ability of the model to deal with huge volume of data Comprehensibility the level of interpretability of the model

8 How to evaluate the generalization? Well-known evaluation measures: Classification Accuracy Overall cost Precision, Recall, F1 AUC (area under the ROC curve) Regression MSE (mean squared error) Ranking Ranking loss MAP (mean average precesion) NDCG

9 How to measure? In most cases, we don t have a test set at all! So we have to leverage our provided data to measure Two widely used method Hold out: Randomly partition the data into two disjoint set, one for training and one for test. (usually 2/3 vs. 1/3, or 75% vs. 25%) Repeat the process for many times to achieve a good estimate Cross validation Randomly partition data into k disjoint sets with equal-size. Sequentially choose 1 set for test and the others for training. So the training / testing will be conducted k times. e.g., 10- fold cross validation, Leave- one- out,.

10 How good can a prediction model be? Perfect prediction is what we expected. However, for some problems, no perfect prediction can be achieved if no other knowledge is used besides the data Not separable! Mistake is inevitable in this case

11 How good can a predictive model be? Bayes decision and Bayes error best current Bayes Error: No other classifier can achieve a lower expected error rate on unseen new data. It is a lower- bound on the best classifier for this problem

12 No Free Lunch Generally, there is no algorithm which is consistently better than other algorithm The No Free Lunch Theorem states that if an algorithm performs well on a certain class of problems then it necessarily pays for that with degraded performance on the set of all remaining problems D.H. Wolpert and W.G. Macready. No free lunch theorems for search. IEEE TEC, 1997, 1(1):67-82 D.H. Wolpert and W.G. Macready. No free lunch theorems for optimization. Tech. Rep. SFI-TR , Santa Fe Institute, Different Algorithms usually have different pros and cons, therefore, it is important to know the strength/weakness of an algorithm and when should it be used

13 Different types of classifier Discriminative Aims to model how the data can be separated Models the decision boundary directly e.g., perceptron, Nerual Networks, decision trees, SVM, Models the posterior class probabilities p(c k x) directly e.g., Logistic regression, Generative Aims to find the model that generates the data Model assumption is required. Mismatch might lead to poor prediction e.g., Naïve Bayes, Bayes Network,

14 What is decision tree? Decision tree is a flow-chart-like tree structure, where each internal node denotes a test on an attribute, each branch represents an outcome of the test, and each leaf represents a class or class distribution an example < age? >40 student? yes credit_rating? no yes excellent fair no yes yes no the topmost node in a tree is the root node in order to classify an unseen instance, the attribute values of the instance are tested against the decision tree. A path is traced from the root to a leaf which holds the class prediction for the instance

15 Brief history of decision tree (I) The first decision tree algorithm is CLS (Concept Learning System) [E. B. Hunt, J. Marin, and P. T. Stone s book Experiments in Induction published by Academic Press in 1966] The algorithm raised the interests in decision tree is ID3 [J. R. Quinlan s paper in a book Expert System in the Micro Electronic Age edited by D. Michie, published by Edinburgh University Press in 1979] J. Ross Quinlan SIGKDD Innovation Award Winner (2011) The most popular decision tree algorithm is C4.5 [J. R. Quinlan s book C4.5: Programs for Machine Learning published by Morgan Kaufmann in 1993]

16 Brief history of decision tree (II) The most popular decision tree algorithm that can be used in regression is CART (Classification and Regression Tree) [L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone s book Classification and Regression Trees published by Wadsworth in 1984] The strongest decision tree-based learning algorithm is RandomForests, a tree ensemble algorithm [L. Breiman s MLJ 01 paper Random Forests ] Leo Breiman SIGKDD Innovation Award Winner (2005)

17 How to construct a decision tree? (I) Basic strategy: A tree is constructed in a top-down recursive divide-and-conquer manner At start, all the training examples are at the root Attributes are categorical (if continuous-valued, they are discretized in advance) Examples are partitioned recursively based on selected attributes a selected attribute is also called a split or a test Splits are selected based on a heuristic or statistical measure (e.g., information gain) The partitioning terminates if any of the constraints is met: - all examples falling into a node belong to the same class this node becomes a leaf whose label is the class - no attribute can be used to further partition the data this node becomes a leaf whose label is the majority class of the examples falling into the node - no instance falling into a node this node becomes a leaf whose label is the majority class of the examples falling into the parent of the node

18 How to construct a decision tree? (II) Algorithm of the basic strategy (ID3): generate_decision_tree (samples, attribute_list) 1) create a node N; 2) if samples are all of the same class, C, then return N as a leaf node labeled with class C; 3) if attribute_list is empty then return N as a leaf node labeled with the most common class in samples 4) select test_attribute, the attribute among attribute_list with the highest information gain; 5) label node N with test_attribute; 6) for each known value a i of test_attribute how? 1) grow a branch from node N for the condition test_attribute = a i ; 2) Let s i be the set of samples in samples for which test_attribute = a i ; 3) if s i is empty then attach a leaf labeled with the most common class in samples; 4) else attach the node returned by generate_decision_tree (s i, attribute_list-test_attribute);

19 SpliZing criteria: Information gain S: training set S i : training instances of class C i (i = 1,,m) a j : values of attribute A (j = 1,,v) the information needed to correctly classify the training set is suppose attribute A is selected to partition the training set into the subsets {S A 1, S A 2,,SA v }, then the entropy of the subsets, i.e., the information needed to classify all the instances in those subsets is where S A ij is the instances of class C j contained in Sa i then the information gain of selecting A is the bigger the information gain, the more relevant the attribute A

20 SpliZing criteria: Example of information gain (I) Target class: Graduate students (Σ=120) gender major birth_country age_range gpa count M Science Canada Very_good 16 F Science Foreign Excellent 22 M Engineering Foreign Excellent 18 F Science Foreign Excellent 25 M Science Canada Excellent 21 F Engineering Canada Excellent 18 Contrasting class: Undergraduate students (Σ=130) gender major birth_country age_range gpa count M Science Foreign <20 Very_good 18 F Business Canada <20 Fair 20 M Business Canada <20 Fair 22 F Science Canada Fair 24 M Engineering Foreign Very_good 22 F Engineering Canada <20 Excellent 24

21 SpliZing criteria: Example of information gain (II) the information needed to correctly classify the training set is suppose attribute major is selected to partition the training set for major = Science : S 11 = 84, S 12 = 42 for major = Engineering : S 21 = 36, S 22 = 46 for major = Business : S 31 = 0, S 32 = 42 then the entropy of major is

22 SpliZing criteria: Example of information gain (III) then the information gain of major is Gain(major) = I(S 1,S 2 ) E(major) = we can also get the information gain of other attributes: Gain(gender) = Gain(birth_country) = Gain(gpa) = Gain(age_range) = Then we take major as the split because its gain is the biggest

23 SpliZing criteria: Other kinds of split selection criteria (I) Gain ratio a shortcoming of information gain is its bias to attributes with lots of values. In order to reduce the influence of this bias, J. R. Quinlan used gain ratio in C4.5 instead of information gain where attribute A is selected to partition the instance set into the subsets {S 1, S 2,, S v } if IV(X) 0, J. R. Quinlan recommended to select the attribute X, which maximizes the Gain_ratio(X), from the attributes with an average-or-better Gain(X)

24 SpliZing criteria: Other kinds of split selection criteria (II) Gini index (Gini index is used in CART ) S: training set S i : training instances of class C i (i=1,,m) a j : values of attribute A (j=1,,v) the gini of S is suppose attribute A is selected to partition the instance set into the subsets {S 1, S 2,, S v }, then the gini of A is the attribute with the smallest gini is chosen to split the node

25 Why pruning? overfitting( 过 拟 合 ): the trained model fits the training set too much such that it deviates the real distribution of the instance space main reason: finite training set; noise when a decision tree is built, many branches may reflect anomalies in the training set due to noise or outliers pruning is used to address the problem of overfitting

26 How to prune decision tree? Two popular methods: Prepruning Terminate tree construction early: do not split a node if it would result in the goodness measure falling below a threshold hard to choose an appropriate threshold Postpruning Remove branches from a fully grown tree: progressively prune the tree if the goodness measure can be improved The goodness is usually measured with the help of a validation set, which is a set of data different from the training data In general, postpruning is more accurate than prepruning, yet requires more computational cost

27 Mapping decision tree to rules A rule is created for each path from the root to a leaf: Each attribute-value pair along a given path forms a conjunction in the rule antecedent. The class label held by the leaf forms the rule consequent age? < >40 student? yes credit_rating? no yes excellent fair no yes yes no IF age = <30 AND student = no IF age = <30 AND student = yes IF age = IF age = >40 AND credit_rating = execllent IF age = >40 AND credit_rating = fair THEN buys_computer = no THEN buys_computer = yes THEN buys_computer = yes THEN buys_computer = yes THEN buys_computer = no

28 Enhance basic decision tree algorithm (I) Allow for continuous-valued attributes a test on a continuous-valued attribute A results in two branches corresponding to A V and A > V given v values of A, (v-1)possible splits may be considered Handle missing attribute values assign the most common value of the attribute assign probability of each of the possible values Incremental induction it it not good to generate the tree from scratch at every time that new instances arriving dynamically adjust the splits in the tree Attribute construction the initial attributes may not be good for solving the problem generate new attributes from initial ones by constructive induction

29 Enhance basic decision tree algorithm (II) Scalable decision tree algorithm: most studies focus on improving the data structures SLIQ [Mehta et al., EDBT96] build an index for each attribute. Only class list and the current attribute list reside in memory SPRINT [J. Shafer et al., VLDB96] construct an attribute list. When a node is partitioned, the attribute list is also partitioned RainForest [J. Gehrke et al., VLDB98] build an AVC-list (attribute, value, class label) separate the scalability aspects from the criterion that determine the quality of the tree

30 What is neural network? also called artificial neural network neural networks are massively parallel interconnected networks of simple (usually adaptive) elements and their hierarchical organizations which are intended to interact with the objects of the real world in the same way as biological nervous systems do [T. Kohonen, NN88] The basic component of a neural network: neuron and weight M-P model neurons are connected by weights neuron is also called unit bias is also called threshold The knowledge learned by a neural network is encoded in the weights and biases

31 Perceptron The model of the Perceptron It can be learned by For each training example (x i, y i ) do w w η( w T x i y i )x i Repeat Until convergence Aims to find a hyperplain such that it separate different class

32 Gradient Descent w w + Δw Δw = η E(w)

33 Perceptron Limitations of perceptron XOR problem

34 What is multilayer feedforward NN? Feedforward neural network: a kind of neural network where a unit is only connected with the units in its next neighboring layer Hidden units and output units are called functional units, which are usually equipped with a non-linear function (sigmoid function) There is no rule indicating how to get the best network, therefore the network design is a trial-by-error process

35 Backpropagation (I) Abbreviated as BP most popular neural network algorithm, can be used in both classification and regression at first, proposed by P. Werbos in his Ph.D dissertation: P. Werbos. Beyond regression: New tools for prediction and analysis in the behavioral science. Ph.D dissertation, Harvard University, 1974 The BP algorithm Sketch: Step 1: feedforward input from input layer to hidden layer to output layer Step 2: compute the error of the output layer Step 3: backpropagate the error from output layer to hidden layer Step 4: adjust weights and biases Step 5: if termination criterion is satisfied, stop. Otherwise go to step1

36 Backpropagation (II) Backpropagation Procedure: For each given training example (x, y), do Propagate input forward though the network 1. Input the instance x to the NN and compute the output value o u of every output unit u of the network 2. For each network output unit k, calculate its error term δ k 3. For each hidden unit h, calculate its error term δ h δ o (1 o ) w k output h h h hk k 4. Update each network weight w ji which is the weight associated with the i-th input value to the unit j Propagate the errors backward though the network δ

37 Backpropagation (III) Let s derive the BP. Reform the score function as Since we adopt a stochastic gradient descent, we calculate the gradient when receiving the training example (x p, y p ) as, where Then,we derive the corresponding network, such that the update rule can be for different type of units in the

38 Backpropagation (IV) For the weights associated to output unit By using chain rule, we get, where Plugging o j yields Thus,

39 Backpropagation (V) For the weights associated to hidden unit By using chain rule, we get where and Nextlayer(j) is a set of unit whose immediate input is j Thus,

40 Example

41 Example MSE of output layer

42 Example Hidden Unit encoding for input

43 Example Weights from inputs to one hidden unit

44 Backpropagation (VI) Remarks Solution of BP: Guaranteed local minima local minima: Random initiation and stochastic gradient descent make it less likely to get stuck in the local minima. Representation power of Neural Networks: Boolean functions: can be exactly represented by 2 layers of units error surface (Bounded) Continuous functions: can be approximated with arbitrarily small error with 2 layer of units, with 1 layer of sigmoid units (hidden) and 1 layer of linear units Arbitrary functions: can be approximated to arbitrary accuracy by 3 layers of units with 2 hidden layers of sigmoid units.

45 What is support vector machine? Support vector machines are learning systems that use a hypothesis space of linear functions in a high- dimensional feature space, trained with a learning algorithm from optimization theory that implements a learning bias derived from statistical learning theory SVM has close relationship with neural networks, e.g., SVM with Gaussian kernel is actually a RBF neural network Although SVM becomes hot since the middle of 1990s, in fact the idea of support vector was proposed by V. Vapnik in 1963, and some keys gradients of Statistical Learning Theory were obtained in 1960s and 1970s (mainly by V. Vapnik and A. Chervonenkis): VC dimension (1968), structural risk minimization inductive principle (1974)

46 Linear hyperplane Binary classification can be viewed as the task of separating classes in feature space Which hyperplane is better?

47 Margin r(x) = wt x + b w Margin: The width between two classes Assuming all data points are at least distance 1 from the hyperplane, the following two constraints hold for a training set {(x i, y i )} Support Vectors: The examples that are closest to the hyperplane

48 General Model of SVM Model: linear classifier in high dimensional feature space Score function & optimization: Structural risk s.t min w Constrained quadratic optimization. is a mapping from input space to high-dimensional feature space. If ϕ (x) = x, SVM is a linear SVM Empirical risk Aims to find a hyperplane in the high dimensional feature space to separate the data, where the margin between the two classes are maximized

49 From linear to non- linear (I) Why the mapping matters By mapping the input space to higher dimensional feature space, we can achieve nonlinearity by linear means. Maps the input to higher dimensional space Project the hyperplane back to the input space yields the non-linear boundary

50 From linear to non- linear (II) High dimension cause problems on learning (curse of dimensionality) Solution: Kernel trick Kernel trick allows to work in the original space while benefiting from the mappings to high dimensional feature space. A kernel function corresponds to inner products in some high dimensional feature space. Thus, we can implicitly compute the inner products in some high dimensional feature space using kernel function without actually conduct the mapping Mercer s theorem: every semi- definite symmetric function is a kernel

51 What is Bayesian classification? Bayesian classification is based on Bayes rule Bayesian classifiers have exhibited high accuracy and fast speed when applied to large databases Bayes rule where P(H X) is the posterior probability of the hypothesis H conditioned on the data sample X, P(H) is the prior probability of H, P(X H) is the posterior probability of X conditioned on H, P(X) is the prior probability of X

52 Naïve Bayes classifier (I) also called simple Bayes classifier class conditional independence: assume that the effect of an attribute value on a given class is independent of the values of other attributes class C i (i = 1,,m) attribute A k (k = 1,,n) feature vector X = (x 1,x 2,,x n ), where x k is the value of X on A k Naïve Bayes classifier wants to get the maximum a posteriori hypothesis C i P(C i X) > P(C j X) for 1 j m, j i according to Bayes rule, because P(X) is a constant for all classes, only P(X C i )P(C i ) needs to be maximized

53 Naïve Bayes classifier (II) to maximize P(X C i )P(C i ): P(C i ) can be estimated by where S i is the number of training instances of class C i, and S is the total number of training instances since naïve Bayes classifier assumes class conditional independence, P(X Ci) can be estimated by if A k is an categorical attribute, then we can take where S ik is the number of training instances of class C i having the value x k for A k, and S i is the number of training instances of class C i if A k is a continuous attribute, then usually we can take where g(x k,µ ci, σ ci ) is the Gaussian density function for A k ; and are the mean and variance, respectively, given the values for A k for training instances of class C i

54 Example of naïve Bayes classifier (I) training set: C 1 : buys_computer = yes, C 2 : buys_computer = no rid age income student credit_rating Class: buys_computer 1 <30 high no fair no 2 <30 high no execllent no high no fair yes 4 >40 medium no fair yes 5 >40 low yes fair yes 6 >40 low yes excellent no low yes excellent yes 8 <30 medium no fair no 9 <30 low yes fair yes 10 >40 medium yes fair yes 11 <30 medium yes excellent yes medium no excellent yes high yes fair yes 14 >40 medium no execllent no

55 Example of naïve Bayes classifier (II) Given an instance to be classified: X = (age = < 30, income = medium, student = yes, credit_rating = fair ) P(C i ): P(C 1 ) = P(buys_computer = yes ) = 9 / 14 = P(C 2 ) = P(buys_computer = no ) = 5 / 14 = P(X C i ): since P(age = <30 buys_computer = yes ) = 2 / 9 = P(age = <30 buys_computer = no ) = 3 / 5 = P(income = medium buys_computer = yes ) = 4 / 9 = P(income = medium buys_computer = no ) = 2 / 5 = P(student = yes buys_computer = yes ) = 6 / 9 = P(student = yes buys_computer = no ) = 1 / 5 = P(credit_rating = fair buys_computer = yes ) = 6 / 9 = P(credit_rating = fair buys_computer = no ) = 2 / 5 = then P(X C1) = P(X buys_computer = yes ) = 0.222*0.444*0.667*0.667 = P(X C2) = P(X buys_computer = no ) = 0.600*0.400*0.200*0.400 = P(X C i )P(C i ): P(X C1)P(C1) = 0.044*0.643 = P(X C2)P(C2) = 0.019*0.357 = therefore C 1, i.e., buys_computer = yes, is returned

56 A problem with naïve Bayes classifier rid age income student credit_rating Class: buys_computer 1 <30 high no fair no 2 <30 high no execllent no high no fair yes 4 >40 medium no fair yes 5 >40 low yes fair yes 6 >40 low yes excellent no low yes excellent yes 8 <30 medium no fair no 9 <30 low yes fair yes 10 >40 medium yes fair yes 11 <30 medium yes excellent yes medium no excellent yes high yes fair yes 14 >40 medium no execllent no What happens if we remove this example? X = (age =??, income =??, student = yes, credit_rating =??) will always be classified as buys-computer = yes, no matter what values appear in other features

57 Laplacian correction to naïve Bayes ( 拉 普 拉 斯 修 正 ) If some attribute values are not observed in training data, how to perform naïve Bayes learning? e.g. student credit buys yes no yes no yes yes yes yes yes no no yes no all entries are yes for buys = no Then, how about: (student = yes, credit = no )?? the number of training instances with property X is denoted as #X Laplacian correction

58 Bayesian network also called belief network, Bayesian belief network, or probabilistic network a graphical model which allows the representation of dependencies among subsets of attributes a standard Bayesian network is defined by two components: a directed acyclic graph where each node represents a random variable, and each arc represents a probabilistic dependence a conditional probability table (CPT) for each variable Bayesian network can return a probability distribution for the classes

59 Example of Bayesian network Family History Smoker conditional probability table for the variable LungCancer (FH, S) (FH, ~S) (~FH, S) (~FH, ~S) Lung Cancer Emphysema LC ~LC Pisitive XRay Dyspnea P(LungCancer = yes FamilyHistory = yes, Smoker = yes ) = 0.8 P(LungCancer = no FamilyHistory = no, Smoker = no ) = 0.9

60 How to construct a Bayesian network? if the network structure and all the variables are known - it is easy to calculate the CPT entries as calculating the probabilities in naïve Bayes classifiers if the network structure is known, but some variables are unknown - gradient descent methods are often used to generate the values of the CPT entries if the network structure is unknown - discrete optimization techniques are often used to generate the network structure from known variables - one of the core research topics in this area

61 2011 年 度 图 灵 奖 Judea Pearl ( ) (UCLA) Pioneer of Probabilistic and Causal Reasoning (Bayesian networks, graphical model) J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, Morgan Kaufmann, 1988.

62 What is ensemble learning? Ensemble learning is a machine learning paradigm where multiple (homogenous / heterogeneous) individual learners are trained for the same problem Problem Problem Learner Learner Learner Learner

63 Why ensemble learning? The generalization ability of an ensemble is usually significantly better than the corresponding single learner Ensemble learning was regarded as one of the four current directions in machine learning research [T.G. Dietterich, AIMag97] [A. Krogh & J. Vedelsby, NIPS94] The more accurate and the more diverse, the better

64 How to build an ensemble? An ensemble is built in two steps: 1) obtain the base learners 2) combine the individual predictions Problem Learner Learner Learner

65 Many ensemble methods According to how the base learners can be generated: Parallel methods Bagging [L. Breiman, MLJ96] Random Subspace [T. K. Ho, TPAMI98] Random Forests [L. Breiman, MLJ01] Sequential methods AdaBoost [Y. Freund & R. Schapire, JCSS97] Arc-x4 [L. Breiman, AnnStat98] LPBoost [A. Demiriz et al., MLJ06]

66 Bagging Data set Data set 1 Data set 2 Data set n Bootstrap a set of learners: Generate a set of training set by bootstrap sampling from original data set and then train a learner for each generated data set Learner 1 Learner 2 Learner n Voting for classification: Output the label with the most votes Averaging for regression: Output the averaging of the individual learners

67 Boosting Original training set training instances that are wrongly predicted by Learner1 will play more important roles in the training of Learner2 Data set 1 Data set 2 Data set T Learner 1 Learner 2 Learner T weighted combination Gödel Prize (2003) Freund & Schapire, A decision theoretic generalization of on-line learning and an application to Boosting. Journal of Computer and System Sciences, 1997, 55:

68 Simple yet effective can be applied to almost all tasks where one wants to apply machine learning techniques For example, in computer vision, the Viola-Jones detector AdaBoost using harr-like features in a cascade structure in average, only 8 features needed to be evaluated per image

69 The Viola-Jones detector the first real-time face detector Comparable accuracy, but 15 times faster than stateof-the-art of face detectors (at that time) Longuet-Higgins Prize (2011) Viola & Jones, Rapid object detection using a Boosted cascade of simple features. CVPR, 2001.

70 Selective ensemble Many Could be Better Than All: Given a set of trained learners,, ensembling many of the trained learners may be better than ensembling all of them [Z.-H. Zhou et al, AIJ02] The basic idea of selective ensemble: Use multiple solutions and perform some kind of selection individual solution Problem individual solution individual solution Principles for selection: the effectiveness of the individuals the complementarity of the individuals The idea of selective ensemble can be used in other scientific disciplines, not limited to machine learning/data mining

71 More about ensemble methods: Z.-H. Zhou. Ensemble Methods: Foundations and Algorithms, Boca Raton, FL: Chapman & Hall/ CRC, Jun (ISBN )

72 k- nearest neighbor classifier store all the training examples each training instance represents a point in n-d instance space the nearest neighbors are identified based on some distance measures for classification, returns the most common value among the k training instances nearest to the unseen instance for regression estimation, returns the average value among the k training instances nearest to the unseen instance

73 Case- based reasoning cases are complex symbolic descriptions store all cases compare the cases (or components of cases) with unseen cases (or components of cases) combine the solutions of similar previous cases (or components of cases) to the solution of unseen cases previous cases: case 1: a man stole $100 was published for 1 year imprisonment case 2: a man spitted in public area was published for 10 whips unseen case: similar identical a man stole $200 and spitted in public area published for 1.5 year imprisonment plus 10 whips

74 Linear regression linear regression data are modeled to fit a straight line where α and β can be estimated from a training set: multiple regression a response variable Y is modeled as a linear function of multiple predictor variables

75 Nonlinear regression polynomial regression data are modeled to fit a polynomial function it can be converted to linear regression problem, e.g. let there are also nonlinear regression models that cannot be converted to a linear model. For such cases, it may be possible to obtain leastsquare estimates through extensive calculations on more complex formulae

76 Let s move to Part 6

Classification and Prediction

Classification and Prediction Classification and Prediction Slides for Data Mining: Concepts and Techniques Chapter 7 Jiawei Han and Micheline Kamber Intelligent Database Systems Research Lab School of Computing Science Simon Fraser

More information

Data Mining for Knowledge Management. Classification

Data Mining for Knowledge Management. Classification 1 Data Mining for Knowledge Management Classification Themis Palpanas University of Trento http://disi.unitn.eu/~themis Data Mining for Knowledge Management 1 Thanks for slides to: Jiawei Han Eamonn Keogh

More information

COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction

COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction COMP3420: Advanced Databases and Data Mining Classification and prediction: Introduction and Decision Tree Induction Lecture outline Classification versus prediction Classification A two step process Supervised

More information

Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

More information

Classification and Prediction

Classification and Prediction Classification and Prediction 1. Objectives...2 2. Classification vs. Prediction...3 2.1. Definitions...3 2.2. Supervised vs. Unsupervised Learning...3 2.3. Classification and Prediction Related Issues...4

More information

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05 Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification

More information

Reference Books. Data Mining. Supervised vs. Unsupervised Learning. Classification: Definition. Classification k-nearest neighbors

Reference Books. Data Mining. Supervised vs. Unsupervised Learning. Classification: Definition. Classification k-nearest neighbors Classification k-nearest neighbors Data Mining Dr. Engin YILDIZTEPE Reference Books Han, J., Kamber, M., Pei, J., (2011). Data Mining: Concepts and Techniques. Third edition. San Francisco: Morgan Kaufmann

More information

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence Artificial Neural Networks and Support Vector Machines CS 486/686: Introduction to Artificial Intelligence 1 Outline What is a Neural Network? - Perceptron learners - Multi-layer networks What is a Support

More information

Professor Anita Wasilewska. Classification Lecture Notes

Professor Anita Wasilewska. Classification Lecture Notes Professor Anita Wasilewska Classification Lecture Notes Classification (Data Mining Book Chapters 5 and 7) PART ONE: Supervised learning and Classification Data format: training and test data Concept,

More information

Neural Networks and Support Vector Machines

Neural Networks and Support Vector Machines INF5390 - Kunstig intelligens Neural Networks and Support Vector Machines Roar Fjellheim INF5390-13 Neural Networks and SVM 1 Outline Neural networks Perceptrons Neural networks Support vector machines

More information

Data Mining Classification: Decision Trees

Data Mining Classification: Decision Trees Data Mining Classification: Decision Trees Classification Decision Trees: what they are and how they work Hunt s (TDIDT) algorithm How to select the best split How to handle Inconsistent data Continuous

More information

Chapter 12 Discovering New Knowledge Data Mining

Chapter 12 Discovering New Knowledge Data Mining Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to

More information

Data Mining. Nonlinear Classification

Data Mining. Nonlinear Classification Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15

More information

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski [email protected]

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk Introduction to Machine Learning and Data Mining Prof. Dr. Igor Trakovski [email protected] Neural Networks 2 Neural Networks Analogy to biological neural systems, the most robust learning systems

More information

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Knowledge Discovery and Data Mining Unit # 11 Sajjad Haider Fall 2013 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right

More information

Supervised Learning (Big Data Analytics)

Supervised Learning (Big Data Analytics) Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used

More information

Gerry Hobbs, Department of Statistics, West Virginia University

Gerry Hobbs, Department of Statistics, West Virginia University Decision Trees as a Predictive Modeling Method Gerry Hobbs, Department of Statistics, West Virginia University Abstract Predictive modeling has become an important area of interest in tasks such as credit

More information

Classification algorithm in Data mining: An Overview

Classification algorithm in Data mining: An Overview Classification algorithm in Data mining: An Overview S.Neelamegam #1, Dr.E.Ramaraj *2 #1 M.phil Scholar, Department of Computer Science and Engineering, Alagappa University, Karaikudi. *2 Professor, Department

More information

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Knowledge Discovery and Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Evaluating the Accuracy of a Classifier Holdout, random subsampling, crossvalidation, and the bootstrap are common techniques for

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

More information

Chapter 4: Artificial Neural Networks

Chapter 4: Artificial Neural Networks Chapter 4: Artificial Neural Networks CS 536: Machine Learning Littman (Wu, TA) Administration icml-03: instructional Conference on Machine Learning http://www.cs.rutgers.edu/~mlittman/courses/ml03/icml03/

More information

Comparison of Data Mining Techniques used for Financial Data Analysis

Comparison of Data Mining Techniques used for Financial Data Analysis Comparison of Data Mining Techniques used for Financial Data Analysis Abhijit A. Sawant 1, P. M. Chawan 2 1 Student, 2 Associate Professor, Department of Computer Technology, VJTI, Mumbai, INDIA Abstract

More information

Decision Trees from large Databases: SLIQ

Decision Trees from large Databases: SLIQ Decision Trees from large Databases: SLIQ C4.5 often iterates over the training set How often? If the training set does not fit into main memory, swapping makes C4.5 unpractical! SLIQ: Sort the values

More information

Classification of Bad Accounts in Credit Card Industry

Classification of Bad Accounts in Credit Card Industry Classification of Bad Accounts in Credit Card Industry Chengwei Yuan December 12, 2014 Introduction Risk management is critical for a credit card company to survive in such competing industry. In addition

More information

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal Learning Example Chapter 18: Learning from Examples 22c:145 An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc) of newly admitted patients. A decision is needed: whether

More information

REVIEW OF ENSEMBLE CLASSIFICATION

REVIEW OF ENSEMBLE CLASSIFICATION Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IJCSMC, Vol. 2, Issue.

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA

ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA D.Lavanya 1 and Dr.K.Usha Rani 2 1 Research Scholar, Department of Computer Science, Sree Padmavathi Mahila Visvavidyalayam, Tirupati, Andhra Pradesh,

More information

L25: Ensemble learning

L25: Ensemble learning L25: Ensemble learning Introduction Methods for constructing ensembles Combination strategies Stacked generalization Mixtures of experts Bagging Boosting CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna

More information

Data Mining Practical Machine Learning Tools and Techniques

Data Mining Practical Machine Learning Tools and Techniques Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea

More information

Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques Data Mining: Concepts and Techniques Chapter 6 Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign www.cs.uiuc.edu/~hanj 2006 Jiawei Han and Micheline Kamber, All rights

More information

Chapter 6. The stacking ensemble approach

Chapter 6. The stacking ensemble approach 82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

More information

A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier

A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier G.T. Prasanna Kumari Associate Professor, Dept of Computer Science and Engineering, Gokula Krishna College of Engg, Sullurpet-524121,

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

Data Mining Techniques for Prognosis in Pancreatic Cancer

Data Mining Techniques for Prognosis in Pancreatic Cancer Data Mining Techniques for Prognosis in Pancreatic Cancer by Stuart Floyd A Thesis Submitted to the Faculty of the WORCESTER POLYTECHNIC INSTITUE In partial fulfillment of the requirements for the Degree

More information

Data quality in Accounting Information Systems

Data quality in Accounting Information Systems Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania

More information

NEURAL NETWORKS A Comprehensive Foundation

NEURAL NETWORKS A Comprehensive Foundation NEURAL NETWORKS A Comprehensive Foundation Second Edition Simon Haykin McMaster University Hamilton, Ontario, Canada Prentice Hall Prentice Hall Upper Saddle River; New Jersey 07458 Preface xii Acknowledgments

More information

Predict Influencers in the Social Network

Predict Influencers in the Social Network Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, [email protected] Department of Electrical Engineering, Stanford University Abstract Given two persons

More information

Feature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification

Feature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification Feature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification Henrik Boström School of Humanities and Informatics University of Skövde P.O. Box 408, SE-541 28 Skövde

More information

On the effect of data set size on bias and variance in classification learning

On the effect of data set size on bias and variance in classification learning On the effect of data set size on bias and variance in classification learning Abstract Damien Brain Geoffrey I Webb School of Computing and Mathematics Deakin University Geelong Vic 3217 With the advent

More information

Data Mining Algorithms Part 1. Dejan Sarka

Data Mining Algorithms Part 1. Dejan Sarka Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka ([email protected]) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses

More information

Principles of Data Mining by Hand&Mannila&Smyth

Principles of Data Mining by Hand&Mannila&Smyth Principles of Data Mining by Hand&Mannila&Smyth Slides for Textbook Ari Visa,, Institute of Signal Processing Tampere University of Technology October 4, 2010 Data Mining: Concepts and Techniques 1 Differences

More information

Model Combination. 24 Novembre 2009

Model Combination. 24 Novembre 2009 Model Combination 24 Novembre 2009 Datamining 1 2009-2010 Plan 1 Principles of model combination 2 Resampling methods Bagging Random Forests Boosting 3 Hybrid methods Stacking Generic algorithm for mulistrategy

More information

A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions

A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions Jing Gao Wei Fan Jiawei Han Philip S. Yu University of Illinois at Urbana-Champaign IBM T. J. Watson Research Center

More information

Advanced Ensemble Strategies for Polynomial Models

Advanced Ensemble Strategies for Polynomial Models Advanced Ensemble Strategies for Polynomial Models Pavel Kordík 1, Jan Černý 2 1 Dept. of Computer Science, Faculty of Information Technology, Czech Technical University in Prague, 2 Dept. of Computer

More information

Why Ensembles Win Data Mining Competitions

Why Ensembles Win Data Mining Competitions Why Ensembles Win Data Mining Competitions A Predictive Analytics Center of Excellence (PACE) Tech Talk November 14, 2012 Dean Abbott Abbott Analytics, Inc. Blog: http://abbottanalytics.blogspot.com URL:

More information

An Introduction to Data Mining

An Introduction to Data Mining An Introduction to Intel Beijing [email protected] January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail

More information

Mining Direct Marketing Data by Ensembles of Weak Learners and Rough Set Methods

Mining Direct Marketing Data by Ensembles of Weak Learners and Rough Set Methods Mining Direct Marketing Data by Ensembles of Weak Learners and Rough Set Methods Jerzy B laszczyński 1, Krzysztof Dembczyński 1, Wojciech Kot lowski 1, and Mariusz Paw lowski 2 1 Institute of Computing

More information

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning. Lecture Machine Learning Milos Hauskrecht [email protected] 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht [email protected] 539 Sennott

More information

Simple and efficient online algorithms for real world applications

Simple and efficient online algorithms for real world applications Simple and efficient online algorithms for real world applications Università degli Studi di Milano Milano, Italy Talk @ Centro de Visión por Computador Something about me PhD in Robotics at LIRA-Lab,

More information

Model Trees for Classification of Hybrid Data Types

Model Trees for Classification of Hybrid Data Types Model Trees for Classification of Hybrid Data Types Hsing-Kuo Pao, Shou-Chih Chang, and Yuh-Jye Lee Dept. of Computer Science & Information Engineering, National Taiwan University of Science & Technology,

More information

Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang

Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang Classifying Large Data Sets Using SVMs with Hierarchical Clusters Presented by :Limou Wang Overview SVM Overview Motivation Hierarchical micro-clustering algorithm Clustering-Based SVM (CB-SVM) Experimental

More information

An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

More information

Big Data Analytics CSCI 4030

Big Data Analytics CSCI 4030 High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising

More information

Customer Classification And Prediction Based On Data Mining Technique

Customer Classification And Prediction Based On Data Mining Technique Customer Classification And Prediction Based On Data Mining Technique Ms. Neethu Baby 1, Mrs. Priyanka L.T 2 1 M.E CSE, Sri Shakthi Institute of Engineering and Technology, Coimbatore 2 Assistant Professor

More information

Neural Networks for Machine Learning. Lecture 13a The ups and downs of backpropagation

Neural Networks for Machine Learning. Lecture 13a The ups and downs of backpropagation Neural Networks for Machine Learning Lecture 13a The ups and downs of backpropagation Geoffrey Hinton Nitish Srivastava, Kevin Swersky Tijmen Tieleman Abdel-rahman Mohamed A brief history of backpropagation

More information

Chapter 11 Boosting. Xiaogang Su Department of Statistics University of Central Florida - 1 -

Chapter 11 Boosting. Xiaogang Su Department of Statistics University of Central Florida - 1 - Chapter 11 Boosting Xiaogang Su Department of Statistics University of Central Florida - 1 - Perturb and Combine (P&C) Methods have been devised to take advantage of the instability of trees to create

More information

New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction

New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction Introduction New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Predictive analytics encompasses the body of statistical knowledge supporting the analysis of massive data sets.

More information

Weather forecast prediction: a Data Mining application

Weather forecast prediction: a Data Mining application Weather forecast prediction: a Data Mining application Ms. Ashwini Mandale, Mrs. Jadhawar B.A. Assistant professor, Dr.Daulatrao Aher College of engg,karad,[email protected],8407974457 Abstract

More information

How To Make A Credit Risk Model For A Bank Account

How To Make A Credit Risk Model For A Bank Account TRANSACTIONAL DATA MINING AT LLOYDS BANKING GROUP Csaba Főző [email protected] 15 October 2015 CONTENTS Introduction 04 Random Forest Methodology 06 Transactional Data Mining Project 17 Conclusions

More information

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski [email protected]

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk Introduction to Machine Learning and Data Mining Prof. Dr. Igor Trajkovski [email protected] Ensembles 2 Learning Ensembles Learn multiple alternative definitions of a concept using different training

More information

Machine Learning. CUNY Graduate Center, Spring 2013. Professor Liang Huang. [email protected]

Machine Learning. CUNY Graduate Center, Spring 2013. Professor Liang Huang. huang@cs.qc.cuny.edu Machine Learning CUNY Graduate Center, Spring 2013 Professor Liang Huang [email protected] http://acl.cs.qc.edu/~lhuang/teaching/machine-learning Logistics Lectures M 9:30-11:30 am Room 4419 Personnel

More information

Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier

Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-1, Issue-6, January 2013 Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing

More information

EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER. Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d.

EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER. Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d. EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER ANALYTICS LIFECYCLE Evaluate & Monitor Model Formulate Problem Data Preparation Deploy Model Data Exploration Validate Models

More information

Learning is a very general term denoting the way in which agents:

Learning is a very general term denoting the way in which agents: What is learning? Learning is a very general term denoting the way in which agents: Acquire and organize knowledge (by building, modifying and organizing internal representations of some external reality);

More information

Predicting required bandwidth for educational institutes using prediction techniques in data mining (Case Study: Qom Payame Noor University)

Predicting required bandwidth for educational institutes using prediction techniques in data mining (Case Study: Qom Payame Noor University) 260 IJCSNS International Journal of Computer Science and Network Security, VOL.11 No.6, June 2011 Predicting required bandwidth for educational institutes using prediction techniques in data mining (Case

More information

A Comparative Analysis of Classification Techniques on Categorical Data in Data Mining

A Comparative Analysis of Classification Techniques on Categorical Data in Data Mining A Comparative Analysis of Classification Techniques on Categorical Data in Data Mining Sakshi Department Of Computer Science And Engineering United College of Engineering & Research Naini Allahabad [email protected]

More information

Data Mining Part 5. Prediction

Data Mining Part 5. Prediction Data Mining Part 5. Prediction 5.7 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Introduction Linear Regression Other Regression Models References Introduction Introduction Numerical prediction is

More information

Lecture 10: Regression Trees

Lecture 10: Regression Trees Lecture 10: Regression Trees 36-350: Data Mining October 11, 2006 Reading: Textbook, sections 5.2 and 10.5. The next three lectures are going to be about a particular kind of nonlinear predictive model,

More information

Role of Neural network in data mining

Role of Neural network in data mining Role of Neural network in data mining Chitranjanjit kaur Associate Prof Guru Nanak College, Sukhchainana Phagwara,(GNDU) Punjab, India Pooja kapoor Associate Prof Swami Sarvanand Group Of Institutes Dinanagar(PTU)

More information

Knowledge Discovery and Data Mining. Bootstrap review. Bagging Important Concepts. Notes. Lecture 19 - Bagging. Tom Kelsey. Notes

Knowledge Discovery and Data Mining. Bootstrap review. Bagging Important Concepts. Notes. Lecture 19 - Bagging. Tom Kelsey. Notes Knowledge Discovery and Data Mining Lecture 19 - Bagging Tom Kelsey School of Computer Science University of St Andrews http://tom.host.cs.st-andrews.ac.uk [email protected] Tom Kelsey ID5059-19-B &

More information

Classification and Prediction techniques using Machine Learning for Anomaly Detection.

Classification and Prediction techniques using Machine Learning for Anomaly Detection. Classification and Prediction techniques using Machine Learning for Anomaly Detection. Pradeep Pundir, Dr.Virendra Gomanse,Narahari Krishnamacharya. *( Department of Computer Engineering, Jagdishprasad

More information

A Simple Introduction to Support Vector Machines

A Simple Introduction to Support Vector Machines A Simple Introduction to Support Vector Machines Martin Law Lecture for CSE 802 Department of Computer Science and Engineering Michigan State University Outline A brief history of SVM Large-margin linear

More information

Tree based ensemble models regularization by convex optimization

Tree based ensemble models regularization by convex optimization Tree based ensemble models regularization by convex optimization Bertrand Cornélusse, Pierre Geurts and Louis Wehenkel Department of Electrical Engineering and Computer Science University of Liège B-4000

More information

Data Mining Part 5. Prediction

Data Mining Part 5. Prediction Data Mining Part 5. Prediction 5.1 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Classification vs. Numeric Prediction Prediction Process Data Preparation Comparing Prediction Methods References Classification

More information

SURVEY OF TEXT CLASSIFICATION ALGORITHMS FOR SPAM FILTERING

SURVEY OF TEXT CLASSIFICATION ALGORITHMS FOR SPAM FILTERING I J I T E ISSN: 2229-7367 3(1-2), 2012, pp. 233-237 SURVEY OF TEXT CLASSIFICATION ALGORITHMS FOR SPAM FILTERING K. SARULADHA 1 AND L. SASIREKA 2 1 Assistant Professor, Department of Computer Science and

More information

An Overview of Data Mining: Predictive Modeling for IR in the 21 st Century

An Overview of Data Mining: Predictive Modeling for IR in the 21 st Century An Overview of Data Mining: Predictive Modeling for IR in the 21 st Century Nora Galambos, PhD Senior Data Scientist Office of Institutional Research, Planning & Effectiveness Stony Brook University AIRPO

More information

Fine Particulate Matter Concentration Level Prediction by using Tree-based Ensemble Classification Algorithms

Fine Particulate Matter Concentration Level Prediction by using Tree-based Ensemble Classification Algorithms Fine Particulate Matter Concentration Level Prediction by using Tree-based Ensemble Classification Algorithms Yin Zhao School of Mathematical Sciences Universiti Sains Malaysia (USM) Penang, Malaysia Yahya

More information

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Scott Pion and Lutz Hamel Abstract This paper presents the results of a series of analyses performed on direct mail

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

Chapter 20: Data Analysis

Chapter 20: Data Analysis Chapter 20: Data Analysis Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 20: Data Analysis Decision Support Systems Data Warehousing Data Mining Classification

More information

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES Vijayalakshmi Mahanra Rao 1, Yashwant Prasad Singh 2 Multimedia University, Cyberjaya, MALAYSIA 1 [email protected]

More information

Data mining techniques: decision trees

Data mining techniques: decision trees Data mining techniques: decision trees 1/39 Agenda Rule systems Building rule systems vs rule systems Quick reference 2/39 1 Agenda Rule systems Building rule systems vs rule systems Quick reference 3/39

More information

Generalizing Random Forests Principles to other Methods: Random MultiNomial Logit, Random Naive Bayes, Anita Prinzie & Dirk Van den Poel

Generalizing Random Forests Principles to other Methods: Random MultiNomial Logit, Random Naive Bayes, Anita Prinzie & Dirk Van den Poel Generalizing Random Forests Principles to other Methods: Random MultiNomial Logit, Random Naive Bayes, Anita Prinzie & Dirk Van den Poel Copyright 2008 All rights reserved. Random Forests Forest of decision

More information

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015 An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content

More information

Detection. Perspective. Network Anomaly. Bhattacharyya. Jugal. A Machine Learning »C) Dhruba Kumar. Kumar KaKta. CRC Press J Taylor & Francis Croup

Detection. Perspective. Network Anomaly. Bhattacharyya. Jugal. A Machine Learning »C) Dhruba Kumar. Kumar KaKta. CRC Press J Taylor & Francis Croup Network Anomaly Detection A Machine Learning Perspective Dhruba Kumar Bhattacharyya Jugal Kumar KaKta»C) CRC Press J Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint of the Taylor

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015 RESEARCH ARTICLE OPEN ACCESS Data Mining Technology for Efficient Network Security Management Ankit Naik [1], S.W. Ahmad [2] Student [1], Assistant Professor [2] Department of Computer Science and Engineering

More information

Feedforward Neural Networks and Backpropagation

Feedforward Neural Networks and Backpropagation Feedforward Neural Networks and Backpropagation Feedforward neural networks Architectural issues, computational capabilities Sigmoidal and radial basis functions Gradient-based learning and Backprogation

More information

Machine Learning in Spam Filtering

Machine Learning in Spam Filtering Machine Learning in Spam Filtering A Crash Course in ML Konstantin Tretyakov [email protected] Institute of Computer Science, University of Tartu Overview Spam is Evil ML for Spam Filtering: General Idea, Problems.

More information

Leveraging Ensemble Models in SAS Enterprise Miner

Leveraging Ensemble Models in SAS Enterprise Miner ABSTRACT Paper SAS133-2014 Leveraging Ensemble Models in SAS Enterprise Miner Miguel Maldonado, Jared Dean, Wendy Czika, and Susan Haller SAS Institute Inc. Ensemble models combine two or more models to

More information

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam

More information

Operations Research and Knowledge Modeling in Data Mining

Operations Research and Knowledge Modeling in Data Mining Operations Research and Knowledge Modeling in Data Mining Masato KODA Graduate School of Systems and Information Engineering University of Tsukuba, Tsukuba Science City, Japan 305-8573 [email protected]

More information

Question 2 Naïve Bayes (16 points)

Question 2 Naïve Bayes (16 points) Question 2 Naïve Bayes (16 points) About 2/3 of your email is spam so you downloaded an open source spam filter based on word occurrences that uses the Naive Bayes classifier. Assume you collected the

More information

Making Sense of the Mayhem: Machine Learning and March Madness

Making Sense of the Mayhem: Machine Learning and March Madness Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University [email protected] [email protected] I. Introduction III. Model The goal of our research

More information

KATE GLEASON COLLEGE OF ENGINEERING. John D. Hromi Center for Quality and Applied Statistics

KATE GLEASON COLLEGE OF ENGINEERING. John D. Hromi Center for Quality and Applied Statistics ROCHESTER INSTITUTE OF TECHNOLOGY COURSE OUTLINE FORM KATE GLEASON COLLEGE OF ENGINEERING John D. Hromi Center for Quality and Applied Statistics NEW (or REVISED) COURSE (KGCOE- CQAS- 747- Principles of

More information