CNFSAT: Predictive Models, Dimensional Reduction, and Phase Transition

Size: px
Start display at page:

Download "CNFSAT: Predictive Models, Dimensional Reduction, and Phase Transition"

Transcription

1 CNFSAT: Predictive Models, Dimensional Reduction, and Phase Transition Neil P. Slagle College of Computing Georgia Institute of Technology Atlanta, GA Abstract CNFSAT embodies the P versus NP computational dilemma. Most machine learning research on CNFSAT applies randomized optimization techniques to specific problem instances in search of a satisfying assignment. A novel approach is treating CNFSAT as a supervised learning and dimension reduction problem in search of models capable of predicting satisfiability with a relatively low number of features. Herein we present some empirical results after applying decision trees to three representations of CNFSAT over four and five variables and linear regression to three representations of MAXSAT and the CNFSAT solution count (SATSOL) problem over four and five variables. The representations are bit vectors indicating the clauses included in the formulas, principal component analysis applied to the previous representation, and a simple clause count and variable hit representation. Significantly, the first principal component exhibits the number of clauses in instances, and decision trees on CNFSAT and linear regression on MAXSAT and SATSOL empirically offer a 20% improvement in error rates after preprocessing the data with PCA, baselining against the full bit vector representation. The clause count and variable hit representation gives the lowest error rates observed in the experiments. Thus, feature selection demonstrates a reduction in error using decision trees and linear regression as prediction models. Notably, PCA and the clause count and variable hit representations reduce the representation size from O(3n ) to O(n2 ) and O(n), respectively, where n is the number of variables, while reducing predictive error. Also of interest, the linear regression prediction model exhibits the phase transition of MAXSAT and SATSOL appearing in randomized optimization approaches applied to MAXSAT; that is, decision trees, linear regression, and PCA applied to these problems encounter difficulty in prediction over regions where randomized optimization algorithms encounter difficulty in solving formulas.

2 Back g r ou nd and relat ed w ork CNFSAT, or conjunctive normal form satisfiability, first shown to be NPcomplete by Cook [], is the problem of determining whether a Boolean formula in conjunctive normal form is satisfiable. Much of the research on CNFSAT applies randomized optimization techniques to specific problem instances [2]. MAXSAT is the problem of determining the maximum number of clauses satisfiable in a given instance of CNFSAT. The maximum number of satisfiable clauses in a given instance of CNFSAT is at least 50% the number of clauses in the instance; we can characterize the MAXSAT output as a fraction of clauses satisfiable. MAXSAT, like CNFSAT, is NP-hard [3], and similarly, much of the existing research applies stochastic optimization [4]. Counting the number of solutions (SATSOL) in a given instance of CNFSAT is a #P problem (defined in [5]). The phase transition of CNFSAT and MAXSAT appears in [4], [6], and [7]. Phase transitions in combinatorial problems exhibit regions containing hard and easy problem instances with respect to some key parameter [6]. In CNFSAT, MAXSAT, and, based on results herein, SATSOL, a key parameter is the number of clauses [4]. Finally, motivating the approaches discussed herein is that existing machine learning research on these problems applies randomized optimization to instances of problems rather than building supervised learning models for and applying feature selection algorithms to large collections of instances as data sets [2], [4]. That is, little or no research exists attempting to transform these problems into instances of supervised learning. 2 R ep re sen ting CNF SAT, MA X SAT, and SATSO L a s d ata To represent CNFSAT, MAXSAT, and SATSOL as data, we create bit vectors indicating the hash values of the clauses present in a formula, a dimensionally-compressed space over the bit vectors using PCA, and a restrictive clause count and variable hit representation. 2. The bit vectors The bit vector representation is isomorphic to the formula space; that is, we can recover the original formula using this representation. Though the bit vectors capture the complexity of the formulas and thus become infeasible for even low variable counts, we can leverage them to generate the second representation and as a baseline for comparing performance of the predictive models. The space of formulas containing up to n variables contains 3 n- distinct clauses, assuming no clause contains a particular variable more than once. 2 Bit vectors over four variables require 80 entries in addition to the label; bit vectors over five variables require 242 entries in addition to the label. To represent a particular formula in the bit vector, we hash the clauses of the formula so that clauses over, say the j lowest variables, ordered by indices, exhibit the same hash values irrespective of the value of n. More formally, given the hash function H:clauses Z+ and some clause C over x,, xn-, We previously applied other supervised learning techniques (SVMs, knn, boosting, ANNs) with insignificant results; decision trees and linear regression offer attractive results. 2 Each variable can appear one of three ways in a clause: absent, a positive literal, or a negative literal. Subtracting one for the empty clause, we obtain 3n- possible clauses.

3 H(C Ⅴ xn) = H(C) + 3n-, H(C Ⅴ ~xn) = H(C) + 2(3 n- ), H(xn) = 3n 2, H(~xn ) = 3n, where H(x) =, and H(~x ) = 2. Defined recursively, this hashing mechanism allows formulas over n variables to subsume formulas over smaller subsets of more lowly indexed variables in a natural way. That is, if Fn is the bit vector representing a formula F over x,, xn, then in the space of formulas over x,, xn+, F's bit vector Fn+ appears identical to Fn in the first 3 n entries. As a bit vector example, suppose a formula over simply three variables is xvx2 ^~xvx3 ^~x3. Then the bit vector representations is , (CNFSAT) , 3/3 (MAXSAT) , (SATSOL). Since a three variable formula can contain any of 26 clauses, the length of the vector sans the label is 26. The location of the ones in the bit vector indicate the hash values of the clauses in the given formula. The final entry is the label; for CNFSAT, this label is a zero or one, indicating whether the formula represented by the bit vector is satisfiable; for MAXSAT, the label is a floating precision number indicating the ratio of the maximum number of clauses satisfiable to the number of clauses in said formula; for SATSOL, the label indicates the number of solutions of said formula. 2.2 The bit vectors, reduced by PCA The PCA representations we obtain by multiplying the principal component matrix PC p 26, where p is the number of principal components and each row represents a principal component, by the bit vector. Results below suggest good performance where p is O(n2 ). 2.3 Clause count and variable hits The clause count and variable hit representation for this instance is 3 0, (CNFSAT), 3 0, 3/3 (MAXSAT) 3 0, (SATSOL). The first entry is the number of clauses, the second to fourth indicate the number of clauses containing the positive literals x, x2, and x3, respectively, and the fifth to seventh indicate the number of clauses containing the negative literals ~x, ~x2, and ~x3, respectively. If n is the number of variables, this representation contains 2n+, or O(n) entries. 3 Col lec ting t he d ata As stated earlier, the space of formulas containing up to n variables contains 3 n- possible clauses, assuming no clause can contain a variable more than once. To generate data over n variables, we randomly select a

4 number of clauses for a particular formula using an exponential distribution with some user-selected mean representing the average ratio of clauses present, say 0.3 or 0.4. The exponential distribution, defined over a positive domain, guarantees that all clause counts are possible while depressing the formula lengths more toward O(n2 ), yielding a 3035% ratio of positive examples. 3 Next, we randomly select clauses to add to the formula from the set of 3 n- clauses until the formula is of the specified length. We determine satisfiability, the maximum number of satisfiable clauses, and the number of solutions of each formula using a brute force search of the 2 n possible solutions. Finally, we collect 0,000 data points each from problem spaces in which n is four and five 4. 4 The p rincip al comp on en ts Significantly, the first principal component seems always to measure linearly to the number of clauses in the formula, a value we can calculate easily in polynomial time. Based on empirical results, we postulate that the first principle component of the data set over n variables is approximately the 3 n- dimensional vector PC = ± 3 n,...,± 3n. After subtracting the approximate average vector X avg=,..., 2 2 over the data X from Xi, a bit vector representing a formula with m clauses, the new data vector transformed by the principle component gives PC X i X avg =± 2 m 3n. 2 3n Clearly, a simple linear transformation recovers m from this form. The change in empirical variance 5 before and after subtracting the first PC is often considerably larger than the corresponding changes for subsequently calculated PCs. For example, for the data set over four variables, the empirical differences in variances are 74295, 435, 40, 430, and 43 for the first five components. The other data sets exhibit similar behavior, suggesting that PCA interprets the number of clauses to be the most significant feature of a formula. Despite the easy empirical interpretation of the first PC, similar interpretations of higher order PCs are not so easily forthcoming, despite the natural subsumption 6 of lower degree formulas into higher degree formulas. The number of PCs necessary to outperform the bit vector representation is surprisingly low. In fact, in the four variable case, PCs between sizes 3 In earlier experiments, we applied a uniform distribution; unfortunately, the spread of formula sizes significantly diminished the number of satisfiable formulas generated, frustrating supervised learning. Both distributions exhibit the significant PCA result described in the PCA section of the paper. 4 2SAT isn't NP-complete and contains only 255 formulas; all formulas in 3SAT fit on a typical hard-drive, whereas generating all of 4SAT, space notwithstanding, could require 2700 times the eight hours required to generate all of 3SAT. Thus, prediction becomes more essential once n >3. 5 The change in variance exhibits the eigenvalues for the given eigenvectors (principal components.) 6 See the hash description above.

5 eight and 20 perform similarly, exhibited in Illustration. Though this graph represents prediction error using linear regression in MAXSAT, decision trees over CNFSAT and linear regression over SATSOL exhibit similar convergence of error among various component counts. In subsequent data presented, we apply 20 PCs. Illustration : Various PC numbers on four variable MAXSAT 5 D eci sio n tre es Upon discretizing the dimensionally reduced datasets, we apply decision trees using information gain and reduced error pruning, baselining against the bit vector representation. Illustration 2 exhibits training and validation errors using the bit vector representation on four variable CNFSAT over various training sizes. Illustration 3 exhibits training and validation errors on four variable CNFSAT using the PCA representation. Illustration 4 exhibits training and validation errors on four variable CNFSAT using the clause count and variable hit representation. Interestingly, processing 0,000 records of four variable CNFSAT and five variable CNFSAT with PCA reduces the post-prune training and validation error with decision trees by 20%, even with meager training set sizes of below 000 and a double split discretization per feature. Also of note is that the post-prune node counts remain relatively low for four and five variables using all three representations, as in Illustration 5, suggesting further that decision trees can capture the problem with a relatively low complexity of the space. Finally, the validation error rates using the clause count and variable hits representation are significantly lower than either of the other representations, offering a better generalization. Illustration 2: Decision tree on four variable CNFSAT Illustration 3: Decision tree on four variable CNFSAT, 20 PCs

6 Illustration 4: Decision tree on four variable CNFSAT using clause counts and variable hits 6 Illustration 5: Decision tree node counts on five variable CNFSAT Li near reg re ssi on, p has e tran siti on We apply linear regression to MAXSAT and SATSOL in both the four and five variable cases over the three representations. As before, the PCA dimensionally-compressed representation outperforms the full bit vector representation, and the clause count and variable hit representation outperforms both the former representations on validation error. Models based on the bit vector representation achieve the lowest training error but fail to generalize as well as the remaining to representations. Errors are square roots of average sum of square errors. Illustration 6 exhibits linear regression errors on five variable MAXSAT with respect to training set size. Illustration 7 exhibits linear regression errors on five variable MAXSAT with respect to formula size. Illustration 8 exhibits linear regression errors on four variable SATSOL with respect to training set size. Illustration 9 exhibits linear regression errors on four variable SATSOL with respect to formula size. Illustration 6: Linear regression on five variable MAXSAT

7 Illustration 7: Linear regression on five variable MAXSAT by formula length Illustration 8: Linear regression on four variable SATSOL by training size Illustration 9: Linear regression on four variable SATSOL by formula size Of note in Illustration 7 and Illustration 9 is the empirical phase change discussed for MAXSAT in [4]; at a clause to variable ratio of approximately four, the error rates increase in both MAXSAT and SATSOL over four and five variables. More pronounced in SATSOL is the easy-hard-easy transition. Clearly, supervised linear regression encounters difficulty much like the optimization techniques in [4], irrespective of which of the three

8 representations we apply. This suggests a uniformity of difficulty across isolating single solutions using randomized optimization and predicting using supervised learning. 7 Co nclu sio ns and furt her w ork The results herein demonstrate that the novel approach of treating CNFSAT and constituent problems as instances of supervised learning could be significant, not just in reduction of space complexity but also that this reduction in space complexity corresponds to a reduction in prediction error. The first principal component representing essentially the number of clauses is significant, and further study might demonstrate similar interpretations of the subsequent principal components and whether this result generalizes to instances containing more than five variables. The dimensionally reduced representations outperform the full bit vector representation in generalization over decision trees and linear regression. And even if the models achieved merely comparable generalization error, the PCA dimensionally-compressed representation and the clause count and variable hit representation are still superior since their respective feature spaces are considerably smaller; that is, if the number of features is essentially polynomial in the number of variables, say O(n2), then comparable prediction error with the full, O(3n ), representation is significant. The PCA count of 20 realizes O(n2 ). The clause count and variable hits representation realizes O(n). This suggests that prediction of satisfiability, the maximum number of satisfiable clauses, and the number of solutions, might require much less complexity than that of the bit vector representation. The phase transition appearing in both randomized optimization of specific instances and the predictive models discussed herein suggests some uniformity of difficulty across approaches, entreating further study. Acknowledgments I would like to acknowledge Professor H. Venkateswaran of the Georgia Institute of Technology for his interest in these topics; he graciously participated in many discussions related to this paper. References [] Cook, S. A. (97). The complexity of theorem-proving procedures. STOC '7 Proceedings of the third annual ACM symposium on Theory of computing. ACM: New York, NY, USA. [2] Schoning, T. (999). A probabilistic algorithm for k-sat and constraint satisfaction problems. IEEE Computer Society. IEEE Computer Society: Washington, DC, USA. [3] Krentel, M. (986). The Complexity of Optimization Problems. Journal of Computer and System Sciences - Structure in Complexity Theory Conference : Orlando, FL, USA. [4] Qasem, M., Prugel-Bennett, A. (2008). Complexity of MAX-SAT Using Stochastic Algorithms. GECCO: New York, NY, USA. [5] Valiant, L. (979). The Complexity of Computing the Permanent. Theoretical Computer Science. Elsevier Science Publishers: Essex, UK. [6] Istrate, G. (999). The phase transition in random Horn satisfiability and its algorithmic implications. Random Structures & Algorithms. John Wiley & Sons, Inc.: New York, NY, USA. [7] Istrate, G. (2005). Coarse and Sharp Thresholds of Boolean Constraint Satisfaction Problems. Discrete Applied Mathematics - Special issue: Typical case complexity and phase transitions. Elsevier Science Publishers: Amsterdam, The Netherlands.

Lecture 7: NP-Complete Problems

Lecture 7: NP-Complete Problems IAS/PCMI Summer Session 2000 Clay Mathematics Undergraduate Program Basic Course on Computational Complexity Lecture 7: NP-Complete Problems David Mix Barrington and Alexis Maciel July 25, 2000 1. Circuit

More information

Introduction to Logic in Computer Science: Autumn 2006

Introduction to Logic in Computer Science: Autumn 2006 Introduction to Logic in Computer Science: Autumn 2006 Ulle Endriss Institute for Logic, Language and Computation University of Amsterdam Ulle Endriss 1 Plan for Today Now that we have a basic understanding

More information

Nan Kong, Andrew J. Schaefer. Department of Industrial Engineering, Univeristy of Pittsburgh, PA 15261, USA

Nan Kong, Andrew J. Schaefer. Department of Industrial Engineering, Univeristy of Pittsburgh, PA 15261, USA A Factor 1 2 Approximation Algorithm for Two-Stage Stochastic Matching Problems Nan Kong, Andrew J. Schaefer Department of Industrial Engineering, Univeristy of Pittsburgh, PA 15261, USA Abstract We introduce

More information

Why? A central concept in Computer Science. Algorithms are ubiquitous.

Why? A central concept in Computer Science. Algorithms are ubiquitous. Analysis of Algorithms: A Brief Introduction Why? A central concept in Computer Science. Algorithms are ubiquitous. Using the Internet (sending email, transferring files, use of search engines, online

More information

P versus NP, and More

P versus NP, and More 1 P versus NP, and More Great Ideas in Theoretical Computer Science Saarland University, Summer 2014 If you have tried to solve a crossword puzzle, you know that it is much harder to solve it than to verify

More information

How To Identify A Churner

How To Identify A Churner 2012 45th Hawaii International Conference on System Sciences A New Ensemble Model for Efficient Churn Prediction in Mobile Telecommunication Namhyoung Kim, Jaewook Lee Department of Industrial and Management

More information

Final Project Report

Final Project Report CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes

More information

Dimensionality Reduction: Principal Components Analysis

Dimensionality Reduction: Principal Components Analysis Dimensionality Reduction: Principal Components Analysis In data mining one often encounters situations where there are a large number of variables in the database. In such situations it is very likely

More information

Review Jeopardy. Blue vs. Orange. Review Jeopardy

Review Jeopardy. Blue vs. Orange. Review Jeopardy Review Jeopardy Blue vs. Orange Review Jeopardy Jeopardy Round Lectures 0-3 Jeopardy Round $200 How could I measure how far apart (i.e. how different) two observations, y 1 and y 2, are from each other?

More information

Steven M. Ho!and. Department of Geology, University of Georgia, Athens, GA 30602-2501

Steven M. Ho!and. Department of Geology, University of Georgia, Athens, GA 30602-2501 PRINCIPAL COMPONENTS ANALYSIS (PCA) Steven M. Ho!and Department of Geology, University of Georgia, Athens, GA 30602-2501 May 2008 Introduction Suppose we had measured two variables, length and width, and

More information

ARTICLE IN PRESS. European Journal of Operational Research xxx (2004) xxx xxx. Discrete Optimization. Nan Kong, Andrew J.

ARTICLE IN PRESS. European Journal of Operational Research xxx (2004) xxx xxx. Discrete Optimization. Nan Kong, Andrew J. A factor 1 European Journal of Operational Research xxx (00) xxx xxx Discrete Optimization approximation algorithm for two-stage stochastic matching problems Nan Kong, Andrew J. Schaefer * Department of

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

Classification of Bad Accounts in Credit Card Industry

Classification of Bad Accounts in Credit Card Industry Classification of Bad Accounts in Credit Card Industry Chengwei Yuan December 12, 2014 Introduction Risk management is critical for a credit card company to survive in such competing industry. In addition

More information

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal Learning Example Chapter 18: Learning from Examples 22c:145 An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc) of newly admitted patients. A decision is needed: whether

More information

Machine Learning Final Project Spam Email Filtering

Machine Learning Final Project Spam Email Filtering Machine Learning Final Project Spam Email Filtering March 2013 Shahar Yifrah Guy Lev Table of Content 1. OVERVIEW... 3 2. DATASET... 3 2.1 SOURCE... 3 2.2 CREATION OF TRAINING AND TEST SETS... 4 2.3 FEATURE

More information

Medical Information Management & Mining. You Chen Jan,15, 2013 [email protected]

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu Medical Information Management & Mining You Chen Jan,15, 2013 [email protected] 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?

More information

IMPROVING PERFORMANCE OF RANDOMIZED SIGNATURE SORT USING HASHING AND BITWISE OPERATORS

IMPROVING PERFORMANCE OF RANDOMIZED SIGNATURE SORT USING HASHING AND BITWISE OPERATORS Volume 2, No. 3, March 2011 Journal of Global Research in Computer Science RESEARCH PAPER Available Online at www.jgrcs.info IMPROVING PERFORMANCE OF RANDOMIZED SIGNATURE SORT USING HASHING AND BITWISE

More information

Data Mining Practical Machine Learning Tools and Techniques

Data Mining Practical Machine Learning Tools and Techniques Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea

More information

T-61.3050 : Email Classification as Spam or Ham using Naive Bayes Classifier. Santosh Tirunagari : 245577

T-61.3050 : Email Classification as Spam or Ham using Naive Bayes Classifier. Santosh Tirunagari : 245577 T-61.3050 : Email Classification as Spam or Ham using Naive Bayes Classifier Santosh Tirunagari : 245577 January 20, 2011 Abstract This term project gives a solution how to classify an email as spam or

More information

Component Ordering in Independent Component Analysis Based on Data Power

Component Ordering in Independent Component Analysis Based on Data Power Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals

More information

15.062 Data Mining: Algorithms and Applications Matrix Math Review

15.062 Data Mining: Algorithms and Applications Matrix Math Review .6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop

More information

Support Vector Machines with Clustering for Training with Very Large Datasets

Support Vector Machines with Clustering for Training with Very Large Datasets Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France [email protected] Massimiliano

More information

Page 1. CSCE 310J Data Structures & Algorithms. CSCE 310J Data Structures & Algorithms. P, NP, and NP-Complete. Polynomial-Time Algorithms

Page 1. CSCE 310J Data Structures & Algorithms. CSCE 310J Data Structures & Algorithms. P, NP, and NP-Complete. Polynomial-Time Algorithms CSCE 310J Data Structures & Algorithms P, NP, and NP-Complete Dr. Steve Goddard [email protected] CSCE 310J Data Structures & Algorithms Giving credit where credit is due:» Most of the lecture notes

More information

Data Mining Techniques for Prognosis in Pancreatic Cancer

Data Mining Techniques for Prognosis in Pancreatic Cancer Data Mining Techniques for Prognosis in Pancreatic Cancer by Stuart Floyd A Thesis Submitted to the Faculty of the WORCESTER POLYTECHNIC INSTITUE In partial fulfillment of the requirements for the Degree

More information

Manifold Learning Examples PCA, LLE and ISOMAP

Manifold Learning Examples PCA, LLE and ISOMAP Manifold Learning Examples PCA, LLE and ISOMAP Dan Ventura October 14, 28 Abstract We try to give a helpful concrete example that demonstrates how to use PCA, LLE and Isomap, attempts to provide some intuition

More information

D-optimal plans in observational studies

D-optimal plans in observational studies D-optimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational

More information

A Performance Comparison of Five Algorithms for Graph Isomorphism

A Performance Comparison of Five Algorithms for Graph Isomorphism A Performance Comparison of Five Algorithms for Graph Isomorphism P. Foggia, C.Sansone, M. Vento Dipartimento di Informatica e Sistemistica Via Claudio, 21 - I 80125 - Napoli, Italy {foggiapa, carlosan,

More information

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning. Lecture Machine Learning Milos Hauskrecht [email protected] 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht [email protected] 539 Sennott

More information

A Content based Spam Filtering Using Optical Back Propagation Technique

A Content based Spam Filtering Using Optical Back Propagation Technique A Content based Spam Filtering Using Optical Back Propagation Technique Sarab M. Hameed 1, Noor Alhuda J. Mohammed 2 Department of Computer Science, College of Science, University of Baghdad - Iraq ABSTRACT

More information

Compact Representations and Approximations for Compuation in Games

Compact Representations and Approximations for Compuation in Games Compact Representations and Approximations for Compuation in Games Kevin Swersky April 23, 2008 Abstract Compact representations have recently been developed as a way of both encoding the strategic interactions

More information

USE OF EIGENVALUES AND EIGENVECTORS TO ANALYZE BIPARTIVITY OF NETWORK GRAPHS

USE OF EIGENVALUES AND EIGENVECTORS TO ANALYZE BIPARTIVITY OF NETWORK GRAPHS USE OF EIGENVALUES AND EIGENVECTORS TO ANALYZE BIPARTIVITY OF NETWORK GRAPHS Natarajan Meghanathan Jackson State University, 1400 Lynch St, Jackson, MS, USA [email protected] ABSTRACT This

More information

SIMS 255 Foundations of Software Design. Complexity and NP-completeness

SIMS 255 Foundations of Software Design. Complexity and NP-completeness SIMS 255 Foundations of Software Design Complexity and NP-completeness Matt Welsh November 29, 2001 [email protected] 1 Outline Complexity of algorithms Space and time complexity ``Big O'' notation Complexity

More information

AUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S.

AUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S. AUTOMATION OF ENERGY DEMAND FORECASTING by Sanzad Siddique, B.S. A Thesis submitted to the Faculty of the Graduate School, Marquette University, in Partial Fulfillment of the Requirements for the Degree

More information

Machine Learning in FX Carry Basket Prediction

Machine Learning in FX Carry Basket Prediction Machine Learning in FX Carry Basket Prediction Tristan Fletcher, Fabian Redpath and Joe D Alessandro Abstract Artificial Neural Networks ANN), Support Vector Machines SVM) and Relevance Vector Machines

More information

Lossless Grey-scale Image Compression using Source Symbols Reduction and Huffman Coding

Lossless Grey-scale Image Compression using Source Symbols Reduction and Huffman Coding Lossless Grey-scale Image Compression using Source Symbols Reduction and Huffman Coding C. SARAVANAN [email protected] Assistant Professor, Computer Centre, National Institute of Technology, Durgapur,WestBengal,

More information

Making Sense of the Mayhem: Machine Learning and March Madness

Making Sense of the Mayhem: Machine Learning and March Madness Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University [email protected] [email protected] I. Introduction III. Model The goal of our research

More information

Practical Graph Mining with R. 5. Link Analysis

Practical Graph Mining with R. 5. Link Analysis Practical Graph Mining with R 5. Link Analysis Outline Link Analysis Concepts Metrics for Analyzing Networks PageRank HITS Link Prediction 2 Link Analysis Concepts Link A relationship between two entities

More information

Gerry Hobbs, Department of Statistics, West Virginia University

Gerry Hobbs, Department of Statistics, West Virginia University Decision Trees as a Predictive Modeling Method Gerry Hobbs, Department of Statistics, West Virginia University Abstract Predictive modeling has become an important area of interest in tasks such as credit

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

More information

Ensemble Data Mining Methods

Ensemble Data Mining Methods Ensemble Data Mining Methods Nikunj C. Oza, Ph.D., NASA Ames Research Center, USA INTRODUCTION Ensemble Data Mining Methods, also known as Committee Methods or Model Combiners, are machine learning methods

More information

Analysis of Internet Topologies: A Historical View

Analysis of Internet Topologies: A Historical View Analysis of Internet Topologies: A Historical View Mohamadreza Najiminaini, Laxmi Subedi, and Ljiljana Trajković Communication Networks Laboratory http://www.ensc.sfu.ca/cnl Simon Fraser University Vancouver,

More information

Supervised Feature Selection & Unsupervised Dimensionality Reduction

Supervised Feature Selection & Unsupervised Dimensionality Reduction Supervised Feature Selection & Unsupervised Dimensionality Reduction Feature Subset Selection Supervised: class labels are given Select a subset of the problem features Why? Redundant features much or

More information

Discuss the size of the instance for the minimum spanning tree problem.

Discuss the size of the instance for the minimum spanning tree problem. 3.1 Algorithm complexity The algorithms A, B are given. The former has complexity O(n 2 ), the latter O(2 n ), where n is the size of the instance. Let n A 0 be the size of the largest instance that can

More information

Data Mining for Knowledge Management. Classification

Data Mining for Knowledge Management. Classification 1 Data Mining for Knowledge Management Classification Themis Palpanas University of Trento http://disi.unitn.eu/~themis Data Mining for Knowledge Management 1 Thanks for slides to: Jiawei Han Eamonn Keogh

More information

MULTIPLE-OBJECTIVE DECISION MAKING TECHNIQUE Analytical Hierarchy Process

MULTIPLE-OBJECTIVE DECISION MAKING TECHNIQUE Analytical Hierarchy Process MULTIPLE-OBJECTIVE DECISION MAKING TECHNIQUE Analytical Hierarchy Process Business Intelligence and Decision Making Professor Jason Chen The analytical hierarchy process (AHP) is a systematic procedure

More information

PHASE ESTIMATION ALGORITHM FOR FREQUENCY HOPPED BINARY PSK AND DPSK WAVEFORMS WITH SMALL NUMBER OF REFERENCE SYMBOLS

PHASE ESTIMATION ALGORITHM FOR FREQUENCY HOPPED BINARY PSK AND DPSK WAVEFORMS WITH SMALL NUMBER OF REFERENCE SYMBOLS PHASE ESTIMATION ALGORITHM FOR FREQUENCY HOPPED BINARY PSK AND DPSK WAVEFORMS WITH SMALL NUM OF REFERENCE SYMBOLS Benjamin R. Wiederholt The MITRE Corporation Bedford, MA and Mario A. Blanco The MITRE

More information

Guessing Game: NP-Complete?

Guessing Game: NP-Complete? Guessing Game: NP-Complete? 1. LONGEST-PATH: Given a graph G = (V, E), does there exists a simple path of length at least k edges? YES 2. SHORTEST-PATH: Given a graph G = (V, E), does there exists a simple

More information

CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.

CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore. CI6227: Data Mining Lesson 11b: Ensemble Learning Sinno Jialin PAN Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore Acknowledgements: slides are adapted from the lecture notes

More information

One last point: we started off this book by introducing another famously hard search problem:

One last point: we started off this book by introducing another famously hard search problem: S. Dasgupta, C.H. Papadimitriou, and U.V. Vazirani 261 Factoring One last point: we started off this book by introducing another famously hard search problem: FACTORING, the task of finding all prime factors

More information

CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES

CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES Claus Gwiggner, Ecole Polytechnique, LIX, Palaiseau, France Gert Lanckriet, University of Berkeley, EECS,

More information

Algebra Unpacked Content For the new Common Core standards that will be effective in all North Carolina schools in the 2012-13 school year.

Algebra Unpacked Content For the new Common Core standards that will be effective in all North Carolina schools in the 2012-13 school year. This document is designed to help North Carolina educators teach the Common Core (Standard Course of Study). NCDPI staff are continually updating and improving these tools to better serve teachers. Algebra

More information

NP-Completeness and Cook s Theorem

NP-Completeness and Cook s Theorem NP-Completeness and Cook s Theorem Lecture notes for COM3412 Logic and Computation 15th January 2002 1 NP decision problems The decision problem D L for a formal language L Σ is the computational task:

More information

Determining the Optimal Combination of Trial Division and Fermat s Factorization Method

Determining the Optimal Combination of Trial Division and Fermat s Factorization Method Determining the Optimal Combination of Trial Division and Fermat s Factorization Method Joseph C. Woodson Home School P. O. Box 55005 Tulsa, OK 74155 Abstract The process of finding the prime factorization

More information

CSC 373: Algorithm Design and Analysis Lecture 16

CSC 373: Algorithm Design and Analysis Lecture 16 CSC 373: Algorithm Design and Analysis Lecture 16 Allan Borodin February 25, 2013 Some materials are from Stephen Cook s IIT talk and Keven Wayne s slides. 1 / 17 Announcements and Outline Announcements

More information

Tutorial 8. NP-Complete Problems

Tutorial 8. NP-Complete Problems Tutorial 8 NP-Complete Problems Decision Problem Statement of a decision problem Part 1: instance description defining the input Part 2: question stating the actual yesor-no question A decision problem

More information

Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

More information

Standardization and Its Effects on K-Means Clustering Algorithm

Standardization and Its Effects on K-Means Clustering Algorithm Research Journal of Applied Sciences, Engineering and Technology 6(7): 399-3303, 03 ISSN: 040-7459; e-issn: 040-7467 Maxwell Scientific Organization, 03 Submitted: January 3, 03 Accepted: February 5, 03

More information

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear

More information

Generating models of a matched formula with a polynomial delay

Generating models of a matched formula with a polynomial delay Generating models of a matched formula with a polynomial delay Petr Savicky Institute of Computer Science, Academy of Sciences of Czech Republic, Pod Vodárenskou Věží 2, 182 07 Praha 8, Czech Republic

More information

How To Cluster

How To Cluster Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main

More information

Using multiple models: Bagging, Boosting, Ensembles, Forests

Using multiple models: Bagging, Boosting, Ensembles, Forests Using multiple models: Bagging, Boosting, Ensembles, Forests Bagging Combining predictions from multiple models Different models obtained from bootstrap samples of training data Average predictions or

More information

Chapter 1. NP Completeness I. 1.1. Introduction. By Sariel Har-Peled, December 30, 2014 1 Version: 1.05

Chapter 1. NP Completeness I. 1.1. Introduction. By Sariel Har-Peled, December 30, 2014 1 Version: 1.05 Chapter 1 NP Completeness I By Sariel Har-Peled, December 30, 2014 1 Version: 1.05 "Then you must begin a reading program immediately so that you man understand the crises of our age," Ignatius said solemnly.

More information

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin

More information

npsolver A SAT Based Solver for Optimization Problems

npsolver A SAT Based Solver for Optimization Problems npsolver A SAT Based Solver for Optimization Problems Norbert Manthey and Peter Steinke Knowledge Representation and Reasoning Group Technische Universität Dresden, 01062 Dresden, Germany [email protected]

More information

LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014

LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014 LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING ----Changsheng Liu 10-30-2014 Agenda Semi Supervised Learning Topics in Semi Supervised Learning Label Propagation Local and global consistency Graph

More information

How High a Degree is High Enough for High Order Finite Elements?

How High a Degree is High Enough for High Order Finite Elements? This space is reserved for the Procedia header, do not use it How High a Degree is High Enough for High Order Finite Elements? William F. National Institute of Standards and Technology, Gaithersburg, Maryland,

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

A Brief Introduction to Property Testing

A Brief Introduction to Property Testing A Brief Introduction to Property Testing Oded Goldreich Abstract. This short article provides a brief description of the main issues that underly the study of property testing. It is meant to serve as

More information

Volatility modeling in financial markets

Volatility modeling in financial markets Volatility modeling in financial markets Master Thesis Sergiy Ladokhin Supervisors: Dr. Sandjai Bhulai, VU University Amsterdam Brian Doelkahar, Fortis Bank Nederland VU University Amsterdam Faculty of

More information

9 Hedging the Risk of an Energy Futures Portfolio UNCORRECTED PROOFS. Carol Alexander 9.1 MAPPING PORTFOLIOS TO CONSTANT MATURITY FUTURES 12 T 1)

9 Hedging the Risk of an Energy Futures Portfolio UNCORRECTED PROOFS. Carol Alexander 9.1 MAPPING PORTFOLIOS TO CONSTANT MATURITY FUTURES 12 T 1) Helyette Geman c0.tex V - 0//0 :00 P.M. Page Hedging the Risk of an Energy Futures Portfolio Carol Alexander This chapter considers a hedging problem for a trader in futures on crude oil, heating oil and

More information

Complexity Classes P and NP

Complexity Classes P and NP Complexity Classes P and NP MATH 3220 Supplemental Presentation by John Aleshunas The cure for boredom is curiosity. There is no cure for curiosity Dorothy Parker Computational Complexity Theory In computer

More information

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05 Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification

More information

Lecture 10: Regression Trees

Lecture 10: Regression Trees Lecture 10: Regression Trees 36-350: Data Mining October 11, 2006 Reading: Textbook, sections 5.2 and 10.5. The next three lectures are going to be about a particular kind of nonlinear predictive model,

More information

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning SAMSI 10 May 2013 Outline Introduction to NMF Applications Motivations NMF as a middle step

More information

SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING

SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING AAS 07-228 SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING INTRODUCTION James G. Miller * Two historical uncorrelated track (UCT) processing approaches have been employed using general perturbations

More information

Knowledge Discovery and Data Mining. Structured vs. Non-Structured Data

Knowledge Discovery and Data Mining. Structured vs. Non-Structured Data Knowledge Discovery and Data Mining Unit # 2 1 Structured vs. Non-Structured Data Most business databases contain structured data consisting of well-defined fields with numeric or alphanumeric values.

More information

Master of Arts in Mathematics

Master of Arts in Mathematics Master of Arts in Mathematics Administrative Unit The program is administered by the Office of Graduate Studies and Research through the Faculty of Mathematics and Mathematics Education, Department of

More information

Engineering Problem Solving and Excel. EGN 1006 Introduction to Engineering

Engineering Problem Solving and Excel. EGN 1006 Introduction to Engineering Engineering Problem Solving and Excel EGN 1006 Introduction to Engineering Mathematical Solution Procedures Commonly Used in Engineering Analysis Data Analysis Techniques (Statistics) Curve Fitting techniques

More information

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing CS Master Level Courses and Areas The graduate courses offered may change over time, in response to new developments in computer science and the interests of faculty and students; the list of graduate

More information

Adaptive Demand-Forecasting Approach based on Principal Components Time-series an application of data-mining technique to detection of market movement

Adaptive Demand-Forecasting Approach based on Principal Components Time-series an application of data-mining technique to detection of market movement Adaptive Demand-Forecasting Approach based on Principal Components Time-series an application of data-mining technique to detection of market movement Toshio Sugihara Abstract In this study, an adaptive

More information

Introduction to Machine Learning Using Python. Vikram Kamath

Introduction to Machine Learning Using Python. Vikram Kamath Introduction to Machine Learning Using Python Vikram Kamath Contents: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Introduction/Definition Where and Why ML is used Types of Learning Supervised Learning Linear Regression

More information

Big Ideas in Mathematics

Big Ideas in Mathematics Big Ideas in Mathematics which are important to all mathematics learning. (Adapted from the NCTM Curriculum Focal Points, 2006) The Mathematics Big Ideas are organized using the PA Mathematics Standards

More information

Part 2: Community Detection

Part 2: Community Detection Chapter 8: Graph Data Part 2: Community Detection Based on Leskovec, Rajaraman, Ullman 2014: Mining of Massive Datasets Big Data Management and Analytics Outline Community Detection - Social networks -

More information

BookTOC.txt. 1. Functions, Graphs, and Models. Algebra Toolbox. Sets. The Real Numbers. Inequalities and Intervals on the Real Number Line

BookTOC.txt. 1. Functions, Graphs, and Models. Algebra Toolbox. Sets. The Real Numbers. Inequalities and Intervals on the Real Number Line College Algebra in Context with Applications for the Managerial, Life, and Social Sciences, 3rd Edition Ronald J. Harshbarger, University of South Carolina - Beaufort Lisa S. Yocco, Georgia Southern University

More information

Less naive Bayes spam detection

Less naive Bayes spam detection Less naive Bayes spam detection Hongming Yang Eindhoven University of Technology Dept. EE, Rm PT 3.27, P.O.Box 53, 5600MB Eindhoven The Netherlands. E-mail:[email protected] also CoSiNe Connectivity Systems

More information

Machine Learning. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Machine Learning Term 2012/2013 1 / 34

Machine Learning. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Machine Learning Term 2012/2013 1 / 34 Machine Learning Javier Béjar cbea LSI - FIB Term 2012/2013 Javier Béjar cbea (LSI - FIB) Machine Learning Term 2012/2013 1 / 34 Outline 1 Introduction to Inductive learning 2 Search and inductive learning

More information

Logistic Regression for Spam Filtering

Logistic Regression for Spam Filtering Logistic Regression for Spam Filtering Nikhila Arkalgud February 14, 28 Abstract The goal of the spam filtering problem is to identify an email as a spam or not spam. One of the classic techniques used

More information

Computer programming course in the Department of Physics, University of Calcutta

Computer programming course in the Department of Physics, University of Calcutta Computer programming course in the Department of Physics, University of Calcutta Parongama Sen with inputs from Prof. S. Dasgupta and Dr. J. Saha and feedback from students Computer programming course

More information

How To Encrypt Data With A Power Of N On A K Disk

How To Encrypt Data With A Power Of N On A K Disk Towards High Security and Fault Tolerant Dispersed Storage System with Optimized Information Dispersal Algorithm I Hrishikesh Lahkar, II Manjunath C R I,II Jain University, School of Engineering and Technology,

More information

Machine Learning and Pattern Recognition Logistic Regression

Machine Learning and Pattern Recognition Logistic Regression Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,

More information

Notes on Factoring. MA 206 Kurt Bryan

Notes on Factoring. MA 206 Kurt Bryan The General Approach Notes on Factoring MA 26 Kurt Bryan Suppose I hand you n, a 2 digit integer and tell you that n is composite, with smallest prime factor around 5 digits. Finding a nontrivial factor

More information

COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction

COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction COMP3420: Advanced Databases and Data Mining Classification and prediction: Introduction and Decision Tree Induction Lecture outline Classification versus prediction Classification A two step process Supervised

More information

DATA ANALYSIS II. Matrix Algorithms

DATA ANALYSIS II. Matrix Algorithms DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where

More information

Supervised Learning (Big Data Analytics)

Supervised Learning (Big Data Analytics) Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used

More information

The Classes P and NP

The Classes P and NP The Classes P and NP We now shift gears slightly and restrict our attention to the examination of two families of problems which are very important to computer scientists. These families constitute the

More information

A Comparative Study of the Pickup Method and its Variations Using a Simulated Hotel Reservation Data

A Comparative Study of the Pickup Method and its Variations Using a Simulated Hotel Reservation Data A Comparative Study of the Pickup Method and its Variations Using a Simulated Hotel Reservation Data Athanasius Zakhary, Neamat El Gayar Faculty of Computers and Information Cairo University, Giza, Egypt

More information

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University [email protected]

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University caizhua@gmail.com Bayesian Machine Learning (ML): Modeling And Inference in Big Data Zhuhua Cai Google Rice University [email protected] 1 Syllabus Bayesian ML Concepts (Today) Bayesian ML on MapReduce (Next morning) Bayesian

More information

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES Vijayalakshmi Mahanra Rao 1, Yashwant Prasad Singh 2 Multimedia University, Cyberjaya, MALAYSIA 1 [email protected]

More information

OHJ-2306 Introduction to Theoretical Computer Science, Fall 2012 8.11.2012

OHJ-2306 Introduction to Theoretical Computer Science, Fall 2012 8.11.2012 276 The P vs. NP problem is a major unsolved problem in computer science It is one of the seven Millennium Prize Problems selected by the Clay Mathematics Institute to carry a $ 1,000,000 prize for the

More information