# Review of some concepts in predictive modeling

Save this PDF as:

Size: px
Start display at page:

Download "Review of some concepts in predictive modeling"

## Transcription

1 Review of some concepts in predictive modeling Brigham and Women s Hospital Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support

2 A disjoint list of topics? Naïve Bayes Bayesian networks Logistic Regression Rough and Fuzzy Sets CART Neural Networks Support Vector Machines Hierarchical clustering, K- means Survival Analysis Evaluation Optimization Essentials of time series Implied knowledge of Linear regression K-nearest neighbors

3 What predictive models do Case 1 Case 2 Predict this Using these and evaluate performance on new cases ???????

4 Predictive Model Considerations Select a model Linear, Nonlinear Parametric, non-parametric Data separability Continuous versus discrete (categorical) outcome Continuous versus discrete variables One class, multiple classes Estimate the parameters (i.e., learn from data ) Evaluate

5 Predictive Modeling Tenets Evaluate performance on a set of new cases Test set should not be used in any step of building the predictive modeling (model selection, parameter estimation) Avoid overfitting Rule of thumb : 2-10 times more cases than attributes Use a portion of the training set for model selection or parameter tuning Start with simpler models as benchmarks

6 Desirable properties of models Good predictive performance (even for non-linearly separable data) Robustness (outliers are ignored) Ability to be interpreted Indicate which variables contribute more for the predictions Indicate the nature of variable interactions Allow visualization Be easily applied, be generalizable to other measurement instruments, and easily communicated

7 Logistic Regression Good for interpretation Works well only if data are linearly separable Interactions need to be entered manually Not likely to overfit if # variables is low Inputs Age 34 Gender 1 Mitoses 4 *.5 *.4 Σ *.8 Output 0.6 Probability of cancer Coefficients Prediction p = e -(Σ+α)

8 Logistic Regression and non-linearly-separable problems Simple form cannot deal with it Y = 1/(1+exp-(ax1+bx2) Adding interaction terms transforms the space such that problem may become linearly separable Y = 1/(1+exp-(ax1 + bx2 + cx1x2))

9 From perceptrons to multilayer perceptrons Why?

10 Neural Networks Work well even with nonlinearly separable data Overfitting control: Few weights Little training

11 Neural Networks Inputs Hidden Layer Outputs Age Gender Mitoses Σ Σ Σ 0.6 Probability of Cancer Weights Weights

12 Classification Trees

13 Support Vector Machines Points on dashed lines satisfy g(x) = +1 resp. g(o) = -1 Support Vectors are at the margin If there are outliers in the margin, then the vectors may be wrong

14 Nonlinear SVM Idea: Nonlinearly project data into higher dimensional space with Φ:R m H Apply optimal hyperplane algorithm in H

15 LARGE data sets In predictive modeling, large data sets have several cases (with few attributes or variables for each case) In some domains, large data sets with several attributes and few cases are subject to analysis (predictive modeling) The main tenets of predictive modeling should be always used

16 Large m small n problem m variables, n cases Underdetermined systems Simple memorization even with simple models Poor generalization to new data Overfitting

17 Reducing Columns Some approaches: Principal Components Analysis (a component is a linear combination of variables with specific coefficients) Variable selection

18 Principal Component Analysis Identify direction with greatest variation (combination of variables with different weights) Identify next direction conditioned on the first one, and so on until the variance accounted for is acceptable

19 PCA disadvantage No class information used in PCA Projected coordinates may be bad for classification

20 Variable Selection Ideal: consider all variable combinations Not feasible: 2 n Greedy Backward: may not work if more variables than cases Greedy Forward: Select most important variable as the first component Select other variables conditioned on the previous ones Stepwise: consider backtracking Other search methods: genetic algorithms that optimize classification performance and # variables

21 Simple Forward Variable Selection Conditional ranking of most important variables is possible Easy interpretation of resulting LR model No artificial axis that is a combination of variables as in PCA No need to deal with too many columns Selection based on outcome variable uses classification problem at hand

22 Optimization Not only for variable or case selection! But it helps here Family of problems in which there is a function to maximize/minimize, and constraints For example, in classification problems you may be minimizing squared errors, cross-entropy Knowing the complexity of the problem helps select algorithm (hill-climbing, etc.) Used in classification and decision making

23 Visualization Capabilities of predictive models in this area are limited Clustering is often good for visualization, but it is generally not very useful to separate data into pre-defined categories Hierarchical trees 2-D or 3-D multidimensional scaling plots Self-organizing maps

24 Visualizing the classification potential of selected inputs Clustering visualization that uses classification information may help display the separation of the cases in a limited number of dimensions Clustering without selection of dimensions important for classification is less expected to display this separation

25 See Khan et al. Nature Medicine, 7(6):

26 Generalizing a model Assume: You have a good classification model You know which variables are important Wouldn t it also be nice to be able to extract simple rules such as: If X 1 is high and X 2 is high, then Y is high If X 3 is high or X 2 is low, then Y is low Maybe there is a role for using fuzzy systems that compute with words

27 Fuzzy Sets Zadeh (1965) introduced Fuzzy Sets where he replaced the characteristic function with membership ψ S : U {0,1} is replaced by m S : U [0,1] Membership is a generalization of characteristic function and gives a degree of membership Successful applications in control theoretic settings (appliances, gearbox)

28 Fuzzy Sets Vague concepts can be represented Example: Let S be the set of people of normal height Normality is clearly not a crisp concept A fuzzy classification system can use simple English rules from inputs to estimate some output

29 Common steps in a simple Fuzzy System Fuzzification of variables (determine the parameters of the membership functions) Fuzzy inference (apply fuzzy operations on the sets) Deffuzification of output (translate membership of outcome variable back to numbers if necessary)

30 Incorporating Knowledge Incorporating prior beliefs: Bayesian framework c Conditional probabilities a b Bayesian networks Structures and probabilities can be obtained from data or experts Structure tells which variables are conditionally independent (d-separation)

31 Sensitivity = 40/50 =.8 Specificity = 45/50 =.9 threshold nl nl 45 D D nl disease TN TP FN FP

32 Sensitivity = 30/50 =.6 Specificity = 1 threshold nl nl 50 D D nl disease TN TP FN

33 Threshold 0.6 Threshold 0.4 nl D nl D nl D nl D Sensitivity ROC curve Threshold 0.7 nl D nl D Specificity

34 Data set I Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks Javed Khan 1, 2, 7, Jun S. Wei 1, 7, Markus Ringnér 1, 3, 7, Lao H. Saal 1, Marc Ladanyi 4, Frank Westermann 5, Frank Berthold 6, Manfred Schwab 5, Cristina R. Antonescu 4, Carsten Peterson 3 & Paul S. Meltzer 1 Nature Medicine, June 2001 Volume 7 Number 6 pp small, round blue cell tumors (SRBCTs) of childhood: neuroblastoma (NB), rhabdomyosarcoma (RMS), non-hodgkin lymphoma (NHL) and the Ewing family of tumors (EWS) genes, 63 training samples (40 cell lines, 23 tumor samples), 25 test samples

35 SRBCT Log-R ANN # genes Correct Unclassified 5 7 Incorrect 2 0

36 Hierarchical Clustering on Khan Data using 8 genes

37 Gene H62098 T54213 * AA *+ R36960 H06992 *+ W AA AA Category BL RMS RMS RMS NB NB EWS EWS + : gene in the list of 96 from Khan model * : strong support from literature

38 A Simple Fuzzy System Membership in Low, Medium, High Triangular function based on min, median, max 1 0 min median max

39 Class memberships Establish a threshold for membership for each class 1 1 t 1 : max of negative cases (low) t 3 : min of positive cases (high) t 2 : medium point between t 1 and t 0 3 Select class with highest membership Reject if more than more than one class with same max membership t 1 t 2 t 3

40 Rule Discovery Ad-hoc human inspection of the training set A rule could have the variables from the LR model, would use HIGH if coefficient positive and LOW otherwise IF H62098 is HIGH then BL If any two of (T54123, AA705225, R36960) are HIGH, then RMS If (H06992 is HIGH and W49619) is HIGH then NB If (AA is HIGH and AA is HIGH) then EWS Gene H62098 T54213 * AA *+ R36960 H06992 *+ W AA AA Category BL RMS RMS RMS NB NB EWS EWS

41 Some reminders Simple models may perform at the same level of complex ones for certain data sets A benchmark can be established with these models, which can be easily accessed Simple rules may have a role in generalizing results to other platforms No model can be proved to be best, need to try all

42 Project guidelines Justify choices for the data and classification model Research what has already been done (benchmarks) Establish your own benchmarks in simple models Evaluate appropriately Acknowledge limitations Indicate what else could have been done (and why you did not do it)

### PCA, Clustering and Classification. By H. Bjørn Nielsen strongly inspired by Agnieszka S. Juncker

PCA, Clustering and Classification By H. Bjørn Nielsen strongly inspired by Agnieszka S. Juncker Motivation: Multidimensional data Pat1 Pat2 Pat3 Pat4 Pat5 Pat6 Pat7 Pat8 Pat9 209619_at 7758 4705 5342

### Clustering and Data Mining in R

Clustering and Data Mining in R Workshop Supplement Thomas Girke December 10, 2011 Introduction Data Preprocessing Data Transformations Distance Methods Cluster Linkage Hierarchical Clustering Approaches

### An Introduction to Data Mining

An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail

### Supervised Learning (Big Data Analytics)

Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used

### Introduction to Machine Learning. Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011

Introduction to Machine Learning Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011 1 Outline 1. What is machine learning? 2. The basic of machine learning 3. Principles and effects of machine learning

### Data Mining Chapter 6: Models and Patterns Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Data Mining Chapter 6: Models and Patterns Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Models vs. Patterns Models A model is a high level, global description of a

### Neural Networks and Support Vector Machines

INF5390 - Kunstig intelligens Neural Networks and Support Vector Machines Roar Fjellheim INF5390-13 Neural Networks and SVM 1 Outline Neural networks Perceptrons Neural networks Support vector machines

### Introduction to machine learning and pattern recognition Lecture 1 Coryn Bailer-Jones

Introduction to machine learning and pattern recognition Lecture 1 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 What is machine learning? Data description and interpretation

### Principles of Data Mining by Hand&Mannila&Smyth

Principles of Data Mining by Hand&Mannila&Smyth Slides for Textbook Ari Visa,, Institute of Signal Processing Tampere University of Technology October 4, 2010 Data Mining: Concepts and Techniques 1 Differences

### Comparison of machine learning methods for intelligent tutoring systems

Comparison of machine learning methods for intelligent tutoring systems Wilhelmiina Hämäläinen 1 and Mikko Vinni 1 Department of Computer Science, University of Joensuu, P.O. Box 111, FI-80101 Joensuu

### The Data Mining Process

Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data

### Machine learning for algo trading

Machine learning for algo trading An introduction for nonmathematicians Dr. Aly Kassam Overview High level introduction to machine learning A machine learning bestiary What has all this got to do with

### Data Clustering. Dec 2nd, 2013 Kyrylo Bessonov

Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main

### Data Mining Algorithms Part 1. Dejan Sarka

Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses

### Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear

### CLASSIFICATION AND CLUSTERING. Anveshi Charuvaka

CLASSIFICATION AND CLUSTERING Anveshi Charuvaka Learning from Data Classification Regression Clustering Anomaly Detection Contrast Set Mining Classification: Definition Given a collection of records (training

### New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction

Introduction New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Predictive analytics encompasses the body of statistical knowledge supporting the analysis of massive data sets.

### Inferential Statistics

Inferential Statistics Sampling and the normal distribution Z-scores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are

### Data Mining Techniques for Prognosis in Pancreatic Cancer

Data Mining Techniques for Prognosis in Pancreatic Cancer by Stuart Floyd A Thesis Submitted to the Faculty of the WORCESTER POLYTECHNIC INSTITUE In partial fulfillment of the requirements for the Degree

### Predictive Dynamix Inc

Predictive Modeling Technology Predictive modeling is concerned with analyzing patterns and trends in historical and operational data in order to transform data into actionable decisions. This is accomplished

### Learning. CS461 Artificial Intelligence Pinar Duygulu. Bilkent University, Spring 2007. Slides are mostly adapted from AIMA and MIT Open Courseware

1 Learning CS 461 Artificial Intelligence Pinar Duygulu Bilkent University, Slides are mostly adapted from AIMA and MIT Open Courseware 2 Learning What is learning? 3 Induction David Hume Bertrand Russell

### Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction

### Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

### Chapter 12 Discovering New Knowledge Data Mining

Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to

### Application of Data Mining Techniques in Improving Breast Cancer Diagnosis

Application of Data Mining Techniques in Improving Breast Cancer Diagnosis ABSTRACT Breast cancer is the second leading cause of cancer deaths among women in the United States. Although mortality rates

### Introduction to Machine Learning

Introduction to Machine Learning Prof. Alexander Ihler Prof. Max Welling icamp Tutorial July 22 What is machine learning? The ability of a machine to improve its performance based on previous results:

### Feature Subset Selection in E-mail Spam Detection

Feature Subset Selection in E-mail Spam Detection Amir Rajabi Behjat, Universiti Technology MARA, Malaysia IT Security for the Next Generation Asia Pacific & MEA Cup, Hong Kong 14-16 March, 2012 Feature

### The WM Method Completed: A Flexible Fuzzy System Approach to Data Mining

768 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 11, NO. 6, DECEMBER 2003 The WM Method Completed: A Flexible Fuzzy System Approach to Data Mining Li-Xin Wang, Member, IEEE Abstract In this paper, the so-called

### Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

Artificial Neural Networks and Support Vector Machines CS 486/686: Introduction to Artificial Intelligence 1 Outline What is a Neural Network? - Perceptron learners - Multi-layer networks What is a Support

### Machine Learning Logistic Regression

Machine Learning Logistic Regression Jeff Howbert Introduction to Machine Learning Winter 2012 1 Logistic regression Name is somewhat misleading. Really a technique for classification, not regression.

### Neural Networks. Introduction to Artificial Intelligence CSE 150 May 29, 2007

Neural Networks Introduction to Artificial Intelligence CSE 150 May 29, 2007 Administration Last programming assignment has been posted! Final Exam: Tuesday, June 12, 11:30-2:30 Last Lecture Naïve Bayes

### NTC Project: S01-PH10 (formerly I01-P10) 1 Forecasting Women s Apparel Sales Using Mathematical Modeling

1 Forecasting Women s Apparel Sales Using Mathematical Modeling Celia Frank* 1, Balaji Vemulapalli 1, Les M. Sztandera 2, Amar Raheja 3 1 School of Textiles and Materials Technology 2 Computer Information

### Social Media Mining. Data Mining Essentials

Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

### Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier

International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-1, Issue-6, January 2013 Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing

### Data Mining. Concepts, Models, Methods, and Algorithms. 2nd Edition

Brochure More information from http://www.researchandmarkets.com/reports/2171322/ Data Mining. Concepts, Models, Methods, and Algorithms. 2nd Edition Description: This book reviews state-of-the-art methodologies

### Predictive Data modeling for health care: Comparative performance study of different prediction models

Predictive Data modeling for health care: Comparative performance study of different prediction models Shivanand Hiremath hiremat.nitie@gmail.com National Institute of Industrial Engineering (NITIE) Vihar

### Data quality in Accounting Information Systems

Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania

### Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines

### 1. Classification problems

Neural and Evolutionary Computing. Lab 1: Classification problems Machine Learning test data repository Weka data mining platform Introduction Scilab 1. Classification problems The main aim of a classification

### ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) Gabriela Ochoa http://www.cs.stir.ac.uk/~goc/ OUTLINE Preliminaries Classification and Clustering Applications

### EFFICIENT DATA PRE-PROCESSING FOR DATA MINING

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING USING NEURAL NETWORKS JothiKumar.R 1, Sivabalan.R.V 2 1 Research scholar, Noorul Islam University, Nagercoil, India Assistant Professor, Adhiparasakthi College

### X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.

### International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer

### Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal

Learning Example Chapter 18: Learning from Examples 22c:145 An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc) of newly admitted patients. A decision is needed: whether

### Predicting required bandwidth for educational institutes using prediction techniques in data mining (Case Study: Qom Payame Noor University)

260 IJCSNS International Journal of Computer Science and Network Security, VOL.11 No.6, June 2011 Predicting required bandwidth for educational institutes using prediction techniques in data mining (Case

### Churn problem in retail banking Current methods in churn prediction models Fuzzy c-means clustering algorithm vs. classical k-means clustering

CHURN PREDICTION MODEL IN RETAIL BANKING USING FUZZY C- MEANS CLUSTERING Džulijana Popović Consumer Finance, Zagrebačka banka d.d. Bojana Dalbelo Bašić Faculty of Electrical Engineering and Computing University

### Evaluation of Machine Learning Techniques for Green Energy Prediction

arxiv:1406.3726v1 [cs.lg] 14 Jun 2014 Evaluation of Machine Learning Techniques for Green Energy Prediction 1 Objective Ankur Sahai University of Mainz, Germany We evaluate Machine Learning techniques

### Random forest algorithm in big data environment

Random forest algorithm in big data environment Yingchun Liu * School of Economics and Management, Beihang University, Beijing 100191, China Received 1 September 2014, www.cmnt.lv Abstract Random forest

### WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

Information Builders enables agile information solutions with business intelligence (BI) and integration technologies. WebFOCUS the most widely utilized business intelligence platform connects to any enterprise

### A Basic Guide to Modeling Techniques for All Direct Marketing Challenges

A Basic Guide to Modeling Techniques for All Direct Marketing Challenges Allison Cornia Database Marketing Manager Microsoft Corporation C. Olivia Rud Executive Vice President Data Square, LLC Overview

### An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

### A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS Mrs. Jyoti Nawade 1, Dr. Balaji D 2, Mr. Pravin Nawade 3 1 Lecturer, JSPM S Bhivrabai Sawant Polytechnic, Pune (India) 2 Assistant

### BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376

Course Director: Dr. Kayvan Najarian (DCM&B, kayvan@umich.edu) Lectures: Labs: Mondays and Wednesdays 9:00 AM -10:30 AM Rm. 2065 Palmer Commons Bldg. Wednesdays 10:30 AM 11:30 AM (alternate weeks) Rm.

### SURVIVABILITY ANALYSIS OF PEDIATRIC LEUKAEMIC PATIENTS USING NEURAL NETWORK APPROACH

330 SURVIVABILITY ANALYSIS OF PEDIATRIC LEUKAEMIC PATIENTS USING NEURAL NETWORK APPROACH T. M. D.Saumya 1, T. Rupasinghe 2 and P. Abeysinghe 3 1 Department of Industrial Management, University of Kelaniya,

### Acknowledgments. Data Mining with Regression. Data Mining Context. Overview. Colleagues

Data Mining with Regression Teaching an old dog some new tricks Acknowledgments Colleagues Dean Foster in Statistics Lyle Ungar in Computer Science Bob Stine Department of Statistics The School of the

### An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content

### Audit Analytics. --An innovative course at Rutgers. Qi Liu. Roman Chinchila

Audit Analytics --An innovative course at Rutgers Qi Liu Roman Chinchila A new certificate in Analytic Auditing Tentative courses: Audit Analytics Special Topics in Audit Analytics Forensic Accounting

### Machine Learning with MATLAB David Willingham Application Engineer

Machine Learning with MATLAB David Willingham Application Engineer 2014 The MathWorks, Inc. 1 Goals Overview of machine learning Machine learning models & techniques available in MATLAB Streamlining the

### CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.

Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott

### Making Sense of the Mayhem: Machine Learning and March Madness

Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University atran3@stanford.edu ginzberg@stanford.edu I. Introduction III. Model The goal of our research

### Scalable Developments for Big Data Analytics in Remote Sensing

Scalable Developments for Big Data Analytics in Remote Sensing Federated Systems and Data Division Research Group High Productivity Data Processing Dr.-Ing. Morris Riedel et al. Research Group Leader,

### Evaluation of Predictive Models

Evaluation of Predictive Models Assessing calibration and discrimination Examples Decision Systems Group, Brigham and Women s Hospital Harvard Medical School Harvard-MIT Division of Health Sciences and

### Chapter 6. The stacking ensemble approach

82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

### Knowledge Discovery and Data Mining. Structured vs. Non-Structured Data

Knowledge Discovery and Data Mining Unit # 2 1 Structured vs. Non-Structured Data Most business databases contain structured data consisting of well-defined fields with numeric or alphanumeric values.

### AUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S.

AUTOMATION OF ENERGY DEMAND FORECASTING by Sanzad Siddique, B.S. A Thesis submitted to the Faculty of the Graduate School, Marquette University, in Partial Fulfillment of the Requirements for the Degree

### Classifiers & Classification

Classifiers & Classification Forsyth & Ponce Computer Vision A Modern Approach chapter 22 Pattern Classification Duda, Hart and Stork School of Computer Science & Statistics Trinity College Dublin Dublin

### Statistics for BIG data

Statistics for BIG data Statistics for Big Data: Are Statisticians Ready? Dennis Lin Department of Statistics The Pennsylvania State University John Jordan and Dennis K.J. Lin (ICSA-Bulletine 2014) Before

### Effective Analysis and Predictive Model of Stroke Disease using Classification Methods

Effective Analysis and Predictive Model of Stroke Disease using Classification Methods A.Sudha Student, M.Tech (CSE) VIT University Vellore, India P.Gayathri Assistant Professor VIT University Vellore,

### Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 2. Tid Refund Marital Status

Data Mining Classification: Basic Concepts, Decision Trees, and Evaluation Lecture tes for Chapter 4 Introduction to Data Mining by Tan, Steinbach, Kumar Classification: Definition Given a collection of

### Chapter 4: Artificial Neural Networks

Chapter 4: Artificial Neural Networks CS 536: Machine Learning Littman (Wu, TA) Administration icml-03: instructional Conference on Machine Learning http://www.cs.rutgers.edu/~mlittman/courses/ml03/icml03/

### Q1 Define the following: Data Mining, ETL, Transaction coordinator, Local Autonomy, Workload distribution

Q1 Define the following: Data Mining, ETL, Transaction coordinator, Local Autonomy, Workload distribution Q2 What are Data Mining Activities? Q3 What are the basic ideas guide the creation of a data warehouse?

### MS1b Statistical Data Mining

MS1b Statistical Data Mining Yee Whye Teh Department of Statistics Oxford http://www.stats.ox.ac.uk/~teh/datamining.html Outline Administrivia and Introduction Course Structure Syllabus Introduction to

### Component Ordering in Independent Component Analysis Based on Data Power

Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals

### Statistical Models in Data Mining

Statistical Models in Data Mining Sargur N. Srihari University at Buffalo The State University of New York Department of Computer Science and Engineering Department of Biostatistics 1 Srihari Flood of

### A Secured Approach to Credit Card Fraud Detection Using Hidden Markov Model

A Secured Approach to Credit Card Fraud Detection Using Hidden Markov Model Twinkle Patel, Ms. Ompriya Kale Abstract: - As the usage of credit card has increased the credit card fraud has also increased

### Testing Stock Market Efficiency Using Historical Trading Data and Machine Learning

DEGREE PROJECT, IN COMPUTER SCIENCE, FIRST LEVEL STOCKHOLM, SWEDEN 2015 Testing Stock Market Efficiency Using Historical Trading Data and Machine Learning SAMI PURMONEN & PAUL GRIFFIN KTH ROYAL INSTITUTE

### Neural Nets. General Model Building

Neural Nets To give you an idea of how new this material is, let s do a little history lesson. The origins are typically dated back to the early 1940 s and work by two physiologists, McCulloch and Pitts.

Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm

### Assessing Data Mining: The State of the Practice

Assessing Data Mining: The State of the Practice 2003 Herbert A. Edelstein Two Crows Corporation 10500 Falls Road Potomac, Maryland 20854 www.twocrows.com (301) 983-3555 Objectives Separate myth from reality

### Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j

Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j What is Kiva? An organization that allows people to lend small amounts of money via the Internet

### Statistical Machine Learning

Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

### Data Mining: Concepts and Techniques. Jiawei Han. Micheline Kamber. Simon Fräser University К MORGAN KAUFMANN PUBLISHERS. AN IMPRINT OF Elsevier

Data Mining: Concepts and Techniques Jiawei Han Micheline Kamber Simon Fräser University К MORGAN KAUFMANN PUBLISHERS AN IMPRINT OF Elsevier Contents Foreword Preface xix vii Chapter I Introduction I I.

### Active Learning SVM for Blogs recommendation

Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the

### Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

### Machine Learning CS 6830. Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu

Machine Learning CS 6830 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu What is Learning? Merriam-Webster: learn = to acquire knowledge, understanding, or skill

### Trees and Random Forests

Trees and Random Forests Adele Cutler Professor, Mathematics and Statistics Utah State University This research is partially supported by NIH 1R15AG037392-01 Cache Valley, Utah Utah State University Leo

### Journal of Engineering Science and Technology Review 7 (4) (2014) 89-96

Jestr Journal of Engineering Science and Technology Review 7 (4) (2014) 89-96 JOURNAL OF Engineering Science and Technology Review www.jestr.org Applying Fuzzy Logic and Data Mining Techniques in Wireless

### Data Mining. Nonlinear Classification

Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15

: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

### CS 591.03 Introduction to Data Mining Instructor: Abdullah Mueen

CS 591.03 Introduction to Data Mining Instructor: Abdullah Mueen LECTURE 3: DATA TRANSFORMATION AND DIMENSIONALITY REDUCTION Chapter 3: Data Preprocessing Data Preprocessing: An Overview Data Quality Major

### Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD

Predictive Analytics Techniques: What to Use For Your Big Data March 26, 2014 Fern Halper, PhD Presenter Proven Performance Since 1995 TDWI helps business and IT professionals gain insight about data warehousing,

### Machine Learning for NLP

Natural Language Processing SoSe 2015 Machine Learning for NLP Dr. Mariana Neves May 4th, 2015 (based on the slides of Dr. Saeedeh Momtazi) Introduction Field of study that gives computers the ability

### C19 Machine Learning

C9 Machine Learning 8 Lectures Hilary Term 25 2 Tutorial Sheets A. Zisserman Overview: Supervised classification perceptron, support vector machine, loss functions, kernels, random forests, neural networks

### 6. Feed-forward mapping networks

6. Feed-forward mapping networks Fundamentals of Computational Neuroscience, T. P. Trappenberg, 2002. Lecture Notes on Brain and Computation Byoung-Tak Zhang Biointelligence Laboratory School of Computer

### Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,

### Support Vector Machine (SVM)

Support Vector Machine (SVM) CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

### Machine Learning in Spam Filtering

Machine Learning in Spam Filtering A Crash Course in ML Konstantin Tretyakov kt@ut.ee Institute of Computer Science, University of Tartu Overview Spam is Evil ML for Spam Filtering: General Idea, Problems.

### 3D Ultrasonic Diagnosis of Breast Tumors. Wei-Ming Chen

3D Ultrasonic Diagnosis of Breast Tumors Wei-Ming Chen Three major benefits of ultrasound Ultrasound imaging has been shown to be valuable for differentiating some aspects of benign and malignant diseases.