Innovative Data Mining based approaches for life course analysis

Size: px
Start display at page:

Download "Innovative Data Mining based approaches for life course analysis"

Transcription

1 IPUC, Neuchâtel, February 23-24, 2007 Innovative Data Mining based approaches for life course analysis Gilbert Ritschard Alexis Gabadinho, Nicolas Müller, Matthias Studer University of Geneva, Switzerland Outline 1 Aim of the research project 2 Our first results 2.1 Mobility trees 2.2 Survival trees 2.3 Characteristic sequences 3 Foreseen Developments IPUC07 toc intro mob surv seq conc 22/2/2007gr 1

2 1 Aim of the research project Just started February 1, 2007 FNS project on Mining event histories: Towards new insight on personal Swiss life courses Methodological concern Explore and develop data mining approaches for individual longitudinal data Methods for time to event analysis Methods for sequence data analysis Socio-demographic concern Using mainly SHP data, but also other sources, gain original insight on How familial, professional and other socio-demographic events are entwined, Typical characteristics of Swiss life trajectories, Changes in these characteristics over time. IPUC07 toc intro mob surv seq conc 22/2/2007gr 2

3 What is data mining? Data Mining is the process of finding new and potentially useful knowledge from data Gregory Piatetsky-Shapiro editor of Data mining is the analysis of (often large) observational data sets to find unsuspected relationships and to summarize the data in novel ways that are both understandable and useful to the data owner (Hand et al., 2001) Also called Knowledge Discovery in Databases, KDD. Origin: IJCAI Workshop, 1989, Piatetsky-Shapiro (1989) Textbooks : Han and Kamber (2001), Hand et al. (2001) IPUC07 toc intro mob surv seq conc 22/2/2007gr 3

4 What is data mining? (2) Concerned with characterization of interesting patterns per se (unsupervised learning) Clustering Frequent itemsets Association rules for classification or prediction purposes (supervised learning) Decision trees Bayesian networks SVM and Kernel Methods CBR (case based reasoning), K-NN (k nearest neighbors) Proceeds mainly heuristically. Unlike statistical modeling, makes no assumptions about process generating the data. IPUC07 toc intro mob surv seq conc 22/2/2007gr 4

5 Typology of methods for individual longitudinal data nature of data questions time stamped event state/event sequences descriptive - Survival curves: - Optimal matching clustering Parametric (Weibull, Gompertz) - Frequencies of typical and non parametric patterns (Kaplan-Meier, Nelson-Aalen) - Discovering typical patterns estimators causality - Hazard regression models - Markov models, Mobility trees - Survival trees - Association rules between subsequences IPUC07 toc intro mob surv seq conc 22/2/2007gr 5

6 2 Our first results Mobility trees Survival trees Characteristic sequences IPUC07 toc intro mob surv seq conc 22/2/2007gr 6

7 2.1 Mobility trees (SHP Data, Waves 1 to 6 ( ), aged between 20 and 64 in 2004.) How does working status (occupied active, unemployed, inactive) in 2004 depend on working status in previous year (1999 to 2003) other factors (attained education level, partner working status, partner education level,...) and what are main interaction effects? Mobility trees are alternative to Markovian transition models. Growing separate classification trees for women and men highlights gender differences. IPUC07 toc intro mob surv seq conc 22/2/2007gr 7

8 Mobility tree, Men Working status 04 Node 0 active occupied unemployed not in labor force Total (100.00) 1283 Working status B, 03 Adj. P-value=0.0000, Chi-square= , df=2 active, full time (>= 80%);active, long part time (50%-80%);active, short part time (< 50%) not in labour force unemployed,<missing> Node 1 active occupied unemployed not in labor force Total (84.57) 1085 Node 2 active occupied unemployed not in labor force Total (4.75) 61 Node 3 active occupied unemployed not in labor force Total (10.68) 137 Partner highest level of education achieved 04 (both grid and individual quest.) Adj. P-value=0.0001, Chi-square= , df=1 Partner actual occupation 04, into 6 Adj. P-value=0.0002, Chi-square= , df=1 <=vocational high school >vocational high school,<missing> at home;part-time paid w ork;full time paid w ork + family company;retired or invalid education,<missing> Node 4 active occupied unemployed not in labor force Total (55.42) 711 Node 5 active occupied unemployed not in labor force Total (29.15) 374 Node 6 active occupied unemployed not in labor force Total (4.68) 60 Node 7 active occupied unemployed not in labor force Total (6.00) 77 IPUC07 toc intro mob surv seq conc 22/2/2007gr 8

9 Mobility tree, Women Working status 04 Node 0 active occupied unemployed not in labor force Total (100.00) 1647 Working status B, 03 Adj. P-value=0.0000, Chi-square= , df=3 not in labour force active, full time (>= 80%);active, long part time (50%-80%) active, short part time (< 50%) unemployed,<missing> Node 1 Node 2 Node 3 Node 4 active occupied active occupied active occupied active occupied unemployed unemployed unemployed unemployed not in labor force not in labor force not in labor force not in labor force Total (18.76) 309 Total (46.51) 766 Total (22.89) 377 Total (11.84) 195 Working status B, 02 Working status B, 99 Working status B, 02 Working status B, 00 Adj. P-value=0.0000, Chi-square= , df=1 Adj. P-value=0.0047, Chi-square= , df=1 Adj. P-value=0.0003, Chi-square= , df=1 Adj. P-value=0.0004, Chi-square= , df=1 not in labour force active, full time (>= 80%);active, short part time (< 50%);unemployed;active, long part time (50%-80%),<missing> active, full time (>= 80%);active, long part time (50%-80%);active, short part time (< 50%) not in labour force;unemployed,<missing> not in labour force;active, full time (>= 80%) active, short part time (< 50%);unemployed;active, long part time (50%-80%),<missing> not in labour force;unemployed;active, long part time (50%-80%),<missing> active, full time (>= 80%);active, short part time (< 50%) Node 5 Node 6 Node 7 Node 8 Node 9 Node 10 Node 11 Node 12 active occupied active occupied active occupied active occupied active occupied active occupied active occupied active occupied unemployed unemployed unemployed unemployed unemployed unemployed unemployed unemployed not in labor force not in labor force not in labor force not in labor force not in labor force not in labor force not in labor force not in labor force Total (14.33) 236 Total (4.43) 73 Total (37.46) 617 Total (9.05) 149 Total (3.52) 58 Total (19.37) 319 Total (6.50) 107 Total (5.34) 88 Highest level of education achieved 04 (both grid and individual quest.) Adj. P-value=0.0292, Chi-square=8.6618, df=1 <=full-time vocational school >full-time vocational school Node 13 active occupied unemployed not in labor force Total (21.25) 350 Node 14 active occupied unemployed not in labor force Total (16.21) 267 Working status B (full time, long part time, short part time, unemployed, inactive) in 2003 used for first split IPUC07 toc intro mob surv seq conc 22/2/2007gr 9

10 Mobility tree, Women: Details for women inactive in 2003 not in labour force Node 1 active occupied unemployed not in labor force Total (18.76) 309 Working status B, 02 Adj. P-value=0.0000, Chi-square= , df=1 not in labour force active, full time (>= 80%);active, short part time (< 50%);unemployed;active, long part time (50%-80%),<missing> activ Node 5 active occupied unemployed not in labor force Total (14.33) 236 Node 6 active occupied unemployed not in labor force Total (4.43) 73 IPUC07 toc intro mob surv seq conc 22/2/2007gr 10

11 2.2 Survival trees (SHP 2002 biographical data, 2002 Wave data for some potential explanatory factors) Which are the most discriminating factors for marriage duration until divorce/separation? Used same variables as for discrete time logistic model in Ritschard and Sauvain-Dugerdil (2007) Tried two methods Maximize differences in KM survival curves using Tarone-Ware (T-W) p-value (Segal, 1988). Cox regression tree: maximize differences in proportionality factors among groups (Leblanc and Crowley, 1992; Therneau and Atkinson, 1997) IPUC07 toc intro mob surv seq conc 22/2/2007gr 11

12 T-W Survival Tree: Marriage until Divorce/Separation Population n = 3619, e = 622 S < 90% at 11 S at 30 = 0.77 TW χ 2 (1) = 54.81, p< <=1940 n = 841, e = 123 S < 90% at 21 S at 30 = 0.86 TW χ 2 (1) = 22.48, p< > 1940 n =2778, e = 499 S <90% at 9 S at 30 = 0.73 TW χ 2 (1) = 37.44, p< <=1940 & Non French L. n = 667, e = 79 S < 90% at 26 S at 30 = 0.89 TW χ 2 (1) = 8.08, p< <=1940 & French L. n = 174, e = 44 S < 90% at 11 S at 30 = 0.74 > 1940 & No Child n = 603, e = 138 S < 90% at 5 S at 30 = 0.64 TW χ 2 (1) = 4.45, p= > 1940 & Child n = 2175, e = 361 S < 90% at 11 S at 30 = 0.75 TW χ 2 (1) = 9.77, p= <=1940 & Non French L. & University n = 51, e = 12 S < 90% at 10 S at 30 = 0.76 > 1940 & No Child & University n = 86, e = 23 S < 90% at 3 S at 30 = 0.59 > 1940 & Child & German or Italian L. n = 1444, e = 217 S < 90% at 13 S at 30 = 0.77 <=1940 & Non French L. & Not University n = 667, e = 79 S < 90% at 29 S at 30 = > 1940 & No Child & Not University n = 517, e = 138 S < 90% at 6 S at 30 = 0.65 > 1940 & Child & French or unknown L. n =731, e = 144 S < 90% at 8 S at 30 = 0.70 IPUC07 toc intro mob surv seq conc 22/2/2007gr 12

13 Noeud finaux Cohorte <=1940 et Allemand, Italien ou inconnu et Université Cohorte <=1940 et Langue Allemand, Italien ou inconnu et Non Université Cohorte <=1940 et Langue Français Cohorte > 1940 et Sans Enfant et Universitaire Cohorte > 1940 et Sans Enfant et Non Universitaire Cohorte > 1940 et Avec Enfant et (Allemand ou Italien) Cohorte > 1940 et Avec Enfant et (Français ou Inconnu) IPUC07 toc intro mob surv seq conc 22/2/2007gr 13

14 Marriage Marriage survival survival probabilities probability until until Divorce/Separation, divorce/separation, by by leaves leaves Survival probability years 10 years 20 years 30 years 0 <=1940 & non French L. & University <=1940 & non French L. & non University <=1940 & French L. > 1940 & no Child & University > 1940 & no Child & non University > 1940 & Child & German or Italian L. > 1940 & Child & French or unknown L. IPUC07 toc intro mob surv seq conc 22/2/2007gr 14

15 Cox Survival Tree: Marriage until Divorce/Separation Population n = 3619, e = 622 Prop. fact. = 1.0 -LL improv.= <=1940 n = 841, e = 123 Prop. fact. = LL improv.= > 1940 n = 2778, e = 499 Prop. fact. = LL improv.= <=1940 & Non French n = 667, e = 79 Prop. fact. = 0.48 <=1940 & French n = 174, e = 44 Prop. fact. = 1.10 > 1940 & Child n = 2175, e = 361 Prop. fact. = 1.06 > 1940 & No Child n = 603, e = 138 Prop. fact. = 1.88 IPUC07 toc intro mob surv seq conc 22/2/2007gr 15

16 Noeud finaux Cohorte <=1940 et Langue Allemand, Italien ou inconnu Cohorte <=1940 et Langue Français Cohorte > 1940 et Avec Enfant Cohorte > 1940 et Sans Enfant IPUC07 toc intro mob surv seq conc 22/2/2007gr 16

17 2.3 Characteristic sequences (SHP 2002 biographical data) Selection of pairs of events, e.g. marriage and first job. For each pair, order of sequence: <, =, >, missing Which are the most typical sequences? Most discriminating sequences between sex birth cohort (1940 and before, after 1940) IPUC07 toc intro mob surv seq conc 22/2/2007gr 17

18 Frequencies of characteristic 2-event sequences 30% 25% 20% 15% 10% 5% 0% Child < Marriage Marriage < Child Child = Marriage Child < Job Job < Child Child = Job Child < Educ end Educ end < Child Child = Educ end Marriage < Job Job < Marriage Marriage = Job Marriage < Educ end Educ end < Marriage Marriage = Educ end Job < Educ end Educ end < Job Job = Educ end IPUC07 toc intro mob surv seq conc 22/2/2007gr 18

19 Cohort discriminating 2-event sequences Naissance Node 0 apres avant Total (100.00) 4755 Départ et mariage Adj. P-value=0.0000, Chi-square= , df=3 < > = <missing> Node 1 Node 2 Node 3 Node 4 apres apres apres apres avant avant avant avant Total (45.95) 2185 Total (1.51) 72 Total (24.44) 1162 Total (28.10) 1336 Mariage et emploi Départ et emploi Mariage et fin des études Adj. P-value=0.0063, Chi-square= , df=1 Adj. P-value=0.0000, Chi-square= , df=1 Adj. P-value=0.0000, Chi-square= , df=2 =,<missing> >;< <;= >,<missing> >;= < <missing> Node 5 Node 6 Node 7 Node 8 Node 9 Node 10 Node 11 apres apres apres apres apres apres apres avant avant avant avant avant avant avant Total (5.62) 267 Total (40.34) 1918 Total (4.48) 213 Total (19.96) 949 Total (4.16) 198 Total (1.18) 56 Total (22.75) 1082 Départ et fin des études Départ et fin des études Départ et fin des études Enfant et emploi Enfant et emploi Enfant et emploi Adj. P-value=0.0249, Chi-square=9.1512, df=1 Adj. P-value=0.0498, Chi-square=7.8866, df=1 Adj. P-value=0.0064, Chi-square= , df=1 Adj. P-value=0.0339, Chi-square=4.5007, df=1 Adj. P-value=0.0007, Chi-square= , df=1 Adj. P-value=0.0000, Chi-square= , df=1 > <;=,<missing> >;< =,<missing> >,<missing> <;= > <missing> >;<;= <missing> >;< =,<missing> Node 12 Node 13 Node 14 Node 15 Node 16 Node 17 Node 18 Node 19 Node 20 Node 21 Node 22 Node 23 apres apres apres apres apres apres apres apres apres apres apres apres avant avant avant avant avant avant avant avant avant avant avant avant Total (1.72) 82 Total (3.89) 185 Total (28.29) 1345 Total (12.05) 573 Total (2.52) 120 Total (1.96) 93 Total (13.31) 633 Total (6.65) 316 Total (1.14) 54 Total (3.03) 144 Total (2.21) 105 Total (20.55) 977 First split variable: {Marriage, Leaving Home} IPUC07 toc intro mob surv seq conc 22/2/2007gr 19

20 Cohort: details for Leaving Home before Marriage < Node 1 apres avant Total (45.95) 2185 > Node 2 apres avant Total (1.51) 72 Mariage et emploi Adj. P-value=0.0063, Chi-square= , df=1 =,<missing> >;< <;= Node 5 apres avant Total (5.62) 267 Node 6 apres avant Total (40.34) 1918 Node 7 apres avant Total (4.48) 213 Départ et fin des études Adj. P-value=0.0249, Chi-square=9.1512, df=1 Départ et fin des études Adj. P-value=0.0498, Chi-square=7.8866, df=1 Départ et fin des études Adj. P-value=0.0064, Chi-square=11.6 > <;=,<missing> >;< =,<missing> >,<missing> Node 12 apres avant Total (1.72) 82 Node 13 apres avant Total (3.89) 185 Node 14 apres avant Total (28.29) 1345 Node 15 apres avant Total (12.05) 573 Node 16 apres avant Total (2.52) 120 Categ apres avant Total IPUC07 toc intro mob surv seq conc 22/2/2007gr 20

21 Sex discriminating 2-event sequences sexe Node 0 masculin féminin Total (100.00) 4755 Emploi et fin des études Adj. P-value=0.0000, Chi-square= , df=3 > < = <missing> Node 1 masculin féminin Total (20.06) 954 Node 2 masculin féminin Total (29.91) 1422 Node 3 masculin féminin Total (20.46) 973 Node 4 masculin féminin Total (29.57) 1406 Enfant et emploi Adj. P-value=0.0028, Chi-square= , df=1 Départ et emploi Adj. P-value=0.0000, Chi-square= , df=1 Départ et emploi Adj. P-value=0.0000, Chi-square= , df=1 >;=,<missing> < <;=,<missing> > <;=,<missing> > Node 5 masculin féminin Total (17.77) 845 Node 6 masculin féminin Total (2.29) 109 Node 7 masculin féminin Total (14.30) 680 Node 8 masculin féminin Total (15.60) 742 Node 9 masculin féminin Total (21.64) 1029 Node 10 masculin féminin Total (7.93) 377 Départ et fin des études Adj. P-value=0.0274, Chi-square=8.9750, df=1 Mariage et fin des études Adj. P-value=0.0000, Chi-square= , df=1 Mariage et fin des études Adj. P-value=0.0019, Chi-square= , df=1 >;<,<missing> = >,<missing> <;= >;=,<missing> < Node 11 masculin féminin Total (16.00) 761 Node 12 masculin féminin Total (1.77) 84 Node 13 masculin féminin Total (10.20) 485 Node 14 masculin féminin Total (5.40) 257 Node 15 masculin féminin Total (20.44) 972 Node 16 masculin féminin Total (1.20) 57 First split variable: {Job, Education End} IPUC07 toc intro mob surv seq conc 22/2/2007gr 21

22 Sex: details for Job after Education end féminin 53.7 Total (100.0 Emploi et fin des é Adj. P-value=0.0000, Chi-squa > Node 1 masculin féminin Total (20.06) 954 < Node 2 masculin féminin Total (29.91) 1422 Enfant et emploi Adj. P-value=0.0028, Chi-square= , df=1 Départ et emploi Adj. P-value=0.0000, Chi-square= , df=1 >;=,<missing> < <;=,<missing> > Node 5 masculin féminin Total (17.77) 845 Node 6 masculin féminin Total (2.29) 109 Node 7 masculin féminin Total (14.30) 680 Node Category masculin féminin Total ( Départ et fin des études Adj. P-value=0.0274, Chi-square=8.9750, df=1 Mariage et fin Adj. P-value=0.0000, Chi >;<,<missing> = >,<missing> Node 11 masculin féminin Total (16.00) 761 Node 12 masculin féminin Total (1.77) 84 Node 13 masculin féminin Total (10.20) 485 IPUC07 toc intro mob surv seq conc 22/2/2007gr 22

23 3 Foreseen Developments Extend tree approaches for Time varying covariates Multilevel contexts Mining typical sequence patterns and association rules Suitable validation criteria Friendly graphical interface for making methods easily accessible Analysis of Swiss life courses Differential impact of various profiles of social insertion Broken lives... IPUC07 toc intro mob surv seq conc 22/2/2007gr 23

24 References Han, J. and M. Kamber (2001). Data Mining: Concept and Techniques. San Francisco: Morgan Kaufmann. Hand, D. J., H. Mannila, and P. Smyth (2001). Principles of Data Mining. Adaptive Computation and Machine Learning. Cambridge MA: MIT Press. Leblanc, M. and J. Crowley (1992). Relative risk trees for censored survival data. Biometrics 48, Piatetsky-Shapiro, G. (Ed.) (1989). Notes of IJCAI 89 Workshop on Knowledge Discovery in Databases (KDD 89), Detroit, MI. Ritschard, G. et C. Sauvain-Dugerdil (2007). L enfant ciment du couple ou le couple comme ciment de la relation du père à l enfant? Quelques enseignements de l enquête rétrospective du Panel Suisse de Ménages. In C. Burton-Jeangros, E. Widmer, et C. Lalive d Epinay (Eds.), Interactions familiales et constructions de l intimité., coll. Questions sociologiques. Paris : L Harmattan. (à paraître). Segal, M. R. (1988). Regression trees for censored data. Biometrics 44, Therneau, T. M. and E. J. Atkinson (1997). An introduction to recursive partitioning using the rpart routines. Technical Report Series 61, Mayo Clinic, Section of Statistics, Rochester, Minnesota. References toc intro mob surv seq conc 22/2/2007gr 24

Course/Seminar Gilbert Ritschard Wednesday 10h15-14h M-5383 Anne-Laure Bertrand (Ass)

Course/Seminar Gilbert Ritschard Wednesday 10h15-14h M-5383 Anne-Laure Bertrand (Ass) Institute for Demographic and Life Course Studies Sequential Data Analysis 4311012 Course by Gilbert Ritschard Master level Sequential, Spring 2014, Info This sequence analysis course (6 ECTS) is given

More information

Outline. Sequential Data Analysis Issues With Sequential Data. How shall we handle missing values? Missing data in sequences

Outline. Sequential Data Analysis Issues With Sequential Data. How shall we handle missing values? Missing data in sequences Outline Sequential Data Analysis Issues With Sequential Data Gilbert Ritschard Alexis Gabadinho, Matthias Studer Institute for Demographic and Life Course Studies, University of Geneva and NCCR LIVES:

More information

Data Mining Analytics for Business Intelligence and Decision Support

Data Mining Analytics for Business Intelligence and Decision Support Data Mining Analytics for Business Intelligence and Decision Support Chid Apte, T.J. Watson Research Center, IBM Research Division Knowledge Discovery and Data Mining (KDD) techniques are used for analyzing

More information

Data Mining and Exploration. Data Mining and Exploration: Introduction. Relationships between courses. Overview. Course Introduction

Data Mining and Exploration. Data Mining and Exploration: Introduction. Relationships between courses. Overview. Course Introduction Data Mining and Exploration Data Mining and Exploration: Introduction Amos Storkey, School of Informatics January 10, 2006 http://www.inf.ed.ac.uk/teaching/courses/dme/ Course Introduction Welcome Administration

More information

Principles of Dat Da a t Mining Pham Tho Hoan hoanpt@hnue.edu.v hoanpt@hnue.edu. n

Principles of Dat Da a t Mining Pham Tho Hoan hoanpt@hnue.edu.v hoanpt@hnue.edu. n Principles of Data Mining Pham Tho Hoan hoanpt@hnue.edu.vn References [1] David Hand, Heikki Mannila and Padhraic Smyth, Principles of Data Mining, MIT press, 2002 [2] Jiawei Han and Micheline Kamber,

More information

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10 1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom

More information

Tips for surviving the analysis of survival data. Philip Twumasi-Ankrah, PhD

Tips for surviving the analysis of survival data. Philip Twumasi-Ankrah, PhD Tips for surviving the analysis of survival data Philip Twumasi-Ankrah, PhD Big picture In medical research and many other areas of research, we often confront continuous, ordinal or dichotomous outcomes

More information

Perspectives on Data Mining

Perspectives on Data Mining Perspectives on Data Mining Niall Adams Department of Mathematics, Imperial College London n.adams@imperial.ac.uk April 2009 Objectives Give an introductory overview of data mining (DM) (or Knowledge Discovery

More information

Reference Books. Data Mining. Supervised vs. Unsupervised Learning. Classification: Definition. Classification k-nearest neighbors

Reference Books. Data Mining. Supervised vs. Unsupervised Learning. Classification: Definition. Classification k-nearest neighbors Classification k-nearest neighbors Data Mining Dr. Engin YILDIZTEPE Reference Books Han, J., Kamber, M., Pei, J., (2011). Data Mining: Concepts and Techniques. Third edition. San Francisco: Morgan Kaufmann

More information

Principles of Data Mining by Hand&Mannila&Smyth

Principles of Data Mining by Hand&Mannila&Smyth Principles of Data Mining by Hand&Mannila&Smyth Slides for Textbook Ari Visa,, Institute of Signal Processing Tampere University of Technology October 4, 2010 Data Mining: Concepts and Techniques 1 Differences

More information

Static Data Mining Algorithm with Progressive Approach for Mining Knowledge

Static Data Mining Algorithm with Progressive Approach for Mining Knowledge Global Journal of Business Management and Information Technology. Volume 1, Number 2 (2011), pp. 85-93 Research India Publications http://www.ripublication.com Static Data Mining Algorithm with Progressive

More information

Data Mining. Knowledge Discovery, Data Warehousing and Machine Learning Final remarks. Lecturer: JERZY STEFANOWSKI

Data Mining. Knowledge Discovery, Data Warehousing and Machine Learning Final remarks. Lecturer: JERZY STEFANOWSKI Data Mining Knowledge Discovery, Data Warehousing and Machine Learning Final remarks Lecturer: JERZY STEFANOWSKI Email: Jerzy.Stefanowski@cs.put.poznan.pl Data Mining a step in A KDD Process Data mining:

More information

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing Introduction to Data Mining and Machine Learning Techniques Iza Moise, Evangelos Pournaras, Dirk Helbing Iza Moise, Evangelos Pournaras, Dirk Helbing 1 Overview Main principles of data mining Definition

More information

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015 An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content

More information

Introduction to Event History Analysis DUSTIN BROWN POPULATION RESEARCH CENTER

Introduction to Event History Analysis DUSTIN BROWN POPULATION RESEARCH CENTER Introduction to Event History Analysis DUSTIN BROWN POPULATION RESEARCH CENTER Objectives Introduce event history analysis Describe some common survival (hazard) distributions Introduce some useful Stata

More information

Classification and Prediction

Classification and Prediction Classification and Prediction Slides for Data Mining: Concepts and Techniques Chapter 7 Jiawei Han and Micheline Kamber Intelligent Database Systems Research Lab School of Computing Science Simon Fraser

More information

Assessing Data Mining: The State of the Practice

Assessing Data Mining: The State of the Practice Assessing Data Mining: The State of the Practice 2003 Herbert A. Edelstein Two Crows Corporation 10500 Falls Road Potomac, Maryland 20854 www.twocrows.com (301) 983-3555 Objectives Separate myth from reality

More information

Mobility Tool Ownership - A Review of the Recessionary Report

Mobility Tool Ownership - A Review of the Recessionary Report Hazard rate 0.20 0.18 0.16 0.14 0.12 0.10 0.08 0.06 Residence Education Employment Education and employment Car: always available Car: partially available National annual ticket ownership Regional annual

More information

Dr. U. Devi Prasad Associate Professor Hyderabad Business School GITAM University, Hyderabad Email: Prasad_vungarala@yahoo.co.in

Dr. U. Devi Prasad Associate Professor Hyderabad Business School GITAM University, Hyderabad Email: Prasad_vungarala@yahoo.co.in 96 Business Intelligence Journal January PREDICTION OF CHURN BEHAVIOR OF BANK CUSTOMERS USING DATA MINING TOOLS Dr. U. Devi Prasad Associate Professor Hyderabad Business School GITAM University, Hyderabad

More information

Data Mining with SAS. Mathias Lanner mathias.lanner@swe.sas.com. Copyright 2010 SAS Institute Inc. All rights reserved.

Data Mining with SAS. Mathias Lanner mathias.lanner@swe.sas.com. Copyright 2010 SAS Institute Inc. All rights reserved. Data Mining with SAS Mathias Lanner mathias.lanner@swe.sas.com Copyright 2010 SAS Institute Inc. All rights reserved. Agenda Data mining Introduction Data mining applications Data mining techniques SEMMA

More information

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

Statistics 215b 11/20/03 D.R. Brillinger. A field in search of a definition a vague concept

Statistics 215b 11/20/03 D.R. Brillinger. A field in search of a definition a vague concept Statistics 215b 11/20/03 D.R. Brillinger Data mining A field in search of a definition a vague concept D. Hand, H. Mannila and P. Smyth (2001). Principles of Data Mining. MIT Press, Cambridge. Some definitions/descriptions

More information

Data Mining for Model Creation. Presentation by Paul Below, EDS 2500 NE Plunkett Lane Poulsbo, WA USA 98370 paul.below@eds.

Data Mining for Model Creation. Presentation by Paul Below, EDS 2500 NE Plunkett Lane Poulsbo, WA USA 98370 paul.below@eds. Sept 03-23-05 22 2005 Data Mining for Model Creation Presentation by Paul Below, EDS 2500 NE Plunkett Lane Poulsbo, WA USA 98370 paul.below@eds.com page 1 Agenda Data Mining and Estimating Model Creation

More information

Statistics Graduate Courses

Statistics Graduate Courses Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

More information

Data Mining Techniques and Opportunities for Taxation Agencies

Data Mining Techniques and Opportunities for Taxation Agencies Data Mining Techniques and Opportunities for Taxation Agencies Florida Consultant In This Session... You will learn the data mining techniques below and their application for Tax Agencies ABC Analysis

More information

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,

More information

Philosophies and Advances in Scaling Mining Algorithms to Large Databases

Philosophies and Advances in Scaling Mining Algorithms to Large Databases Philosophies and Advances in Scaling Mining Algorithms to Large Databases Paul Bradley Apollo Data Technologies paul@apollodatatech.com Raghu Ramakrishnan UW-Madison raghu@cs.wisc.edu Johannes Gehrke Cornell

More information

Mobile Phone APP Software Browsing Behavior using Clustering Analysis

Mobile Phone APP Software Browsing Behavior using Clustering Analysis Proceedings of the 2014 International Conference on Industrial Engineering and Operations Management Bali, Indonesia, January 7 9, 2014 Mobile Phone APP Software Browsing Behavior using Clustering Analysis

More information

Three Perspectives of Data Mining

Three Perspectives of Data Mining Three Perspectives of Data Mining Zhi-Hua Zhou * National Laboratory for Novel Software Technology, Nanjing University, Nanjing 210093, China Abstract This paper reviews three recent books on data mining

More information

200609 - ATV - Lifetime Data Analysis

200609 - ATV - Lifetime Data Analysis Coordinating unit: Teaching unit: Academic year: Degree: ECTS credits: 2015 200 - FME - School of Mathematics and Statistics 715 - EIO - Department of Statistics and Operations Research 1004 - UB - (ENG)Universitat

More information

A Mechanism for Selecting Appropriate Data Mining Techniques

A Mechanism for Selecting Appropriate Data Mining Techniques A Mechanism for Selecting Appropriate Data Mining Techniques Rose Tinabo School of Computing Dublin institute of technology Kevin Street Dublin 8 Ireland rose.tinabo@mydit.ie ABSTRACT: Due to an increase

More information

Chapter 12 Discovering New Knowledge Data Mining

Chapter 12 Discovering New Knowledge Data Mining Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to

More information

DATA MINING TECHNIQUES AND APPLICATIONS

DATA MINING TECHNIQUES AND APPLICATIONS DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,

More information

Data Mining: An Overview. David Madigan http://www.stat.columbia.edu/~madigan

Data Mining: An Overview. David Madigan http://www.stat.columbia.edu/~madigan Data Mining: An Overview David Madigan http://www.stat.columbia.edu/~madigan Overview Brief Introduction to Data Mining Data Mining Algorithms Specific Eamples Algorithms: Disease Clusters Algorithms:

More information

EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH

EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH SANGITA GUPTA 1, SUMA. V. 2 1 Jain University, Bangalore 2 Dayanada Sagar Institute, Bangalore, India Abstract- One

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli (alberto.ceselli@unimi.it)

More information

Knowledge Discovery Process and Data Mining - Final remarks

Knowledge Discovery Process and Data Mining - Final remarks Knowledge Discovery Process and Data Mining - Final remarks Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 14 SE Master Course 2008/2009

More information

Immigrant fertility in West Germany: Socialization effect in transitions to second and third births?

Immigrant fertility in West Germany: Socialization effect in transitions to second and third births? Immigrant fertility in West Germany: Socialization effect in transitions to second and third births? Nadja Milewski Institut national d études démographiques (INED), Paris Conference on Effects of Migration

More information

A Review of Missing Data Treatment Methods

A Review of Missing Data Treatment Methods A Review of Missing Data Treatment Methods Liu Peng, Lei Lei Department of Information Systems, Shanghai University of Finance and Economics, Shanghai, 200433, P.R. China ABSTRACT Missing data is a common

More information

An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset

An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset P P P Health An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset Peng Liu 1, Elia El-Darzi 2, Lei Lei 1, Christos Vasilakis 2, Panagiotis Chountas 2, and Wei Huang

More information

Living standards after divorce: does alimony offset gender income inequalities?

Living standards after divorce: does alimony offset gender income inequalities? Living standards after divorce: does alimony offset gender income inequalities? Carole Bonnet (Ined) 1, Bertrand Garbinti (CREST Insee, PSE) 2, Anne Solaz (Ined) 3 Proposal for the XXVII International

More information

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning. Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott

More information

Data Mining: Overview. What is Data Mining?

Data Mining: Overview. What is Data Mining? Data Mining: Overview What is Data Mining? Recently * coined term for confluence of ideas from statistics and computer science (machine learning and database methods) applied to large databases in science,

More information

Generational differences in timing of first MSM-specific prevention advice among MSM in Switzerland

Generational differences in timing of first MSM-specific prevention advice among MSM in Switzerland Generational differences in timing of first MSM-specific prevention advice among MSM in Switzerland Presentation at the Swiss Public Health Conference, Basel 2011-08 08-26 André Jeannin, Brenda Spencer,

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014 RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer

More information

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland Data Mining and Knowledge Discovery in Databases (KDD) State of the Art Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland 1 Conference overview 1. Overview of KDD and data mining 2. Data

More information

Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone: +27 21 702 4666 www.spss-sa.com

Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone: +27 21 702 4666 www.spss-sa.com SPSS-SA Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone: +27 21 702 4666 www.spss-sa.com SPSS-SA Training Brochure 2009 TABLE OF CONTENTS 1 SPSS TRAINING COURSES FOCUSING

More information

Dynamic Data in terms of Data Mining Streams

Dynamic Data in terms of Data Mining Streams International Journal of Computer Science and Software Engineering Volume 2, Number 1 (2015), pp. 1-6 International Research Publication House http://www.irphouse.com Dynamic Data in terms of Data Mining

More information

Data Mining Project Report. Document Clustering. Meryem Uzun-Per

Data Mining Project Report. Document Clustering. Meryem Uzun-Per Data Mining Project Report Document Clustering Meryem Uzun-Per 504112506 Table of Content Table of Content... 2 1. Project Definition... 3 2. Literature Survey... 3 3. Methods... 4 3.1. K-means algorithm...

More information

Evaluation of Treatment Pathways in Oncology: Modeling Approaches. Feng Pan, PhD United BioSource Corporation Bethesda, MD

Evaluation of Treatment Pathways in Oncology: Modeling Approaches. Feng Pan, PhD United BioSource Corporation Bethesda, MD Evaluation of Treatment Pathways in Oncology: Modeling Approaches Feng Pan, PhD United BioSource Corporation Bethesda, MD 1 Objectives Rationale for modeling treatment pathways Treatment pathway simulation

More information

Data Mining Applications in Fund Raising

Data Mining Applications in Fund Raising Data Mining Applications in Fund Raising Nafisseh Heiat Data mining tools make it possible to apply mathematical models to the historical data to manipulate and discover new information. In this study,

More information

Knowledge Discovery from Data Bases Proposal for a MAP-I UC

Knowledge Discovery from Data Bases Proposal for a MAP-I UC Knowledge Discovery from Data Bases Proposal for a MAP-I UC P. Brazdil 1, João Gama 1, P. Azevedo 2 1 Universidade do Porto; 2 Universidade do Minho; 1 Knowledge Discovery from Data Bases We are deluged

More information

Disparities in Regular Health Care Utilisation in Europe

Disparities in Regular Health Care Utilisation in Europe Séminaire «Santé et Travail dans les Enquêtes Européennes» Lille, (FR) 13 décembre 2010 Disparities in Regular Health Care Utilisation in Europe A Life-time Analysis Nicolas Sirven *, Zeynep Or Institute

More information

Using Data Mining for Mobile Communication Clustering and Characterization

Using Data Mining for Mobile Communication Clustering and Characterization Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer

More information

Data Mining for Knowledge Management. Classification

Data Mining for Knowledge Management. Classification 1 Data Mining for Knowledge Management Classification Themis Palpanas University of Trento http://disi.unitn.eu/~themis Data Mining for Knowledge Management 1 Thanks for slides to: Jiawei Han Eamonn Keogh

More information

Predicting Customer Default Times using Survival Analysis Methods in SAS

Predicting Customer Default Times using Survival Analysis Methods in SAS Predicting Customer Default Times using Survival Analysis Methods in SAS Bart Baesens Bart.Baesens@econ.kuleuven.ac.be Overview The credit scoring survival analysis problem Statistical methods for Survival

More information

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376 Course Director: Dr. Kayvan Najarian (DCM&B, kayvan@umich.edu) Lectures: Labs: Mondays and Wednesdays 9:00 AM -10:30 AM Rm. 2065 Palmer Commons Bldg. Wednesdays 10:30 AM 11:30 AM (alternate weeks) Rm.

More information

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal Learning Example Chapter 18: Learning from Examples 22c:145 An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc) of newly admitted patients. A decision is needed: whether

More information

Machine Learning and Statistics: What s the Connection?

Machine Learning and Statistics: What s the Connection? Machine Learning and Statistics: What s the Connection? Institute for Adaptive and Neural Computation School of Informatics, University of Edinburgh, UK August 2006 Outline The roots of machine learning

More information

Benchmarking Open-Source Tree Learners in R/RWeka

Benchmarking Open-Source Tree Learners in R/RWeka Benchmarking Open-Source Tree Learners in R/RWeka Michael Schauerhuber 1, Achim Zeileis 1, David Meyer 2, Kurt Hornik 1 Department of Statistics and Mathematics 1 Institute for Management Information Systems

More information

Data Mining for Fun and Profit

Data Mining for Fun and Profit Data Mining for Fun and Profit Data mining is the extraction of implicit, previously unknown, and potentially useful information from data. - Ian H. Witten, Data Mining: Practical Machine Learning Tools

More information

Modeling the Claim Duration of Income Protection Insurance Policyholders Using Parametric Mixture Models

Modeling the Claim Duration of Income Protection Insurance Policyholders Using Parametric Mixture Models Modeling the Claim Duration of Income Protection Insurance Policyholders Using Parametric Mixture Models Abstract This paper considers the modeling of claim durations for existing claimants under income

More information

COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction

COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction COMP3420: Advanced Databases and Data Mining Classification and prediction: Introduction and Decision Tree Induction Lecture outline Classification versus prediction Classification A two step process Supervised

More information

Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer

Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer Machine Learning Chapter 18, 21 Some material adopted from notes by Chuck Dyer What is learning? Learning denotes changes in a system that... enable a system to do the same task more efficiently the next

More information

Statistical Models in Data Mining

Statistical Models in Data Mining Statistical Models in Data Mining Sargur N. Srihari University at Buffalo The State University of New York Department of Computer Science and Engineering Department of Biostatistics 1 Srihari Flood of

More information

CAS CS 565, Data Mining

CAS CS 565, Data Mining CAS CS 565, Data Mining Course logistics Course webpage: http://www.cs.bu.edu/~evimaria/cs565-10.html Schedule: Mon Wed, 4-5:30 Instructor: Evimaria Terzi, evimaria@cs.bu.edu Office hours: Mon 2:30-4pm,

More information

Survival analysis methods in Insurance Applications in car insurance contracts

Survival analysis methods in Insurance Applications in car insurance contracts Survival analysis methods in Insurance Applications in car insurance contracts Abder OULIDI 1-2 Jean-Marie MARION 1 Hérvé GANACHAUD 3 1 Institut de Mathématiques Appliquées (IMA) Angers France 2 Institut

More information

Analyzing Polls and News Headlines Using Business Intelligence Techniques

Analyzing Polls and News Headlines Using Business Intelligence Techniques Analyzing Polls and News Headlines Using Business Intelligence Techniques Eleni Fanara, Gerasimos Marketos, Nikos Pelekis and Yannis Theodoridis Department of Informatics, University of Piraeus, 80 Karaoli-Dimitriou

More information

Data Mining Algorithms Part 1. Dejan Sarka

Data Mining Algorithms Part 1. Dejan Sarka Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses

More information

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu Medical Information Management & Mining You Chen Jan,15, 2013 You.chen@vanderbilt.edu 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?

More information

SUMAN DUVVURU STAT 567 PROJECT REPORT

SUMAN DUVVURU STAT 567 PROJECT REPORT SUMAN DUVVURU STAT 567 PROJECT REPORT SURVIVAL ANALYSIS OF HEROIN ADDICTS Background and introduction: Current illicit drug use among teens is continuing to increase in many countries around the world.

More information

Data Mining Methods: Applications for Institutional Research

Data Mining Methods: Applications for Institutional Research Data Mining Methods: Applications for Institutional Research Nora Galambos, PhD Office of Institutional Research, Planning & Effectiveness Stony Brook University NEAIR Annual Conference Philadelphia 2014

More information

Customer Classification And Prediction Based On Data Mining Technique

Customer Classification And Prediction Based On Data Mining Technique Customer Classification And Prediction Based On Data Mining Technique Ms. Neethu Baby 1, Mrs. Priyanka L.T 2 1 M.E CSE, Sri Shakthi Institute of Engineering and Technology, Coimbatore 2 Assistant Professor

More information

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics. Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are

More information

from Larson Text By Susan Miertschin

from Larson Text By Susan Miertschin Decision Tree Data Mining Example from Larson Text By Susan Miertschin 1 Problem The Maximum Miniatures Marketing Department wants to do a targeted mailing gpromoting the Mythic World line of figurines.

More information

ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION

ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION ISSN 9 X INFORMATION TECHNOLOGY AND CONTROL, 00, Vol., No.A ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION Danuta Zakrzewska Institute of Computer Science, Technical

More information

Knowledge Discovery from Databases

Knowledge Discovery from Databases Encyclopedia of Database Technologies and Applications Knowledge Discovery from Databases Jose Hernandez-Orallo Dep. of Information Systems and Computation Technical University of Valencia, Spain jorallo@dsic.upv.es

More information

Data Mining Part 5. Prediction

Data Mining Part 5. Prediction Data Mining Part 5. Prediction 5.1 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Classification vs. Numeric Prediction Prediction Process Data Preparation Comparing Prediction Methods References Classification

More information

Statistics in Retail Finance. Chapter 6: Behavioural models

Statistics in Retail Finance. Chapter 6: Behavioural models Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural

More information

Predicting required bandwidth for educational institutes using prediction techniques in data mining (Case Study: Qom Payame Noor University)

Predicting required bandwidth for educational institutes using prediction techniques in data mining (Case Study: Qom Payame Noor University) 260 IJCSNS International Journal of Computer Science and Network Security, VOL.11 No.6, June 2011 Predicting required bandwidth for educational institutes using prediction techniques in data mining (Case

More information

Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP

Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP ABSTRACT In data mining modelling, data preparation

More information

Search and Data Mining: Techniques. Applications Anya Yarygina Boris Novikov

Search and Data Mining: Techniques. Applications Anya Yarygina Boris Novikov Search and Data Mining: Techniques Applications Anya Yarygina Boris Novikov Introduction Data mining applications Data mining system products and research prototypes Additional themes on data mining Social

More information

MARKET SEGMENTATION, CUSTOMER LIFETIME VALUE, AND CUSTOMER ATTRITION IN HEALTH INSURANCE: A SINGLE ENDEAVOR THROUGH DATA MINING

MARKET SEGMENTATION, CUSTOMER LIFETIME VALUE, AND CUSTOMER ATTRITION IN HEALTH INSURANCE: A SINGLE ENDEAVOR THROUGH DATA MINING MARKET SEGMENTATION, CUSTOMER LIFETIME VALUE, AND CUSTOMER ATTRITION IN HEALTH INSURANCE: A SINGLE ENDEAVOR THROUGH DATA MINING Illya Mowerman WellPoint, Inc. 370 Bassett Road North Haven, CT 06473 (203)

More information

Healthcare Measurement Analysis Using Data mining Techniques

Healthcare Measurement Analysis Using Data mining Techniques www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 03 Issue 07 July, 2014 Page No. 7058-7064 Healthcare Measurement Analysis Using Data mining Techniques 1 Dr.A.Shaik

More information

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table

More information

Overview. Background. Data Mining Analytics for Business Intelligence and Decision Support

Overview. Background. Data Mining Analytics for Business Intelligence and Decision Support Mining Analytics for Business Intelligence and Decision Support Chid Apte, PhD Manager, Abstraction Research Group IBM TJ Watson Research Center apte@us.ibm.com http://www.research.ibm.com/dar Overview

More information

Data Mining: Concepts and Techniques. Jiawei Han. Micheline Kamber. Simon Fräser University К MORGAN KAUFMANN PUBLISHERS. AN IMPRINT OF Elsevier

Data Mining: Concepts and Techniques. Jiawei Han. Micheline Kamber. Simon Fräser University К MORGAN KAUFMANN PUBLISHERS. AN IMPRINT OF Elsevier Data Mining: Concepts and Techniques Jiawei Han Micheline Kamber Simon Fräser University К MORGAN KAUFMANN PUBLISHERS AN IMPRINT OF Elsevier Contents Foreword Preface xix vii Chapter I Introduction I I.

More information

HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION

HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION Chihli Hung 1, Jing Hong Chen 2, Stefan Wermter 3, 1,2 Department of Management Information Systems, Chung Yuan Christian University, Taiwan

More information

Gordon S. Linoff Founder Data Miners, Inc. gordon@data-miners.com

Gordon S. Linoff Founder Data Miners, Inc. gordon@data-miners.com Survival Data Mining Gordon S. Linoff Founder Data Miners, Inc. gordon@data-miners.com What to Expect from this Talk Background on survival analysis from a data miner s perspective Introduction to key

More information

A LONGITUDINAL AND SURVIVAL MODEL WITH HEALTH CARE USAGE FOR INSURED ELDERLY. Workshop

A LONGITUDINAL AND SURVIVAL MODEL WITH HEALTH CARE USAGE FOR INSURED ELDERLY. Workshop A LONGITUDINAL AND SURVIVAL MODEL WITH HEALTH CARE USAGE FOR INSURED ELDERLY Ramon Alemany Montserrat Guillén Xavier Piulachs Lozada Riskcenter - IREA Universitat de Barcelona http://www.ub.edu/riskcenter

More information

Predictive Data modeling for health care: Comparative performance study of different prediction models

Predictive Data modeling for health care: Comparative performance study of different prediction models Predictive Data modeling for health care: Comparative performance study of different prediction models Shivanand Hiremath hiremat.nitie@gmail.com National Institute of Industrial Engineering (NITIE) Vihar

More information

An Introduction to Data Mining

An Introduction to Data Mining An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail

More information

A New Approach for Evaluation of Data Mining Techniques

A New Approach for Evaluation of Data Mining Techniques 181 A New Approach for Evaluation of Data Mining s Moawia Elfaki Yahia 1, Murtada El-mukashfi El-taher 2 1 College of Computer Science and IT King Faisal University Saudi Arabia, Alhasa 31982 2 Faculty

More information

APPLICATION OF DATA MINING TECHNIQUES FOR BUILDING SIMULATION PERFORMANCE PREDICTION ANALYSIS. email paul@esru.strath.ac.uk

APPLICATION OF DATA MINING TECHNIQUES FOR BUILDING SIMULATION PERFORMANCE PREDICTION ANALYSIS. email paul@esru.strath.ac.uk Eighth International IBPSA Conference Eindhoven, Netherlands August -4, 2003 APPLICATION OF DATA MINING TECHNIQUES FOR BUILDING SIMULATION PERFORMANCE PREDICTION Christoph Morbitzer, Paul Strachan 2 and

More information

Title. Introduction to Data Mining. Dr Arulsivanathan Naidoo Statistics South Africa. OECD Conference Cape Town 8-10 December 2010.

Title. Introduction to Data Mining. Dr Arulsivanathan Naidoo Statistics South Africa. OECD Conference Cape Town 8-10 December 2010. Title Introduction to Data Mining Dr Arulsivanathan Naidoo Statistics South Africa OECD Conference Cape Town 8-10 December 2010 1 Outline Introduction Statistics vs Knowledge Discovery Predictive Modeling

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining a.j.m.m. (ton) weijters (slides are partially based on an introduction of Gregory Piatetsky-Shapiro) Overview Why data mining (data cascade) Application examples Data Mining

More information

Machine Learning and Data Analysis overview. Department of Cybernetics, Czech Technical University in Prague. http://ida.felk.cvut.

Machine Learning and Data Analysis overview. Department of Cybernetics, Czech Technical University in Prague. http://ida.felk.cvut. Machine Learning and Data Analysis overview Jiří Kléma Department of Cybernetics, Czech Technical University in Prague http://ida.felk.cvut.cz psyllabus Lecture Lecturer Content 1. J. Kléma Introduction,

More information

Who Goes to Graduate School in Taiwan? Evidence from the 2005 College Graduate Survey and Follow- Up Surveys in 2006 and 2008

Who Goes to Graduate School in Taiwan? Evidence from the 2005 College Graduate Survey and Follow- Up Surveys in 2006 and 2008 Who Goes to Graduate School in Taiwan? Evidence from the 2005 College Graduate Survey and Follow- Up Surveys in 2006 and 2008 Ping-Yin Kuan Department of Sociology Chengchi Unviersity Taiwan Presented

More information