Statistical Analysis of the EPIRARE survey data

Size: px
Start display at page:

Download "Statistical Analysis of the EPIRARE survey data"

Transcription

1 Deliverable D1.4 Statistical Analysis of the EPIRARE survey data Michele Santoro, Michele Lipucci, Fabrizio Bianchi and the EPIRARE Work Package 6 Team 1

2 2

3 CONTENTS Overview of the documents produced by EPIRARE... 4 Disclaimer... 4 I. Part 1: Descriptive, multivariate and exploratory analyses... 5 A. Introduction... 5 B. Methods... 5 C. Results... 6 D. Conclusions E. References II. Part 2: Cluster analysis A. Introduction B. Methods C. Results Target Population Number of diseases Geographical Coverage Data collected Data providers Expected Services by EU Platform 27 D. Conclusions E. References

4 Overview of the documents produced by EPIRARE Disclaimer The contents of this document is in the sole responsibility of the Authors; The Executive Agency for Health and Consumers is not responsible for any use that may be made of the information contained herein. 4

5 I. Part 1: Descriptive, multivariate and exploratory analyses A. Introduction The aim of Work Package 6 "Common data set and disease-specific data collection" is the definition of a dataset structure adoptable by Registries of Rare Diseases, which will allow to collect consistent information compared to the targets defined by the Registries[1,2]. A Registry is made by developing data elements in relation to its aim. The challenge will be to identify a common data elements, defined consistently with the clinical and epidemiological research, that has to be able to standardize the data collection of the rare diseases[3,4]. An analysis was performed to understand the needs and the informative abilities of currently existing Registries, in order to provide a common and shareable informative platform. Therefore data from the Survey, developed for Rare Diseases Registries operating in Europe and other countries, were analyzed. A total of 220 Registries have answered to the questionnaire. Our study activity has been focused only on the questions of interest of the WP6. B. Methods A priori consideration oriented the analysis on some selected questions and on relative answers of the Survey. The question on "Type of data Collected" by the Registry was strictly consistent to the aims of WP6, but it was necessary to evaluate the answers to this question in function of two essential dimensions: purpose and classification of the Registry. The informative heritage collected must be evaluated in respect to the different goals pursued and must be related to the main characteristics of the Registry, as obtained by the questionnaires. Therefore variables analyzed were the following: Aims; Population Target; Number of diseases; Data providers; 5

6 Type of data collected; Disease Coding System; Data sharing. The univariate distribution of response modes, afferent to the questions listed above, was analyzed. Potential associations among variables were initially investigated by bivariate analysis with chi-square test and then studied in deep through multivariate analysis using Logistic Regression models[5,6]. In the end, a factor analysis was performed and was mainly oriented to find out the structure of latent relationships among variables, using the Multiple Correspondence Analysis[7,8,9]. C. Results One of the Survey questions was about the objectives of the Registry with more than 10 response modes allowing an unlimited choice of answers. The "Epidemiological Research" was the main goal declared by the Registries (70.8%), followed by the "Clinical Research" (61.2%) and by the "Natural History of Disease" (60.7%). More than a half of the Registries deals with the "Disease Surveillance" (55.7%), while almost half of the Registries deals with genetic aspects ("Genotype-phenotype correlation" and "Mutation Database"). The "Treatment evaluation" is a target for the 42.9% of Registries, while the "Treatment Monitoring" for the 33.3%; the "Healthcare service planning" for the 33.8%; just 1 Registry out of 5 deals with "Social planning" (19.2%). One of the classifications that characterize a Registry is based on the target population. The Survey showed that 57.1% of responding Registries are Population-based, the 24.0% are Hospital-based, while the 18.9% of them are Case-based. More than the 80% of the Registries surveys a single disease or a group of diseases, while only the 7.3% activated a surveillance on all rare diseases. With regard to data sources, emerges that the "Clinical Units" provide data to the 83.6% of Registries, the "Clinical Genetics Units" to the 43.8%, the "Central Laboratories and services" to the 43.4% and the "Centres of Expertise" to the 30.6%; almost half of the Registries collects data from "Patients and families" and the 21.9% from "Patients' groups"; an interesting aspect is the limited use of routinary informative systems: the 31.1% of the Registries uses data from "Discharge Registers" and only the 12.8% uses "Mortality Registers"; other routinary informative systems are used in a less percentage. The 51.0% of Registries performs an activity of collaboration and of sharing with the Other Registries, the 33.8% with Centre of expertise and the 16.2% with the Biobanks. 6

7 With regard to data collected by the Registries, it is noted that almost all Registries (95.0%) collects information on diagnosis; the 86.6% collects "Clinical data" and the 72.3% collects "Genetic data"; the latter response is accompanied by information on "Family history" (55.0%) and on "Birth and reproductive history" (30.5%). The 61.4% collects information on "Medication, devices and health services", the 46.2% collects the "Socio-demographic information" while only the 32.3% states to register the "Anagraphical data" of patients. The very small percentage of the latter answer suggests a possible misinterpretation of the question and makes it necessary to perform a specific investigation. Less than a half of the Registries adopts the International Classification of Diseases (ICD9, or ICD10, or ICDO) as system of disease coding and the 36.5% doesn t use any code but just reports the disease name; the 13.0% of the Registries adopts the ORPHA code and the same percentage adopts the MIM code; the 25.0% adopts instead an own coding system. Table 1. Distribution of the answers to the following questions: Aim, Population Target, Number of diseases, Data Providers, Data collected, Data sharing and Disease coding system Aim Variable N. % Epidemiological Research ,8 Clinical Research ,2 Natural history of disease ,7 Disease surveillance ,7 Genotype-phenotype correlation ,4 Mutation database 94 42,9 Treatment evaluation (efficacy) 94 42,9 Healthcare service planning 74 33,8 Treatment monitoring (safety) 73 33,3 Social planning 42 19,2 Other 18 8,2 Population Target Population-based ,1 Hospital-based 52 24,0 Case-based 41 18,9 Number of diseases Just one 75 34,3 A group of related RDs ,6 Several RDs (not related) 26 11,9 All rare disease 16 7,3 continua 7

8 continua Data Providers Variable N. % Clinical Units ,6 Patients and families ,4 Clinical genetic Units 96 43,8 Laboratories/central services 95 43,4 Discharge Registers 68 31,1 Centres of Expertise 67 30,6 Patients groups 48 21,9 Mortality Registers 28 12,8 Birth Registers 8 3,7 Disability Registers 7 3,2 Other Registers 15 6,9 Data sharing Other Registers ,0 Biobanks 32 16,2 Centre of expertise 67 33,8 Data Collected Diagnosis ,0 Clinical ,6 Genetic ,3 Medications, devices and health services ,4 Family history ,0 Socio-demographic information ,2 Patient-reported outcomes 78 35,5 Anthropometric information 72 32,7 Anagraphical 71 32,3 Birth and reproductive history 67 30,5 Clinical research participation and bio-specimen donation 67 30,5 Patient's preferences for communication 28 12,7 Disease Coding System ORPHA code 27 13,0 MIM code 27 13,0 ICD O 13 6,3 ICD ,1 ICD ,4 Own code system 52 25,0 Non coding system, just disease name 76 36,5 8

9 Taking in consideration the goals of the WP6 a cross-analysis was performed in order to find out possible dependencies and associations among some of the selected variables. A particularly relevant result of the bivariate analysis emerged about the two main targets declared by the Registries: the "Epidemiological Research" and the "Clinical Research". These objectives are rather exhaustive of the characterization of the mission of the Registry. In fact, only the 6.9% of the Registries does not identify itself in none of the two objectives in question; the 38.8% claims to pursue both goals, while the 54.4% is divided between the two lines of research: the 32.0% deals with "Epidemiological Research" and the 22.4% with "Clinical Research". The two types of research activities point out a diverging trend: the bivariate statistical analysis of the two variables through the chi-square test allows to reject the hypothesis of independence (p=0.003) and highlights an inverse association (coefficient PHI=-0.20). The inverse association emerges also from the "Clinical Research" Odds Ratio compared to the "Epidemiological Research", that is 0.4 (95%CI= ). The analysis pointed out that the Registries tend to follow divergent development strategies according to the two different lines of research; this divergence can characterize a different structure of the Registry in its further purposes and a different informative heritage collected. In relation to the specific aim of WP6, even the possible characterization was investigated in terms of dataset generated from different types of activity. The associations of the answers "Epidemiological Research" and "Clinical Research" were calculated and compared to the other declared objectives. Logistic Regression models were used in order to evaluate such associations using "Epidemiological Research" and "Clinical Research" alternatively as outcome variables. The results are therefore expressed through Odds Ratio (OR) and p-value (Table 2). Table 2. Epidemiological Research and Clinical Research Odds Ratios compared to other objectives Epidemiological Research Clinical Research Aim OR p-value OR p-value Disease surveillance 3,8 0,0004 0,5 0,099 Treatment evaluation (efficacy) 1,3 0,636 1,2 0,685 Mutation database or Genotype-phenotype correlation 0,5 0,039 4,0 <0,0001 Social planning 2,9 0,075 0,6 0,302 Healthcare service planning 0,9 0,843 1,8 0,193 Natural History of disease 1,9 0,093 1,7 0,129 Treatment Monitoring 0,7 0,438 2,5 0,047 The Registries dealing with "Epidemiological Research" show a strong association with the "Disease Surveillance" and with the "Social planning"; they are significantly associated in an inverse way to study aims on genetic aspects, which are here summed up into the joint variable Mutation Database/Genotype- 9

10 phenotype correlation. There is an evident association with "Natural history of disease", and, even if weak, with "Treatment evaluation". There is no association with "Healthcare service planning" and there is a weak inverse association with "Treatment Monitoring". On the contrary, Registries dealing with Clinical Research are inversely associated to the Disease Surveillance and to the Social Planning and therefore they tend not to follow the objective. A positive association, not statistically significant, emerges compared to Healthcare service planning, Natural history of disease and Treatment evaluation. The same model of statistical analysis was implemented compared to "Target Population", "Data providers", Data sharing, "Data Collected" and Disease coding system dimensions. Even compared to the "Target Population" (Table 3), the type of activity shows completely inverse associations. Registries concerning with "Epidemiological Research" have mainly a "Population-based" structure, the "Hospitalbased" are even more frequent than the "Case-based". The Registries dealing with "Clinical Research" are mainly "Case-based, less "Hospital-based" and even less "Population-based". Table 3. Epidemiological Research and Clinical Research Odds Ratios compared to the Target Population Population target Epidemiological Research Clinical Research OR p-value OR p-value Population-based 3,2 0,002 0,5 0,084 Hospital-based 1,6 0,253 0,8 0,584 Case-based 1 1 Compared to the data sources used (Table 4), Registries that deal with "Epidemiological Research" get data from "Clinical Units" and from "Centres of Expertise" more than what Registries dealing with "Clinical Research" do; these latter, instead, mainly draw on informative heritages from patients (patients, families and groups); there is a statistically significant inverse association of the "Epidemiological Research" compared to the "Clinical genetic units" which instead are significantly associated to the "Clinical Research"; all the Registries using "Mortality registers" deal with "Epidemiological Research, while there are no significant associations with the other types of routine informative systems for both the areas of research. 10

11 Table 4. Epidemiological Research and Clinical Research Odds Ratios compared to the Data Providers Data provider Epidemiological Research Clinical Research OR p-value OR p-value Clinical Units 2,3 0,060 1,0 0,936 Clinical genetic Units 0,3 0,001 2,1 0,041 Patients and families 1,3 0,478 2,1 0,020 Patients' groups 0,7 0,430 1,7 0,238 Laboratories/central services 1,5 0,290 0,7 0,309 Centres of expertise 3,3 0,003 1,3 0,495 Discharge Registers 1,1 0,774 1,2 0,616 Mortality Registers * 0,6 0,255 Birth, Disability and other Registers 0,8 0,795 0,7 0,519 * All Registries using Mortality Registries deal with Epidemiological Research. The activity of sharing of Other Registers and Centre of expertise is more intense with Registries that deal with Epidemiological Research, while the Biobanks tend to a greater sharing with Registries dealing with Clinical Research (Table 5). Tabella 5. Epidemiological Research and Clinical Research Odds Ratios compared to the Data sharing Epidemiological Research Clinical Research Data sharing OR p-value OR p-value Other Registers 1,5 0,185 1,3 0,351 Biobanks 1,1 0,803 1,7 0,248 Centres of expertise 1,9 0,101 1,3 0,473 With regard to the typology of the data collected (Table 6), the analysis allowed to point out considerably different informative characteristics compared to the research typology pursued by the Registry. The mode of response Diagnosis was erased from the analysis since diagnosis date are collected by almost all the Registries making therefore this variable not explicative. The variable "Epidemiological Research" highlights a positive and statistically significant association with "Socio-demographic information" and it is also positively associated with "Anagraphical data" and with "Clinical data", instead it is strongly and inversely associated with "Genetic data". On the contrary the variable "Clinical Research" is significantly associated with "Genetic data", as well as with "Medications, devices and health services" and "Clinical Research participation and bio-specimen donation"; it is also associated, but not significantly, with "Clinical data" and it is statistically significant the inverse association with "Socio-demographic information." 11

12 Table 6. Epidemiological Research and Clinical Research Odds Ratios compared to the Data collected Epidemiological Clinical Research Research Data OR p-value OR p-value Anagraphical 1,8 0,130 1,2 0,618 Socio-demographic information 4,2 <0,0001 0,5 0,050 Genetic 0,2 0,003 4,4 0,001 Clinical 2,8 0,085 1,9 0,234 Medications, devices and health services 0,9 0,792 2,6 0,015 Patient-reported outcomes 0,9 0,861 1,3 0,524 Family history 1,3 0,459 1,3 0,468 Anthropometric information 1,6 0,236 0,6 0,264 Birth and reproductive history 1,0 0,931 0,7 0,459 Clinical research participation and bio-specimen donation 0,6 0,257 3,0 0,013 In the end, the Registries that deal with Epidemiological Research tend to use the ICD coding system and not the MIM code, while for Registries that deal with the Clinical Research is all inverted. Furthermore, the Epidemiological Research is positively but not significantly associated with the use of the ORPHA code and with the use of an own coding system (Table 7). Table 7. Epidemiological Research and Clinical Research Odds Ratios compared to the Disease Coding System Epidemiological Clinical Research Research Coding system OR p value OR p value ORPHA code 1,7 0,413 0,6 0,295 MIM code 0,2 0,003 5,3 0,003 ICD 4,2 0,048 0,3 0,054 Own code system 3,4 0,060 0,8 0,720 Non coding system 2,2 0,292 0,9 0,787 The factor analysis substantially confirmed the cognitive framework emerged from the results produced by the model of logistic regression. A Multiple Correspondence Analysis was performed in order to build a factorial plan able to highlight latent structures of relationship between the data; the following variables were selected as active variables to define factorial axes: - Aims; - Population target; - Number of diseases. 12

13 On the once built factorial plan, were projected the other variables considered in the statistical model as supplementary ones: - Data providers; - Data sharing; - Data collected; - Disease coding system. Figure 1 reports the spatial plan defined by the first two factorial axes. The inertia explained by the first axis, according to the correction of Benzecrì[10], is equal to 56.44%, while the second axis explains an inertia of 41.79%, determining therefore, a total variability explained by the plan of 98.23%. The variables related to the objectives of the Registries which provide the greatest contribution to the definition of the first factorial axis are: "Treatment evaluation", "Treatment Monitoring", Social planning", "Healthcare service planning", "Disease surveillance" and "Natural History of Disease". Therefore, the factorial axis could be interpreted as a measure of the monitoring and evaluation activity. In the upper part of the second axis Is reported the contribution of the "Epidemiological Research", while in the lower part, the contribution of the "Clinical Research", "Genetic Research" and "Natural History of Disease. The axis orientation could be interpreted as follow: downward the research on disease and upward the research on population. In fact, the "Population-based" mode is located at the top along the second axis, but it is not associated with the dynamic of monitoring explained by the first axis; the "Case based" mode is located in the diametrically opposite part; the "Hospital-based" mode is located downward but is also associated with the first factorial axis. The mode "All/Several diseases" is located upward along the second axis, while One disease" and "A group of diseases are located along the second axis with a greater contribution of the latter mode. In the upper part of the plan, Epidemiological research, "Disease surveillance", "Healthcare service planning" and Social planning", being placed on the same direction, are correlated. In the lower part of the plan a correlation between "Clinical Research" and "Genetic Research" is shown, as well as between "Treatment evaluation" and "Treatment Monitoring". 13

14 Figure 1. Factorial Plan determined by the active variables all/several RDs Health service planning Social planning Population-based Epidemiological research Disease suveillance Case-based one RD a group of RDs Hospital-based Clinical research Genopheno/Mutation History of disease Treatment evaluation Treatment monitoring Figure 2 shows the spatial plan where the collected information related to the Data providers and to the Data sharing are projected. It must be noted that data from routine informative system (mortality, discharge and other registries) are located in the first quadrant of the factorial plan, the one oriented to the monitoring in the public health field. The variable "Laboratories/central services" is also located in the first quadrant, but it s moved towards the origin of the axes which represents the center gravity. "Clinical Units" and, more clearly "Centres of Expertise", tend to be strongly associated with the first factor but they do not discriminate with respect to the second; the variables "Patients family, Patients' groups" and "Genetic Units" lie in the direction of clinical and genetic research. The sharing with the Other Registers is located in the upper part of the plan while the one with the Biobanks is located in the lower part. Figure 3 shows the Data collected and Disease coding system projections on the factorial plan. In the first quadrant there are "Anagraphical data" mode and "Socio-demographic information" mode, while in the lower quadrant are located "Genetic data", "Family history", "Clinical research participation and biospecimen donation", "Anthropometric information", "Medications, devices and health services" and "Clinical data"; "Diagnosis" is confirmed to be a not discriminant variable. "Birth and reproductive history", "Anthropometric information", "Clinical research participation and biospecimen donation" are data that tend to be mostly collected by the Registries that deal with monitoring and evaluation in a clinical field. 14

15 The use of the ICD code, located in the upper quadrant, highlights an association with the Epidemiological Research and with the Population-based ; in the same quadrant there is even the ORPHA code while in the opposite quadrant, associated with the clinical and genetic research, is represented the MIM code use and the variable No coding system is used. Figure 2. Projection of the Data providers and of the Data sharing on factorial plan mortality register all/several RDs other register Health service planning Social planning Case-based one RD genetic units Population-based discharge register share other register laboratories Epidemiological research clinic units centre expert patient family share biobank a group of RDs Hospital-based patient group History of disease Clinical research Genopheno/Mutation Disease suveillance Treatment evaluation Treatment monitoring 15

16 Figure 3. Projection of the Data collected and of the Disease coding system on factorial plan all/several RDs Health service planning Social planning no code MIM code Case-based one RD Population-based ORPHA code Epidemiological research anagraphic socio demo genetic ICD code Disease suveillance diagnosis own code birth reprod clinical medic, health serv a group of RDs anthropometric Hospital-based History of disease family history biospecimen Clinical research Genopheno/Mutation Treatment evaluation Treatment monitoring D. Conclusions The Analysis of Survey Data focused on the specific objectives of WP6, allowed to obtain useful and important informations on the features of the existing Registries that could assist the analysis process for the definition of the common dataset. Particularly, it was found that the Registries show a tendency to separation of the research lines and this divergence consequently influences all the informative heritage. The divergence emerged between Registries pursuing an epidemiological research and those which pursue a clinical research, clearly shows two well different types of Registries. On the one hand, there are Registries, potentially population-based, whose main target is the "Epidemiological Research"; these Registries deal with the surveillance of diseases and the social relapse, interface themselves with other information systems, use the ICD system, collect personal data. On the other hand, there are Registries, potentially case-based and/or hospital-based, whose main goal is the Clinical Research ; these Registries are pursuing goals on genetic aspects, on assessment and on monitoring of treatments, for which the bulk of information is strictly based on genetic and clinical data, and, obviously, on data concerning the diagnosis. At a glance, there are Registries that we could define as population-oriented, that potentially pursue public health objectives, and Registries disease-oriented with clinical-genetic research objectives. 16

17 It s worth of note that these registries showed also several common traits, which should be deepen in the perspective of identifying a common data set. Such differences may be indicative of possible limitations in orienting the global research activities that should characterize a Registry of Rare Diseases that willing to be a useful tool for public health. This kind of research must necessarily use both epidemiological and clinical data, and at the same time should represent the basement for the development of specilyzed registries. In this perspective, the identification of a common dataset assumes a strategic relevance for collecting consistent data able to develop a solid and flexible platform for research and public health activities. E. References 1. Nadkarni PM, Brandt CA (2006) The common data elements for cancer research: remarks on functions and structure. Methods Inf. Med. 45(6): Carter J, Evans J., Tuttle M., Weida T, White T, Harvell J. Shipley S (2006). Making the minimum data set compliant with health information technology standards. Excecutive summary. U.S. Department of Health and human Services, Accessed: 2nd September Richesson RL, Krischer JP (2007) Data standard in clinical research: gaps, overlaps, challenges and future directions. J. Am. Med. Inform. Assoc. 14(6): AHRQ (2010) Registries for evaluating patient outcomes: a user s guide. In: Glicklich RE, Dreyer N (eds) Agency for Healthcare Research and Quality, Rockville, MD. 5. McCullagh, P, Nelder, JA (1989), Generalized Linear Models, Second Edition, London: Chapman and Hall. 6. Woodward M (2005) Epidemiology: Study Design and Data Analysis, Second Edition, New York: Chapman & Hall/CRC 7. Benzécri, JP (1973), L Analyse des Données: T. 2, l Analyse des Correspondances, Paris: Dunod. 8. Greenacre, MJ (1984), Theory and Applications of Correspondence Analysis, London: Academic Press. 9. Greenacre, MJ (1994), Multiple and Joint Correspondence Analysis, in: MJ Greenacre and J Blasius, (eds), Correspondence Analysis in the Social Sciences, London: Academic Press. 10. Benzécri, JP (1979), Sur le Calcul des taux d inertie dans l analyse d un questionaire, Addendum et erratum á [BIN.MULT.], Cahiers de l Analyse des Données 4,

18 II. Part 2: Cluster analysis A. Introduction The objective of WP6 concerns the development of a proposal of a common dataset applied to the Registries on Rare Diseases exploring a bottom-up approach. The data collected by a questionnaire sent to 220 Registries operating in different European countries, represent a huge amount of information, crucial to understand the characteristics and weaknesses that distinguish the active Registries in the field of rare diseases. The analysis of such data is used to generate relevant information to support the definition of a common dataset. An initial analysis of the Survey, focused to the specific objective of WP6, has already provided important information on the actual differentiation of the Registries on Rare Diseases, which reflect the different objectives. The previous analysis (see WP6 Interim Technical Report) has allowed to identify different patterns. In particular, a multivariate analysis was conducted with the aim to identify relations between variables that define the objectives and other characteristic elements of the Registries. Multivariate analysis, carried out through the use of the technique of Multiple Correspondences, allowed the identification of relations among groups of variables that showed two macro-groups of Registries ( there are Registries that we could define as population-oriented, that potentially pursue public health objectives, and Registries disease-oriented with clinical-genetic research objectives ). We performed further analysis on the database provided by the Survey with the aim of obtaining useful information to understand the profile of the Registries and to define more accurately the common information needs (common dataset). A cluster analysis was done in order to identify groups of Registries with common traits and characteristics. Underlining that in this second analytical approach has been moved the point of observation: from variables (Correspondences Analysis) to units of observation, the Registries (Cluster Analysis). B. Methods Cluster Analysis is a set of statistical techniques that allows, by iterative processes, to identify groups of similar observations with respect to specific characteristics. We performed a Cluster Analysis using a hierarchical model, which is a computational process that integrates, by progressive steps, observations more "close" and similar to each other, starting from all of observations until you get to a single group. Priority was given to the Aims declared by the Registries, as the objectives of a registry represent the main 18

19 feature associated with the information need. We carried out additional analysis that take in consideration as explanatory variables, in addition to the Aims, also other variables collected by the questionnaire: Number of diseases, Population target, Geographical Coverage. The results of these additional analysis did not provide significative patterns of grouping, or they provided a such large number of clusters which does not allow a clear interpretation. This result is of great importance and it means that the result obtained with a variable or several variables is quite similar for the identification of a small number of clusters which represents an indirect measure of solid consistency of the result achieved. The Cluster Analysis performed in our study, was carried out on the following Aims declared by the Registries: Epidemiological Research Clinical Research Natural history of disease Disease surveillance Genotype-phenotype correlation + Mutation database Healthcare service planning Social planning Treatment evaluation (efficacy) Treatment monitoring (safety) For the interpretation of Clusters identified by statistical analysis, we analyzed the distribution of Aims in each Cluster and evaluated the deviation of the frequency respect to an expected value. The expected value for each Aim was estimated applying to each cluster the overall percentage calculated on all Registries ( i.e.: the percentage of the Aim Clinical Research in all Registries is 61.2.; this value was used to calculate the Expected value of Clinical Research for each Cluster). The deviation of the value observed respect to the expected value is expressed as percentage Deviation (%Deviation=100*(Number Observed- Number Expected)/Number Expected). To validate the consistency of the interpretation of clusters and produce additional information on the characterization of the different types of Registries, we estimated the distribution and the percentage deviation also on the following questions: Target Population, Number of diseases, Geographical Coverage, Collected Data, Data Providers and Services expected by EU platform. 19

20 C. Results The iterative process of calculation concerning the clustering can be viewed by the Dendrogram (Figure 1), which puts in evidence three major clusters. Also the statistical tests (cubic clustering criteria and Pseudo F test), aimed at a proper definition of the number of clusters, confirm the identification of 3 clusters (Figure 2). The number of Registries present in each cluster (Table 1) is quite balanced: the first group, named Cluster 1 includes 52 Registries (23.7%), 86 Registries are in the Cluster 2 (39.3%) and 81 in the Cluster 3 (37.0%). Table 1. Composition of the Clusters Cluster Number % Cluster ,7 Cluster ,3 Cluster ,0 Totale 219* *1 Registry not analysed: the Aims reported as missing Figure 1. Dendrogram of Cluster Analysis 20

21 Figure 2. Criteria for the definition of number of Clusters: Cubic Clustering e Pseudo F We analyzed the distribution of the Aims in each group to facilitate the interpretation of the identified Clusters. In Table 2 are shown, for each of the three clusters, the distribution of Aim and the percentage deviation (see Methods) compared to the expected value. In the Cluster 1, 90.4% of Registries perform Epidemiological Research, 75.0% pursue the objectives of Disease Surveillance and 63.5% Healthcare Service Planning. In the Cluster 1 the expected number of Registries with the Aim of Epidemiological Research is 37, whereas the observed number is 47, corresponding to +28%. The Cluster 1 shows a positive deviation also for: Disease Surveillance (+35%), Healthcare Service planning (+88%) and Social Planning (+70%); instead it shows negative deviations for: Clinical research (-65%), Natural History of disease (-81%) and Mutation database or Genotype-phenotype correlation (-93%). For the same Aims, the Cluster 2 shows a contrary trend with a positive deviation for: Clinical research (+14%), Natural History of disease (+9%) and Mutation or Genotype-phenotype correlation database (+38%); negative deviations are highlighted instead for: Epidemiological research (-21%), Disease Surveillance (-54%), Healthcare Service planning (-79%), Social Planning (-88%). Both Cluster exhibit a lower percentage for Treatment evaluation (Cluster 1: -60%; Cluster 2: -86%) and for Treatment Monitoring (Cluster 1: -83%, Cluster 2: -86%). The Cluster 3 showed an higher percentage than expected for all the Aims, but especially for Aims relating to the Treatment: 98.8% (+130% compared to the expected value) of the Registries enclosed in the Cluster 3 declares to make Treatment evaluation and 81. 5% (+144%) Treatment Monitoring. 21

22 Table 2. Number and percentage of Registries observed, Number of Registries Expected, Percentage Deviation from Expected, by Cluster and Aim Aim Cluster 1 (n=52) Cluster 2 (n=86) Cluster 3 (n=81) N (%) Exp Dev N (%) Exp Dev N (%) Exp Dev Clinical research 11 (21.2) 32-65% 60 (69.8) % 63 (77.8) % Disease surveillance 39 (75.0) % 22 (25.6) 48-54% 61 (75.3) % Epidemiological research 47 (90.4) % 48 (55.8) 61-21% 60 (74.1) 57 +5% Genotype-phenotype /mutation database 2 (3.8) 30-93% 69 (80.2) % 56 (69.1) % Healthcare services planning 33 (63.5) % 6 (7.0) 29-79% 35 (43.2) % Natural history of disease 6 (11.5) 32-81% 57 (66.3) 52 +9% 70 (86.4) % Social planning 17 (32.7) % 2 (2.3) 16-88% 23 (28.4) % Treatment evaluation 9 (17.3) 22-60% 5 (5.8) 37-86% 80 (98.8) % Treatment monitoring 3 (5.8) 17-83% 4 (4.7) 29-86% 66 (81.5) % N= Number of Registries observed in the Cluster Exp = Number of Registries expected in the Cluster Dev = Percentage Deviation of Observed value from Expected value (see Methods) The interpretation of these results seem to be quite clear for the first two Clusters: the Cluster 1 is characterized by a type of Registry which pursues Aims relating to the activities of Public Health; the Cluster 2 identifies a type of Registry more oriented in Clinical and Genetic Research. The interpretation of Cluster 3 is more complex: it seems to include Registries based on research for the assessment and the monitoring of the Treatment. Regarding the Cluster 3 we observed a higher percentage also of all the other Aims which could be explained as a bias due by a declaration of multiple objectives of the Registries. Basically, the results of the Cluster Analysis are in agreement with the findings obtained by the previous Multiple Correspondences analysis, which aimed to search the correlations among the variables. The factorial plan determined by statistical method of Multiple Correspondences (Figure 3) identified clearly the association among the 3 groups of Aims, as also confirmed by the Cluster Analysis. 22

23 Figure 3. Factorial Plan by Analysis of Multiple Correspondence The Cluster analysis indicated the presence of three macro-types of Registries with Aims which show a tendency to differentiation mainly for the first 2 types. The joint interpretation of the results obtained by Cluster Analysis and Multiple Correspondences Analysis suggest to name the three cluster of Registries as: Cluster 1: Public Health Registries Cluster 2: Clinical and Genetic Research Registries Cluster 3: Treatment Registries. On the basis of this interpretation we analyzed the distribution of other variables collected by the Survey within the three types of Registries identified. The results are expressed in terms of the percentage distribution within the group, and as a percentage deviation from the expected value calculated in the same way of the question Aims. The results are reported for the questions: Target Population, Number of diseases, Geographical Coverage, Collected Data, Data Providers and Services expected by EU platform. 1. Target Population The 78.8% of the Registries belonging to the group "Public health" is Population-based, whereas only a small number is Case-based (5.8%) and Hospital-based (15.4%). Even the Registries belonging to the other two types claim to be prevalent Population-based, but with values lower than expected. Registries "Clinical- 23

24 Genetic Research" showed a greater tendency to be case-based (+30% compared to the expected), whereas the "Treatment" to be Hospital-based (+26%). This result is consistent with the interpretation given to the Cluster. Table 3. Number and percentage of Registries observed, Number of Registries Expected, Percentage Deviation from Expected, by Cluster and Population target Population target Public Health Clinical-Genetic Research Treatment N (%) Exp Dev N (%) Exp Dev N (%) Exp Dev Case based 3 (5.8) 10-70% 21 (24.7) % 17 (21.5) % Hospital based 8 (15.4) 13-36% 20 (23.5) 20-2% 24 (30.4) % Population based 41 (78.8) % 44 (51.8) 48-9% 38 (48.1) 45-16% 2. Number of diseases The Registries "Public Health", in contrast to the other two types, show a greater tendency to cover all diseases (+73% compared to the expected value). The 47.7% of Registries "Clinical-Genetic Research" collect data on a group of diseases, the 34.9% collect data on a single disease and the 17.4% cover all diseases. Such distribution reflects the expected distribution. The Registries "Treatment" deal mainly with a group of rare diseases (49.4%), or one rare disease (38.3%, with a value of 13% above the expected). Table 4. Number and percentage of Registries observed, Number of Registries Expected, Percentage Deviation from Expected, by Cluster and Number of diseases Number of diseases Public Health Clinical-Genetic Research Treatment N (%) Exp Dev N (%) Exp Dev N (%) Exp Dev A group / several 21 (41.2) 24-12% 41 (47.7) 40 +2% 40 (49.4) 38 +6% All 17 (33.3) % 15 (17.4) 17-9% 10 (12.3) 16-36% Just one 13 (25.5) 17-25% 30 (34.9) 29 +3% 31 (38.3) % 3. Geographical Coverage The majority of the Registries has a national coverage in all the 3 typologies with a different distribution: although 50% of the Registries "Public Health" have national coverage, the value is 19% less than the expected; the Registries "Clinical-Genetic Research" show a frequency of national coverage equal to the expected; the Registries "Treatment" exhibit a higher value than expected (+12%). The 22 Registries "Public Health", compared to 9 Registries expected (+148%), have a regional coverage, and this deviation is 24

25 reversed for the other two types ("Clinical-Genetic Research" -58%, "Treatment" -35%). The international coverage is provided by a small number of Registries "Public Health" (-79% compared to expected), whereas the "Clinical-Genetic Research" show a positive deviation (+52%). Table 5. Number and percentage of Registries observed, Number of Registries Expected, Percentage Deviation from Expected, by Cluster and Geographical coverage Geographical Coverage Public Health Clinical-Genetic Research Treatment N (%) Exp Dev N (%) Exp Dev N (%) Exp Dev International 2 (3.8) 9-79% 23 (27.4) % 14 (17.3) 15-4% Local 2 (3.8) 2 +19% 3 (3.6) 3 +11% 2 (2.5) 3-23% National 26 (50.0) 32-19% 52 (61.9) 52 0% 56 (69.1) % Regional 22 (42.3) % 6 (7.1) 14-58% 9 (11.1) 14-35% 4. Data collected Diagnosis is an information collected by almost all of the Registries (95%), so there are no substantial differences among the three types of Registries. "Public Health" Registries tend to collect more anagraphic data (+26% compared to the expected) and socio-demographic data (+28%), whereas other information are collected with values below the expected. The 88.4% of Registries "Clinical-Genetic Research" collect clinical data and 8.0% genetic data; these Registries show also a higher frequency than that expected for the collection of data on Family history (+17%) and Patient's preferences for communication (+27%). The 97.5% of Registries "Treatment" collect Clinical data and show a positive percentage deviation for all types of information, except for the anagraphic data; in particular the highest values are highlighted for: anthropometric info (+68%), Clinic research participation and biospecimen donation (+39%), Birth and reproductive history (+49%), Family history (+22%), Medications devices and health services (+37%), Patient-reported outcomes (+65% ). 25

26 Table 6. Number and percentage of Registries observed, Number of Registries Expected, Percentage Deviation from Expected, by Cluster and Data collected Data collected Public Health Clinical-Genetic Research Treatment N (%) Exp Dev N (%) Exp Dev N (%) Exp Dev Anagraphic 21 (40.4) % 26 (30.2) 27-5% 23 (28.4) 26-11% Anthropometric info 8 (15.4) 17-53% 19 (22.1) 28-32% 44 (54.3) % Clinic research participation and biospecimen donation 6 (11.5) 16-62% 26 (30.2) 26 0% 34 (42.0) % Birth and reproductive history 9 (17.3) 16-43% 21 (24.4) 26-20% 37 (45.7) % Clinical data 35 (67.3) 45-22% 76 (88.4) 75 +2% 79 (97.5) % Diagnosis 52 (100) 49 +5% 77 (89.5) 82-6% 79 (97.5) 77 +3% Family history 11 (21.2) 28-61% 55 (64.0) % 54 (66.7) % Genetic data 21 (40.4) 38-44% 74 (86.0) % 63 (77.8) 58 +8% Medications, devices and health services Patient's preferences for communication 22 (42.3) 32-31% 44 (51.2) 53-16% 68 (84.0) % 3 (5.8) 7-55% 14 (16.3) % 11 (13.6) 10 +6% Patient-reported outcomes 9 (17.3) 18-51% 21 (24.4) 30-31% 47 (58.0) % Socio demographic info 32 (61.5) % 26 (30.2) 41-37% 47 (58.0) % 5. Data providers The Clinical Units are the most providers for the three types of Registries. "Public health" Registries show a tendency to the use of Health Information Systems (Hospital databases, Mortality and other Registries), whereas it is limited the use of information from the Clinical Genetic Units (-34%), from the patients and their families (-29%) and patients organisations (-30%). In contrast, Registries "Clinical-Genetic Research" tend to use as data sources the context of the patients (Patients and families +15%, Patients' groups +16%), and the Clinical Genetic Units (+14%), while the Health Information Systems are not used. Most data providers for the Registries "Treatment" are: the Centers of Expertise (+18%), Clinical units (+11%), Clinical genetic units (+8%) and Hospital databases (+12%). 26

27 Table 7. Number and percentage of Registries observed, Number of Registries Expected, Percentage Deviation from Expected, by Cluster and Data provider Data provider Public Health Clinical-Genetic Research Treatment N (%) Exp Dev N (%) Exp Dev N (%) Exp Dev Centres of expertise 13 (25.0) 16-19% 25 (29.1) 26-5% 29 (36.3) % Clinical genetic units 15 (28.8) 23-34% 43 (50.0) % 38 (47.5) 35 +8% Clinical units 43 (82.7) 43-1% 65 (75.6) 72-9% 74 (92.5) % Hospital databases 23 (44.2) % 17 (19.8) 27-37% 28 (35.0) % Laboratories/central services 26 (50.0) % 32 (37.2) 37-15% 37 (46.3) 35 +6% Mortality registers 17 (32.7) % 1 (1.2) 11-91% 10 (12.5) 10-3% Other registers 14 (26.9) % 3 (3.5) 10-70% 8 (10.0) 9-13% Patients and families 18 (34.6) 25-29% 48 (55.8) % 40 (50.0) 39 +3% Patients' groups 8 (15.4) 11-30% 22 (25.6) % 18 (22.5) 18 +2% 6. Expected Services by EU Platform The main expected services by the Platform for Registries "Public health" is the Quality control system (66.0%), whereas for the Registries "Clinical-Genetic Research" and "Treatment" are the IT tools. In relation to their epidemiological function, "Public health" Registries have a higher expectation on services for the Facilitated Access to data sources (+11%). Table 8. Number and percentage of Registries observed, Number of Registries Expected, Percentage Deviation from Expected, by Cluster and Expected service by EU Platform Expected service by EU Platform Public Health Clinical-Genetic Research Treatment N (%) Exp Dev N (%) Exp Dev N (%) Exp Dev Expert technical advice 15 (31.9) 18-18% 28 (35.4) 31-9% 33 (47.8) % Facilitated access to data sources 23 (48.9) % 22 (27.8) 35-37% 41 (59.4) % IT tools 27 (57.4) 32-16% 56 (70.9) 54 +3% 51 (73.9) 47 +8% Legal advice 21 (44.7) 23-7% 37 (46.8) 38-3% 36 (52.2) 33 +8% Model documents 21 (44.7) 22-5% 47 (59.5) % 24 (34.8) 33-26% Quality control systems. 31 (66.0) % 37 (46.8) 45-17% 42 (60.9) 39 +8% Tools for networkig among partners and Registries 24 (51.1) 27-10% 45 (57.0) 45 0% 42 (60.9) 39 +7% 27

28 Registries "Clinical-Genetic Research" declare an higher expectation on services related to Model documents (+26%). Registries "Treatment" express an higher expectation on services for the Facilitated access to data sources (+35%) and Expert technical advice (+23%). D. Conclusions Cluster analysis identified three main typologies of Registries with Aims which showed a clear pattern of differentiation in particular for Cluster 1 and 2. By the analysis of the distribution of the Aim and the percentage deviation from the expected values, it was possible to define the three types of Registries named as: Registries for "Public health", Registries "Clinical-Genetic Research" and Registries "Treatment". Their distribution, related to the number of Registries in the Survey, exhibited a lower presence of Registries "Public health" (23.7%) compared to Registries "Clinical-Genetic Research" (39.3%) and "Treatment" (37.0%). The analysis of the questions in Survey have allowed to verify a framework rather consistent with the definition of the Clusters and they provided a huge amount of useful information for the characterization of the different typologies of Registries. Registries "Public health" pursue aims of Epidemiological Research, Social Planning, Healthcare Services Planning and Disease Surveillance; they show a greater tendency than other types of Registries to be population-based, to collect information on all diseases and to have a regional coverage closer to the target of health policy; consistently with their epidemiological function, they tend to collect information like as anagraphical and socio-demographic data, using more Health Information Systems as a data source; the expectations of these types of Registries on the services of the Platform were addressed to Quality control system and Facilitated access to data sources. The Registries "Clinical-Genetic Research" follow more aims regarding clinical and genetic research; they are mostly population-based, but show a tendency to be structured as case-based; they usually collect data on one or groups of diseases, they have a national geographical coverage with a tendency to have also an international coverage; they collect mainly information on clinical and genetic data, family history and patient's preferences for communication; the most data providers are clinical and genetic units, patients and their families and patients organizations; the main expectations from the services of the Platform are focused on IT tools and Model documents. The Registries "Treatment" pursue mainly aims relating to the Treatment Evaluation and Treatment Monitoring; they have a higher tendency to be hospital-based and focused on one disease; the geographical coverage is usually national; they collect different types of data mainly concerning: Clinical data, Clinic research participation and biospecimen donation, anthropometric info, Birth and reproductive 28

29 history, Family history, Medications devices and health services and Patient-reported outcomes; the most important data providers are the Clinical Units and the Centres of Expertise and the major expectations from the services of the Platform are addressed on IT Tools with a relevant interest for Facilitated access to data sources and expert technical advice. The statistical analysis allowed to explore the complex and fragmented framework on Registries of rare diseases highlighting structures with differentiated and interrelated profiles. These results represent an useful source of information to develop an oriented planning which can facilitate the interoperability and interconnection of Registries in accordance with the different profiles identified. Such a trait appears fundamental in the process for the rare registries platform construction. E. References 1. Anderberg M.R. (1973), Cluster Analysis for applications, New York: Academic press, Inc. 2. EPIRARE Work Package 6 (2012), Statistical Analysis of the EPIRARE survey database, 3. Interim Technical Report Appendix1 4. EPIRARE Work Package 8 (2012), Developing a European Platform for Rare Disease Registries, Draft 08/11/ EPIRARE (2012), EPIRARE Survey, 6. Everitt B.S. (1980), Cluster Analysis, 2nd Edition, London: Heineman Educational Books Ltd. 7. Fabbris L. (1983), Analisi esplorativa di dati multidimensionali, Cleup Editore 8. Fabbris L. (1997), Statistica multivariata, Milano: McGraw-Hill Libri Italia. 9. Milligan, G.W. and Cooper, M.C. (1985), An Examination of Procedures for Determining the Number of Clusters in a Data Set, Psychometrika, 50, SAS Institute Inc (2009), SASOnlineDoc9.2., Cary, NC: SAS Institute INC 11. SAS Institute Inc (1999), SAS/STAT User s Guide, Version 8, Cary, NC: SAS Institute INC 12. Sarle, W.S. (1983), Cubic Clustering Criterion, SAS Technical Report A-108, Cary, NC: SAS Institute Inc. 29

Data mining on the EPIRARE survey data

Data mining on the EPIRARE survey data Deliverable D1.5 Data mining on the EPIRARE survey data A. Coi 1, M. Santoro 2, M. Lipucci 2, A.M. Bianucci 1, F. Bianchi 2 1 Department of Pharmacy, Unit of Research of Bioinformatic and Computational

More information

Clinicians and patients needs and expectations from registries

Clinicians and patients needs and expectations from registries Clinicians and patients needs and expectations from registries Luciano Vittozzi cnmr.eu@iss.it National Centre Rare Diseases National Institute for Health Rome Italy EPIRARE is a project co-funded by the

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

Department of Behavioral Sciences and Health Education

Department of Behavioral Sciences and Health Education ROLLINS SCHOOL OF PUBLIC HEALTH OF EMORY UNIVERSITY Core Competencies Upon graduation, a student with an MPH/MSPH should be able to: Use analytic reasoning and quantitative methods to address questions

More information

Multivariate Analysis of Ecological Data

Multivariate Analysis of Ecological Data Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology

More information

Early mortality rate (EMR) in Acute Myeloid Leukemia (AML)

Early mortality rate (EMR) in Acute Myeloid Leukemia (AML) Early mortality rate (EMR) in Acute Myeloid Leukemia (AML) George Yaghmour, MD Hematology Oncology Fellow PGY5 UTHSC/West cancer Center, Memphis, TN May,1st,2015 Off-Label Use Disclosure(s) I do not intend

More information

CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19

CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19 PREFACE xi 1 INTRODUCTION 1 1.1 Overview 1 1.2 Definition 1 1.3 Preparation 2 1.3.1 Overview 2 1.3.2 Accessing Tabular Data 3 1.3.3 Accessing Unstructured Data 3 1.3.4 Understanding the Variables and Observations

More information

RuSH: Sickle Cell Surveillance and Registry Program

RuSH: Sickle Cell Surveillance and Registry Program RuSH: Sickle Cell Surveillance and Registry Program CDR Althea M Grant, PhD Chief, Epidemiology and Surveillance Branch Division of Blood Disorders National Center for Birth Defects and Developmental Disabilities

More information

EURORDIS-NORD-CORD Joint Declaration of. 10 Key Principles for. Rare Disease Patient Registries

EURORDIS-NORD-CORD Joint Declaration of. 10 Key Principles for. Rare Disease Patient Registries EURORDIS-NORD-CORD Joint Declaration of 10 Key Principles for Rare Disease Patient Registries 1. Patient Registries should be recognised as a global priority in the field of Rare Diseases. 2. Rare Disease

More information

Abbas S. Tavakoli, DrPH, MPH, ME 1 ; Nikki R. Wooten, PhD, LISW-CP 2,3, Jordan Brittingham, MSPH 4

Abbas S. Tavakoli, DrPH, MPH, ME 1 ; Nikki R. Wooten, PhD, LISW-CP 2,3, Jordan Brittingham, MSPH 4 1 Paper 1680-2016 Using GENMOD to Analyze Correlated Data on Military System Beneficiaries Receiving Inpatient Behavioral Care in South Carolina Care Systems Abbas S. Tavakoli, DrPH, MPH, ME 1 ; Nikki

More information

PEER REVIEW HISTORY ARTICLE DETAILS VERSION 1 - REVIEW. Elizabeth Comino Centre fo Primary Health Care and Equity 12-Aug-2015

PEER REVIEW HISTORY ARTICLE DETAILS VERSION 1 - REVIEW. Elizabeth Comino Centre fo Primary Health Care and Equity 12-Aug-2015 PEER REVIEW HISTORY BMJ Open publishes all reviews undertaken for accepted manuscripts. Reviewers are asked to complete a checklist review form (http://bmjopen.bmj.com/site/about/resources/checklist.pdf)

More information

chapter 5. Quality control at the population-based cancer registry

chapter 5. Quality control at the population-based cancer registry chapter 5. Quality control at the population-based cancer registry All cancer registries should be able to give some objective indication of the quality of the data that they have collected. The methods

More information

SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

More information

Segmentation: Foundation of Marketing Strategy

Segmentation: Foundation of Marketing Strategy Gelb Consulting Group, Inc. 1011 Highway 6 South P + 281.759.3600 Suite 120 F + 281.759.3607 Houston, Texas 77077 www.gelbconsulting.com An Endeavor Management Company Overview One purpose of marketing

More information

How to report the percentage of explained common variance in exploratory factor analysis

How to report the percentage of explained common variance in exploratory factor analysis UNIVERSITAT ROVIRA I VIRGILI How to report the percentage of explained common variance in exploratory factor analysis Tarragona 2013 Please reference this document as: Lorenzo-Seva, U. (2013). How to report

More information

Statistical Rules of Thumb

Statistical Rules of Thumb Statistical Rules of Thumb Second Edition Gerald van Belle University of Washington Department of Biostatistics and Department of Environmental and Occupational Health Sciences Seattle, WA WILEY AJOHN

More information

Application of discriminant analysis to predict the class of degree for graduating students in a university system

Application of discriminant analysis to predict the class of degree for graduating students in a university system International Journal of Physical Sciences Vol. 4 (), pp. 06-0, January, 009 Available online at http://www.academicjournals.org/ijps ISSN 99-950 009 Academic Journals Full Length Research Paper Application

More information

Simple Linear Regression Inference

Simple Linear Regression Inference Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

More information

A PROSPECTIVE EVALUATION OF THE RELATIONSHIP BETWEEN REASONS FOR DRINKING AND DSM-IV ALCOHOL-USE DISORDERS

A PROSPECTIVE EVALUATION OF THE RELATIONSHIP BETWEEN REASONS FOR DRINKING AND DSM-IV ALCOHOL-USE DISORDERS Pergamon Addictive Behaviors, Vol. 23, No. 1, pp. 41 46, 1998 Copyright 1998 Elsevier Science Ltd Printed in the USA. All rights reserved 0306-4603/98 $19.00.00 PII S0306-4603(97)00015-4 A PROSPECTIVE

More information

Competency 1 Describe the role of epidemiology in public health

Competency 1 Describe the role of epidemiology in public health The Northwest Center for Public Health Practice (NWCPHP) has developed competency-based epidemiology training materials for public health professionals in practice. Epidemiology is broadly accepted as

More information

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm

More information

Data mining and official statistics

Data mining and official statistics Quinta Conferenza Nazionale di Statistica Data mining and official statistics Gilbert Saporta président de la Société française de statistique 5@ S Roma 15, 16, 17 novembre 2000 Palazzo dei Congressi Piazzale

More information

Understanding Characteristics of Caravan Insurance Policy Buyer

Understanding Characteristics of Caravan Insurance Policy Buyer Understanding Characteristics of Caravan Insurance Policy Buyer May 10, 2007 Group 5 Chih Hau Huang Masami Mabuchi Muthita Songchitruksa Nopakoon Visitrattakul Executive Summary This report is intended

More information

Simple Predictive Analytics Curtis Seare

Simple Predictive Analytics Curtis Seare Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use

More information

Glossary Monitoring and Evaluation Terms

Glossary Monitoring and Evaluation Terms Glossary Monitoring and Evaluation Terms This glossary includes terms typically used in the area of monitoring and evaluation (M&E) and provides the basis for facilitating a common understanding of M&E.

More information

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I BNG 202 Biomechanics Lab Descriptive statistics and probability distributions I Overview The overall goal of this short course in statistics is to provide an introduction to descriptive and inferential

More information

Exploratory Factor Analysis Brian Habing - University of South Carolina - October 15, 2003

Exploratory Factor Analysis Brian Habing - University of South Carolina - October 15, 2003 Exploratory Factor Analysis Brian Habing - University of South Carolina - October 15, 2003 FA is not worth the time necessary to understand it and carry it out. -Hills, 1977 Factor analysis should not

More information

Marketing Research Core Body Knowledge (MRCBOK ) Learning Objectives

Marketing Research Core Body Knowledge (MRCBOK ) Learning Objectives Fulfilling the core market research educational needs of individuals and companies worldwide Presented through a unique partnership between How to Contact Us: Phone: +1-706-542-3537 or 1-800-811-6640 (USA

More information

Appendix 6.2 Data Source Described in Detail Hospital Data Sets

Appendix 6.2 Data Source Described in Detail Hospital Data Sets Appendix 6.2 Data Source Described in Detail Hospital Data Sets Appendix 6.2 Data Source Described in Detail Hospital Data Sets Source or Site Hospital discharge data set Hospital admissions reporting

More information

European registered Clinical Laboratory Geneticist (ErCLG) Core curriculum

European registered Clinical Laboratory Geneticist (ErCLG) Core curriculum (February 2015; updated from paper issued by the European Society of Human Genetics Ad hoc committee for the accreditation of clinical laboratory geneticists, published in February 2012) Speciality Profile

More information

Computer-Aided Multivariate Analysis

Computer-Aided Multivariate Analysis Computer-Aided Multivariate Analysis FOURTH EDITION Abdelmonem Af if i Virginia A. Clark and Susanne May CHAPMAN & HALL/CRC A CRC Press Company Boca Raton London New York Washington, D.C Contents Preface

More information

Business Intelligence. Data Mining and Optimization for Decision Making

Business Intelligence. Data Mining and Optimization for Decision Making Brochure More information from http://www.researchandmarkets.com/reports/2325743/ Business Intelligence. Data Mining and Optimization for Decision Making Description: Business intelligence is a broad category

More information

Chapter 6 Case Ascertainment Methods

Chapter 6 Case Ascertainment Methods Chapter 6 Case Ascertainment Methods Table of Contents 6.1 Introduction...6-1 6.2 Terminology...6-2 6.3 General Surveillance Development...6-4 6.3.1 Plan and Document... 6-4 6.3.2 Identify Data Sources...

More information

Multinomial Logistic Regression

Multinomial Logistic Regression Multinomial Logistic Regression Dr. Jon Starkweather and Dr. Amanda Kay Moske Multinomial logistic regression is used to predict categorical placement in or the probability of category membership on a

More information

Estimating the effect of projected changes in the driving population on collision claim frequency

Estimating the effect of projected changes in the driving population on collision claim frequency Bulletin Vol. 29, No. 8 : September 2012 Estimating the effect of projected changes in the driving population on collision claim frequency Despite having higher claim frequencies than prime age drivers,

More information

The Contextualization of Project Management Practice and Best Practice

The Contextualization of Project Management Practice and Best Practice The Contextualization of Project Management Practice and Best Practice Claude Besner PhD, University of Quebec at Montreal Brian Hobbs PhD, University of Quebec at Montreal Abstract This research aims

More information

COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES.

COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES. 277 CHAPTER VI COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES. This chapter contains a full discussion of customer loyalty comparisons between private and public insurance companies

More information

Teaching Multivariate Analysis to Business-Major Students

Teaching Multivariate Analysis to Business-Major Students Teaching Multivariate Analysis to Business-Major Students Wing-Keung Wong and Teck-Wong Soon - Kent Ridge, Singapore 1. Introduction During the last two or three decades, multivariate statistical analysis

More information

Impact of Diabetes on Treatment Outcomes among Maryland Tuberculosis Cases, 2004-2005. Tania Tang PHASE Symposium May 12, 2007

Impact of Diabetes on Treatment Outcomes among Maryland Tuberculosis Cases, 2004-2005. Tania Tang PHASE Symposium May 12, 2007 Impact of Diabetes on Treatment Outcomes among Maryland Tuberculosis Cases, 2004-2005 Tania Tang PHASE Symposium May 12, 2007 Presentation Outline Background Research Questions Methods Results Discussion

More information

EXPANDING THE EVIDENCE BASE IN OUTCOMES RESEARCH: USING LINKED ELECTRONIC MEDICAL RECORDS (EMR) AND CLAIMS DATA

EXPANDING THE EVIDENCE BASE IN OUTCOMES RESEARCH: USING LINKED ELECTRONIC MEDICAL RECORDS (EMR) AND CLAIMS DATA EXPANDING THE EVIDENCE BASE IN OUTCOMES RESEARCH: USING LINKED ELECTRONIC MEDICAL RECORDS (EMR) AND CLAIMS DATA A CASE STUDY EXAMINING RISK FACTORS AND COSTS OF UNCONTROLLED HYPERTENSION ISPOR 2013 WORKSHOP

More information

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table

More information

Factors affecting online sales

Factors affecting online sales Factors affecting online sales Table of contents Summary... 1 Research questions... 1 The dataset... 2 Descriptive statistics: The exploratory stage... 3 Confidence intervals... 4 Hypothesis tests... 4

More information

University of Michigan Dearborn Graduate Psychology Assessment Program

University of Michigan Dearborn Graduate Psychology Assessment Program University of Michigan Dearborn Graduate Psychology Assessment Program Graduate Clinical Health Psychology Program Goals 1 Psychotherapy Skills Acquisition: To train students in the skills and knowledge

More information

Exploratory Factor Analysis of Demographic Characteristics of Antenatal Clinic Attendees and their Association with HIV Risk

Exploratory Factor Analysis of Demographic Characteristics of Antenatal Clinic Attendees and their Association with HIV Risk Doi:10.5901/mjss.2014.v5n20p303 Abstract Exploratory Factor Analysis of Demographic Characteristics of Antenatal Clinic Attendees and their Association with HIV Risk Wilbert Sibanda Philip D. Pretorius

More information

Handling attrition and non-response in longitudinal data

Handling attrition and non-response in longitudinal data Longitudinal and Life Course Studies 2009 Volume 1 Issue 1 Pp 63-72 Handling attrition and non-response in longitudinal data Harvey Goldstein University of Bristol Correspondence. Professor H. Goldstein

More information

Section Format Day Begin End Building Rm# Instructor. 001 Lecture Tue 6:45 PM 8:40 PM Silver 401 Ballerini

Section Format Day Begin End Building Rm# Instructor. 001 Lecture Tue 6:45 PM 8:40 PM Silver 401 Ballerini NEW YORK UNIVERSITY ROBERT F. WAGNER GRADUATE SCHOOL OF PUBLIC SERVICE Course Syllabus Spring 2016 Statistical Methods for Public, Nonprofit, and Health Management Section Format Day Begin End Building

More information

Module 223 Major A: Concepts, methods and design in Epidemiology

Module 223 Major A: Concepts, methods and design in Epidemiology Module 223 Major A: Concepts, methods and design in Epidemiology Module : 223 UE coordinator Concepts, methods and design in Epidemiology Dates December 15 th to 19 th, 2014 Credits/ECTS UE description

More information

Appendix G STATISTICAL METHODS INFECTIOUS METHODS STATISTICAL ROADMAP. Prepared in Support of: CDC/NCEH Cross Sectional Assessment Study.

Appendix G STATISTICAL METHODS INFECTIOUS METHODS STATISTICAL ROADMAP. Prepared in Support of: CDC/NCEH Cross Sectional Assessment Study. Appendix G STATISTICAL METHODS INFECTIOUS METHODS STATISTICAL ROADMAP Prepared in Support of: CDC/NCEH Cross Sectional Assessment Study Prepared by: Centers for Disease Control and Prevention National

More information

Overview of Factor Analysis

Overview of Factor Analysis Overview of Factor Analysis Jamie DeCoster Department of Psychology University of Alabama 348 Gordon Palmer Hall Box 870348 Tuscaloosa, AL 35487-0348 Phone: (205) 348-4431 Fax: (205) 348-8648 August 1,

More information

Does referral from an emergency department to an. alcohol treatment center reduce subsequent. emergency room visits in patients with alcohol

Does referral from an emergency department to an. alcohol treatment center reduce subsequent. emergency room visits in patients with alcohol Does referral from an emergency department to an alcohol treatment center reduce subsequent emergency room visits in patients with alcohol intoxication? Robert Sapien, MD Department of Emergency Medicine

More information

II. DISTRIBUTIONS distribution normal distribution. standard scores

II. DISTRIBUTIONS distribution normal distribution. standard scores Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

More information

Univariate Regression

Univariate Regression Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is

More information

Developing Risk Adjustment Techniques Using the SAS@ System for Assessing Health Care Quality in the lmsystem@

Developing Risk Adjustment Techniques Using the SAS@ System for Assessing Health Care Quality in the lmsystem@ Developing Risk Adjustment Techniques Using the SAS@ System for Assessing Health Care Quality in the lmsystem@ Yanchun Xu, Andrius Kubilius Joint Commission on Accreditation of Healthcare Organizations,

More information

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012 Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

More information

Environmental Health Science. Brian S. Schwartz, MD, MS

Environmental Health Science. Brian S. Schwartz, MD, MS Environmental Health Science Data Streams Health Data Brian S. Schwartz, MD, MS January 10, 2013 When is a data stream not a data stream? When it is health data. EHR data = PHI of health system Data stream

More information

University College London Staff survey 2013: results presentation

University College London Staff survey 2013: results presentation University College London Staff survey 2013: results presentation Classification: Private Agenda Headline results Employee engagement Key drivers of engagement within UCL Other key themes Summary and next

More information

Crash Outcome Data Evaluation System

Crash Outcome Data Evaluation System Crash Outcome Data Evaluation System HEALTH AND COST OUTCOMES RESULTING FROM TRAUMATIC BRAIN INJURY CAUSED BY NOT WEARING A HELMET, FOR MOTORCYCLE CRASHES IN WISCONSIN, 2011 Wayne Bigelow Center for Health

More information

RARITAN VALLEY COMMUNITY COLLEGE ACADEMIC COURSE OUTLINE MATH 111H STATISTICS II HONORS

RARITAN VALLEY COMMUNITY COLLEGE ACADEMIC COURSE OUTLINE MATH 111H STATISTICS II HONORS RARITAN VALLEY COMMUNITY COLLEGE ACADEMIC COURSE OUTLINE MATH 111H STATISTICS II HONORS I. Basic Course Information A. Course Number and Title: MATH 111H Statistics II Honors B. New or Modified Course:

More information

DISCRIMINANT FUNCTION ANALYSIS (DA)

DISCRIMINANT FUNCTION ANALYSIS (DA) DISCRIMINANT FUNCTION ANALYSIS (DA) John Poulsen and Aaron French Key words: assumptions, further reading, computations, standardized coefficents, structure matrix, tests of signficance Introduction Discriminant

More information

ANALYTIC AND REPORTING GUIDELINES

ANALYTIC AND REPORTING GUIDELINES ANALYTIC AND REPORTING GUIDELINES The National Health and Nutrition Examination Survey (NHANES) Last Update: December, 2005 Last Correction, September, 2006 National Center for Health Statistics Centers

More information

Future Biobanking- Developing Smart, Sustainable And Ethically Compliant Biorepositories Market Research By MarketResearchReports.

Future Biobanking- Developing Smart, Sustainable And Ethically Compliant Biorepositories Market Research By MarketResearchReports. Future Biobanking- Developing Smart, Sustainable And Ethically Compliant Biorepositories Market Research By MarketResearchReports.Biz MarketResearchReports.Biz Recently Announced Research Report And Forecast

More information

Dealing with Missing Data

Dealing with Missing Data Dealing with Missing Data Roch Giorgi email: roch.giorgi@univ-amu.fr UMR 912 SESSTIM, Aix Marseille Université / INSERM / IRD, Marseille, France BioSTIC, APHM, Hôpital Timone, Marseille, France January

More information

HEALTH INSURANCE COVERAGE AND ADVERSE SELECTION

HEALTH INSURANCE COVERAGE AND ADVERSE SELECTION HEALTH INSURANCE COVERAGE AND ADVERSE SELECTION Philippe Lambert, Sergio Perelman, Pierre Pestieau, Jérôme Schoenmaeckers 229-2010 20 Health Insurance Coverage and Adverse Selection Philippe Lambert, Sergio

More information

Access Provided by your local institution at 02/06/13 5:22PM GMT

Access Provided by your local institution at 02/06/13 5:22PM GMT Access Provided by your local institution at 02/06/13 5:22PM GMT brief communication Reducing Disparities in Access to Primary Care and Patient Satisfaction with Care: The Role of Health Centers Leiyu

More information

A revalidation of the SET37 questionnaire for student evaluations of teaching

A revalidation of the SET37 questionnaire for student evaluations of teaching Educational Studies Vol. 35, No. 5, December 2009, 547 552 A revalidation of the SET37 questionnaire for student evaluations of teaching Dimitri Mortelmans and Pieter Spooren* Faculty of Political and

More information

How to Measure Customer Satisfaction in a Public Transport Survey

How to Measure Customer Satisfaction in a Public Transport Survey 2 nd UITP International Marketing Conference 12-14 November 2003, Paris Customer Satisfaction as an Element of Strategic Business Decisions Werner Brög Thomas Kahn Socialdata Institute for Transport and

More information

A Cross-Sectional Study of Asbestos- Related Morbidity and Mortality in Vermonters Residing Near an Asbestos Mine November 3, 2008

A Cross-Sectional Study of Asbestos- Related Morbidity and Mortality in Vermonters Residing Near an Asbestos Mine November 3, 2008 A Cross-Sectional Study of Asbestos- Related Morbidity and Mortality in Vermonters Residing Near an Asbestos Mine 108 Cherry Street, PO Box 70 Burlington, VT 05402 802.863.7200 healthvermont.gov A Cross-Sectional

More information

Basic research methods. Basic research methods. Question: BRM.2. Question: BRM.1

Basic research methods. Basic research methods. Question: BRM.2. Question: BRM.1 BRM.1 The proportion of individuals with a particular disease who die from that condition is called... BRM.2 This study design examines factors that may contribute to a condition by comparing subjects

More information

Chapter 5: Analysis of The National Education Longitudinal Study (NELS:88)

Chapter 5: Analysis of The National Education Longitudinal Study (NELS:88) Chapter 5: Analysis of The National Education Longitudinal Study (NELS:88) Introduction The National Educational Longitudinal Survey (NELS:88) followed students from 8 th grade in 1988 to 10 th grade in

More information

Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing. C. Olivia Rud, VP, Fleet Bank

Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing. C. Olivia Rud, VP, Fleet Bank Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing C. Olivia Rud, VP, Fleet Bank ABSTRACT Data Mining is a new term for the common practice of searching through

More information

Attrition in Online and Campus Degree Programs

Attrition in Online and Campus Degree Programs Attrition in Online and Campus Degree Programs Belinda Patterson East Carolina University pattersonb@ecu.edu Cheryl McFadden East Carolina University mcfaddench@ecu.edu Abstract The purpose of this study

More information

2. Filling Data Gaps, Data validation & Descriptive Statistics

2. Filling Data Gaps, Data validation & Descriptive Statistics 2. Filling Data Gaps, Data validation & Descriptive Statistics Dr. Prasad Modak Background Data collected from field may suffer from these problems Data may contain gaps ( = no readings during this period)

More information

Fairfield Public Schools

Fairfield Public Schools Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

More information

Alex Vidras, David Tysinger. Merkle Inc.

Alex Vidras, David Tysinger. Merkle Inc. Using PROC LOGISTIC, SAS MACROS and ODS Output to evaluate the consistency of independent variables during the development of logistic regression models. An example from the retail banking industry ABSTRACT

More information

Study Plan Master in Public Health ( Non-Thesis Track)

Study Plan Master in Public Health ( Non-Thesis Track) Study Plan Master in Public Health ( Non-Thesis Track) I. General Rules and conditions : 1. This plan conforms to the regulations of the general frame of the Graduate Studies. 2. Specialties allowed to

More information

School of Public Health and Health Services Department of Epidemiology and Biostatistics

School of Public Health and Health Services Department of Epidemiology and Biostatistics School of Public Health and Health Services Department of Epidemiology and Biostatistics Master of Public Health and Graduate Certificate Biostatistics 0-04 Note: All curriculum revisions will be updated

More information

Multiple logistic regression analysis of cigarette use among high school students

Multiple logistic regression analysis of cigarette use among high school students Multiple logistic regression analysis of cigarette use among high school students ABSTRACT Joseph Adwere-Boamah Alliant International University A binary logistic regression analysis was performed to predict

More information

EURORDIS Position Paper on Centres of Expertise and European Reference Networks for Rare Diseases

EURORDIS Position Paper on Centres of Expertise and European Reference Networks for Rare Diseases EURORDIS Position Paper on Centres of Expertise and European Reference Networks for Rare Diseases EURORDIS - the European Organisation for Rare Diseases represents 310 rare disease organisations from 34

More information

15.062 Data Mining: Algorithms and Applications Matrix Math Review

15.062 Data Mining: Algorithms and Applications Matrix Math Review .6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop

More information

SUMAN DUVVURU STAT 567 PROJECT REPORT

SUMAN DUVVURU STAT 567 PROJECT REPORT SUMAN DUVVURU STAT 567 PROJECT REPORT SURVIVAL ANALYSIS OF HEROIN ADDICTS Background and introduction: Current illicit drug use among teens is continuing to increase in many countries around the world.

More information

Use advanced techniques for summary and visualization of complex data for exploratory analysis and presentation.

Use advanced techniques for summary and visualization of complex data for exploratory analysis and presentation. MS Biostatistics MS Biostatistics Competencies Study Development: Work collaboratively with biomedical or public health researchers and PhD biostatisticians, as necessary, to provide biostatistical expertise

More information

Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk

Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk Structure As a starting point it is useful to consider a basic questionnaire as containing three main sections:

More information

CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES

CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES Claus Gwiggner, Ecole Polytechnique, LIX, Palaiseau, France Gert Lanckriet, University of Berkeley, EECS,

More information

FACILITATOR/MENTOR GUIDE

FACILITATOR/MENTOR GUIDE FACILITATOR/MENTOR GUIDE Descriptive analysis variables table shells hypotheses Measures of association methods design justify analytic assess calculate analysis problem stratify confounding statistical

More information

Digital Health: Catapulting Personalised Medicine Forward STRATIFIED MEDICINE

Digital Health: Catapulting Personalised Medicine Forward STRATIFIED MEDICINE Digital Health: Catapulting Personalised Medicine Forward STRATIFIED MEDICINE CRUK Stratified Medicine Initiative Somatic mutation testing for prediction of treatment response in patients with solid tumours:

More information

Introduction to Exploratory Data Analysis

Introduction to Exploratory Data Analysis Introduction to Exploratory Data Analysis A SpaceStat Software Tutorial Copyright 2013, BioMedware, Inc. (www.biomedware.com). All rights reserved. SpaceStat and BioMedware are trademarks of BioMedware,

More information

A Guide for the Utilization of HIRA National Patient Samples. Logyoung Kim, Jee-Ae Kim, Sanghyun Kim. Health Insurance Review and Assessment Service

A Guide for the Utilization of HIRA National Patient Samples. Logyoung Kim, Jee-Ae Kim, Sanghyun Kim. Health Insurance Review and Assessment Service A Guide for the Utilization of HIRA National Patient Samples Logyoung Kim, Jee-Ae Kim, Sanghyun Kim (Health Insurance Review and Assessment Service) Jee-Ae Kim (Corresponding author) Senior Research Fellow

More information

How To Understand Multivariate Models

How To Understand Multivariate Models Neil H. Timm Applied Multivariate Analysis With 42 Figures Springer Contents Preface Acknowledgments List of Tables List of Figures vii ix xix xxiii 1 Introduction 1 1.1 Overview 1 1.2 Multivariate Models

More information

Systematic Reviews and Meta-analyses

Systematic Reviews and Meta-analyses Systematic Reviews and Meta-analyses Introduction A systematic review (also called an overview) attempts to summarize the scientific evidence related to treatment, causation, diagnosis, or prognosis of

More information

Methods Commission CLUB DE LA SECURITE DE L INFORMATION FRANÇAIS. 30, rue Pierre Semard, 75009 PARIS

Methods Commission CLUB DE LA SECURITE DE L INFORMATION FRANÇAIS. 30, rue Pierre Semard, 75009 PARIS MEHARI 2007 Overview Methods Commission Mehari is a trademark registered by the Clusif CLUB DE LA SECURITE DE L INFORMATION FRANÇAIS 30, rue Pierre Semard, 75009 PARIS Tél.: +33 153 25 08 80 - Fax: +33

More information

Dimensionality Reduction: Principal Components Analysis

Dimensionality Reduction: Principal Components Analysis Dimensionality Reduction: Principal Components Analysis In data mining one often encounters situations where there are a large number of variables in the database. In such situations it is very likely

More information

EHR Databases and Their Role in Health & Innovation

EHR Databases and Their Role in Health & Innovation 8. New approaches to promoting innovation 8.4 Real-life data and learning from practice to advance innovation See Background Paper 8.4 (BP8_4Data.pdf) The costs of pharmaceutical R&D are high, with clinical

More information

Using News Articles to Predict Stock Price Movements

Using News Articles to Predict Stock Price Movements Using News Articles to Predict Stock Price Movements Győző Gidófalvi Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 9237 gyozo@cs.ucsd.edu 21, June 15,

More information

Quantitative Methods for Finance

Quantitative Methods for Finance Quantitative Methods for Finance Module 1: The Time Value of Money 1 Learning how to interpret interest rates as required rates of return, discount rates, or opportunity costs. 2 Learning how to explain

More information

MULTIPLE LINEAR REGRESSION ANALYSIS USING MICROSOFT EXCEL. by Michael L. Orlov Chemistry Department, Oregon State University (1996)

MULTIPLE LINEAR REGRESSION ANALYSIS USING MICROSOFT EXCEL. by Michael L. Orlov Chemistry Department, Oregon State University (1996) MULTIPLE LINEAR REGRESSION ANALYSIS USING MICROSOFT EXCEL by Michael L. Orlov Chemistry Department, Oregon State University (1996) INTRODUCTION In modern science, regression analysis is a necessary part

More information

Cancer in Ireland 2013: Annual report of the National Cancer Registry

Cancer in Ireland 2013: Annual report of the National Cancer Registry Cancer in 2013: Annual report of the National Cancer Registry ABBREVIATIONS Acronyms 95% CI 95% confidence interval APC Annual percentage change ASR Age standardised rate (European standard population)

More information

Examining Early Preventive Dental Visits: The North Carolina Experience

Examining Early Preventive Dental Visits: The North Carolina Experience Examining Early Preventive Dental Visits: The North Carolina Experience Jessica Y. Lee DDS, MPH, PhD Departments of Pediatric Dentistry & Health Policy and Administration University of North Carolina at

More information

PRACTICAL DATA MINING IN A LARGE UTILITY COMPANY

PRACTICAL DATA MINING IN A LARGE UTILITY COMPANY QÜESTIIÓ, vol. 25, 3, p. 509-520, 2001 PRACTICAL DATA MINING IN A LARGE UTILITY COMPANY GEORGES HÉBRAIL We present in this paper the main applications of data mining techniques at Electricité de France,

More information

GENETIC DATA ANALYSIS

GENETIC DATA ANALYSIS GENETIC DATA ANALYSIS 1 Genetic Data: Future of Personalized Healthcare To achieve personalization in Healthcare, there is a need for more advancements in the field of Genomics. The human genome is made

More information