Statistical Analysis of the EPIRARE survey data
|
|
- Arthur Bruno Watson
- 8 years ago
- Views:
Transcription
1 Deliverable D1.4 Statistical Analysis of the EPIRARE survey data Michele Santoro, Michele Lipucci, Fabrizio Bianchi and the EPIRARE Work Package 6 Team 1
2 2
3 CONTENTS Overview of the documents produced by EPIRARE... 4 Disclaimer... 4 I. Part 1: Descriptive, multivariate and exploratory analyses... 5 A. Introduction... 5 B. Methods... 5 C. Results... 6 D. Conclusions E. References II. Part 2: Cluster analysis A. Introduction B. Methods C. Results Target Population Number of diseases Geographical Coverage Data collected Data providers Expected Services by EU Platform 27 D. Conclusions E. References
4 Overview of the documents produced by EPIRARE Disclaimer The contents of this document is in the sole responsibility of the Authors; The Executive Agency for Health and Consumers is not responsible for any use that may be made of the information contained herein. 4
5 I. Part 1: Descriptive, multivariate and exploratory analyses A. Introduction The aim of Work Package 6 "Common data set and disease-specific data collection" is the definition of a dataset structure adoptable by Registries of Rare Diseases, which will allow to collect consistent information compared to the targets defined by the Registries[1,2]. A Registry is made by developing data elements in relation to its aim. The challenge will be to identify a common data elements, defined consistently with the clinical and epidemiological research, that has to be able to standardize the data collection of the rare diseases[3,4]. An analysis was performed to understand the needs and the informative abilities of currently existing Registries, in order to provide a common and shareable informative platform. Therefore data from the Survey, developed for Rare Diseases Registries operating in Europe and other countries, were analyzed. A total of 220 Registries have answered to the questionnaire. Our study activity has been focused only on the questions of interest of the WP6. B. Methods A priori consideration oriented the analysis on some selected questions and on relative answers of the Survey. The question on "Type of data Collected" by the Registry was strictly consistent to the aims of WP6, but it was necessary to evaluate the answers to this question in function of two essential dimensions: purpose and classification of the Registry. The informative heritage collected must be evaluated in respect to the different goals pursued and must be related to the main characteristics of the Registry, as obtained by the questionnaires. Therefore variables analyzed were the following: Aims; Population Target; Number of diseases; Data providers; 5
6 Type of data collected; Disease Coding System; Data sharing. The univariate distribution of response modes, afferent to the questions listed above, was analyzed. Potential associations among variables were initially investigated by bivariate analysis with chi-square test and then studied in deep through multivariate analysis using Logistic Regression models[5,6]. In the end, a factor analysis was performed and was mainly oriented to find out the structure of latent relationships among variables, using the Multiple Correspondence Analysis[7,8,9]. C. Results One of the Survey questions was about the objectives of the Registry with more than 10 response modes allowing an unlimited choice of answers. The "Epidemiological Research" was the main goal declared by the Registries (70.8%), followed by the "Clinical Research" (61.2%) and by the "Natural History of Disease" (60.7%). More than a half of the Registries deals with the "Disease Surveillance" (55.7%), while almost half of the Registries deals with genetic aspects ("Genotype-phenotype correlation" and "Mutation Database"). The "Treatment evaluation" is a target for the 42.9% of Registries, while the "Treatment Monitoring" for the 33.3%; the "Healthcare service planning" for the 33.8%; just 1 Registry out of 5 deals with "Social planning" (19.2%). One of the classifications that characterize a Registry is based on the target population. The Survey showed that 57.1% of responding Registries are Population-based, the 24.0% are Hospital-based, while the 18.9% of them are Case-based. More than the 80% of the Registries surveys a single disease or a group of diseases, while only the 7.3% activated a surveillance on all rare diseases. With regard to data sources, emerges that the "Clinical Units" provide data to the 83.6% of Registries, the "Clinical Genetics Units" to the 43.8%, the "Central Laboratories and services" to the 43.4% and the "Centres of Expertise" to the 30.6%; almost half of the Registries collects data from "Patients and families" and the 21.9% from "Patients' groups"; an interesting aspect is the limited use of routinary informative systems: the 31.1% of the Registries uses data from "Discharge Registers" and only the 12.8% uses "Mortality Registers"; other routinary informative systems are used in a less percentage. The 51.0% of Registries performs an activity of collaboration and of sharing with the Other Registries, the 33.8% with Centre of expertise and the 16.2% with the Biobanks. 6
7 With regard to data collected by the Registries, it is noted that almost all Registries (95.0%) collects information on diagnosis; the 86.6% collects "Clinical data" and the 72.3% collects "Genetic data"; the latter response is accompanied by information on "Family history" (55.0%) and on "Birth and reproductive history" (30.5%). The 61.4% collects information on "Medication, devices and health services", the 46.2% collects the "Socio-demographic information" while only the 32.3% states to register the "Anagraphical data" of patients. The very small percentage of the latter answer suggests a possible misinterpretation of the question and makes it necessary to perform a specific investigation. Less than a half of the Registries adopts the International Classification of Diseases (ICD9, or ICD10, or ICDO) as system of disease coding and the 36.5% doesn t use any code but just reports the disease name; the 13.0% of the Registries adopts the ORPHA code and the same percentage adopts the MIM code; the 25.0% adopts instead an own coding system. Table 1. Distribution of the answers to the following questions: Aim, Population Target, Number of diseases, Data Providers, Data collected, Data sharing and Disease coding system Aim Variable N. % Epidemiological Research ,8 Clinical Research ,2 Natural history of disease ,7 Disease surveillance ,7 Genotype-phenotype correlation ,4 Mutation database 94 42,9 Treatment evaluation (efficacy) 94 42,9 Healthcare service planning 74 33,8 Treatment monitoring (safety) 73 33,3 Social planning 42 19,2 Other 18 8,2 Population Target Population-based ,1 Hospital-based 52 24,0 Case-based 41 18,9 Number of diseases Just one 75 34,3 A group of related RDs ,6 Several RDs (not related) 26 11,9 All rare disease 16 7,3 continua 7
8 continua Data Providers Variable N. % Clinical Units ,6 Patients and families ,4 Clinical genetic Units 96 43,8 Laboratories/central services 95 43,4 Discharge Registers 68 31,1 Centres of Expertise 67 30,6 Patients groups 48 21,9 Mortality Registers 28 12,8 Birth Registers 8 3,7 Disability Registers 7 3,2 Other Registers 15 6,9 Data sharing Other Registers ,0 Biobanks 32 16,2 Centre of expertise 67 33,8 Data Collected Diagnosis ,0 Clinical ,6 Genetic ,3 Medications, devices and health services ,4 Family history ,0 Socio-demographic information ,2 Patient-reported outcomes 78 35,5 Anthropometric information 72 32,7 Anagraphical 71 32,3 Birth and reproductive history 67 30,5 Clinical research participation and bio-specimen donation 67 30,5 Patient's preferences for communication 28 12,7 Disease Coding System ORPHA code 27 13,0 MIM code 27 13,0 ICD O 13 6,3 ICD ,1 ICD ,4 Own code system 52 25,0 Non coding system, just disease name 76 36,5 8
9 Taking in consideration the goals of the WP6 a cross-analysis was performed in order to find out possible dependencies and associations among some of the selected variables. A particularly relevant result of the bivariate analysis emerged about the two main targets declared by the Registries: the "Epidemiological Research" and the "Clinical Research". These objectives are rather exhaustive of the characterization of the mission of the Registry. In fact, only the 6.9% of the Registries does not identify itself in none of the two objectives in question; the 38.8% claims to pursue both goals, while the 54.4% is divided between the two lines of research: the 32.0% deals with "Epidemiological Research" and the 22.4% with "Clinical Research". The two types of research activities point out a diverging trend: the bivariate statistical analysis of the two variables through the chi-square test allows to reject the hypothesis of independence (p=0.003) and highlights an inverse association (coefficient PHI=-0.20). The inverse association emerges also from the "Clinical Research" Odds Ratio compared to the "Epidemiological Research", that is 0.4 (95%CI= ). The analysis pointed out that the Registries tend to follow divergent development strategies according to the two different lines of research; this divergence can characterize a different structure of the Registry in its further purposes and a different informative heritage collected. In relation to the specific aim of WP6, even the possible characterization was investigated in terms of dataset generated from different types of activity. The associations of the answers "Epidemiological Research" and "Clinical Research" were calculated and compared to the other declared objectives. Logistic Regression models were used in order to evaluate such associations using "Epidemiological Research" and "Clinical Research" alternatively as outcome variables. The results are therefore expressed through Odds Ratio (OR) and p-value (Table 2). Table 2. Epidemiological Research and Clinical Research Odds Ratios compared to other objectives Epidemiological Research Clinical Research Aim OR p-value OR p-value Disease surveillance 3,8 0,0004 0,5 0,099 Treatment evaluation (efficacy) 1,3 0,636 1,2 0,685 Mutation database or Genotype-phenotype correlation 0,5 0,039 4,0 <0,0001 Social planning 2,9 0,075 0,6 0,302 Healthcare service planning 0,9 0,843 1,8 0,193 Natural History of disease 1,9 0,093 1,7 0,129 Treatment Monitoring 0,7 0,438 2,5 0,047 The Registries dealing with "Epidemiological Research" show a strong association with the "Disease Surveillance" and with the "Social planning"; they are significantly associated in an inverse way to study aims on genetic aspects, which are here summed up into the joint variable Mutation Database/Genotype- 9
10 phenotype correlation. There is an evident association with "Natural history of disease", and, even if weak, with "Treatment evaluation". There is no association with "Healthcare service planning" and there is a weak inverse association with "Treatment Monitoring". On the contrary, Registries dealing with Clinical Research are inversely associated to the Disease Surveillance and to the Social Planning and therefore they tend not to follow the objective. A positive association, not statistically significant, emerges compared to Healthcare service planning, Natural history of disease and Treatment evaluation. The same model of statistical analysis was implemented compared to "Target Population", "Data providers", Data sharing, "Data Collected" and Disease coding system dimensions. Even compared to the "Target Population" (Table 3), the type of activity shows completely inverse associations. Registries concerning with "Epidemiological Research" have mainly a "Population-based" structure, the "Hospitalbased" are even more frequent than the "Case-based". The Registries dealing with "Clinical Research" are mainly "Case-based, less "Hospital-based" and even less "Population-based". Table 3. Epidemiological Research and Clinical Research Odds Ratios compared to the Target Population Population target Epidemiological Research Clinical Research OR p-value OR p-value Population-based 3,2 0,002 0,5 0,084 Hospital-based 1,6 0,253 0,8 0,584 Case-based 1 1 Compared to the data sources used (Table 4), Registries that deal with "Epidemiological Research" get data from "Clinical Units" and from "Centres of Expertise" more than what Registries dealing with "Clinical Research" do; these latter, instead, mainly draw on informative heritages from patients (patients, families and groups); there is a statistically significant inverse association of the "Epidemiological Research" compared to the "Clinical genetic units" which instead are significantly associated to the "Clinical Research"; all the Registries using "Mortality registers" deal with "Epidemiological Research, while there are no significant associations with the other types of routine informative systems for both the areas of research. 10
11 Table 4. Epidemiological Research and Clinical Research Odds Ratios compared to the Data Providers Data provider Epidemiological Research Clinical Research OR p-value OR p-value Clinical Units 2,3 0,060 1,0 0,936 Clinical genetic Units 0,3 0,001 2,1 0,041 Patients and families 1,3 0,478 2,1 0,020 Patients' groups 0,7 0,430 1,7 0,238 Laboratories/central services 1,5 0,290 0,7 0,309 Centres of expertise 3,3 0,003 1,3 0,495 Discharge Registers 1,1 0,774 1,2 0,616 Mortality Registers * 0,6 0,255 Birth, Disability and other Registers 0,8 0,795 0,7 0,519 * All Registries using Mortality Registries deal with Epidemiological Research. The activity of sharing of Other Registers and Centre of expertise is more intense with Registries that deal with Epidemiological Research, while the Biobanks tend to a greater sharing with Registries dealing with Clinical Research (Table 5). Tabella 5. Epidemiological Research and Clinical Research Odds Ratios compared to the Data sharing Epidemiological Research Clinical Research Data sharing OR p-value OR p-value Other Registers 1,5 0,185 1,3 0,351 Biobanks 1,1 0,803 1,7 0,248 Centres of expertise 1,9 0,101 1,3 0,473 With regard to the typology of the data collected (Table 6), the analysis allowed to point out considerably different informative characteristics compared to the research typology pursued by the Registry. The mode of response Diagnosis was erased from the analysis since diagnosis date are collected by almost all the Registries making therefore this variable not explicative. The variable "Epidemiological Research" highlights a positive and statistically significant association with "Socio-demographic information" and it is also positively associated with "Anagraphical data" and with "Clinical data", instead it is strongly and inversely associated with "Genetic data". On the contrary the variable "Clinical Research" is significantly associated with "Genetic data", as well as with "Medications, devices and health services" and "Clinical Research participation and bio-specimen donation"; it is also associated, but not significantly, with "Clinical data" and it is statistically significant the inverse association with "Socio-demographic information." 11
12 Table 6. Epidemiological Research and Clinical Research Odds Ratios compared to the Data collected Epidemiological Clinical Research Research Data OR p-value OR p-value Anagraphical 1,8 0,130 1,2 0,618 Socio-demographic information 4,2 <0,0001 0,5 0,050 Genetic 0,2 0,003 4,4 0,001 Clinical 2,8 0,085 1,9 0,234 Medications, devices and health services 0,9 0,792 2,6 0,015 Patient-reported outcomes 0,9 0,861 1,3 0,524 Family history 1,3 0,459 1,3 0,468 Anthropometric information 1,6 0,236 0,6 0,264 Birth and reproductive history 1,0 0,931 0,7 0,459 Clinical research participation and bio-specimen donation 0,6 0,257 3,0 0,013 In the end, the Registries that deal with Epidemiological Research tend to use the ICD coding system and not the MIM code, while for Registries that deal with the Clinical Research is all inverted. Furthermore, the Epidemiological Research is positively but not significantly associated with the use of the ORPHA code and with the use of an own coding system (Table 7). Table 7. Epidemiological Research and Clinical Research Odds Ratios compared to the Disease Coding System Epidemiological Clinical Research Research Coding system OR p value OR p value ORPHA code 1,7 0,413 0,6 0,295 MIM code 0,2 0,003 5,3 0,003 ICD 4,2 0,048 0,3 0,054 Own code system 3,4 0,060 0,8 0,720 Non coding system 2,2 0,292 0,9 0,787 The factor analysis substantially confirmed the cognitive framework emerged from the results produced by the model of logistic regression. A Multiple Correspondence Analysis was performed in order to build a factorial plan able to highlight latent structures of relationship between the data; the following variables were selected as active variables to define factorial axes: - Aims; - Population target; - Number of diseases. 12
13 On the once built factorial plan, were projected the other variables considered in the statistical model as supplementary ones: - Data providers; - Data sharing; - Data collected; - Disease coding system. Figure 1 reports the spatial plan defined by the first two factorial axes. The inertia explained by the first axis, according to the correction of Benzecrì[10], is equal to 56.44%, while the second axis explains an inertia of 41.79%, determining therefore, a total variability explained by the plan of 98.23%. The variables related to the objectives of the Registries which provide the greatest contribution to the definition of the first factorial axis are: "Treatment evaluation", "Treatment Monitoring", Social planning", "Healthcare service planning", "Disease surveillance" and "Natural History of Disease". Therefore, the factorial axis could be interpreted as a measure of the monitoring and evaluation activity. In the upper part of the second axis Is reported the contribution of the "Epidemiological Research", while in the lower part, the contribution of the "Clinical Research", "Genetic Research" and "Natural History of Disease. The axis orientation could be interpreted as follow: downward the research on disease and upward the research on population. In fact, the "Population-based" mode is located at the top along the second axis, but it is not associated with the dynamic of monitoring explained by the first axis; the "Case based" mode is located in the diametrically opposite part; the "Hospital-based" mode is located downward but is also associated with the first factorial axis. The mode "All/Several diseases" is located upward along the second axis, while One disease" and "A group of diseases are located along the second axis with a greater contribution of the latter mode. In the upper part of the plan, Epidemiological research, "Disease surveillance", "Healthcare service planning" and Social planning", being placed on the same direction, are correlated. In the lower part of the plan a correlation between "Clinical Research" and "Genetic Research" is shown, as well as between "Treatment evaluation" and "Treatment Monitoring". 13
14 Figure 1. Factorial Plan determined by the active variables all/several RDs Health service planning Social planning Population-based Epidemiological research Disease suveillance Case-based one RD a group of RDs Hospital-based Clinical research Genopheno/Mutation History of disease Treatment evaluation Treatment monitoring Figure 2 shows the spatial plan where the collected information related to the Data providers and to the Data sharing are projected. It must be noted that data from routine informative system (mortality, discharge and other registries) are located in the first quadrant of the factorial plan, the one oriented to the monitoring in the public health field. The variable "Laboratories/central services" is also located in the first quadrant, but it s moved towards the origin of the axes which represents the center gravity. "Clinical Units" and, more clearly "Centres of Expertise", tend to be strongly associated with the first factor but they do not discriminate with respect to the second; the variables "Patients family, Patients' groups" and "Genetic Units" lie in the direction of clinical and genetic research. The sharing with the Other Registers is located in the upper part of the plan while the one with the Biobanks is located in the lower part. Figure 3 shows the Data collected and Disease coding system projections on the factorial plan. In the first quadrant there are "Anagraphical data" mode and "Socio-demographic information" mode, while in the lower quadrant are located "Genetic data", "Family history", "Clinical research participation and biospecimen donation", "Anthropometric information", "Medications, devices and health services" and "Clinical data"; "Diagnosis" is confirmed to be a not discriminant variable. "Birth and reproductive history", "Anthropometric information", "Clinical research participation and biospecimen donation" are data that tend to be mostly collected by the Registries that deal with monitoring and evaluation in a clinical field. 14
15 The use of the ICD code, located in the upper quadrant, highlights an association with the Epidemiological Research and with the Population-based ; in the same quadrant there is even the ORPHA code while in the opposite quadrant, associated with the clinical and genetic research, is represented the MIM code use and the variable No coding system is used. Figure 2. Projection of the Data providers and of the Data sharing on factorial plan mortality register all/several RDs other register Health service planning Social planning Case-based one RD genetic units Population-based discharge register share other register laboratories Epidemiological research clinic units centre expert patient family share biobank a group of RDs Hospital-based patient group History of disease Clinical research Genopheno/Mutation Disease suveillance Treatment evaluation Treatment monitoring 15
16 Figure 3. Projection of the Data collected and of the Disease coding system on factorial plan all/several RDs Health service planning Social planning no code MIM code Case-based one RD Population-based ORPHA code Epidemiological research anagraphic socio demo genetic ICD code Disease suveillance diagnosis own code birth reprod clinical medic, health serv a group of RDs anthropometric Hospital-based History of disease family history biospecimen Clinical research Genopheno/Mutation Treatment evaluation Treatment monitoring D. Conclusions The Analysis of Survey Data focused on the specific objectives of WP6, allowed to obtain useful and important informations on the features of the existing Registries that could assist the analysis process for the definition of the common dataset. Particularly, it was found that the Registries show a tendency to separation of the research lines and this divergence consequently influences all the informative heritage. The divergence emerged between Registries pursuing an epidemiological research and those which pursue a clinical research, clearly shows two well different types of Registries. On the one hand, there are Registries, potentially population-based, whose main target is the "Epidemiological Research"; these Registries deal with the surveillance of diseases and the social relapse, interface themselves with other information systems, use the ICD system, collect personal data. On the other hand, there are Registries, potentially case-based and/or hospital-based, whose main goal is the Clinical Research ; these Registries are pursuing goals on genetic aspects, on assessment and on monitoring of treatments, for which the bulk of information is strictly based on genetic and clinical data, and, obviously, on data concerning the diagnosis. At a glance, there are Registries that we could define as population-oriented, that potentially pursue public health objectives, and Registries disease-oriented with clinical-genetic research objectives. 16
17 It s worth of note that these registries showed also several common traits, which should be deepen in the perspective of identifying a common data set. Such differences may be indicative of possible limitations in orienting the global research activities that should characterize a Registry of Rare Diseases that willing to be a useful tool for public health. This kind of research must necessarily use both epidemiological and clinical data, and at the same time should represent the basement for the development of specilyzed registries. In this perspective, the identification of a common dataset assumes a strategic relevance for collecting consistent data able to develop a solid and flexible platform for research and public health activities. E. References 1. Nadkarni PM, Brandt CA (2006) The common data elements for cancer research: remarks on functions and structure. Methods Inf. Med. 45(6): Carter J, Evans J., Tuttle M., Weida T, White T, Harvell J. Shipley S (2006). Making the minimum data set compliant with health information technology standards. Excecutive summary. U.S. Department of Health and human Services, Accessed: 2nd September Richesson RL, Krischer JP (2007) Data standard in clinical research: gaps, overlaps, challenges and future directions. J. Am. Med. Inform. Assoc. 14(6): AHRQ (2010) Registries for evaluating patient outcomes: a user s guide. In: Glicklich RE, Dreyer N (eds) Agency for Healthcare Research and Quality, Rockville, MD. 5. McCullagh, P, Nelder, JA (1989), Generalized Linear Models, Second Edition, London: Chapman and Hall. 6. Woodward M (2005) Epidemiology: Study Design and Data Analysis, Second Edition, New York: Chapman & Hall/CRC 7. Benzécri, JP (1973), L Analyse des Données: T. 2, l Analyse des Correspondances, Paris: Dunod. 8. Greenacre, MJ (1984), Theory and Applications of Correspondence Analysis, London: Academic Press. 9. Greenacre, MJ (1994), Multiple and Joint Correspondence Analysis, in: MJ Greenacre and J Blasius, (eds), Correspondence Analysis in the Social Sciences, London: Academic Press. 10. Benzécri, JP (1979), Sur le Calcul des taux d inertie dans l analyse d un questionaire, Addendum et erratum á [BIN.MULT.], Cahiers de l Analyse des Données 4,
18 II. Part 2: Cluster analysis A. Introduction The objective of WP6 concerns the development of a proposal of a common dataset applied to the Registries on Rare Diseases exploring a bottom-up approach. The data collected by a questionnaire sent to 220 Registries operating in different European countries, represent a huge amount of information, crucial to understand the characteristics and weaknesses that distinguish the active Registries in the field of rare diseases. The analysis of such data is used to generate relevant information to support the definition of a common dataset. An initial analysis of the Survey, focused to the specific objective of WP6, has already provided important information on the actual differentiation of the Registries on Rare Diseases, which reflect the different objectives. The previous analysis (see WP6 Interim Technical Report) has allowed to identify different patterns. In particular, a multivariate analysis was conducted with the aim to identify relations between variables that define the objectives and other characteristic elements of the Registries. Multivariate analysis, carried out through the use of the technique of Multiple Correspondences, allowed the identification of relations among groups of variables that showed two macro-groups of Registries ( there are Registries that we could define as population-oriented, that potentially pursue public health objectives, and Registries disease-oriented with clinical-genetic research objectives ). We performed further analysis on the database provided by the Survey with the aim of obtaining useful information to understand the profile of the Registries and to define more accurately the common information needs (common dataset). A cluster analysis was done in order to identify groups of Registries with common traits and characteristics. Underlining that in this second analytical approach has been moved the point of observation: from variables (Correspondences Analysis) to units of observation, the Registries (Cluster Analysis). B. Methods Cluster Analysis is a set of statistical techniques that allows, by iterative processes, to identify groups of similar observations with respect to specific characteristics. We performed a Cluster Analysis using a hierarchical model, which is a computational process that integrates, by progressive steps, observations more "close" and similar to each other, starting from all of observations until you get to a single group. Priority was given to the Aims declared by the Registries, as the objectives of a registry represent the main 18
19 feature associated with the information need. We carried out additional analysis that take in consideration as explanatory variables, in addition to the Aims, also other variables collected by the questionnaire: Number of diseases, Population target, Geographical Coverage. The results of these additional analysis did not provide significative patterns of grouping, or they provided a such large number of clusters which does not allow a clear interpretation. This result is of great importance and it means that the result obtained with a variable or several variables is quite similar for the identification of a small number of clusters which represents an indirect measure of solid consistency of the result achieved. The Cluster Analysis performed in our study, was carried out on the following Aims declared by the Registries: Epidemiological Research Clinical Research Natural history of disease Disease surveillance Genotype-phenotype correlation + Mutation database Healthcare service planning Social planning Treatment evaluation (efficacy) Treatment monitoring (safety) For the interpretation of Clusters identified by statistical analysis, we analyzed the distribution of Aims in each Cluster and evaluated the deviation of the frequency respect to an expected value. The expected value for each Aim was estimated applying to each cluster the overall percentage calculated on all Registries ( i.e.: the percentage of the Aim Clinical Research in all Registries is 61.2.; this value was used to calculate the Expected value of Clinical Research for each Cluster). The deviation of the value observed respect to the expected value is expressed as percentage Deviation (%Deviation=100*(Number Observed- Number Expected)/Number Expected). To validate the consistency of the interpretation of clusters and produce additional information on the characterization of the different types of Registries, we estimated the distribution and the percentage deviation also on the following questions: Target Population, Number of diseases, Geographical Coverage, Collected Data, Data Providers and Services expected by EU platform. 19
20 C. Results The iterative process of calculation concerning the clustering can be viewed by the Dendrogram (Figure 1), which puts in evidence three major clusters. Also the statistical tests (cubic clustering criteria and Pseudo F test), aimed at a proper definition of the number of clusters, confirm the identification of 3 clusters (Figure 2). The number of Registries present in each cluster (Table 1) is quite balanced: the first group, named Cluster 1 includes 52 Registries (23.7%), 86 Registries are in the Cluster 2 (39.3%) and 81 in the Cluster 3 (37.0%). Table 1. Composition of the Clusters Cluster Number % Cluster ,7 Cluster ,3 Cluster ,0 Totale 219* *1 Registry not analysed: the Aims reported as missing Figure 1. Dendrogram of Cluster Analysis 20
21 Figure 2. Criteria for the definition of number of Clusters: Cubic Clustering e Pseudo F We analyzed the distribution of the Aims in each group to facilitate the interpretation of the identified Clusters. In Table 2 are shown, for each of the three clusters, the distribution of Aim and the percentage deviation (see Methods) compared to the expected value. In the Cluster 1, 90.4% of Registries perform Epidemiological Research, 75.0% pursue the objectives of Disease Surveillance and 63.5% Healthcare Service Planning. In the Cluster 1 the expected number of Registries with the Aim of Epidemiological Research is 37, whereas the observed number is 47, corresponding to +28%. The Cluster 1 shows a positive deviation also for: Disease Surveillance (+35%), Healthcare Service planning (+88%) and Social Planning (+70%); instead it shows negative deviations for: Clinical research (-65%), Natural History of disease (-81%) and Mutation database or Genotype-phenotype correlation (-93%). For the same Aims, the Cluster 2 shows a contrary trend with a positive deviation for: Clinical research (+14%), Natural History of disease (+9%) and Mutation or Genotype-phenotype correlation database (+38%); negative deviations are highlighted instead for: Epidemiological research (-21%), Disease Surveillance (-54%), Healthcare Service planning (-79%), Social Planning (-88%). Both Cluster exhibit a lower percentage for Treatment evaluation (Cluster 1: -60%; Cluster 2: -86%) and for Treatment Monitoring (Cluster 1: -83%, Cluster 2: -86%). The Cluster 3 showed an higher percentage than expected for all the Aims, but especially for Aims relating to the Treatment: 98.8% (+130% compared to the expected value) of the Registries enclosed in the Cluster 3 declares to make Treatment evaluation and 81. 5% (+144%) Treatment Monitoring. 21
22 Table 2. Number and percentage of Registries observed, Number of Registries Expected, Percentage Deviation from Expected, by Cluster and Aim Aim Cluster 1 (n=52) Cluster 2 (n=86) Cluster 3 (n=81) N (%) Exp Dev N (%) Exp Dev N (%) Exp Dev Clinical research 11 (21.2) 32-65% 60 (69.8) % 63 (77.8) % Disease surveillance 39 (75.0) % 22 (25.6) 48-54% 61 (75.3) % Epidemiological research 47 (90.4) % 48 (55.8) 61-21% 60 (74.1) 57 +5% Genotype-phenotype /mutation database 2 (3.8) 30-93% 69 (80.2) % 56 (69.1) % Healthcare services planning 33 (63.5) % 6 (7.0) 29-79% 35 (43.2) % Natural history of disease 6 (11.5) 32-81% 57 (66.3) 52 +9% 70 (86.4) % Social planning 17 (32.7) % 2 (2.3) 16-88% 23 (28.4) % Treatment evaluation 9 (17.3) 22-60% 5 (5.8) 37-86% 80 (98.8) % Treatment monitoring 3 (5.8) 17-83% 4 (4.7) 29-86% 66 (81.5) % N= Number of Registries observed in the Cluster Exp = Number of Registries expected in the Cluster Dev = Percentage Deviation of Observed value from Expected value (see Methods) The interpretation of these results seem to be quite clear for the first two Clusters: the Cluster 1 is characterized by a type of Registry which pursues Aims relating to the activities of Public Health; the Cluster 2 identifies a type of Registry more oriented in Clinical and Genetic Research. The interpretation of Cluster 3 is more complex: it seems to include Registries based on research for the assessment and the monitoring of the Treatment. Regarding the Cluster 3 we observed a higher percentage also of all the other Aims which could be explained as a bias due by a declaration of multiple objectives of the Registries. Basically, the results of the Cluster Analysis are in agreement with the findings obtained by the previous Multiple Correspondences analysis, which aimed to search the correlations among the variables. The factorial plan determined by statistical method of Multiple Correspondences (Figure 3) identified clearly the association among the 3 groups of Aims, as also confirmed by the Cluster Analysis. 22
23 Figure 3. Factorial Plan by Analysis of Multiple Correspondence The Cluster analysis indicated the presence of three macro-types of Registries with Aims which show a tendency to differentiation mainly for the first 2 types. The joint interpretation of the results obtained by Cluster Analysis and Multiple Correspondences Analysis suggest to name the three cluster of Registries as: Cluster 1: Public Health Registries Cluster 2: Clinical and Genetic Research Registries Cluster 3: Treatment Registries. On the basis of this interpretation we analyzed the distribution of other variables collected by the Survey within the three types of Registries identified. The results are expressed in terms of the percentage distribution within the group, and as a percentage deviation from the expected value calculated in the same way of the question Aims. The results are reported for the questions: Target Population, Number of diseases, Geographical Coverage, Collected Data, Data Providers and Services expected by EU platform. 1. Target Population The 78.8% of the Registries belonging to the group "Public health" is Population-based, whereas only a small number is Case-based (5.8%) and Hospital-based (15.4%). Even the Registries belonging to the other two types claim to be prevalent Population-based, but with values lower than expected. Registries "Clinical- 23
24 Genetic Research" showed a greater tendency to be case-based (+30% compared to the expected), whereas the "Treatment" to be Hospital-based (+26%). This result is consistent with the interpretation given to the Cluster. Table 3. Number and percentage of Registries observed, Number of Registries Expected, Percentage Deviation from Expected, by Cluster and Population target Population target Public Health Clinical-Genetic Research Treatment N (%) Exp Dev N (%) Exp Dev N (%) Exp Dev Case based 3 (5.8) 10-70% 21 (24.7) % 17 (21.5) % Hospital based 8 (15.4) 13-36% 20 (23.5) 20-2% 24 (30.4) % Population based 41 (78.8) % 44 (51.8) 48-9% 38 (48.1) 45-16% 2. Number of diseases The Registries "Public Health", in contrast to the other two types, show a greater tendency to cover all diseases (+73% compared to the expected value). The 47.7% of Registries "Clinical-Genetic Research" collect data on a group of diseases, the 34.9% collect data on a single disease and the 17.4% cover all diseases. Such distribution reflects the expected distribution. The Registries "Treatment" deal mainly with a group of rare diseases (49.4%), or one rare disease (38.3%, with a value of 13% above the expected). Table 4. Number and percentage of Registries observed, Number of Registries Expected, Percentage Deviation from Expected, by Cluster and Number of diseases Number of diseases Public Health Clinical-Genetic Research Treatment N (%) Exp Dev N (%) Exp Dev N (%) Exp Dev A group / several 21 (41.2) 24-12% 41 (47.7) 40 +2% 40 (49.4) 38 +6% All 17 (33.3) % 15 (17.4) 17-9% 10 (12.3) 16-36% Just one 13 (25.5) 17-25% 30 (34.9) 29 +3% 31 (38.3) % 3. Geographical Coverage The majority of the Registries has a national coverage in all the 3 typologies with a different distribution: although 50% of the Registries "Public Health" have national coverage, the value is 19% less than the expected; the Registries "Clinical-Genetic Research" show a frequency of national coverage equal to the expected; the Registries "Treatment" exhibit a higher value than expected (+12%). The 22 Registries "Public Health", compared to 9 Registries expected (+148%), have a regional coverage, and this deviation is 24
25 reversed for the other two types ("Clinical-Genetic Research" -58%, "Treatment" -35%). The international coverage is provided by a small number of Registries "Public Health" (-79% compared to expected), whereas the "Clinical-Genetic Research" show a positive deviation (+52%). Table 5. Number and percentage of Registries observed, Number of Registries Expected, Percentage Deviation from Expected, by Cluster and Geographical coverage Geographical Coverage Public Health Clinical-Genetic Research Treatment N (%) Exp Dev N (%) Exp Dev N (%) Exp Dev International 2 (3.8) 9-79% 23 (27.4) % 14 (17.3) 15-4% Local 2 (3.8) 2 +19% 3 (3.6) 3 +11% 2 (2.5) 3-23% National 26 (50.0) 32-19% 52 (61.9) 52 0% 56 (69.1) % Regional 22 (42.3) % 6 (7.1) 14-58% 9 (11.1) 14-35% 4. Data collected Diagnosis is an information collected by almost all of the Registries (95%), so there are no substantial differences among the three types of Registries. "Public Health" Registries tend to collect more anagraphic data (+26% compared to the expected) and socio-demographic data (+28%), whereas other information are collected with values below the expected. The 88.4% of Registries "Clinical-Genetic Research" collect clinical data and 8.0% genetic data; these Registries show also a higher frequency than that expected for the collection of data on Family history (+17%) and Patient's preferences for communication (+27%). The 97.5% of Registries "Treatment" collect Clinical data and show a positive percentage deviation for all types of information, except for the anagraphic data; in particular the highest values are highlighted for: anthropometric info (+68%), Clinic research participation and biospecimen donation (+39%), Birth and reproductive history (+49%), Family history (+22%), Medications devices and health services (+37%), Patient-reported outcomes (+65% ). 25
26 Table 6. Number and percentage of Registries observed, Number of Registries Expected, Percentage Deviation from Expected, by Cluster and Data collected Data collected Public Health Clinical-Genetic Research Treatment N (%) Exp Dev N (%) Exp Dev N (%) Exp Dev Anagraphic 21 (40.4) % 26 (30.2) 27-5% 23 (28.4) 26-11% Anthropometric info 8 (15.4) 17-53% 19 (22.1) 28-32% 44 (54.3) % Clinic research participation and biospecimen donation 6 (11.5) 16-62% 26 (30.2) 26 0% 34 (42.0) % Birth and reproductive history 9 (17.3) 16-43% 21 (24.4) 26-20% 37 (45.7) % Clinical data 35 (67.3) 45-22% 76 (88.4) 75 +2% 79 (97.5) % Diagnosis 52 (100) 49 +5% 77 (89.5) 82-6% 79 (97.5) 77 +3% Family history 11 (21.2) 28-61% 55 (64.0) % 54 (66.7) % Genetic data 21 (40.4) 38-44% 74 (86.0) % 63 (77.8) 58 +8% Medications, devices and health services Patient's preferences for communication 22 (42.3) 32-31% 44 (51.2) 53-16% 68 (84.0) % 3 (5.8) 7-55% 14 (16.3) % 11 (13.6) 10 +6% Patient-reported outcomes 9 (17.3) 18-51% 21 (24.4) 30-31% 47 (58.0) % Socio demographic info 32 (61.5) % 26 (30.2) 41-37% 47 (58.0) % 5. Data providers The Clinical Units are the most providers for the three types of Registries. "Public health" Registries show a tendency to the use of Health Information Systems (Hospital databases, Mortality and other Registries), whereas it is limited the use of information from the Clinical Genetic Units (-34%), from the patients and their families (-29%) and patients organisations (-30%). In contrast, Registries "Clinical-Genetic Research" tend to use as data sources the context of the patients (Patients and families +15%, Patients' groups +16%), and the Clinical Genetic Units (+14%), while the Health Information Systems are not used. Most data providers for the Registries "Treatment" are: the Centers of Expertise (+18%), Clinical units (+11%), Clinical genetic units (+8%) and Hospital databases (+12%). 26
27 Table 7. Number and percentage of Registries observed, Number of Registries Expected, Percentage Deviation from Expected, by Cluster and Data provider Data provider Public Health Clinical-Genetic Research Treatment N (%) Exp Dev N (%) Exp Dev N (%) Exp Dev Centres of expertise 13 (25.0) 16-19% 25 (29.1) 26-5% 29 (36.3) % Clinical genetic units 15 (28.8) 23-34% 43 (50.0) % 38 (47.5) 35 +8% Clinical units 43 (82.7) 43-1% 65 (75.6) 72-9% 74 (92.5) % Hospital databases 23 (44.2) % 17 (19.8) 27-37% 28 (35.0) % Laboratories/central services 26 (50.0) % 32 (37.2) 37-15% 37 (46.3) 35 +6% Mortality registers 17 (32.7) % 1 (1.2) 11-91% 10 (12.5) 10-3% Other registers 14 (26.9) % 3 (3.5) 10-70% 8 (10.0) 9-13% Patients and families 18 (34.6) 25-29% 48 (55.8) % 40 (50.0) 39 +3% Patients' groups 8 (15.4) 11-30% 22 (25.6) % 18 (22.5) 18 +2% 6. Expected Services by EU Platform The main expected services by the Platform for Registries "Public health" is the Quality control system (66.0%), whereas for the Registries "Clinical-Genetic Research" and "Treatment" are the IT tools. In relation to their epidemiological function, "Public health" Registries have a higher expectation on services for the Facilitated Access to data sources (+11%). Table 8. Number and percentage of Registries observed, Number of Registries Expected, Percentage Deviation from Expected, by Cluster and Expected service by EU Platform Expected service by EU Platform Public Health Clinical-Genetic Research Treatment N (%) Exp Dev N (%) Exp Dev N (%) Exp Dev Expert technical advice 15 (31.9) 18-18% 28 (35.4) 31-9% 33 (47.8) % Facilitated access to data sources 23 (48.9) % 22 (27.8) 35-37% 41 (59.4) % IT tools 27 (57.4) 32-16% 56 (70.9) 54 +3% 51 (73.9) 47 +8% Legal advice 21 (44.7) 23-7% 37 (46.8) 38-3% 36 (52.2) 33 +8% Model documents 21 (44.7) 22-5% 47 (59.5) % 24 (34.8) 33-26% Quality control systems. 31 (66.0) % 37 (46.8) 45-17% 42 (60.9) 39 +8% Tools for networkig among partners and Registries 24 (51.1) 27-10% 45 (57.0) 45 0% 42 (60.9) 39 +7% 27
28 Registries "Clinical-Genetic Research" declare an higher expectation on services related to Model documents (+26%). Registries "Treatment" express an higher expectation on services for the Facilitated access to data sources (+35%) and Expert technical advice (+23%). D. Conclusions Cluster analysis identified three main typologies of Registries with Aims which showed a clear pattern of differentiation in particular for Cluster 1 and 2. By the analysis of the distribution of the Aim and the percentage deviation from the expected values, it was possible to define the three types of Registries named as: Registries for "Public health", Registries "Clinical-Genetic Research" and Registries "Treatment". Their distribution, related to the number of Registries in the Survey, exhibited a lower presence of Registries "Public health" (23.7%) compared to Registries "Clinical-Genetic Research" (39.3%) and "Treatment" (37.0%). The analysis of the questions in Survey have allowed to verify a framework rather consistent with the definition of the Clusters and they provided a huge amount of useful information for the characterization of the different typologies of Registries. Registries "Public health" pursue aims of Epidemiological Research, Social Planning, Healthcare Services Planning and Disease Surveillance; they show a greater tendency than other types of Registries to be population-based, to collect information on all diseases and to have a regional coverage closer to the target of health policy; consistently with their epidemiological function, they tend to collect information like as anagraphical and socio-demographic data, using more Health Information Systems as a data source; the expectations of these types of Registries on the services of the Platform were addressed to Quality control system and Facilitated access to data sources. The Registries "Clinical-Genetic Research" follow more aims regarding clinical and genetic research; they are mostly population-based, but show a tendency to be structured as case-based; they usually collect data on one or groups of diseases, they have a national geographical coverage with a tendency to have also an international coverage; they collect mainly information on clinical and genetic data, family history and patient's preferences for communication; the most data providers are clinical and genetic units, patients and their families and patients organizations; the main expectations from the services of the Platform are focused on IT tools and Model documents. The Registries "Treatment" pursue mainly aims relating to the Treatment Evaluation and Treatment Monitoring; they have a higher tendency to be hospital-based and focused on one disease; the geographical coverage is usually national; they collect different types of data mainly concerning: Clinical data, Clinic research participation and biospecimen donation, anthropometric info, Birth and reproductive 28
29 history, Family history, Medications devices and health services and Patient-reported outcomes; the most important data providers are the Clinical Units and the Centres of Expertise and the major expectations from the services of the Platform are addressed on IT Tools with a relevant interest for Facilitated access to data sources and expert technical advice. The statistical analysis allowed to explore the complex and fragmented framework on Registries of rare diseases highlighting structures with differentiated and interrelated profiles. These results represent an useful source of information to develop an oriented planning which can facilitate the interoperability and interconnection of Registries in accordance with the different profiles identified. Such a trait appears fundamental in the process for the rare registries platform construction. E. References 1. Anderberg M.R. (1973), Cluster Analysis for applications, New York: Academic press, Inc. 2. EPIRARE Work Package 6 (2012), Statistical Analysis of the EPIRARE survey database, 3. Interim Technical Report Appendix1 4. EPIRARE Work Package 8 (2012), Developing a European Platform for Rare Disease Registries, Draft 08/11/ EPIRARE (2012), EPIRARE Survey, 6. Everitt B.S. (1980), Cluster Analysis, 2nd Edition, London: Heineman Educational Books Ltd. 7. Fabbris L. (1983), Analisi esplorativa di dati multidimensionali, Cleup Editore 8. Fabbris L. (1997), Statistica multivariata, Milano: McGraw-Hill Libri Italia. 9. Milligan, G.W. and Cooper, M.C. (1985), An Examination of Procedures for Determining the Number of Clusters in a Data Set, Psychometrika, 50, SAS Institute Inc (2009), SASOnlineDoc9.2., Cary, NC: SAS Institute INC 11. SAS Institute Inc (1999), SAS/STAT User s Guide, Version 8, Cary, NC: SAS Institute INC 12. Sarle, W.S. (1983), Cubic Clustering Criterion, SAS Technical Report A-108, Cary, NC: SAS Institute Inc. 29
Data mining on the EPIRARE survey data
Deliverable D1.5 Data mining on the EPIRARE survey data A. Coi 1, M. Santoro 2, M. Lipucci 2, A.M. Bianucci 1, F. Bianchi 2 1 Department of Pharmacy, Unit of Research of Bioinformatic and Computational
More informationClinicians and patients needs and expectations from registries
Clinicians and patients needs and expectations from registries Luciano Vittozzi cnmr.eu@iss.it National Centre Rare Diseases National Institute for Health Rome Italy EPIRARE is a project co-funded by the
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More informationDepartment of Behavioral Sciences and Health Education
ROLLINS SCHOOL OF PUBLIC HEALTH OF EMORY UNIVERSITY Core Competencies Upon graduation, a student with an MPH/MSPH should be able to: Use analytic reasoning and quantitative methods to address questions
More informationMultivariate Analysis of Ecological Data
Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology
More informationEarly mortality rate (EMR) in Acute Myeloid Leukemia (AML)
Early mortality rate (EMR) in Acute Myeloid Leukemia (AML) George Yaghmour, MD Hematology Oncology Fellow PGY5 UTHSC/West cancer Center, Memphis, TN May,1st,2015 Off-Label Use Disclosure(s) I do not intend
More informationCONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19
PREFACE xi 1 INTRODUCTION 1 1.1 Overview 1 1.2 Definition 1 1.3 Preparation 2 1.3.1 Overview 2 1.3.2 Accessing Tabular Data 3 1.3.3 Accessing Unstructured Data 3 1.3.4 Understanding the Variables and Observations
More informationRuSH: Sickle Cell Surveillance and Registry Program
RuSH: Sickle Cell Surveillance and Registry Program CDR Althea M Grant, PhD Chief, Epidemiology and Surveillance Branch Division of Blood Disorders National Center for Birth Defects and Developmental Disabilities
More informationEURORDIS-NORD-CORD Joint Declaration of. 10 Key Principles for. Rare Disease Patient Registries
EURORDIS-NORD-CORD Joint Declaration of 10 Key Principles for Rare Disease Patient Registries 1. Patient Registries should be recognised as a global priority in the field of Rare Diseases. 2. Rare Disease
More informationAbbas S. Tavakoli, DrPH, MPH, ME 1 ; Nikki R. Wooten, PhD, LISW-CP 2,3, Jordan Brittingham, MSPH 4
1 Paper 1680-2016 Using GENMOD to Analyze Correlated Data on Military System Beneficiaries Receiving Inpatient Behavioral Care in South Carolina Care Systems Abbas S. Tavakoli, DrPH, MPH, ME 1 ; Nikki
More informationPEER REVIEW HISTORY ARTICLE DETAILS VERSION 1 - REVIEW. Elizabeth Comino Centre fo Primary Health Care and Equity 12-Aug-2015
PEER REVIEW HISTORY BMJ Open publishes all reviews undertaken for accepted manuscripts. Reviewers are asked to complete a checklist review form (http://bmjopen.bmj.com/site/about/resources/checklist.pdf)
More informationchapter 5. Quality control at the population-based cancer registry
chapter 5. Quality control at the population-based cancer registry All cancer registries should be able to give some objective indication of the quality of the data that they have collected. The methods
More informationSAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
More informationSegmentation: Foundation of Marketing Strategy
Gelb Consulting Group, Inc. 1011 Highway 6 South P + 281.759.3600 Suite 120 F + 281.759.3607 Houston, Texas 77077 www.gelbconsulting.com An Endeavor Management Company Overview One purpose of marketing
More informationHow to report the percentage of explained common variance in exploratory factor analysis
UNIVERSITAT ROVIRA I VIRGILI How to report the percentage of explained common variance in exploratory factor analysis Tarragona 2013 Please reference this document as: Lorenzo-Seva, U. (2013). How to report
More informationStatistical Rules of Thumb
Statistical Rules of Thumb Second Edition Gerald van Belle University of Washington Department of Biostatistics and Department of Environmental and Occupational Health Sciences Seattle, WA WILEY AJOHN
More informationApplication of discriminant analysis to predict the class of degree for graduating students in a university system
International Journal of Physical Sciences Vol. 4 (), pp. 06-0, January, 009 Available online at http://www.academicjournals.org/ijps ISSN 99-950 009 Academic Journals Full Length Research Paper Application
More informationSimple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
More informationA PROSPECTIVE EVALUATION OF THE RELATIONSHIP BETWEEN REASONS FOR DRINKING AND DSM-IV ALCOHOL-USE DISORDERS
Pergamon Addictive Behaviors, Vol. 23, No. 1, pp. 41 46, 1998 Copyright 1998 Elsevier Science Ltd Printed in the USA. All rights reserved 0306-4603/98 $19.00.00 PII S0306-4603(97)00015-4 A PROSPECTIVE
More informationCompetency 1 Describe the role of epidemiology in public health
The Northwest Center for Public Health Practice (NWCPHP) has developed competency-based epidemiology training materials for public health professionals in practice. Epidemiology is broadly accepted as
More informationAdditional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm
Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm
More informationData mining and official statistics
Quinta Conferenza Nazionale di Statistica Data mining and official statistics Gilbert Saporta président de la Société française de statistique 5@ S Roma 15, 16, 17 novembre 2000 Palazzo dei Congressi Piazzale
More informationUnderstanding Characteristics of Caravan Insurance Policy Buyer
Understanding Characteristics of Caravan Insurance Policy Buyer May 10, 2007 Group 5 Chih Hau Huang Masami Mabuchi Muthita Songchitruksa Nopakoon Visitrattakul Executive Summary This report is intended
More informationSimple Predictive Analytics Curtis Seare
Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use
More informationGlossary Monitoring and Evaluation Terms
Glossary Monitoring and Evaluation Terms This glossary includes terms typically used in the area of monitoring and evaluation (M&E) and provides the basis for facilitating a common understanding of M&E.
More informationBNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I
BNG 202 Biomechanics Lab Descriptive statistics and probability distributions I Overview The overall goal of this short course in statistics is to provide an introduction to descriptive and inferential
More informationExploratory Factor Analysis Brian Habing - University of South Carolina - October 15, 2003
Exploratory Factor Analysis Brian Habing - University of South Carolina - October 15, 2003 FA is not worth the time necessary to understand it and carry it out. -Hills, 1977 Factor analysis should not
More informationMarketing Research Core Body Knowledge (MRCBOK ) Learning Objectives
Fulfilling the core market research educational needs of individuals and companies worldwide Presented through a unique partnership between How to Contact Us: Phone: +1-706-542-3537 or 1-800-811-6640 (USA
More informationAppendix 6.2 Data Source Described in Detail Hospital Data Sets
Appendix 6.2 Data Source Described in Detail Hospital Data Sets Appendix 6.2 Data Source Described in Detail Hospital Data Sets Source or Site Hospital discharge data set Hospital admissions reporting
More informationEuropean registered Clinical Laboratory Geneticist (ErCLG) Core curriculum
(February 2015; updated from paper issued by the European Society of Human Genetics Ad hoc committee for the accreditation of clinical laboratory geneticists, published in February 2012) Speciality Profile
More informationComputer-Aided Multivariate Analysis
Computer-Aided Multivariate Analysis FOURTH EDITION Abdelmonem Af if i Virginia A. Clark and Susanne May CHAPMAN & HALL/CRC A CRC Press Company Boca Raton London New York Washington, D.C Contents Preface
More informationBusiness Intelligence. Data Mining and Optimization for Decision Making
Brochure More information from http://www.researchandmarkets.com/reports/2325743/ Business Intelligence. Data Mining and Optimization for Decision Making Description: Business intelligence is a broad category
More informationChapter 6 Case Ascertainment Methods
Chapter 6 Case Ascertainment Methods Table of Contents 6.1 Introduction...6-1 6.2 Terminology...6-2 6.3 General Surveillance Development...6-4 6.3.1 Plan and Document... 6-4 6.3.2 Identify Data Sources...
More informationMultinomial Logistic Regression
Multinomial Logistic Regression Dr. Jon Starkweather and Dr. Amanda Kay Moske Multinomial logistic regression is used to predict categorical placement in or the probability of category membership on a
More informationEstimating the effect of projected changes in the driving population on collision claim frequency
Bulletin Vol. 29, No. 8 : September 2012 Estimating the effect of projected changes in the driving population on collision claim frequency Despite having higher claim frequencies than prime age drivers,
More informationThe Contextualization of Project Management Practice and Best Practice
The Contextualization of Project Management Practice and Best Practice Claude Besner PhD, University of Quebec at Montreal Brian Hobbs PhD, University of Quebec at Montreal Abstract This research aims
More informationCOMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES.
277 CHAPTER VI COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES. This chapter contains a full discussion of customer loyalty comparisons between private and public insurance companies
More informationTeaching Multivariate Analysis to Business-Major Students
Teaching Multivariate Analysis to Business-Major Students Wing-Keung Wong and Teck-Wong Soon - Kent Ridge, Singapore 1. Introduction During the last two or three decades, multivariate statistical analysis
More informationImpact of Diabetes on Treatment Outcomes among Maryland Tuberculosis Cases, 2004-2005. Tania Tang PHASE Symposium May 12, 2007
Impact of Diabetes on Treatment Outcomes among Maryland Tuberculosis Cases, 2004-2005 Tania Tang PHASE Symposium May 12, 2007 Presentation Outline Background Research Questions Methods Results Discussion
More informationEXPANDING THE EVIDENCE BASE IN OUTCOMES RESEARCH: USING LINKED ELECTRONIC MEDICAL RECORDS (EMR) AND CLAIMS DATA
EXPANDING THE EVIDENCE BASE IN OUTCOMES RESEARCH: USING LINKED ELECTRONIC MEDICAL RECORDS (EMR) AND CLAIMS DATA A CASE STUDY EXAMINING RISK FACTORS AND COSTS OF UNCONTROLLED HYPERTENSION ISPOR 2013 WORKSHOP
More informationSTATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and
Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table
More informationFactors affecting online sales
Factors affecting online sales Table of contents Summary... 1 Research questions... 1 The dataset... 2 Descriptive statistics: The exploratory stage... 3 Confidence intervals... 4 Hypothesis tests... 4
More informationUniversity of Michigan Dearborn Graduate Psychology Assessment Program
University of Michigan Dearborn Graduate Psychology Assessment Program Graduate Clinical Health Psychology Program Goals 1 Psychotherapy Skills Acquisition: To train students in the skills and knowledge
More informationExploratory Factor Analysis of Demographic Characteristics of Antenatal Clinic Attendees and their Association with HIV Risk
Doi:10.5901/mjss.2014.v5n20p303 Abstract Exploratory Factor Analysis of Demographic Characteristics of Antenatal Clinic Attendees and their Association with HIV Risk Wilbert Sibanda Philip D. Pretorius
More informationHandling attrition and non-response in longitudinal data
Longitudinal and Life Course Studies 2009 Volume 1 Issue 1 Pp 63-72 Handling attrition and non-response in longitudinal data Harvey Goldstein University of Bristol Correspondence. Professor H. Goldstein
More informationSection Format Day Begin End Building Rm# Instructor. 001 Lecture Tue 6:45 PM 8:40 PM Silver 401 Ballerini
NEW YORK UNIVERSITY ROBERT F. WAGNER GRADUATE SCHOOL OF PUBLIC SERVICE Course Syllabus Spring 2016 Statistical Methods for Public, Nonprofit, and Health Management Section Format Day Begin End Building
More informationModule 223 Major A: Concepts, methods and design in Epidemiology
Module 223 Major A: Concepts, methods and design in Epidemiology Module : 223 UE coordinator Concepts, methods and design in Epidemiology Dates December 15 th to 19 th, 2014 Credits/ECTS UE description
More informationAppendix G STATISTICAL METHODS INFECTIOUS METHODS STATISTICAL ROADMAP. Prepared in Support of: CDC/NCEH Cross Sectional Assessment Study.
Appendix G STATISTICAL METHODS INFECTIOUS METHODS STATISTICAL ROADMAP Prepared in Support of: CDC/NCEH Cross Sectional Assessment Study Prepared by: Centers for Disease Control and Prevention National
More informationOverview of Factor Analysis
Overview of Factor Analysis Jamie DeCoster Department of Psychology University of Alabama 348 Gordon Palmer Hall Box 870348 Tuscaloosa, AL 35487-0348 Phone: (205) 348-4431 Fax: (205) 348-8648 August 1,
More informationDoes referral from an emergency department to an. alcohol treatment center reduce subsequent. emergency room visits in patients with alcohol
Does referral from an emergency department to an alcohol treatment center reduce subsequent emergency room visits in patients with alcohol intoxication? Robert Sapien, MD Department of Emergency Medicine
More informationII. DISTRIBUTIONS distribution normal distribution. standard scores
Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,
More informationUnivariate Regression
Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is
More informationDeveloping Risk Adjustment Techniques Using the SAS@ System for Assessing Health Care Quality in the lmsystem@
Developing Risk Adjustment Techniques Using the SAS@ System for Assessing Health Care Quality in the lmsystem@ Yanchun Xu, Andrius Kubilius Joint Commission on Accreditation of Healthcare Organizations,
More informationWhy Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts
More informationEnvironmental Health Science. Brian S. Schwartz, MD, MS
Environmental Health Science Data Streams Health Data Brian S. Schwartz, MD, MS January 10, 2013 When is a data stream not a data stream? When it is health data. EHR data = PHI of health system Data stream
More informationUniversity College London Staff survey 2013: results presentation
University College London Staff survey 2013: results presentation Classification: Private Agenda Headline results Employee engagement Key drivers of engagement within UCL Other key themes Summary and next
More informationCrash Outcome Data Evaluation System
Crash Outcome Data Evaluation System HEALTH AND COST OUTCOMES RESULTING FROM TRAUMATIC BRAIN INJURY CAUSED BY NOT WEARING A HELMET, FOR MOTORCYCLE CRASHES IN WISCONSIN, 2011 Wayne Bigelow Center for Health
More informationRARITAN VALLEY COMMUNITY COLLEGE ACADEMIC COURSE OUTLINE MATH 111H STATISTICS II HONORS
RARITAN VALLEY COMMUNITY COLLEGE ACADEMIC COURSE OUTLINE MATH 111H STATISTICS II HONORS I. Basic Course Information A. Course Number and Title: MATH 111H Statistics II Honors B. New or Modified Course:
More informationDISCRIMINANT FUNCTION ANALYSIS (DA)
DISCRIMINANT FUNCTION ANALYSIS (DA) John Poulsen and Aaron French Key words: assumptions, further reading, computations, standardized coefficents, structure matrix, tests of signficance Introduction Discriminant
More informationANALYTIC AND REPORTING GUIDELINES
ANALYTIC AND REPORTING GUIDELINES The National Health and Nutrition Examination Survey (NHANES) Last Update: December, 2005 Last Correction, September, 2006 National Center for Health Statistics Centers
More informationFuture Biobanking- Developing Smart, Sustainable And Ethically Compliant Biorepositories Market Research By MarketResearchReports.
Future Biobanking- Developing Smart, Sustainable And Ethically Compliant Biorepositories Market Research By MarketResearchReports.Biz MarketResearchReports.Biz Recently Announced Research Report And Forecast
More informationDealing with Missing Data
Dealing with Missing Data Roch Giorgi email: roch.giorgi@univ-amu.fr UMR 912 SESSTIM, Aix Marseille Université / INSERM / IRD, Marseille, France BioSTIC, APHM, Hôpital Timone, Marseille, France January
More informationHEALTH INSURANCE COVERAGE AND ADVERSE SELECTION
HEALTH INSURANCE COVERAGE AND ADVERSE SELECTION Philippe Lambert, Sergio Perelman, Pierre Pestieau, Jérôme Schoenmaeckers 229-2010 20 Health Insurance Coverage and Adverse Selection Philippe Lambert, Sergio
More informationAccess Provided by your local institution at 02/06/13 5:22PM GMT
Access Provided by your local institution at 02/06/13 5:22PM GMT brief communication Reducing Disparities in Access to Primary Care and Patient Satisfaction with Care: The Role of Health Centers Leiyu
More informationA revalidation of the SET37 questionnaire for student evaluations of teaching
Educational Studies Vol. 35, No. 5, December 2009, 547 552 A revalidation of the SET37 questionnaire for student evaluations of teaching Dimitri Mortelmans and Pieter Spooren* Faculty of Political and
More informationHow to Measure Customer Satisfaction in a Public Transport Survey
2 nd UITP International Marketing Conference 12-14 November 2003, Paris Customer Satisfaction as an Element of Strategic Business Decisions Werner Brög Thomas Kahn Socialdata Institute for Transport and
More informationA Cross-Sectional Study of Asbestos- Related Morbidity and Mortality in Vermonters Residing Near an Asbestos Mine November 3, 2008
A Cross-Sectional Study of Asbestos- Related Morbidity and Mortality in Vermonters Residing Near an Asbestos Mine 108 Cherry Street, PO Box 70 Burlington, VT 05402 802.863.7200 healthvermont.gov A Cross-Sectional
More informationBasic research methods. Basic research methods. Question: BRM.2. Question: BRM.1
BRM.1 The proportion of individuals with a particular disease who die from that condition is called... BRM.2 This study design examines factors that may contribute to a condition by comparing subjects
More informationChapter 5: Analysis of The National Education Longitudinal Study (NELS:88)
Chapter 5: Analysis of The National Education Longitudinal Study (NELS:88) Introduction The National Educational Longitudinal Survey (NELS:88) followed students from 8 th grade in 1988 to 10 th grade in
More informationData Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing. C. Olivia Rud, VP, Fleet Bank
Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing C. Olivia Rud, VP, Fleet Bank ABSTRACT Data Mining is a new term for the common practice of searching through
More informationAttrition in Online and Campus Degree Programs
Attrition in Online and Campus Degree Programs Belinda Patterson East Carolina University pattersonb@ecu.edu Cheryl McFadden East Carolina University mcfaddench@ecu.edu Abstract The purpose of this study
More information2. Filling Data Gaps, Data validation & Descriptive Statistics
2. Filling Data Gaps, Data validation & Descriptive Statistics Dr. Prasad Modak Background Data collected from field may suffer from these problems Data may contain gaps ( = no readings during this period)
More informationFairfield Public Schools
Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity
More informationAlex Vidras, David Tysinger. Merkle Inc.
Using PROC LOGISTIC, SAS MACROS and ODS Output to evaluate the consistency of independent variables during the development of logistic regression models. An example from the retail banking industry ABSTRACT
More informationStudy Plan Master in Public Health ( Non-Thesis Track)
Study Plan Master in Public Health ( Non-Thesis Track) I. General Rules and conditions : 1. This plan conforms to the regulations of the general frame of the Graduate Studies. 2. Specialties allowed to
More informationSchool of Public Health and Health Services Department of Epidemiology and Biostatistics
School of Public Health and Health Services Department of Epidemiology and Biostatistics Master of Public Health and Graduate Certificate Biostatistics 0-04 Note: All curriculum revisions will be updated
More informationMultiple logistic regression analysis of cigarette use among high school students
Multiple logistic regression analysis of cigarette use among high school students ABSTRACT Joseph Adwere-Boamah Alliant International University A binary logistic regression analysis was performed to predict
More informationEURORDIS Position Paper on Centres of Expertise and European Reference Networks for Rare Diseases
EURORDIS Position Paper on Centres of Expertise and European Reference Networks for Rare Diseases EURORDIS - the European Organisation for Rare Diseases represents 310 rare disease organisations from 34
More information15.062 Data Mining: Algorithms and Applications Matrix Math Review
.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop
More informationSUMAN DUVVURU STAT 567 PROJECT REPORT
SUMAN DUVVURU STAT 567 PROJECT REPORT SURVIVAL ANALYSIS OF HEROIN ADDICTS Background and introduction: Current illicit drug use among teens is continuing to increase in many countries around the world.
More informationUse advanced techniques for summary and visualization of complex data for exploratory analysis and presentation.
MS Biostatistics MS Biostatistics Competencies Study Development: Work collaboratively with biomedical or public health researchers and PhD biostatisticians, as necessary, to provide biostatistical expertise
More informationAnalysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk
Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk Structure As a starting point it is useful to consider a basic questionnaire as containing three main sections:
More informationCHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES
CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES Claus Gwiggner, Ecole Polytechnique, LIX, Palaiseau, France Gert Lanckriet, University of Berkeley, EECS,
More informationFACILITATOR/MENTOR GUIDE
FACILITATOR/MENTOR GUIDE Descriptive analysis variables table shells hypotheses Measures of association methods design justify analytic assess calculate analysis problem stratify confounding statistical
More informationDigital Health: Catapulting Personalised Medicine Forward STRATIFIED MEDICINE
Digital Health: Catapulting Personalised Medicine Forward STRATIFIED MEDICINE CRUK Stratified Medicine Initiative Somatic mutation testing for prediction of treatment response in patients with solid tumours:
More informationIntroduction to Exploratory Data Analysis
Introduction to Exploratory Data Analysis A SpaceStat Software Tutorial Copyright 2013, BioMedware, Inc. (www.biomedware.com). All rights reserved. SpaceStat and BioMedware are trademarks of BioMedware,
More informationA Guide for the Utilization of HIRA National Patient Samples. Logyoung Kim, Jee-Ae Kim, Sanghyun Kim. Health Insurance Review and Assessment Service
A Guide for the Utilization of HIRA National Patient Samples Logyoung Kim, Jee-Ae Kim, Sanghyun Kim (Health Insurance Review and Assessment Service) Jee-Ae Kim (Corresponding author) Senior Research Fellow
More informationHow To Understand Multivariate Models
Neil H. Timm Applied Multivariate Analysis With 42 Figures Springer Contents Preface Acknowledgments List of Tables List of Figures vii ix xix xxiii 1 Introduction 1 1.1 Overview 1 1.2 Multivariate Models
More informationSystematic Reviews and Meta-analyses
Systematic Reviews and Meta-analyses Introduction A systematic review (also called an overview) attempts to summarize the scientific evidence related to treatment, causation, diagnosis, or prognosis of
More informationMethods Commission CLUB DE LA SECURITE DE L INFORMATION FRANÇAIS. 30, rue Pierre Semard, 75009 PARIS
MEHARI 2007 Overview Methods Commission Mehari is a trademark registered by the Clusif CLUB DE LA SECURITE DE L INFORMATION FRANÇAIS 30, rue Pierre Semard, 75009 PARIS Tél.: +33 153 25 08 80 - Fax: +33
More informationDimensionality Reduction: Principal Components Analysis
Dimensionality Reduction: Principal Components Analysis In data mining one often encounters situations where there are a large number of variables in the database. In such situations it is very likely
More informationEHR Databases and Their Role in Health & Innovation
8. New approaches to promoting innovation 8.4 Real-life data and learning from practice to advance innovation See Background Paper 8.4 (BP8_4Data.pdf) The costs of pharmaceutical R&D are high, with clinical
More informationUsing News Articles to Predict Stock Price Movements
Using News Articles to Predict Stock Price Movements Győző Gidófalvi Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 9237 gyozo@cs.ucsd.edu 21, June 15,
More informationQuantitative Methods for Finance
Quantitative Methods for Finance Module 1: The Time Value of Money 1 Learning how to interpret interest rates as required rates of return, discount rates, or opportunity costs. 2 Learning how to explain
More informationMULTIPLE LINEAR REGRESSION ANALYSIS USING MICROSOFT EXCEL. by Michael L. Orlov Chemistry Department, Oregon State University (1996)
MULTIPLE LINEAR REGRESSION ANALYSIS USING MICROSOFT EXCEL by Michael L. Orlov Chemistry Department, Oregon State University (1996) INTRODUCTION In modern science, regression analysis is a necessary part
More informationCancer in Ireland 2013: Annual report of the National Cancer Registry
Cancer in 2013: Annual report of the National Cancer Registry ABBREVIATIONS Acronyms 95% CI 95% confidence interval APC Annual percentage change ASR Age standardised rate (European standard population)
More informationExamining Early Preventive Dental Visits: The North Carolina Experience
Examining Early Preventive Dental Visits: The North Carolina Experience Jessica Y. Lee DDS, MPH, PhD Departments of Pediatric Dentistry & Health Policy and Administration University of North Carolina at
More informationPRACTICAL DATA MINING IN A LARGE UTILITY COMPANY
QÜESTIIÓ, vol. 25, 3, p. 509-520, 2001 PRACTICAL DATA MINING IN A LARGE UTILITY COMPANY GEORGES HÉBRAIL We present in this paper the main applications of data mining techniques at Electricité de France,
More informationGENETIC DATA ANALYSIS
GENETIC DATA ANALYSIS 1 Genetic Data: Future of Personalized Healthcare To achieve personalization in Healthcare, there is a need for more advancements in the field of Genomics. The human genome is made
More information