INTEGRATION OF ELECTRONIC HEALTH RECORDS AND PUBLIC BIOLOGICAL REPOSITORIES ILLUMINATES HUMAN PATHOPHYSIOLOGY AND UNDERLYING MOLECULAR RELATIONSHIPS

Size: px
Start display at page:

Download "INTEGRATION OF ELECTRONIC HEALTH RECORDS AND PUBLIC BIOLOGICAL REPOSITORIES ILLUMINATES HUMAN PATHOPHYSIOLOGY AND UNDERLYING MOLECULAR RELATIONSHIPS"

Transcription

1 INTEGRATION OF ELECTRONIC HEALTH RECORDS AND PUBLIC BIOLOGICAL REPOSITORIES ILLUMINATES HUMAN PATHOPHYSIOLOGY AND UNDERLYING MOLECULAR RELATIONSHIPS A DISSERTATION SUBMITTED TO THE PROGRAM IN BIOMEDICAL INFORMATICS AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY DAVID P. CHEN AUGUST 2011

2 2011 by David Pei-Ann Chen. All Rights Reserved. Re-distributed by Stanford University under license with the author. This work is licensed under a Creative Commons Attribution- Noncommercial 3.0 United States License. This dissertation is online at: ii

3 I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Atul Butte, Primary Adviser I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Russ Altman I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Michael Walker Approved for the Stanford University Committee on Graduate Studies. Patricia J. Gumport, Vice Provost Graduate Education This signature page was generated electronically upon submission of this dissertation in electronic format. An original signed hard copy of the signature page is on file in University Archives. iii

4 iv

5 ABSTRACT Secondary use of electronic health record (EHR) data has the potential to unlock novel insight into human pathophysiology. While EHR data has often been used in retrospective studies, management of public health, and to improve patient safety, its use in discovering underlying molecular mechanisms of human disease and pathophysiology has been limited. Much of this can be attributed to the differing priorities between healthcare providers and basic biological researchers. The advent of biobanks that collect physiological measurements as well as tissue samples and molecular measurements promises to address this issue. However, the sheer number of different biological and clinical measurement modalities hinders the generation of a truly complete view of the human organism. The increased adoption of EHRs as well as growing biological data repositories enables researchers to answer biological questions applicable to the human population. The goal is not to treat humans as experimental organisms, but rather to gain as much knowledge as possible from every patient seen. By viewing EHRs as a repository of perturbations and their associated physiological consequences we can begin to design experiments that leverage EHR data to generate hypotheses that can be further evaluated. This thesis aims to describe methods to summarize EHR biomarker data in a systematic way to enable downstream analysis as well as methods for integrating EHR data and disparate biological data. I will describe the creation of the clinarray and its application to specific disease populations to differentiate patients by severity and to discover latent physiological factors associated with disease. I will also describe v

6 how to aggregate and analyze clinarrays from across the EHR to build models of aging. Finally I will discuss the use of diseases to integrate EHR data with gene expression data from a disparate biological data source to discover genes related to aging and to generate hypotheses for relationships between biomarkers and genes. The integration of readily available clinical and biological data promises to improve our understanding of phenomics without impacting patient care and adding an unnecessary burden to the healthcare system. It is important for biological research to leverage the increased amount of molecular and environmental data stored in EHRs to build a more complete view of the human organism. vi

7 ACKNOWLEDGEMENTS Throughout my graduate career I have had the pleasure and privilege to work with many extraordinary individuals in the Biomedical Informatics Training Program and the greater Stanford community that have made my time here both intellectually stimulating and life-changing. I would first like to recognize and thank some of my fellow students in the Biomedical Informatics Training Program: Alex Morgan, Sarah Aerni, Noah Zimmerman, Marina Sirota, Chirag Patel, and Joel Dudley for their collaboration, stimulating conversation, advice, and support throughout these years. I would like to thank the following members of the Butte lab who have created and maintained the various repositories and computational resources that have enabled my research: Rong Chen and Alex Skrenchuk. I would also like to thank Susan Aptekar for making the Butte lab run smoothly. I would like to thank my academic advisor, David Paik, for keeping me on track during the early years of my graduate career. I would also like to thank the members of my thesis defense committee for their time and academic advice: Russ Altman, Michael Walker, Henry Lowe, and Gary Peltz. I would like to thank Kenneth Weinberg for his collaboration and for bringing a critical biological perspective to computational results. I would like to especially thank the many members of the Biomedical Informatics community at Stanford without whom nothing would occur smoothly or on time: Larry Fagan, Betty Cheng, and Mary Jeanne Oliva. I would also like to thank Darlene Vian, who while no longer with us, has an enduring legacy in the BMI program. I would also like to thank my close friends and family that have supported me throughout these years. First, I would like vii

8 to thank my partner Justin for always being present, loving, and supportive. Next, I d like to thank my parents Tom and Emily, and my brother Brian for the examples they set and a lifetime of love, guidance, encouragement, and support. I would also like to thank my Aunt Blossom, Uncle Frank, and cousins Peimei and Peili for always being there for me. Lastly, I would like to thank my mentor and advisor Atul Butte. From the moment I met Atul I knew that he was who I wanted to emulate. Thank you for being such a great person, mentor, and life coach., viii

9 TABLE OF CONTENTS List of Tables...xii List of Figures...xiii Chapter 1: Natural Experiments Using Electronic Medical Records Enable Biological Discoveries...1 Chapter 2: The Current Paradigm for Using Electronic Health Records for Molecular Discoveries...12 EHR Use in Biobanks, Academic Research Institutions, and Primary Care Institutions...12 Hurdles of Using EHR Data: Data Quality and Privacy...16 Chapter 3: Clinical Arrays of Laboratory Measures, or Clinarrays, Built from an Electronic Health Record Enable Disease Subtyping by Severity...20 Methods...22 Data Collection, Processing and the Clinarray...22 Metric for Severity...23 Hierarchical Clustering of Patients and Lab Tests...24 Normalization...24 Results...27 Discussion...29 Chapter 4: Latent Physiological Factors of Complex Human Diseases Revealed by Independent Component Analysis of Clinarrays...33 Methods...35 Building Clinarrays from Patient Lab Tests...35 Independent Component Analysis...36 Extracting Physiological Factors...37 Results...39 Creation of Disease-Specific Clinarrays...39 ix

10 Independent Component Analysis of Clinarrays...40 Discussion...40 Conclusions...45 Chapter 5: Validating Pathophysiological Models of Aging Using Clinical Electronic Medical Records...47 Methods and Results...50 Model Building Using NHANES...51 Model Evaluation...55 Model Validation on Clinical Samples...57 Discussion...60 Conclusion...65 Chapter 6: Novel Integration of Hospital Electronic Medical Records and Gene Expression Measurements to Identify Genetic Markers of Maturation...66 Methods...69 Data Collection and Processing...69 Finding Biomarkers for Maturation Using Analysis of Variance...72 Using Diseases to Model Maturation...72 Results...74 Clinical Biomarker for Maturation...74 Finding Maturation and Aging Related Genes...76 Discussion...80 Chapter 7: Bedside to Bench Reverse Translation Enables Discovery of Phenomic Relationships Between Immune Related Molecules and Pathophysiology...84 Methods...88 Gene Expression Data and Annotation...88 Mapping UMLS CUIs To ICD-9-CM Codes...90 Generating Disease-Biomarker State Vectors...90 Weighted Least Squares Regression Between Biomarkers and Genes...92 Multiple Hypothesis Testing...92 Results...93 x

11 Evaluation...97 Discussion...99 Chapter 9: Conclusion xi

12 LIST OF TABLES Table 1: Sample biological questions and their associated problem solving methods...9 Table 2: Diseases and the number of patients used in our clinarray study...23 Table 3: Population after data pruning...27 Table 4: Number of patients and biomarkers remaining after pruning...38 Table 5: Significant biomarkers after ICA analysis...39 Table 6: Top 10 most informative biomarkers...53 Table 7: Results between predicted and expected chronological age...57 Table 8: Diseases and relevant GEO data sets...70 Table 9: Top aging related genes...80 Table 10. MamPhEA Enrichment Results...98 Table 11. Biomarkers compared betwee IL-4R -/- and Balb/c mice...99 xii

13 LIST OF FIGURES Figure 1: Information flow between clinicians and basic science researchers...5 Figure 2: Example Clinarrays...26 Figure 3: Hierarchical cluster of laboratory tests...28 Figure 4: Clustered clinarrays for cystic fibrosis...29 Figure 5: Visual schematic of the ICA model of disease pathophysiology...35 Figure 6: Schematic of the model building and prediction pipeline...51 Figure 7: Aging model feature selection...54 Figure 8: Actual age versus predicted age...56 Figure 9: Comparisons between error distributions across datasets...60 Figure 10: Distribution of laboratory measurements at different ages...75 Figure 11: Comparison of different rates of decay across 11 diseases and the baseline...78 Figure 12: Schematic for integrating clinical and gene expression data...88 Figure 13: Clinarray creation...91 Figure 14: Relationship between Neutrophil(%) and IL-4R...96 Figure 15: Gene-Biomarker networks...97 xiii

14

15 CHAPTER 1: NATURAL EXPERIMENTS USING ELECTRONIC MEDICAL RECORDS ENABLE BIOLOGICAL DISCOVERIES Imagine a major primate core facility at a top-tier academic medical school. This primate facility has studied 60 thousand primates over the past seven years, modeling over 8000 different phenotypes. Now imagine that this facility is linked to a data warehouse, where 7 million quantitative measurements of nearly 1500 different types have already been made, and 5 million new measurements are made, time-stamped, and stored each year. The quality of the measurements made on these primates is ensured, and repeatedly checked and verified. More recently, this facility now records all interventions ordered and performed on these primates, including medication doses administered, imaging studies conducted, and procedures performed, with 10 million recorded interventions of 2800 different types, and 20 million new interventions timestamped and recorded each year. Though easily imaginable, this primate facility would dwarf nearly every existing animal care facility in the United States, in terms of breadth, depth, and relevance. The basic research that could be enabled by this facility would already be boundless. But the most amazing component of this facility is that it covers humans. In fact, this facility is exactly what is provided at an academic medical hospital. For over thirty years patient phenotypes, measurements, and interventions have been collected by physicians, nurses, healthcare technicians, social workers, and other allied health professions, and stored in electronic health records (EHRs), the depth and 1

16 breadth of which grows exponentially year upon year. While the primary use of the data has and continues to be for clinical documentation, patient management, decision support, and reimbursement 1, the secondary uses have historically been limited to performing retrospective clinical studies, monitoring public health, and improving technologies that impact patient care and safety 2, 3. Due to the increasing adoption of the EHR and the ever-growing amounts of biological data being captured, there now exists the important potential for reuse of clinical data for biological research relevant to humans. By realizing that data stored in EHRs can be used for natural experiments and by developing new problem solving methods, we can begin to take advantage of EHR data to better understand underlying human biology. Deviating from the primate core facility analogy, the goal is not to treat humans as experimental organisms, but rather to gain as much knowledge as possible from every patient seen. Natural experiments are observational studies where the experimenter observes how nature impacts subjects, rather than methodically introducing perturbations, as is the norm with controlled experiments 4. The main difference between a natural experiment and an observational study is that one can only claim that assignment to specific groups is as if random 5. Despite this seeming limitation, major biomedical discoveries have resulted from natural experiments. The classic example of such a natural experiment is John Snow s 1855 study that linked sewage dumped into the Thames river to cholera outbreaks in London 6. Similarly, the discovery of blood groups that provide resistance to smallpox can be thought of as a natural experiment. Four decades ago, Vogel and colleagues measured rural Indian villagers blood groups and observed who was more susceptible to the disease in a natural smallpox 2

17 epidemic 7. Epidemiologists, economists, and social scientists have used the concept of the natural experiment to infer relationships between variables that often cannot be manipulated or would be immoral and illegal to do so 8. One cannot simply inject individuals with most viruses and observe their response. Whereas the smallpox study was done with one disease with one physiological marker, the data stored in an EHR represents thousands of physiological changes from thousands of perturbations arising from different conditions, interventions, demographics, and environmental effects. Almost entirely, these perturbations are not related to any experimental protocol and with the right precautions, could arguably be considered as if random. The voluminous amounts of data stored in an EHR consist of demographic information, physicians notes, laboratory values, imaging data, pathology reports, pharmacological data, insurance claims, and much more. Included within these data are perturbations associated with an individual. As an example, SNOMED pathology or ICD-9-CM codes associated with a medical visit could represent the patient s condition. Similarly, smoking status may be found in a physician s note 9. As an illustration of the breadth of data that is available, laboratory data collected from Lucile Packard Children s Hospital and Stanford Hospitals and Clinics consists of over 3000 different laboratory tests. Much of this data is seldom used after a patient encounter. Furthermore, an increasing amount of molecular data is available to individuals outside the clinical environment. These technologies include genome profiling, provided by companies like 23andMe, and complete genome sequencing, which is moving quickly towards consumer uptake. While these types of data may reside in separate databases, the virtual unified view of the EHR encapsulates these 3

18 different elements. Even though complete integration between diverse measurements of human physiology is currently lacking, the current data stored in EHRs suggest what can be accomplished. As an example, we can consider the act of bone marrow donation (ICD-9-CM code V59.3) as a differentiating factor that can separate a paired population into two groups, pre-donation and post donation. Using this design we can begin to examine characteristics of cell populations that differ between pre and post donation. In fact we see that the percent of neutrophil in the blood increases post donation while platelet count decreases (Figure 1b). While these are only two biological examples that show significant differences, we can extend this analysis to systematically compare other types of biomarkers stored in an EHR to see if there are more differences that may be associated with bone marrow donation. 4

19 Figure 1: Information flow between clinicians and basic science researchers a) Biological areas that may benefit from cross-domain analyses of de-identified EHR data. b) Bone marrow donor data used to examine differences in cell populations between paired individuals pre and post donation. 5

20 A key strength of using EHR data for natural experimentation is the ability to ask questions that span clinical specialties (Figure 1a). A prime example of such a study was one done by Andrey Rzhetsky and colleagues. In this study 1.5 million patient records from the Columbia University Medical Center were used to infer genetic overlap between 161 disorders 10. The results of their study showed many known and unknown correlations between diseases and suggest that autism, bipolar disorder, and schizophrenia share significant genetic overlap. The study of human aging is another example of a research area that would benefit from aggregating data across different clinical departments. Most biological studies of aging are currently done in model organisms including yeast, worm, and mouse. As a consequence, many of the discoveries have questionable relevance to humans. Using human physiological data, we and others have built models of human development and aging 11, 12. A combination of alkaline phosphatase, creatinine, hematocrit, and mean red cell volume are the best predictors of male development between years of age while alkaline phosphatase, creatinine, and total serum globulin are the best for female development. Other examples of this methodology include phenomewide association studies conducted by Vanderbilt to detect associations between single nucleotide polymorphism and disease 13 and genome-wide association studies from the Mayo Clinic examining red blood cell traits 14, both of which use EHR data and genotyped individuals. As molecular diagnostics are increasingly incorporated into the standard of care, experiments that were previously accomplished in controlled settings can now be done 6

21 by analyzing EHR data. For example, studies have been published showing how gender and genetic differences affect white blood cell counts, distribution of cell types and other physiological biomarkers 15. Additionally, investigators have probed the relationship between clinical imaging features and gene expression 16. It is therefore ideal that more investigators become informed of the types of analyses that can be done. It is also crucial that the infrastructure and resources are developed for these types of analyses in those clinical settings that have already adopted EHRs. While there are many challenges in using EHR data to its fullest potential, including data quality and privacy concerns which will be discussed in subsequent chapters, perhaps the most significant challenge is determining the new kinds of questions that are enabled by this data. Knowledge of specific medical domain or domains of interest, as well as the types of EHR data available is essential. This differs from the traditional scientific approach in which a hypothesis is first generated, followed by data acquisition and analysis. A researcher who is planning a natural experiment with EHR data must first know the types, amount, and coverage of the data collected. Next the researcher can, given domain expertise or consultation with a domain expert, propose biological questions that could be tested with this data. The researcher must make sure that selected groups are otherwise representative and equal and if not, take into consideration the covariates involved. Finally the researcher can test their hypotheses on the data with the appropriate quantitative methods. While results may show statistically significant relationships, it is still crucial that controlled experiments are subsequently used to validate any observations and to fully understand the mechanism. Table 1 provides a glimpse at possible questions and problem solving 7

22 methods that take advantage of EHR data. The advantage of using natural experiments is that they can provide unexpected relationships that are not just an incremental gain on the knowledge of human biology, but can fundamentally change what we know about it. In order to prepare the next generation of researchers to fully use the potential of EHRs, better collaborative dialogue between clinicians and scientists must be achieved. Scientists need to understand types, amounts, and pitfalls of EHR data available in order to propose valid questions. Clinicians need to be actively engaged in the research endeavor, to teach investigators about diseases and how they affect humans, and to facilitate the accessibility to clinical data. Another significant consideration for clinicians is to be cognizant of and to improve the data quality as it is entered into EHRs in order to facilitate downstream analyses. Training programs that are proponents of this systems medicine approach to research should be adapted to teach graduate students about the clinical environment, perhaps including morning rounds or even weeklong rotations in a hospital. These programs should also have greater medical student participation and interaction so that collaborative dialogue can be fostered. From a sociological perspective, it is often the case that collaboration leads to questions and resources that would be difficult to come by otherwise 17. 8

23 Question How do genotypes in HLA regions affect lifespan? How does the size of the hypothalamus affect BMI? What are the relationships between gene expression and physiological biomarkers (e.g. red blood cell count, blood urea nitrogen, etc.) What chronic diseases accelerate human development and aging? What non-chemotherapeutic drug significantly reduces WBC in the short term? Is there any correlation between physiological pain and laboratory measurements? Method Examine genotype data on HLA regions from normal healthy blood marrow donors to see if any genotype type is over-represented in an older population. Look for the relationship between genotype and immune senescence. Retrieve all patients that have an MRI scan for their hypothalamus and have BMI recorded. Use methods similar to Cypress and colleagues which linked brown adipose tissue to BMI 18. Examine the relationship between physiological biomarker differences and gene expression differences using disease groups as an intermediary to join clinical and experimental data 19. Build a disease specific aging model using patients diagnosed with specific ICD-9-CM codes 20. Remove chemotherapeutic drugs from the drug corpus and examine differences between white blood count prior-to and after administration of all other drugs. Examine correlations between pain scores and measured laboratory values. Table 1: Sample biological questions and their associated problem solving methods In the following chapters I will present examples of informatics methods that allow for the aggregation, evaluation, and use of electronic health records to better understand human health and underlying pathophysiology. In chapter two I will give a brief overview of the current state of integrating electronic health records and molecular data for reverse translational medicine as well as examining some potential hurdles of using EHR data. In chapter three I will discuss the creation of the clinarray, a 9

24 representation of individuals biomarker data aggregated across an EHR, and its application to differentiate individuals by disease severity. The creation of the clinarray as a virtual platform enables many methods that have been widely used in molecular data analyses to be applied to clinical data. We leverage this platform in chapter four, where I will discuss the use of independent component analysis to disease specific aggregation of clinarrays for the discovery of known and unknown physiological factors. The goal in this chapter is twofold. The first is to show that methods used in molecular research can be applied to clinical data and second to shed new light on physiological processes for multifactorial and complex diseases. Whereas chapters three and four focus on specific disease conditions, the remaining chapters will discuss the use of EHR data in aggregate from many different patients across many different conditions. While I ve shown that molecular methods can be applied to clinical data, the question of consistency in the results of data analyses using clinical data still remain. In chapter five I address this concern in the context of age prediction. I will discuss the use of normal biomarker values found in EHR data to validate models of maturation that were built using data from the National Health and Nutrition Examination Survey. The final two chapters will present examples of problem solving methods that incorporate molecular data with clinical data to shed new light on physiological processes and their underlying mechanisms. Both of these chapters use the idea that diseases are perturbations of human physiology to derive relationships between physiological changes and molecular changes. Chapter six will discuss methods to intersect EHR data and gene expression data from a nonintersecting population to find genes related to maturation and aging. Chapter seven 10

25 will describe methods for the integration of EHR data and gene expression data to discover novel relationships between molecules and immune related pathophysiology. It is my hope that the following chapters will lay the groundwork for the understanding of the value of electronic health records and present problem solving methods that have enabled us to discover new clinical and molecular features of diseases. 11

26 CHAPTER 2: THE CURRENT PARADIGM FOR USING ELECTRONIC HEALTH RECORDS FOR MOLECULAR DISCOVERIES EHR use in Biobanks, Academic Research Institutions, and Primary Care Institutions The most comprehensive stores of coupled clinical and molecular data currently reside in biobanks, repositories of biological samples that are connected to clinical data, the majority of which are found at academic research institutions. For example, Vanderbilt University, Mayo Clinic, Marshfield Clinic, Northwestern University, and the Group Health Cooperative University of Washington, are biobanks that have recently become involved in the emerge network, an NHGRI funded consortium that explores the utility of DNA repositories linked to EHRs 21, 22. The data stored in these biobanks have enabled numerous genome-wide association studies (GWAS) that examine the relationship between clinical features and single nucleotide polymorphisms (SNPs) that may affect them. In 2009 Minerva Carrasquillo and colleagues from Mayo Clinic determined that a genetic variant in PCDH11X is associated with susceptibility to late onset Alzheimer s disease 23. In this study, 313,504 SNPs were examined for 844 cases and 1,255 controls aggregated from clinically ascertained individuals from two different Mayo clinics and the Mayo brain bank. Other examples include the investigation into blood cell traits associated with various SNPs 14. Blood cell trait data was collected from EHR records spanning 15 years for 3,012 patients. Features of the EHR, including billing codes were used to 12

27 control for hematological diseases and comorbidities that may affect downstream analyses. The results identified 11 significant SNPs within 4 genomic loci (HBS1L/MYB, TMPRSS6, HFE, and SLC17A1) associated to 4 blood cell traits (RBC, MCV, MCH, and MCHC). Three of these four loci were previously identified in a GWAS using a much larger number of participants. While the majority of these studies have dealt with blood markers and specific diseases one can envision these methods being applied to other clinical disciplines like radiology and pathology in which researchers can examine imaging and tissue features in conjunction with polymorphisms. The lack of studies that integrate these domains and high throughput molecular measurements stem from difficulties in extracting and digitizing features from radiological images and pathology reports. While radiological images are stored in Picture Archiving and Communication Systems, the interpretations of the images, once they are read, are usually the only information stored in an EHR that is easily accessible. These interpretations, while useful for diagnosis and treatment, are the tip of the iceberg in regards to the information that can be extracted from raw image files. The storage of pathology data in electronic format lags behind even radiology as implementation of systems that store such data are currently in development. However, with the digitalization and ability to extract richer features from these commonly used clinical diagnostic tools, their integration into GWAS studies will provide greater insight into the physiological effects of polymorphisms. Although biobanks have an enormous amount of genetic, clinical, and sometimes 13

28 environmental data, the diversity of molecular measurements is somewhat lacking. Most of the emphasis has been placed on the collection of genomic information and storage of tissue samples for future analyses. There exist many other molecular modalities that are continuously being developed and used for basic molecular research that provide different perspectives on disease and pathophysiology. The lack of infrastructure and institutional support for the creation of more academic biobanks, in which high throughput measurements are taken in tandem with clinical measurements gathered from standard healthcare practices, has not precluded these academic research hospitals and institutions from taking advantage of EHRs. While at these institutions, there are no omnibus plans to gather molecular data from patients, individual investigators have collected and used treasure troves of molecular data from research projects and clinical trials that include: genetic sequence, gene expression, mass spectrometry, flow cytometry, microrna, just to name a few. These data, in conjunction with clinical EHR data, have enabled researchers to examine the relationships between clinical and molecular features beyond that of genetic sequence. In 2007 Eran Segal showed, in an example of non-invasive molecular profiling, that 28 imaging traits from lunch CT scans can be predictive of 78% of the global lung cancer gene expression profile 16. While recently there have been many GWAS studies examining blood cell traits and SNPs, Whitney and colleagues in 2003 profiled these traits as well as circadian cycles in the context of gender and age using gene expression microarrays 15. The greatest amount of EHR data collected is by primary care facilities that are not 14

29 related to academic research institutions. These institutions solely focus on the care and treatment of patients. However, there exist primary care facilities that are using their ability to gather clinical, molecular, and environmental information to better understand disease processes and the effects of treatments. Kaiser Permanente s Research Program on Genes, Environment, and Health is a prime example of this 24. Their current plan is to collect clinical data from EHRs, environmental exposure and behavioral data, and genetic information for 500,000 consenting members to examine the genetic and environmental factors that influence common diseases. As of 2010, over 130,000 members have consented and projects studying bipolar disorder among different ethnicities and prostate cancer in African American men are now being initiated. While the prevalence of EHR adoption in the United States is relatively low (~1.5 percent of all U.S. hospitals have comprehensive electronic records 1 ), many European countries have implemented nationwide EHRs. Denmark, for example, has a national health network, MedCom, which is used by over three quarters of the healthcare sector, more than 5,000 different organizations 25. Over 98 percent of primary care practices use clinical EHRs. Patient data, including laboratory information and pharmacy orders, are easily accessible to healthcare providers as well as patients. The connectivity of this amount of information can potentially be an enabling factor for molecular research when molecular diagnostics become the standard of care. 15

30 Hurdles of Using EHR Data: Data Quality and Privacy While the discussion about EHR data quality and the privacy concerns regarding the use of EHR data goes beyond the scope of this thesis, I will attempt to briefly outline the major concerns and how they are being addressed in practice. Much of the concern with EHR data quality involves incorrect data entry and coding, institution-specific coding practices, the quality of natural language processing with regards to unstructured text and uncontrolled terminologies, covariates including comorbidities, drug usage, procedures, environmental effects, socioeconomic groups, etc. The amount of data stored in an EHR plays an important role when dealing with some of these data quality issues. Due to the large amount of data, researchers can be very stringent when coming up with exclusionary rules. For example, Kullo and Colleagues, when examining the relationships between SNPs and blood cell traits, developed an algorithm that uses billing codes and natural language processing of unstructured clinical notes to exclude data affected by comorbidities, medications, or blood loss 14. As examples, Kullo, using International Classification of Disease 9 Clinical Management, procedural ICD-9, and Current Procedural Terminology codes, excluded data from patients with comorbidities that included hematological and solidorgan malignancies, hereditary anemias, solid-organ malignancies, cirrhosis, etc., and medications that included chemotherapeutic agents and immunosuppressive drugs. As a result, 12,864 values in 1,165 patients were excluded from the original 35,159 RBC trait values. This resulted in excluding 200 patients out of the original 3,411. The heuristics that are used to exclude data are domain dependent and require expertise 16

31 with the question that is being asked. The large amounts of data also enable methods like propensity score matching which attempts to derive similar populations based on selected covariates. With regards to values of data, the often non-normal distributions 26, and the potential for outliers found in clinical measurements, analyses should focus on distributions that more closely fit the data and more robust statistical measurements like the median and quantiles rather than the mean and variance. Privacy concerns remain a significiant issue when dealing with use of EHR data. However, institutions have shown that with informed consent the population size of people who allow their health information to be used for research is not insubstantial (up to 75,000 individuals at Vanderbilt to over 130,000 at Kaiser Permanente). There are multiple safe guards in place at many academic and non-academic institutions including the Health Insurance Portability and Accountability Act and Institutional Review Boards that ensure that data from people who opt-in to these programs are being used appropriately. De-identified data also represents a valuable data source that can be used for molecular research. An example of such a data source is NHANES, the National Health and Nutrition Examination Survey, a biannual survey conducted by the Centers for Disease Control and Prevention on a sample of the noninstitutionalized populations of the United States 27. This data set consists of behavioral, environmental, and clinical data that is freely available for download. Access to matched genetic information must be separately requested. Private institutions have also started to use and release de-identified health data. The Heritage Health Prize, for example, is a competition that aims to use aggregated de-identified medical claims data for 100,000 individuals to predict hospital admissions. 17

32 The drawbacks of using de-identified data, however, include the ability to re-identify the data if the de-identification process is not complete enough. This data will undoubtedly be less complete due to exclusion of certain data types, like physicians notes, that are extremely difficult to de-identify. De-identification may also lead to abstraction of data. For example, rather than using specific ICD9-CM codes, the deidentification process may abstract them into high level terms, such as disease groups which may or may not be suitable to a particular study. While technical, sociological, and legal obstacles prevent the use of EHRs to their fullest potential, we believe that many of them can be surmounted. New methods, such as the text mining of physician notes to generate useful computational data 28, 29, continue to be developed and tested on clinical data. Due to the concerns of healthcare privacy, new methodologies for the de-identification of EHR records need to be prioritized. Institutions should maintain repositories and implement procedures to streamline the use of de-identified data in accordance with the Health Insurance Portability and Accountability Act and institutional requirements. Institutions need to build systems that facilitate scientific inquiry and data retrieval, examples of which include the Stanford Translational Research Integrated Database Environment 30 and i2b2 31. Healthcare institutions such as Vanderbilt 32, federal institutions like the Department of Health and Human services and Centers for Disease Control, and private corporations like Cerner have started to collect, use and make available clinical and health data for research. Funding mechanisms like the Clinical and Translational Science Awards promise to democratize this approach to others 33. National structures that aggregate de-identified clinical data, similar to NCBI Gene Expression Omnibus 18

33 for gene expression experiments 34 or dbgap for genome-wide association studies 35, would enable more scientists to use this kind of data for research. As EHR data becomes more accessible, higher quality, and more rich, it promises to contribute significantly to biological research. The ability to integrate EHR data for biological research will fundamentally change how human biological research is performed. It is imperative for researchers to recognize this potential value and work to put into place policies and procedures that facilitate the use of EHR data. As the United States moves from 1.5% 1 to 100% adoption of EHRs, along with the rest of the world, and as more individualized health data become available, scientific leveraging of this data will shed new light on basic science and ultimately improve human health. 19

34 CHAPTER 3: CLINICAL ARRAYS OF LABORATORY MEASURES, OR CLINARRAYS, BUILT FROM AN ELECTRONIC HEALTH RECORD ENABLE DISEASE SUBTYPING BY SEVERITY This work has been done in collaboration with Susan Weber, Philip Constantinou, Todd Ferris and Henry Lowe. Susan provided clinical laboratory data from the Stanford Translational Integrated Database Environment system under the supervision of Philip and leadership of Henry. Todd ensured de-identification met institutional requirements. This work has been published: Chen DP, Weber SC, Constantinou PS, Ferris TA, Lowe HJ, Butte AJ (2007) Clinical Arrays of Laboratory Measures, or Clinarrays, Built from an Electronic Health Record Enable Disease Subtyping by Severity. American Medical Informatics Association The conceptualization and application of biological methods and techniques to clinical data can help narrow the gap between basic science and their clinical relevance as espoused as the underpinnings of translational research. For the past decade, a major modality of research in the biosciences has been microarray technology. Microarrays and gene expression profiling have been used to gain valuable insight into biological processes through the measurement of tens of thousands of genes and have paved the way for novel prognostic tests and disease-subclass determination 36. This platform has provided the ability to quantify gene expression under differing experimental conditions that can be used by various algorithms to classify, learn or predict biologically relevant processes. 20

35 In 1999, Todd Golub and colleagues showed that supervised clustering of microarray samples could distinguish between acute myeloid leukemia and acute lymphoblastic leukemia 37. Alizadeh and colleagues used an unsupervised algorithm to discover subtypes with differing severities from samples of a single disorder, B-cell lymphomas, the difference of which can directly affect clinical outcome 38. Laura van t Veer and colleagues used supervised classification of gene expression to determine a signature that is indicative of the clinical outcome of breast cancer 39. More recently, Nathan Price and colleagues used gene expression data to create a highly accurate two-gene classifier for differentiating between gastrointestinal stromal tumor and leiomyosarcomas 40. The application of supervised and unsupervised algorithms to high-bandwidth gene expression data have had a direct impact at both the bench and bedside to further our understanding and treatment of singular human diseases 41. However, diseases like cystic fibrosis or Crohn s disease, that have environmental influences or social influences or both, make classification based on microarray data imprecise 42. Hence, the prediction of clinical outcome of patients with more complex diseases must examine other variables that are often considered qualitatively. An often-overlooked metric that is a direct measurement of phenotypic information is clinical laboratory data. Stoll and colleagues previously created physiological profiles from measurements in rats, but this approach has yet to be translated to humans 43. While data collected during clinical care were prone to transcription errors in the past, the movement towards using electronic medical records (EMR) has improved the data quality due to elimination of transcription and omission errors 44. In this paper we propose the aggregation of clinical laboratory tests gathered from EMR data on a per- 21

36 patient basis to create what we term a clinarray, enabling quantitative methods traditionally used on gene expression microarrays to now be applied to clinical data. The clinarray is a platform that allows for the quantification of phenotypic expression, across a panel of pathophysiological measurements through clinical laboratory tests, for a patient in the same way that the microarray is a platform used to quantify genome-wide expression for an experiment. We first show that we can apply unsupervised clustering methods to aggregations of clinarrays to retrieve pathophysiologically-relevant laboratory groupings. We then show that unsupervised methods used in microarray analysis can be directly applied to clinarrays to distinguish patients with severe and less-severe forms of cystic fibrosis and Crohn s disease. Methods Data collection, processing and the clinarray Quantitative clinical laboratory data, consisting of 317,338 measurements across 553 distinct lab tests, originally obtained at the Lucile Packard Children s Hospital, were collected in a de-identified manner from the Stanford Translational Research Integrated Database Environment (STRIDE). In total, this data represented 966 patients across all ages that were diagnosed with one or more of 3 chronic diseases (Table 2). The use of de-identified clinical laboratory data in this manner was approved by the Institutional Review Board of the Stanford University School of Medicine. 22

37 Diseases Number of Patients Crohn s disease 154 Cystic fibrosis 449 Down Syndrome 366 Table 2: Diseases and the number of patients used in our clinarray study We averaged the values for each individual lab test across all time points subsequent to a patient being diagnosed with any of the three diseases. Each average represents one value in what we term the clinarray. The clinarray thus represents the collection of average laboratory values for one patient. Metric for Severity Measurements of severity have often been derived from direct clinical or pathological examination of patients or patient samples. The drawback of using de-identified quantitative laboratory measurements is that direct indicators of disease severity are not available for use. However, as it has been previously shown that the number of blood samples drawn for laboratory tests increases for intensive care patients with more severe illness, based on APACHE III scores 45, we believe that we can calculate a similar proxy for severity in using de-identified laboratory test data. Our proxy for the severity of chronic disease is the average number of laboratory tests measured on a patient per year after their first recorded diagnosis of a disease. For each patient, we sum the number of laboratory tests measured on that patient regardless of type. We then divide by the number of years over which the patient has had laboratory measurements taken. We propose that the greater the number of 23

38 laboratory tests measured on a patient per year, the more severe the form of chronic disease. We associate this severity score with each patient. Hierarchical clustering of patients and lab tests After construction, all clinarrays were grouped by disease type. Clinarrays for patients with more than one disorder were considered in each disorder. For each disease, we created a disease-specific matrix in which columns represented individual clinarrays and rows represented laboratory tests. Each cell represented a clinarray value for that specific patient/laboratory pair. Normalization For each disease-specific matrix, we normalized the values by laboratory type. We calculated the mean measurements for each laboratory among all clinarrays. We then assigned a z-score for each cell in our matrix by calculating the number of standard deviations a particular laboratory/patient clinarray value was from their respective laboratory mean. Any values more than three standard deviations were set to three standard deviations. We then removed labs if no patients had the lab measured (Figure 2). We used the normalized laboratory values to examine the coherence of applying hierarchical clustering to laboratory tests across the clinarrays. We calculated pairwise Pearson s correlation coefficients (cor) as a measure of similarity between laboratory tests within each disease-specific matrix 46. As our disease-specific matrix is sparse, correlations between individual clinarrays may not always be possible. To 24

39 rectify this, we pruned each disease-specific matrix by removing clinarrays that had fewer than three overlapping laboratory tests with all other clinarrays, thereby yielding a disease-specific matrix in which all correlations between clinarrays were meaningful. We then removed any laboratory test missing values in 20% or more clinarrays, in each disease-specific matrix. The resulting disease-specific matrices were fairly dense. We then applied hierarchical clustering algorithms using average agglomeration methods to examine the clustering of laboratory tests and to distinguish between patients with differing disease severities. We first clustered laboratory tests across each disease-specific matrix, with a distance measure of 1 minus the correlation coefficient, where negative correlations were considered as zero. Clusters of laboratory tests were then manually examined. 25

40 Figure 2: Example Clinarrays Top: Clinarrays from patients with cystic fibrosis. Columns correspond to patients as represented by clinarrays and rows represent laboratory tests. Any available laboratory studies measured in no cystic fibrosis patients were removed. Clinarrays missing a specific laboratory test measurement are white. Gray scale indicates the degree of deviation from the mean. Bottom: Magnified portion of the matrix. 26

41 We then hierarchically clustered each disease-specific matrix to search for natural subtypes of disease. The similarity of clinarrays was again computed as 1 minus the correlation coefficient, where negative correlations were considered as zero. We then examined major subtypes for each disease by comparing the severity scores assigned to patients found in each cluster, to assess whether the major clusters significantly distinguished between patients with differing disease severities. Results Three disease-specific matrices were created with rows representing laboratories and columns representing clinarrays. We pruned our matrices as described above, which removed a number of patients and laboratories (Table 3). Diseases # patients after pruning Crohn s disease 141 (92%) 29 Cystic fibrosis 352 (78%) 32 Down Syndrome 320 (87%) 9 Table 3: Population after data pruning # of labs at 80% threshold Diseases, number of patients after pruning, and number of laboratory tests for which have at least 80% of patients with measurements We first clustered laboratory tests using the cystic fibrosis disease matrix by correlating measurements of labs across all clinarrays to see if logical and coherent clusters could be retrieved (Figure 3). As expected, liver function tests clustered together, as did blood markers. 27

42 Figure 3: Hierarchical cluster of laboratory tests We next examined whether clustering patients by correlating clinarrays yielded significant subtypes of disease, and whether these subtypes corresponded to patients with differing disease severity. After calculating a severity score for each patient, we applied hierarchical clustering for each disease-specific matrix with average agglomeration to cluster patients based on their clinarrays as described above. (Figure 4) The resulting hierarchical clustering of patients broadly demonstrated two subtypes of disease in each of the three chronic diseases. We retrieved the severity score for all patients within both subtypes and applied the Wilcoxon test to determine if there was any significant difference between the two groups. We find that the patients in the two discovered disease subtypes have statistically significant differences in severity of cystic fibrosis (mean severe = , mean less-severe = 50.81, p = 4.29 x 10-9 ) as 28

The NIH Roadmap: Re-Engineering the Clinical Research Enterprise

The NIH Roadmap: Re-Engineering the Clinical Research Enterprise NIH BACKGROUNDER National Institutes of Health The NIH Roadmap: Re-Engineering the Clinical Research Enterprise Clinical research is the linchpin of the nation s biomedical research enterprise. Before

More information

A Career in Pediatric Hematology-Oncology? Think About It...

A Career in Pediatric Hematology-Oncology? Think About It... A Career in Pediatric Hematology-Oncology? Think About It... What does a pediatric hematologist-oncologist do? What kind of training is necessary? Is there a future need for specialists in this area? T

More information

How To Change Medicine

How To Change Medicine P4 Medicine: Personalized, Predictive, Preventive, Participatory A Change of View that Changes Everything Leroy E. Hood Institute for Systems Biology David J. Galas Battelle Memorial Institute Version

More information

Find the signal in the noise

Find the signal in the noise Find the signal in the noise Electronic Health Records: The challenge The adoption of Electronic Health Records (EHRs) in the USA is rapidly increasing, due to the Health Information Technology and Clinical

More information

Integration of Genetic and Familial Data into. Electronic Medical Records and Healthcare Processes

Integration of Genetic and Familial Data into. Electronic Medical Records and Healthcare Processes Integration of Genetic and Familial Data into Electronic Medical Records and Healthcare Processes By Thomas Kmiecik and Dale Sanders February 2, 2009 Introduction Although our health is certainly impacted

More information

GENETIC DATA ANALYSIS

GENETIC DATA ANALYSIS GENETIC DATA ANALYSIS 1 Genetic Data: Future of Personalized Healthcare To achieve personalization in Healthcare, there is a need for more advancements in the field of Genomics. The human genome is made

More information

A leader in the development and application of information technology to prevent and treat disease.

A leader in the development and application of information technology to prevent and treat disease. A leader in the development and application of information technology to prevent and treat disease. About MOLECULAR HEALTH Molecular Health was founded in 2004 with the vision of changing healthcare. Today

More information

Standardized Representation for Electronic Health Record-Driven Phenotypes

Standardized Representation for Electronic Health Record-Driven Phenotypes Standardized Representation for Electronic Health Record-Driven Phenotypes April 8, 2014 AMIA Joint Summits for Translational Research Rachel L. Richesson, PhD Shelley A. Rusincovitch Michelle M. Smerek

More information

2019 Healthcare That Works for All

2019 Healthcare That Works for All 2019 Healthcare That Works for All This paper is one of a series describing what a decade of successful change in healthcare could look like in 2019. Each paper focuses on one aspect of healthcare. To

More information

THE SIDNEY KIMMEL COMPREHENSIVE CANCER CENTER AT JOHNS HOPKINS

THE SIDNEY KIMMEL COMPREHENSIVE CANCER CENTER AT JOHNS HOPKINS Ushering in a new era of cancer medicine Center is ushering in a new era of cancer medicine. Progress that could not even be imagined a decade ago is now being realized in our laboratories and our clinics.

More information

Visualizing the Future of MS Research: ACP s Repository Holds the Map

Visualizing the Future of MS Research: ACP s Repository Holds the Map Dear Friends, This issue is all about Big Data to Knowledge, otherwise known as BD2K. This refers to society s growing ability to gather a wealth of information about people in our case, people with MS

More information

Kentucky Lung Cancer Research Program. 2010 Strategic Plan Update

Kentucky Lung Cancer Research Program. 2010 Strategic Plan Update Kentucky Lung Cancer Research Program 2010 Strategic Plan Update Approved by the KLCR Program Governance Board August 12, 2009 KLCR Program Strategic Plan Table of Contents Introduction... 3 GOAL 1: Investigator-Initiated

More information

How Can Institutions Foster OMICS Research While Protecting Patients?

How Can Institutions Foster OMICS Research While Protecting Patients? IOM Workshop on the Review of Omics-Based Tests for Predicting Patient Outcomes in Clinical Trials How Can Institutions Foster OMICS Research While Protecting Patients? E. Albert Reece, MD, PhD, MBA Vice

More information

Workshop on Establishing a Central Resource of Data from Genome Sequencing Projects

Workshop on Establishing a Central Resource of Data from Genome Sequencing Projects Report on the Workshop on Establishing a Central Resource of Data from Genome Sequencing Projects Background and Goals of the Workshop June 5 6, 2012 The use of genome sequencing in human research is growing

More information

Secondary Uses of Data for Comparative Effectiveness Research

Secondary Uses of Data for Comparative Effectiveness Research Secondary Uses of Data for Comparative Effectiveness Research Paul Wallace MD Director, Center for Comparative Effectiveness Research The Lewin Group Paul.Wallace@lewin.com Disclosure/Perspectives Training:

More information

Regulatory Issues in Genetic Testing and Targeted Drug Development

Regulatory Issues in Genetic Testing and Targeted Drug Development Regulatory Issues in Genetic Testing and Targeted Drug Development Janet Woodcock, M.D. Deputy Commissioner for Operations Food and Drug Administration October 12, 2006 Genetic and Genomic Tests are Types

More information

Health Science Career Field Allied Health and Nursing Pathway (JM)

Health Science Career Field Allied Health and Nursing Pathway (JM) Health Science Career Field Allied Health and Nursing Pathway (JM) ODE Courses Possible Sinclair Courses CTAG Courses for approved programs Health Science and Technology 1 st course in the Career Field

More information

Genomics and Health Data Standards: Lessons from the Past and Present for a Genome-enabled Future

Genomics and Health Data Standards: Lessons from the Past and Present for a Genome-enabled Future Genomics and Health Data Standards: Lessons from the Past and Present for a Genome-enabled Future Daniel Masys, MD Professor and Chair Department of Biomedical Informatics Professor of Medicine Vanderbilt

More information

National Framework for Excellence in

National Framework for Excellence in National Framework for Excellence in Lung Cancer Screening and Continuum of Care declaration of purpose Rights and Expectations THE RIGHTS OF THE PEOPLE Lung cancer kills more Americans than the next four

More information

Electronic Medical Records and Genomics: Possibilities, Realities, Ethical Issues to Consider

Electronic Medical Records and Genomics: Possibilities, Realities, Ethical Issues to Consider Electronic Medical Records and Genomics: Possibilities, Realities, Ethical Issues to Consider Daniel Masys, M.D. Affiliate Professor Biomedical and Health Informatics University of Washington, Seattle

More information

How can you unlock the value in real-world data? A novel approach to predictive analytics could make the difference.

How can you unlock the value in real-world data? A novel approach to predictive analytics could make the difference. How can you unlock the value in real-world data? A novel approach to predictive analytics could make the difference. What if you could diagnose patients sooner, start treatment earlier, and prevent symptoms

More information

The Contribution of large Healthcare Systems to Improving Treatment for Patients with Rare Diseases

The Contribution of large Healthcare Systems to Improving Treatment for Patients with Rare Diseases Uniting Rare Diseases Advancing Rare Disease Research: The Intersection of Patient Registries, Biospecimen Repositories and Clinical Data Keynote address: The Contribution of large Healthcare Systems to

More information

Graduate Program Objective #1 Course Objectives

Graduate Program Objective #1 Course Objectives 1 Graduate Program Objective #1: Integrate nursing science and theory, biophysical, psychosocial, ethical, analytical, and organizational science as the foundation for the highest level of nursing practice.

More information

Scope and Standards of Practice for The Acute Care Nurse Practitioner. American Association of Critical-Care Nurses

Scope and Standards of Practice for The Acute Care Nurse Practitioner. American Association of Critical-Care Nurses Scope and Standards of Practice for The Acute Care Nurse Practitioner American Association of Critical-Care Nurses Editor: Linda Bell, RN MSN Copy Editor: Anne Bernard Designer: Derek Bennett An AACN Critical

More information

BIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology http://tinyurl.com/bioinf525-w16

BIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology http://tinyurl.com/bioinf525-w16 Course Director: Dr. Barry Grant (DCM&B, bjgrant@med.umich.edu) Description: This is a three module course covering (1) Foundations of Bioinformatics, (2) Statistics in Bioinformatics, and (3) Systems

More information

Brooks College of Health Nursing Course Descriptions

Brooks College of Health Nursing Course Descriptions CATALOG 2010-2011 Graduate Information Brooks College of Health Nursing Course Descriptions NGR500C: Health Assessment and Diagnostics Prerequisites: Admission to the MSN Program or consent of instructor.

More information

Major US Genomic Medicine Programs: NHGRI s Electronic Medical Records and Genomics (emerge) Network

Major US Genomic Medicine Programs: NHGRI s Electronic Medical Records and Genomics (emerge) Network Major US Genomic Medicine Programs: NHGRI s Electronic Medical Records and Genomics (emerge) Network Dan Roden Member, National Advisory Council For Human Genome Research Genomic Medicine Working Group

More information

TABLE OF CONTENTS. Introduction...1. Chapter1 AdvancesinTreatment...2. Chapter2 MedicinesinDevelopment...11. Chapter3 ValueandSpending...

TABLE OF CONTENTS. Introduction...1. Chapter1 AdvancesinTreatment...2. Chapter2 MedicinesinDevelopment...11. Chapter3 ValueandSpending... CANCER TABLE OF CONTENTS Introduction...1 Chapter1 AdvancesinTreatment...2 Chapter2 MedicinesinDevelopment......11 Chapter3 ValueandSpending......15 Chapter4 Conclusion...22 INTRODUCTION Researchers and

More information

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear

More information

2.1 Who first described NMO?

2.1 Who first described NMO? History & Discovery 54 2 History & Discovery 2.1 Who first described NMO? 2.2 What is the difference between NMO and Multiple Sclerosis? 2.3 How common is NMO? 2.4 Who is affected by NMO? 2.1 Who first

More information

Following are detailed competencies which are addressed to various extents in coursework, field training and the integrative project.

Following are detailed competencies which are addressed to various extents in coursework, field training and the integrative project. MPH Epidemiology Following are detailed competencies which are addressed to various extents in coursework, field training and the integrative project. Biostatistics Describe the roles biostatistics serves

More information

SOUTHERN UNIVERSITY AND A&M COLLEGE BATON ROUGE, LOUISIANA

SOUTHERN UNIVERSITY AND A&M COLLEGE BATON ROUGE, LOUISIANA MASTER OF SCIENCE IN NURSING (MSN) COURSE DESCRIPTIONS 600. THEORETICAL FOUNDATIONS OF ADVANCED NURSING (Credit, 3 hours). A systematic examination of the concepts of nursing, human beings, health, and

More information

i2b2 Clinical Research Chart

i2b2 Clinical Research Chart i2b2 Clinical Research Chart Shawn Murphy MD, Ph.D. Griffin Weber MD, Ph.D. Michael Mendis Vivian Gainer MS Lori Phillips MS Rajesh Kuttan Wensong Pan MS Henry Chueh MD Susanne Churchill Ph.D. John Glaser

More information

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table

More information

Graduate and Postdoctoral Affairs School of Biomedical Sciences College of Medicine. Graduate Certificate. Metabolic & Nutritional Medicine

Graduate and Postdoctoral Affairs School of Biomedical Sciences College of Medicine. Graduate Certificate. Metabolic & Nutritional Medicine Graduate and Postdoctoral Affairs School of Biomedical Sciences College of Medicine Graduate Certificate in Metabolic & Nutritional Medicine Graduate Certificate Metabolic & Nutritional Medicine Purpose

More information

Big Data Integration and Governance Considerations for Healthcare

Big Data Integration and Governance Considerations for Healthcare White Paper Big Data Integration and Governance Considerations for Healthcare by Sunil Soares, Founder & Managing Partner, Information Asset, LLC Big Data Integration and Governance Considerations for

More information

Your Cord Blood Donation Options

Your Cord Blood Donation Options Your Cord Blood Donation Options Our Mission To develop and maintain a statewide resource for potentially life-saving cord blood to treat children and adults. What is cord blood? Cord blood is the blood

More information

Big Data Analytics in Health Care

Big Data Analytics in Health Care Big Data Analytics in Health Care S. G. Nandhini 1, V. Lavanya 2, K.Vasantha Kokilam 3 1 13mss032, 2 13mss025, III. M.Sc (software systems), SRI KRISHNA ARTS AND SCIENCE COLLEGE, 3 Assistant Professor,

More information

SECTION I: Request. SECTION II: Need. Program Description

SECTION I: Request. SECTION II: Need. Program Description SECTION I: Request This is a formal request for the Utah CARMA Center to be formally recognized by the University of Utah as a large, collaborative medical and academic center whose focus is on comprehensive

More information

Bench to Bedside Clinical Decision Support:

Bench to Bedside Clinical Decision Support: Bench to Bedside Clinical Decision Support: The Role of Semantic Web Technologies in Clinical and Translational Medicine Tonya Hongsermeier, MD, MBA Corporate Manager, Clinical Knowledge Management and

More information

Environmental Health Science. Brian S. Schwartz, MD, MS

Environmental Health Science. Brian S. Schwartz, MD, MS Environmental Health Science Data Streams Health Data Brian S. Schwartz, MD, MS January 10, 2013 When is a data stream not a data stream? When it is health data. EHR data = PHI of health system Data stream

More information

TRANSLATIONAL BIOINFORMATICS 101

TRANSLATIONAL BIOINFORMATICS 101 TRANSLATIONAL BIOINFORMATICS 101 JESSICA D. TENENBAUM Department of Bioinformatics and Biostatistics, Duke University Durham, NC 27715 USA Jessie.Tenenbaum@duke.edu SUBHA MADHAVAN Innovation Center for

More information

Meaningful Use. Medicare and Medicaid EHR Incentive Programs

Meaningful Use. Medicare and Medicaid EHR Incentive Programs Meaningful Use Medicare and Medicaid Table of Contents What is Meaningful Use?... 1 Table 1: Patient Benefits... 2 What is an EP?... 4 How are Registration and Attestation Being Handled?... 5 What are

More information

The State of U.S. Hospitals Relative to Achieving Meaningful Use Measurements. By Michael W. Davis Executive Vice President HIMSS Analytics

The State of U.S. Hospitals Relative to Achieving Meaningful Use Measurements. By Michael W. Davis Executive Vice President HIMSS Analytics The State of U.S. Hospitals Relative to Achieving Meaningful Use Measurements By Michael W. Davis Executive Vice President HIMSS Analytics Table of Contents 1 2 3 9 15 18 Executive Summary Study Methodology

More information

ITT Advanced Medical Technologies - A Programmer's Overview

ITT Advanced Medical Technologies - A Programmer's Overview ITT Advanced Medical Technologies (Ileri Tip Teknolojileri) ITT Advanced Medical Technologies (Ileri Tip Teknolojileri) is a biotechnology company (SME) established in Turkey. Its activity area is research,

More information

Fiscal Year 2013 (FY13) Prostate Cancer Research Program (PCRP) Reference Table of Award Mechanisms and Submission Requirements

Fiscal Year 2013 (FY13) Prostate Cancer Research Program (PCRP) Reference Table of Award Mechanisms and Submission Requirements Fiscal Year 2013 (FY13) Prostate Cancer Research Program (PCRP) Reference Table of Award Mechanisms and Submission Requirements PCRP AWARD MECHANISMS WITH EMPHASIS ON RESOURCES Clinical Consortium Award

More information

A Primer of Genome Science THIRD

A Primer of Genome Science THIRD A Primer of Genome Science THIRD EDITION GREG GIBSON-SPENCER V. MUSE North Carolina State University Sinauer Associates, Inc. Publishers Sunderland, Massachusetts USA Contents Preface xi 1 Genome Projects:

More information

Electronic Health Record (EHR) Data Analysis Capabilities

Electronic Health Record (EHR) Data Analysis Capabilities Electronic Health Record (EHR) Data Analysis Capabilities January 2014 Boston Strategic Partners, Inc. 4 Wellington St. Suite 3 Boston, MA 02118 www.bostonsp.com Boston Strategic Partners is uniquely positioned

More information

Roche Position on Human Stem Cells

Roche Position on Human Stem Cells Roche Position on Human Stem Cells Background Stem cells and treating diseases. Stem cells and their applications offer an enormous potential for the treatment and even the cure of diseases, along with

More information

Test Content Outline Effective Date: February 9, 2016. Family Nurse Practitioner Board Certification Examination

Test Content Outline Effective Date: February 9, 2016. Family Nurse Practitioner Board Certification Examination February 9, 2016 Board Certification Examination There are 200 questions on this examination. Of these, 175 are scored questions and 25 are pretest questions that are not scored. Pretest questions are

More information

Find your future in the history

Find your future in the history Find your future in the history Is your radiology practice ready for the future? Demands are extremely high as radiology practices move from a fee-for-service model to an outcomes-based model centered

More information

Masters of Science in Nursing Curriculum Guide Course Descriptions

Masters of Science in Nursing Curriculum Guide Course Descriptions Masters of Science in Nursing Curriculum Guide Course Descriptions Core Courses (26 credits total) N502 Theoretical Foundations of Nursing Practice (3 credits) Theoretical Foundations of Nursing Practice

More information

Acute Myeloid Leukemia

Acute Myeloid Leukemia Acute Myeloid Leukemia Introduction Leukemia is cancer of the white blood cells. The increased number of these cells leads to overcrowding of healthy blood cells. As a result, the healthy cells are not

More information

A Genetic Analysis of Rheumatoid Arthritis

A Genetic Analysis of Rheumatoid Arthritis A Genetic Analysis of Rheumatoid Arthritis Introduction to Rheumatoid Arthritis: Classification and Diagnosis Rheumatoid arthritis is a chronic inflammatory disorder that affects mainly synovial joints.

More information

Essential Nursing Competencies and Curricula Guidelines for Genetics and Genomics: Outcome Indicators

Essential Nursing Competencies and Curricula Guidelines for Genetics and Genomics: Outcome Indicators Essential Nursing Competencies and Curricula Guidelines for Genetics and Genomics: Outcome Indicators Introduction The Outcome Indicators are an adjunct to the Essential Nursing Competencies and Curricula

More information

Your Cord Blood Donation Options

Your Cord Blood Donation Options Your Cord Blood Donation Options What is cord blood? Cord blood is the blood that remains in the placenta after a baby is born. Cord blood has been found to be a rich source of stem cells and can be used

More information

Vad är bioinformatik och varför behöver vi det i vården? a bioinformatician's perspectives

Vad är bioinformatik och varför behöver vi det i vården? a bioinformatician's perspectives Vad är bioinformatik och varför behöver vi det i vården? a bioinformatician's perspectives Dirk.Repsilber@oru.se 2015-05-21 Functional Bioinformatics, Örebro University Vad är bioinformatik och varför

More information

Research Resources at Partners Hospitals

Research Resources at Partners Hospitals Research Resources at Partners Hospitals Barbara E. Bierer, M.D. SVP Research, BWH bbierer@partners.org (617) 732-8990 Rick Bringhurst, M.D. SVP Research, MGH rbringhurst@partners.org (617) 724-8549 Agenda

More information

Factors for success in big data science

Factors for success in big data science Factors for success in big data science Damjan Vukcevic Data Science Murdoch Childrens Research Institute 16 October 2014 Big Data Reading Group (Department of Mathematics & Statistics, University of Melbourne)

More information

NP/PA Clinical Hepatology Fellowship Summary of Year-Long Curriculum

NP/PA Clinical Hepatology Fellowship Summary of Year-Long Curriculum OVERVIEW OF THE FELLOWSHIP The goal of the AASLD NP/PA Fellowship is to provide a 1-year postgraduate hepatology training program for nurse practitioners and physician assistants in a clinical outpatient

More information

FACULTY OF ALLIED HEALTH SCIENCES

FACULTY OF ALLIED HEALTH SCIENCES FACULTY OF ALLIED HEALTH SCIENCES 102 Naresuan University FACULTY OF ALLIED HEALTH SCIENCES has focused on providing strong professional programs, including Medical established as one of the leading institutes

More information

Prognosis for Healthcare: The Future of Medicine

Prognosis for Healthcare: The Future of Medicine Prognosis for Healthcare: The Future of Medicine Bruce M. Cohen, M.D., Ph.D. Director, Frazier Research Institute, McLean Hospital President and Psychiatrist in Chief Emeritus, McLean Hospital Robertson-Steele

More information

A Multi-locus Genetic Risk Score for Abdominal Aortic Aneurysm

A Multi-locus Genetic Risk Score for Abdominal Aortic Aneurysm A Multi-locus Genetic Risk Score for Abdominal Aortic Aneurysm Zi Ye, 1 MD, Erin Austin, 1,2 PhD, Daniel J Schaid, 2 PhD, Iftikhar J. Kullo, 1 MD Affiliations: 1 Division of Cardiovascular Diseases and

More information

The Big Picture: IDNT in Electronic Records Glossary

The Big Picture: IDNT in Electronic Records Glossary TERM DEFINITION CCI Canada Health Infoway Canadian Institute for Health Information EHR EMR EPR H L 7 (HL7) Canadian Classification of Interventions is the Canadian standard for classifying health care

More information

Personalized Medicine: Humanity s Ultimate Big Data Challenge. Rob Fassett, MD Chief Medical Informatics Officer Oracle Health Sciences

Personalized Medicine: Humanity s Ultimate Big Data Challenge. Rob Fassett, MD Chief Medical Informatics Officer Oracle Health Sciences Personalized Medicine: Humanity s Ultimate Big Data Challenge Rob Fassett, MD Chief Medical Informatics Officer Oracle Health Sciences 2012 Oracle Corporation Proprietary and Confidential 2 3 Humanity

More information

Where to Begin? Auditing the Current EHR System

Where to Begin? Auditing the Current EHR System Chapter 1 Where to Begin? Auditing the Current EHR System After implementation, allow for a period of stabilization, so physicians and employees can gain more comfort using the electronic health record

More information

Big Data for Population Health and Personalised Medicine through EMR Linkages

Big Data for Population Health and Personalised Medicine through EMR Linkages Big Data for Population Health and Personalised Medicine through EMR Linkages Zheng-Ming CHEN Professor of Epidemiology Nuffield Dept. of Population Health, University of Oxford Big Data for Health Policy

More information

Consultation Response Medical profiling and online medicine: the ethics of 'personalised' healthcare in a consumer age Nuffield Council on Bioethics

Consultation Response Medical profiling and online medicine: the ethics of 'personalised' healthcare in a consumer age Nuffield Council on Bioethics Consultation Response Medical profiling and online medicine: the ethics of 'personalised' healthcare in a consumer age Nuffield Council on Bioethics Response by the Genetic Interest Group Question 1: Health

More information

I was just diagnosed, so my doctor and I are deciding on treatment. My doctor said there are several

I was just diagnosed, so my doctor and I are deciding on treatment. My doctor said there are several Track 3: Goals of therapy I was just diagnosed, so my doctor and I are deciding on treatment. My doctor said there are several factors she ll use to decide what s best for me. Let s talk about making treatment

More information

The Ohio State University College of Medicine. Biomedical Sciences Graduate Program (BSGP) The Biology of Human Disease. medicine.osu.

The Ohio State University College of Medicine. Biomedical Sciences Graduate Program (BSGP) The Biology of Human Disease. medicine.osu. (BSGP) The Biology of Human Disease medicine.osu.edu/bsgp Welcome from BSGP Leadership Thank you for your interest in the at The Ohio State University Wexner Medical Center. Our goal is to train talented,

More information

Mastering the Data Game: Accelerating

Mastering the Data Game: Accelerating Mastering the Data Game: Accelerating Integration and Optimization Healthcare systems are breaking new barriers in analytics as they seek to meet aggressive quality and financial goals. Mastering the Data

More information

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources 1 of 8 11/7/2004 11:00 AM National Center for Biotechnology Information About NCBI NCBI at a Glance A Science Primer Human Genome Resources Model Organisms Guide Outreach and Education Databases and Tools

More information

BioVisualization: Enhancing Clinical Data Mining

BioVisualization: Enhancing Clinical Data Mining BioVisualization: Enhancing Clinical Data Mining Even as many clinicians struggle to give up their pen and paper charts and spreadsheets, some innovators are already shifting health care information technology

More information

Clinical Trials: Questions and Answers

Clinical Trials: Questions and Answers Clinical Trials: Questions and Answers Key Points Clinical trials are research studies that test how well new medical approaches work in people (see Question 1). Every clinical trial has a protocol, which

More information

CLINICAL TRIALS SHOULD YOU PARTICIPATE? by Gwen L. Nichols, MD

CLINICAL TRIALS SHOULD YOU PARTICIPATE? by Gwen L. Nichols, MD CLINICAL TRIALS SHOULD YOU PARTICIPATE? by Gwen L. Nichols, MD Gwen L. Nichols, M.D., is currently the Oncology Site Head of the Roche Translational Clinical Research Center at Hoffman- LaRoche. In this

More information

Biomedical Big Data and Precision Medicine

Biomedical Big Data and Precision Medicine Biomedical Big Data and Precision Medicine Jie Yang Department of Mathematics, Statistics, and Computer Science University of Illinois at Chicago October 8, 2015 1 Explosion of Biomedical Data 2 Types

More information

Exploring the Challenges and Opportunities of Leveraging EMRs for Data-Driven Clinical Research

Exploring the Challenges and Opportunities of Leveraging EMRs for Data-Driven Clinical Research White Paper Exploring the Challenges and Opportunities of Leveraging EMRs for Data-Driven Clinical Research Because some knowledge is too important not to share. Exploring the Challenges and Opportunities

More information

Leading Genomics. Diagnostic. Discove. Collab. harma. Shanghai Cambridge, MA Reykjavik

Leading Genomics. Diagnostic. Discove. Collab. harma. Shanghai Cambridge, MA Reykjavik Leading Genomics Diagnostic harma Discove Collab Shanghai Cambridge, MA Reykjavik Global leadership for using the genome to create better medicine WuXi NextCODE provides a uniquely proven and integrated

More information

NIH s Genomic Data Sharing Policy

NIH s Genomic Data Sharing Policy NIH s Genomic Data Sharing Policy 2 Benefits of Data Sharing Enables data generated from one study to be used to explore a wide range of additional research questions Increases statistical power and scientific

More information

Integrating Genetic Data into Clinical Workflow with Clinical Decision Support Apps

Integrating Genetic Data into Clinical Workflow with Clinical Decision Support Apps White Paper Healthcare Integrating Genetic Data into Clinical Workflow with Clinical Decision Support Apps Executive Summary The Transformation Lab at Intermountain Healthcare in Salt Lake City, Utah,

More information

NIH Genomic Data Sharing (GDS) Policy Guidance Memo #2 1

NIH Genomic Data Sharing (GDS) Policy Guidance Memo #2 1 MEMORANDUM TO: Principal Investigators and Research Staff DATE: 2/22/15 FROM: Anne Klibanski, MD, Partners Chief Academic Officer (CAO) Paul Anderson, MD, PhD, BWH CAO Harry Orf, PhD, MGH Sr. Vice President-Research

More information

Big Data for Patients (BD4P) Stakeholder Engagement Plan

Big Data for Patients (BD4P) Stakeholder Engagement Plan Big Data for Patients (BD4P) Stakeholder Engagement Plan Index I. BD4P Program Background a. Goals and Objectives II. Participation a. How will stakeholders be engaged? i. Stakeholders ii. Workgroups III.

More information

Introduction and Invitation for Public Comment

Introduction and Invitation for Public Comment 2 of 22 Introduction and Invitation for Public Comment The Patient-Centered Outcomes Research Institute (PCORI) is an independent, non-profit health research organization. Its mission is to fund research

More information

Use advanced techniques for summary and visualization of complex data for exploratory analysis and presentation.

Use advanced techniques for summary and visualization of complex data for exploratory analysis and presentation. MS Biostatistics MS Biostatistics Competencies Study Development: Work collaboratively with biomedical or public health researchers and PhD biostatisticians, as necessary, to provide biostatistical expertise

More information

FACULTY OF MEDICAL SCIENCE

FACULTY OF MEDICAL SCIENCE Doctor of Philosophy in Biochemistry FACULTY OF MEDICAL SCIENCE Naresuan University 73 Doctor of Philosophy in Biochemistry The Biochemistry Department at Naresuan University is a leader in lower northern

More information

> Semantic Web Use Cases and Case Studies

> Semantic Web Use Cases and Case Studies > Semantic Web Use Cases and Case Studies Case Study: Applied Semantic Knowledgebase for Detection of Patients at Risk of Organ Failure through Immune Rejection Robert Stanley 1, Bruce McManus 2, Raymond

More information

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi-110 012 seema@iasri.res.in Genomics A genome is an organism s

More information

Degree Level Expectations, Learning Outcomes, Indicators of Achievement and the Program Requirements that Support the Learning Outcomes

Degree Level Expectations, Learning Outcomes, Indicators of Achievement and the Program Requirements that Support the Learning Outcomes Department/Academic Unit: Pathology and Molecular Medicine Degree Program: PhD Degree Level Expectations, Learning Outcomes, Indicators of Achievement and the Program Requirements that Support the Learning

More information

The Future of the Electronic Health Record. Gerry Higgins, Ph.D., Johns Hopkins

The Future of the Electronic Health Record. Gerry Higgins, Ph.D., Johns Hopkins The Future of the Electronic Health Record Gerry Higgins, Ph.D., Johns Hopkins Topics to be covered Near Term Opportunities: Commercial, Usability, Unification of different applications. OMICS : The patient

More information

Summary of the Proposed Rule for the Medicare and Medicaid Electronic Health Records (EHR) Incentive Program (Eligible Professionals only)

Summary of the Proposed Rule for the Medicare and Medicaid Electronic Health Records (EHR) Incentive Program (Eligible Professionals only) Summary of the Proposed Rule for the Medicare and Medicaid Electronic Health Records (EHR) Incentive Program (Eligible Professionals only) Background Enacted on February 17, 2009, the American Recovery

More information

Information for patients and the public and patient information about DNA / Biobanking across Europe

Information for patients and the public and patient information about DNA / Biobanking across Europe Information for patients and the public and patient information about DNA / Biobanking across Europe BIOBANKING / DNA BANKING SUMMARY: A biobank is a store of human biological material, used for the purposes

More information

Graduate Curriculum Guide Course Descriptions: Core and DNP

Graduate Curriculum Guide Course Descriptions: Core and DNP Graduate Curriculum Guide Course Descriptions: Core and DNP APN Core Courses (35 credits total) N502 Theoretical Foundations of Nursing Practice (3 credits) Theoretical Foundations of Nursing Practice

More information

InteliChart. Putting the Meaningful in Meaningful Use. Meeting current criteria while preparing for the future

InteliChart. Putting the Meaningful in Meaningful Use. Meeting current criteria while preparing for the future Putting the Meaningful in Meaningful Use Meeting current criteria while preparing for the future The Centers for Medicare & Medicaid Services designed Meaningful Use (MU) requirements to encourage healthcare

More information

The Electronic Health Record: What is it's potential for enabling health?

The Electronic Health Record: What is it's potential for enabling health? The Electronic Health Record: What is it's potential for enabling health? W. Ed Hammond Professor Emeritus Duke University PEP 2007 Sāo Paulo, Brazil 7 October 2007 Chair, HL7 2008-2009 What I m I m going

More information

Executive Summary Think Research. Using Electronic Medical Records to Bridge Patient Care and Research

Executive Summary Think Research. Using Electronic Medical Records to Bridge Patient Care and Research Executive Summary Think Research Using Electronic Medical Records to Bridge Patient Care and Research Introduction Letter from Greg Simon, President of FasterCures At FasterCures our mission is to save

More information

Opportunities for advancing biomarkers for patient stratification and early diagnosis in liver disease

Opportunities for advancing biomarkers for patient stratification and early diagnosis in liver disease Opportunities for advancing biomarkers for patient stratification and early diagnosis in liver disease 80 Liver Disease and Obesity Liver disease biomarkers: Percent of adult population (United States)

More information

Tutorial for proteome data analysis using the Perseus software platform

Tutorial for proteome data analysis using the Perseus software platform Tutorial for proteome data analysis using the Perseus software platform Laboratory of Mass Spectrometry, LNBio, CNPEM Tutorial version 1.0, January 2014. Note: This tutorial was written based on the information

More information

Human Health Sciences

Human Health Sciences Human Health Sciences WITH PLYMOUTH UNIVERSITY DISCOVER MORE If you would like to visit Plymouth and meet our staff, then why not come along to one of our open days. Human Health Sciences WITH PLYMOUTH

More information

School of Nursing. Presented by Yvette Conley, PhD

School of Nursing. Presented by Yvette Conley, PhD Presented by Yvette Conley, PhD What we will cover during this webcast: Briefly discuss the approaches introduced in the paper: Genome Sequencing Genome Wide Association Studies Epigenomics Gene Expression

More information

GE Global Research. The Future of Brain Health

GE Global Research. The Future of Brain Health GE Global Research The Future of Brain Health mission statement We will know the brain as well as we know the body. Future generations won t have to face Alzheimer s, TBI and other neurological diseases.

More information