Introduction to translational and clinical bioinformatics

Size: px
Start display at page:

Download "Introduction to translational and clinical bioinformatics"

Transcription

1 Introduction to translational and clinical bioinformatics Connecting complex molecular information to clinically relevant decisions using molecular profiles Constantin F. Aliferis M.D., Ph.D., FACMI Director, NYU Center for Health Informatics and Bioinformatics Informatics Director, NYU Clinical and Translational Science Institute Alexander Statnikov Ph.D. Director, Computational Causal Discovery laboratory Assistant Professor, NYU Center for Health Informatics and Bioinformatics, General Internal Medicine Director, Molecular Signatures Laboratory, Associate Professor, Department of Pathology, Adjunct Associate Professor in Biostatistics and Biomedical Informatics, Vanderbilt University 1

2 Goals Understand spectrum of Bioinformatics and Medical informatics activities Understand basic concepts of clinical/translational Bioinformatics Understand basic concepts of molecular profiling Introduction to high-throughput assays enabling molecular profiling Introduction to computational data analytics/bioinformatics enabling molecular profiling Understand analytic challenges and pitfalls/interpretation issues Discuss case study of profiles used to diagnose/treat patients Perform hands-on development of a molecular profile, finding novel biomarkers and testing profile/markers accuracy Discussion supported by general literature and heavily grounded on: - NYUMC informatics experts/research projects/grants/papers/entities/software systems - Commercially availiable modalities & assays 2

3 Overview Session #1: Basic Concepts Session #2: High-throughput assay technologies Session #3: Computational data analytics Session #4: Case study / practical applications Session #5: Hands-on computer lab exercise 3

4 Session #1: Basic Concepts Understand spectrum of Bioinformatics and Medical informatics activities - NYUMC informatics Understand basic concepts of clinical/translational Bioinformatics Understand basic concepts of molecular profiling ALSO: - s/names/interests - adjustments to plan 4

5 NYU Center for health Informatics & Bioinformatics: Broad Plan Health Informatics Infrastructure & Integrative Methods/Activities Bioinformatics Educational informatics Evidence based medicine, and Information retrieval Informatics Library Collaborative Data Integration & Mining: -Data warehouse & interfacing with EMR - Omics LIMS - Genomic EMR - Biospecimen management -research protocol database systems and management team -Data mining service -Data Mining software CTSI High Performance Computing Facility Research labs Kluger Molecular Signatures EBM, IR & Scientometrics Computational Causal Discovery MS/PhD (& Post-doc Fellowship) Program Continuing Education Workshops & tutorials Paper digest Research Colloquium Invited Speakers Integrate/Focus Existing Informatics and Increase Collaborations BPIC (best Practices Integrative Consultation Core/Service Literature Synthesis & Benchmarking studies Method-problem matchmaking Design and execution of studies Cancer Center Genetics- Genomics COEs Multi-modal & Integrative studies Proteomics Informatics Microarray Informatics: i. Upstream ii. Differential expression, iii. Pathway inference iv. Molecular profiles Next-gen sequencing informatics : Upstream analyses i. Chi-seq ii. RNA seq iii. Epigenetics iv. Microbiomics v. micro RNA studies vi. CNV & splice variation studies vii. Digital RNA viii. Denovo sequencing & resequencing Downstream analyses 5

6 Current Capabilities: Areas Health Informatics Infrastructure & Integrative Methods/Activities Bioinformatics Educational informatics Evidence based medicine, and Information retrieval Informatics Library Collaborative Data Integration & Mining: -Data warehouse & interfacing with EMR - Omics LIMS - Genomic EMR - Biospecimen management -research protocol database systems and management team -Data mining service -Data Mining software CTSI High Performance Computing Facility Research labs Kluger Molecular Signatures EBM, IR & Scientometrics Computational Causal Discovery MS/PhD (& Post-doc Fellowship) Program Continuing Education Workshops & tutorials Paper digest Research Colloquium Invited Speakers Integrate/Focus Existing Informatics and Increase Collaborations BPIC (best Practices Integrative Consultation Core/Service Literature Synthesis & Benchmarking studies Method-problem matchmaking Design and execution of studies Cancer Center Genetics- Genomics COEs Multi-modal & Integrative studies Proteomics Informatics Microarray Informatics: i. Upstream ii. Differential expression, iii. Pathway inference iv. Molecular profiles Next-gen sequencing informatics : Upstream analyses i. Chi-seq ii. RNA seq iii. Epigenetics iv. Microbiomics v. micro RNA studies vi. CNV & splice variation studies vii. Digital RNA viii. Denovo sequencing & resequencing Downstream analyses 6

7 Current & Future capabilities Health Informatics Educational informatics Evidence based medicine, and Information retrieval Informatics Content management, medical simulations Filter Medline according to content and quality Filter Web for health advice quality Predict future citations of articles Classify individual citations as instrumental or not Identify special types of articles Construct citation histories & Analyze impact of articles Integrate and manage queries and related content Combine and optimize knowledge source searches New find a researcher Find a collaborator Library Collaborative Apply, evaluate, refine next-gen IR methods Data Integration & Mining: -Data warehouse & interfacing with EMR - Omics LIMS - Genomic EMR - Biospecimen management -research protocol database systems and management team -Data mining service -Data Mining software -Data warehouse needs; software acquisition; implementation - OMICS LIMS needs capture; vendor product assessment; funds; sofwtare purchase and implementation; integration with billing and EMR -Biospecimen management -Research protocol database system (evelos) -Data base management team -Data mining service -Data mining engine: faculty; funds; prototype; implementation; evaluation 7

8 Current & Future capabilities Infrastructure & Integrative Methods/Activities CTSI High Performance Computing Facility Research labs Kluger Molecular Signatures EBM, IR & Scientometrics Computational Causal Discovery MS/PhD (& Post-doc Fellowship) Program (supported by rest of objectives) Sequencing server; hectar1; hectar2; Funds; needs; grants; personnel post; specs; room/networking/access; Personnel hires; hw install; licenses; BP; launch Kluger TF /Regulation studies; high-throughput outcome prediction, specialized clustering methods Molecular Signatures development of molecular signatures for diagnosis outcome prediction and personalized medicine, discovery of diagnostic/imaging biomarkers and putative drug targets, deployment of signatures, automated software, new methods EBM, IR & Scientometrics development and evaluation of next-gen IR and scientometric models and studies Computational Causal Discovery discovery of pathways; studies of causal validity of bioinformatics discovery methods, multiplicity studies, automated software, active learning/experiment number minimization Formal Training in Biomedical Informatics at pre and post-doctoral levels Continuing Education Workshops & tutorials Paper digest Research Colloquium Invited Speakers Workshops & tutorials Paper digest Research Colloquium Invited Speakers Continuing Education Integrate/Focus Existing Informatics and Increase Collaborations Faculty and Staff career development; Informatics Affiliates; Working Collaborations with Courant, Polytechnic, NYC Informatics and other non-nyumc entities 8

9 Current & Future capabilities Bioinformatics BPIC (best Practices Integrative Consultation Core/Service Literature Synthesis & Benchmarking studies Method-problem matchmaking Design and execution of studies Study publication assistance Area-specific (Disease, Assay) Informatics Genetics-Genomics COEs Cancer Center Microarray Informatics: Experiment design, assay execution, differential expression, pathway mapping, pathway-specific testing (GSEA/GSA), de novo pathway discovery, phylogeny, clustering, hybrid experimental/observational designs; SNP arrays; ChIPon-ChIP analyses, acgh, tiled arrays, etc Sequencing Informatics: Chip-Seq analysis, digital gene expression, de novo sequence assembly & reassembly, CNV analysis, epigenomic studies, microbiomics Proteomics Informatics: platform-specific pre-processing, differential abundance, peptide-protein mapping, protein identification, de novo protein interaction network inference, protein modification and structure studies, Multi-modal Integrative and Higher-level Informatics: Molecular Signatures & linking high-dimensional data to phenotype development of molecular signatures for diagnosis, outcome prediction and personalized medicine; in silico signature scanning, in silico signature equivalence, discovery of diagnostic/imaging biomarkers and putative drug targets, deployment of signatures, automated software, novel methods Mechanistic /causative studies discovery of pathways; multiplicity studies, TS/DBN designs, automated software, active learning/experiment number minimization Integrating clinical lab, text, imaging and high throughput data in CTs/prospective studies or exploratory retrospective ones 9

10 Summary Contacts (Until Centralized Consultation Service is Launched) Management of Clinical and protocol data Educational Informatics Next-Gen Information Retrieval Informatics for Data Mining Data Integration & Warehousing High Performance Computing Best Practices in Bioinformatics Sequencing Informatics Microarray Informatics Cancer Informatics Proteomics Informatics General Tools Specialized applications (Genetics, Regulation, Pathways ) Molecular Signatures development, biomarker discovery, Multi-modal and Integrative studies James Robinson Mark Triola Lawrence Fu, Constantin Aliferis, TBD Alexander Statnikov, Constantin Aliferis John Chelico, Ross Smith, Constantin Aliferis Constantin Aliferis, Ross Smith Constantin Aliferis, Alexander Statnikov Upstream: Stuart Brown, Alexander Alekseyenko, Yuval Kluger, Jinhua Wang, TBD, TBD Downstream: Alexander Alekseyenko, Yuval Kluger, Jinhua Wang, Alexander Statnikov, Constantin Aliferis Jiri Zafadil, Yuval Kluger, Jinhua Wang, Constantin Aliferis, Alexander Statnikov Yuval Kluger, Jinhua Wang, Stuart Brown, Jiri Zafadil, Constantin Aliferis Stuart Brown, Jinhua Wang, Constantin Aliferis, Alexander Statnikov, TBD Stuart Brown Stuart Brown, Yuval Kluger, Alexander Statnikov, Constantin Aliferis Constantin Aliferis, Alexander Statnikov, Yuval Kluger 10

11 Molecular Signatures Definition = computational or mathematical models that link highdimensional molecular information to phenotype of interest 11

12 Molecular Signatures Gene markers New drug targets 12

13 Molecular Signatures: Main Uses 1. Direct benefits: Models of disease phenotype/clinical outcome & estimation of the model performance Diagnosis Prognosis, long-term disease management Personalized treatment (drug selection, titration) ( predictive models) 2. Ancillary benefits 1: Biomarkers for diagnosis, or outcome prediction Make the above tasks resource efficient, and easy to use in clinical practice Helps next-generation molecular imaging Leads for potential new drug candidates 3. Ancillary benefits 2: Discovery of structure & mechanisms (regulatory/interaction networks, pathways, sub-types) Leads for potential new drug candidates 13

14 Molecular Signatures The FDA calls them in vitro diagnostic multivariate index assays 1. Class II Special Controls Guidance Document: Gene Expression Profiling Test System for Breast Cancer Prognosis : - addresses device classification 2. The Critical Path to New Medical Products : - identifies pharmacogenomics as crucial to advancing medical product development and personalized medicine. 3. Draft Guidance on Pharmacogenetic Tests and Genetic Tests for Heritable Markers & Guidance for Industry: Pharmacogenomic Data Submissions - identifies 3 main goals (dose, ADEs, responders), - define IVDMIA, - encourages fault-free sharing of pharmacogenomic data, - separates probable from valid biomarkers, - focuses on genomics (and not other omics), 14

15 Less Conventional Uses of Molecular Signatures Increased Clinical Trial sample efficiency, and decreased costs or both, using placebo responder signatures ; In silico signature-based candidate drug screening; Drug resurrection Establishing existence of biological signal in very small sample situations where univariate signals are too weak; Assess importance of markers and of mechanisms involving those Choosing the right animal model? 15

16 Recent molecular mignatures available for patient care Agendia Clarient Prediction Sciences LabCorp OvaSure University Genomics Genomic Health Veridex BioTheranostics Applied Genomics Power3 Correlogic Systems 16

17 Molecular signatures in the market (examples) Company Product Disease Purpose Agendia MammaPrint Risk assessment for the recurrence of distant metastasis in a breast Breast cancer cancer patient. Agendia TargetPrint Quantitative determination of the expression level of estrogen receptor, Breast cancer progesteron receptor and HER2 genes. This product is supplemental to MammaPrint. Agendia CupPrint Cancer Determination of the origin of the primary tumor. University Genomics Breast Bioclassifier Breast cancer Classification of ER-positive and ER-negative breast cancers into expression-based subtypes that more accurately predict patient outcome. Clarient Clarient Prediction Sciences Insight Dx Breast Cancer Profile Prostate Gene Expression Profile RapidResponse c-fn Test Breast cancer Prediction of disease recurrence risk. Prostate cancer Stroke Genomic Health OncotypeDx Breast cancer Diagnosis of grade 3 or higher prostate cancer. Identification of the patients that are safe to receive tpa and those at high risk for HT, to help guide the physician s treatment decision. Individualized prediction of chemotherapy benefit and 10-year distant recurrence to inform adjuvant treatment decisions in certain women with early-stage breast cancer. biotheranostics CancerTYPE ID Cancer Classification of 39 types of cancer. Risk assessment and identification of patients likely to benefit from biotheranostics Breast Cancer Index Breast cancer endocrine therapy, and whose tumors are likely to be sensitive or resistant to chemotherapy. Applied Genomics MammaStrat Breast cander Risk assessment of cancer recurrence. Applied Genomics Applied Genomics PulmoType PulmoStrat Lung cancer Lung cancer Classification of non-small cell lung cancer into adenocarcinoma versus squamous cell carcinoma subtypes. Assessment of an individual's risk of lung cancer recurrence following surgery for helping with adjuvant therapy decisions. Correlogic OvaCheck Ovarian cancer Early detection of epithelial ovarian cancer. Assessment of the presence of early stage ovarian cancer in high-risk LabCorp OvaSure Ovarian cancer women. Veridex GeneSearch BLN Assay Breast cancer Determination of whether breast cancer has spread to the lymph nodes. Power3 BC-SeraPro Breast cancer Differentiation between breast cancer patients and control subjects. 17

18 MammaPrint Developed by Agendia ( 70-gene signature to stratify women with breast cancer that hasn t spread into low risk and high risk for recurrence of the disease Independently validated in >1,000 patients So far performed 12,000 tests Cost of the test is $3,200 In February, 2007 the FDA cleared the MammaPrint test for marketing in the U.S. for node negative women under 61 years of age with tumors of less than 5 cm. TIME Magazine s 2007 medical invention of the year. 18

19 CupPrint Developed by Agendia ( ~500-gene (~1900 probes) signature to identify primary site of 49 different types of carcinomas as well as other types of cancer such as sarcoma and melanoma. Several independent validation studies 19

20 ColoPrint In development & validation by Agendia ( Multi-gene expression signature to determine the risk for recurrence in colorectal cancer patients Planning to seek FDA approval References:

21 Oncotype DX Development synopsis Main reference: Paik S, Shak S, Tang G, et al. A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer. N Engl J Med. 2004; 351(27): Developed by Genomic Health ( ) 21-gene signature to predict whether a woman with localized, ER+ breast cancer is at risk of relapse Independently validated in >1,000 patients So far performed 55,000 tests Cost of the test is $3,650 Reimbursement. Information about reimbursement for molecular signatures from Aetna: Oncotype DX did not undergo FDA review. Here is an article that mentions FDA review of Oncotype DX (slightly outdated): The following paper shows the health benefits and cost-effectiveness benefits of using Oncotype DX: 21

22 CancerType ID Developed by AviaraDX ( 92-gene signature to classify 39 tumor types Signature developed by GA/KNN Compressed version of CupPrint 22

23 Breast Cancer Index Developed by AviaraDX ( Uses 7 genes (combines 5-gene MGI signature and 2-gene H/I signature) Stratifies breast cancer patients into groups with low or high risk of cancer recurrence and good or poor response to endocrine therapy. Validated in thousands of patients (treated & untreated) 23

24 GeneSearch Breast Lymph Node (BLN) Assay Developed by Veridex ( a Johnson & Johnson company Test to detect if breast cancer has spread to the lymph nodes The GeneSearch BLN uses real-time reverse transcriptase-polymerase chain reaction (RT-PCR) to detect ammoglobin (MG) and cytokeratin 19 (CK 19) in lymph nodes. FDA approved Featured in TIME s 2007 Top 10 Medical Breakthroughs list 24

25 MammoStrat Developed by Applied Genomics ( The test is based on 5 biomarkers. The test is used to classify individual patients as having an AGI-defined high-, moderate-, or low-risk of breast cancer recurrence following surgical removal of their primary tumor and treatment with tamoxifen alone. Independently validated in >1000 patients 25

26 NuroPro Developed by Power3 ( Early detection of neurodegenerative diseases: Alzheimer s disease, ALS (Lou Gehrig s disease), and Parkinson s disease. Validation study in progress. Based on 59 proteins. 26

27 BC-SeraPro Developed by Power3 ( Test for diagnosis of breast cancer (breast cancer case vs. control). Validation study in progress. Based on 22 proteins. Uses linear discriminant analysis; outputs a probability score. 27

28 Key ingredients for developing a molecular signature Well-defined clinical problem & access to patients Computational & Biostatistical Analysis Molecular Signature High-throughput assays 28

29 Challenges in Computational Analysis of omics data for development of molecular signatures Relatively easy to develop a predictive model + even easier to believe that a model is good when it is not false sense of security Several problems exist: some theoretical and some practical Omics data has many special characteristics and is tricky to analyze! 29

30 OvaCheck Developed by Correlogic ( Blood test for the early detection of epithelial ovarian cancer Failed to obtain FDA approval Looks for subtle changes in patterns among the tens of thousands of proteins, protein fragments and metabolites in the blood Signature developed by genetic algorithm Significant artifacts in data collection & analysis questioned validity of the signature: - Results are not reproducible - Data collected differently for different groups of patients 30

31 Problem with OvaCheck Data Set 1 (Top), Data Set 2 (Bottom) Cancer A Normal B Other Cancer C D Figure from Baggerly et al (Bioinformatics, 2004) Normal E Other F Clock Tick 31

32 Molecular Signatures Gene markers New drug targets 32

33 Brief History of main omics technology: gene expression microarrays 1988: Edwin Southern files UK patent applications for in situ synthesized, oligonucleotide microarrays 1991: Stephen Fodor and colleagues publish photolithographic array fabrication method 1992: Undeterred by NIH naysayers, Patrick Brown develops spotted arrays 1993: Affymax begets Affymetrix 1995: Mark Schena publishes first use of microarrays for gene expression analysis Edwin Southern founds Oxford Gene Technologies 1996: First human gene expression microarray study published Affymetrix releases its first catalog GeneChip microarray, for HIV, in April 1997: Stanford researchers publish the first whole-genome microarray study, of yeast 33

34 Brief History of main omics technology: gene expression microarrays (The scientist 2005) 1998: Brown's lab develops CLUSTER, a statistical tool for microarray data analysis; red and green "thermal plots" start popping up everywhere 1999: Todd Golub and colleagues use microarrays to classify cancers, sparking widespread interest in clinical applications 2000: Affymetrix spins off Perlegen, to sequence multiple human genomes and identify genetic variation using arrays 2001: The Microarray Gene Expression Data Society develops MIAME standard for the collection and reporting of microarray data 2003: Joseph DeRisi uses a microarray to identify the SARS virus Affymetrix, Applied Biosystems, and Agilent Technologies individually array human genome on a single chip 2004: Roche releases Amplichip CYP450, the first FDA-approved microarray for diagnostic purposes 34

35 An early kind of analysis: learning disease sub-types by clustering patient profiles p53 Rb 35

36 Clustering: seeking natural groupings & hoping that they will be useful p53 Rb 36

37 E.g., for treatment Respond to treatment Tx1 p53 Do not Respond to treatment Tx1 Rb 37

38 E.g., for diagnosis Adenocarcinoma p53 Squamous carcinoma Rb 38

39 Another use of clustering Cluster genes (instead of patients): Genes that cluster together may belong to the same pathways Genes that cluster apart may be unrelated 39

40 Unfortunately clustering is a non-specific method and falls into the one-solution fits all trap when used for prediction Do not Respond to treatment Tx2 p53 Respond to treatment Tx2 Rb 40

41 Clustering is also non-specific when used to discover pathway membership, regulatory control, or other causation-oriented relationships G1 Ph G2 G3 It is entirely possible in this simple illustrative counter-example for G3 (a causally unrelated gene to the phenotype) to be more strongly associated and thus cluster with the phenotype (or its surrogate genes) more strongly than the true oncogenic genes G1, G2 41

42 Two improved classes of methods Supervised learning predictive signatures and markers Regulatory network reverse engineering pathways 42

43 Supervised learning : use the known phenotypes (a.k.a labels) in training data to build signatures or find markers highly specific for that phenotype A B C D E TRAIN INSTANCES INDUCTIVE ALGORITHM Classifier OR Regression Model APPLICATION INSTANCES A 1, B 1, C 1, D 1, E 1 A 2, B 2, C 2, D 2, E 2 A n, B n, C n, D n, E n CLASSIFICATION PERFORMANCE 43

44 Regulatory network reverse engineering B A C TRAIN INSTANCES INDUCTIVE ALGORITHM B A C E D E D A 1, B 1, C 1, D 1, E 1 A 2, B 2, C 2, D 2, E 2 A n, B n, C n, D n, E n PERFORMANCE 44

45 Supervised learning: a geometrical interpretation p53 Cancer patients New case, classified as cancer P1 SVM classifier? P2 + P3 + P P4? New case, classified as normal + Normals Rb 45

46 In 2-D looks good but what happens in: 10,000-50,000 (regular gene expression microarrays, acgh, and early SNP arrays) >500,000 (tiled microarrays, new SNP arrays) 10, ,000 (regular MS proteomics) >10, 000, 000 (LC-MS proteomics) This is the curse of dimensionality problem 46

47 High-dimensionality (especially with Some methods do not run at all (classical regression) Some methods give bad results (KNN, Decision trees) Very slow analysis Very expensive/cumbersome clinical application Tends to overfit small samples) causes: 47

48 Two (very real and very unpleasant) problems: Over-fitting & Under-fitting Over-fitting ( a model to your data)= building a model that is good in original data but fails to generalize well to fresh data Under-fitting ( a model to your data)= building a model that is poor in both original data and fresh data 48

49 Intuitive explanation of overfitting & underfitting Play the game: find rule to predict who are the instructors in any given class (use today s class to find a general rule) 49

50 Over/under-fitting are directly related to the complexity of the decision surface and how well the training data is fit Outcome of Interest Y This line is good! Training Data Future Data This line overfits! Predictor X 50

51 Over/under-fitting are directly related to the complexity of the decision surface and how well the training data is fit Outcome of Interest Y This line is good! Training Data Future Data This line underfits! Predictor X 51

52 Very Important Concept: Successful data analysis methods balance training data fit with complexity. Too complex signature (to fit training data well) overfitting (i.e., signature does not generalize) Too simplistic signature (to avoid overfitting) underfitting (will generalize but the fit to both the training and future data will be low and predictive performance small). 52

53 Part of the Solution: feature selection P A O B C D E T K H I J Q L M N 53

54 How well supervised learning works in practice? 54

55 Datasets Bhattacharjee2 - Lung cancer vs normals [GE/DX] Bhattacharjee2_I - Lung cancer vs normals on common genes between Bhattacharjee2 and Beer [GE/DX] Bhattacharjee3 - Adenocarcinoma vs Squamous [GE/DX] Bhattacharjee3_I - Adenocarcinoma vs Squamous on common genes between Bhattacharjee3 and Su [GE/DX] Savage - Mediastinal large B-cell lymphoma vs diffuse large B-cell lymphoma [GE/DX] Rosenwald4-3-year lymphoma survival [GE/CO] Rosenwald5-5-year lymphoma survival [GE/CO] Rosenwald6-7-year lymphoma survival [GE/CO] Adam - Prostate cancer vs benign prostate hyperplasia and normals [MS/DX] Yeoh - Classification between 6 types of leukemia [GE/DX-MC] Conrads - Ovarian cancer vs normals [MS/DX] Beer_I - Lung cancer vs normals (common genes with Bhattacharjee2) [GE/DX] Su_I - Adenocarcinoma vs squamous (common genes with Bhattacharjee3) [GE/DX Banez - Prostate cancer vs normals [MS/DX] 55

56 Methods: Gene and Peak Selection Algorithms ALL - No feature selection LARS - LARS HITON_PC - HITON_PC_W - HITON_PC+ wrapping phase HITON_MB - HITON_MB_W - HITON_MB + wrapping phase GA_KNN - GA/KNN RFE - RFE with validation of feature subset with optimized polynomial kernel RFE_Guyon - RFE with validation of feature subset with linear kernel (as in Guyon) RFE_POLY - RFE (with polynomial kernel) with validation of feature subset with polynomial optimized kernel RFE_POLY_Guyon - RFE (with polynomial kernel) with validation of feature subset with linear kernel (as in Guyon) SIMCA - SIMCA (Soft Independent Modeling of Class Analogy): PCA based method SIMCA_SVM - SIMCA (Soft Independent Modeling of Class Analogy): PCA based method with validation of feature subset by SVM WFCCM_CCR - Weighted Flexible Compound Covariate Method (WFCCM) applied as in Clinical Cancer Research paper by Yamagata (analysis of microarray data) WFCCM_Lancet - Weighted Flexible Compound Covariate Method (WFCCM) applied as in Lancet paper by Yanagisawa (analysis of mass-spectrometry data) UAF_KW - Univariate with Kruskal-Walis statistic UAF_BW - Univariate with ratio of genes between groups to within group sum of squares UAF_S2N - Univariate with signal-to-noise statistic 56

57 Classification Performance (average over all tasks/datasets) 57

58 How well gene selection works in practice? 58

59 ALL LARS HITONgp_PC HITONgp_MB HITONgp_PC_W HITONgp_MB_W GA_KNN RFE RFE_Guyon RFE_POLY RFE_POLY_Guyon SIMCA SIMCA_SVM WFCCM_CCR UAF_KW UAF_BW UAF_S2N Number of Selected Features (average over all tasks/datasets)

60 Number of Selected Features (zoom on most powerful methods) LARS HITONgp_PC HITONgp_MB HITONgp_PC_W HITONgp_MB_W GA_KNN RFE RFE_Guyon RFE_POLY RFE_POLY_Guyon 60

61 Number of Selected Features (average over all tasks/datasets) 61

62 Conclusions so far Special classifiers (with inherent complexity control) combined with feature selection & careful parameterization protocols overcome over-fitting & estimate future performance accurately. Caveats: analysis is typically complex and error prone. Need: (a) an experienced analyst on the team, or (b) a validated software system designed for nonexperts. 62

63 Software Causal Explorer Gems Fast-aims 63

64 Causal Explorer Matlab library of computational causal discovery and variable selection algorithms Introductory-level library to our causal algorithms (~3% of our algorithms) Discover the direct causal or probabilistic relations around a response variable of interest (e.g., disease is directly caused by and directly causes a set of variables/observed quantities). Discover the set of all direct causal or probabilistic relations among the variables. Discover the Markov blanket of a response variable of interest, i.e., the minimal subset of variables that contains all necessary information to optimally predict the response variable. Code emphasizes efficiency, scalability, and quality of discovery Requires relatively deep understanding of underlying theory and how the algorithms operate 64

65 Statistics of Registered Users 739 registered users in >50 countries. 402 (54%) users are affiliated with educational, governmental, and non-profit organizations 337 (46%) users are either from private or commercial sectors. Major commercial organizations that have registered users of Causal Explorer include: IBM Intel SAS Institute Texas Instruments Siemens GlaxoSmithKline Merck Microsoft 65

66 Statistics of Registered Users Major U.S. institutions that have registered users of Causal Explorer: Boston University Brandies University Carnegie Mellon University Case Western Reserve University Central Washington University College of William and Mary Cornell University Duke University Harvard University Illinois Institute of Technology Indiana University-Purdue University Indianapolis Johns Hopkins University Louisiana State University M. D. Anderson Cancer Center Massachusetts Institute of Technology Medical College of Wisconsin Michigan State University Naval Postgraduate School New York University Northeastern University Northwestern University Oregon State University Pennsylvania State University Princeton University Rutgers University Stanford University State University of New York Tufts University University of Arkansas University of California Berkley University of California Los Angeles University of California San Diego University of California Santa Cruz University of Cincinnati University of Colorado Denver University of Delaware University of Houston-Clear Lake University of Idaho University of Illinois at Chicago University of Illinois at Urbana- Champaign University of Kansas University of Maryland Baltimore County University of Massachusetts Amherst University of Michigan University of New Mexico University of Pennsylvania University of Pittsburgh University of Rochester University of Tennessee Chattanooga University of Texas at Austin University of Utah University of Virginia University of Washington University of Wisconsin- Madison University of Wisconsin- Milwaukee Vanderbilt University Virginia Tech Yale University 66

67 Other systems for supervised analysis of microarray data Name Version Developer Automatic model selection for classifier and gene selection methods ArrayMiner ClassMarker 5.2 Optimal Design, Belgium No Avadis Prophetic 3.3 Strand Genomics, USA No BRB ArrayTools 3.2 Beta National Cancer Institute, USA No University of Pittsburgh and University of cageda supervised (accessed 10/2004) Pittsburgh analysis No Medical Center, of USA microarray data, but Cleaver 1.0 (accessed 10/2004) GeneCluster Stanford University, USA GeneLinker Platinum 4.5 Predictive Patterns Software, Canada No GeneMaths XT 1.02 Applied Maths, Belgium No Broad Institute, Massachusetts Institute of GenePattern No algorithms Technology, or algorithms USA with unknown performance. Genesis Graz University of Technology, Austria No GeneSpring 7 Silicon Genetics, USA No GEPAS There exist many good software packages for Neither system Broad Institute, provides Massachusetts a Institute protocol of for data analysis No that Technology, USA precludes overfitting. A typical software either offers an overabundance of The software packages address needs only of experienced analysts. 1.1 (accessed 10/2004) National Center for Cancer Research (CNIO), Spain No Limited (for number of genes) MultiExperiment Viewer The Institute for Genomic Research, USA No PAM 1.21a Stanford University, USA Partek Predict 6.0 Partek, USA Limited (for a single parameter of the classifier) Limited (does not allow optimization of the choice of gene selection algorithms) Weka Explorer University of Waikato, New Zeland No 67

68 Purpose of GEMS Gene expression data and outcome variable Normal Cancer Cancer Normal Normal Cancer Cancer Cancer Normal Classification model Cross-validation performance estimate Optional: Gene names & IDs ring finger protein 1 tubulin, beta, 5 glucose-6-phosphate dehydrogenase glutathione S-transferase M5 carnitine acetyltransferase Rho GTPase activating protein 4 SMA3 mannose phosphate isomerase mitogen-activated protein kinase 3 leukotriene A4 hydrolase chromosome 21 open reading frame 1 dihydropyrimidinase-like 2 beta-2-microglobulin discs, large (Drosophila) homolog 4 (model generation & performance estimation mode) Reduced set of genes Rho GTPase activating protein 4 SMA3 mannose phosphate isomerase mitogen-activated protein kinase 3 Links to literature 68

69 Purpose of GEMS Gene expression data and unknown outcome variable????????? Model predictions Normal Cancer Cancer Normal Normal Cancer Cancer Cancer Normal Classification model Performance estimate (model application mode) 69

70 MC-SVM Methods Implemented in GEMS Cross-Validation Designs N-Fold CV LOOCV Normalization Techniques [a, b] (x MEAN(x)) / STD(x) Classifiers One-Versus-Rest One-Versus-One DAGSVM Method by WW Method by CS Gene Selection Methods S2N One-Versus-Rest S2N One-Versus-One Non-param. ANOVA x / STD(x) BW ratio x / MEAN(x) x / MEDIAN(x) x / NORM(x) Performance Metrics Accuracy HITON_MB HITON_PC x MEAN(x) RCI x MEDIAN(x) AUC ROC ABS(x) x + ABS(x) 70

71 Software Architecture of GEMS GEMS 2.0 Wizard-Like User Interface Computational Engine Estimate classification performance Generate a classification model and estimate its performance Generate a classification model Apply existing model to a new set of patients Report Generator I II X Cross-Validation Loop for Performance Est. N-Fold CV LOOCV I Cross-Validation Loop for Model Selection N-Fold CV I LOOCV II II Performance Computation Accuracy RCI AUC ROC Normalization Gene Selection S2N One-Versus-Rest S2N One-Versus-One Non-param. ANOVA BW ratio HITON_PC HITON_MB Classification by MC-SVM One-Versus-Rest One-Versus-One DAGSVM Method by WW Method by CS 71

72 GEMS 2.0: Wizard-Like Interface Task selection Dataset specification Cross-validation design Normalization Logging Performance metric Gene selection Classification Report generation Analysis execution 72

73 GEMS 2.0: Wizard-Like Interface Input microarray gene expression dataset File with gene names File with gene accession numbers Output model 73

74 Statistics of registered users 800 users in >50 countries 350 academic & non-profit users 450 private & commercial users 205 scientific citations of major paper that introduced GEMS Major commercial organizations that have registered users of Causal Explorer include: Eli Lilly Novartis IBM GE Genedata Nuvera Biosciences GenomicTree Cogenetics Pronota 74

75 FAST-AIMS FAST-AIMS is a system to support automatic development of high-quality classification models and biomarker discovery in mass spectrometry proteomics data Incorporates automated data analysis protocols of GEMS Deals with additional challenges of MS data analysis 75

76 System Workflow 76

77 Evaluation in multiple user study 77

Center for Health Informatics & Bioinformatics. A New Catalyst For Cutting Edge research, Funding Opportunities, and Education at NYULMC

Center for Health Informatics & Bioinformatics. A New Catalyst For Cutting Edge research, Funding Opportunities, and Education at NYULMC Center for Health Informatics & Bioinformatics A New Catalyst For Cutting Edge research, Funding Opportunities, and Education at NYULMC 1 Current Challenges Biological Research Complex assays/instruments:

More information

University Your selection: 169 universities

University Your selection: 169 universities University Your selection: 169 universities Level of study: bachelor, master Regions: United States, compareuni T eaching & Learning Research Knowledge T ransf er International Orientation Regional Engagement

More information

Description of Procedure or Service. assays_of_genetic_expression_to_determine_prognosis_of_breast_cancer 11/2004 3/2015 3/2016 3/2015

Description of Procedure or Service. assays_of_genetic_expression_to_determine_prognosis_of_breast_cancer 11/2004 3/2015 3/2016 3/2015 Corporate Medical Policy Assays of Genetic Expression to Determine Prognosis of Breast File Name: Origination: Last CAP Review: Next CAP Review: Last Review: assays_of_genetic_expression_to_determine_prognosis_of_breast_cancer

More information

NYU CENTER FOR HEALTH INFORMATICS & BIOINFORMATICS OFFICIAL LAUNCH 11-6-2009

NYU CENTER FOR HEALTH INFORMATICS & BIOINFORMATICS OFFICIAL LAUNCH 11-6-2009 NYU CENTER FOR HEALTH INFORMATICS & BIOINFORMATICS OFFICIAL LAUNCH 11-6-2009 The mission of the Center for Health Informatics and Bioinformatics (CHIBI) is to catalyze transformative changes in biomedicine

More information

US News & World Report Graduate Program Comparison 1994 2015 Year ranking was published

US News & World Report Graduate Program Comparison 1994 2015 Year ranking was published US News & World Report Graduate Program Comparison Year was published Select Findings from US News and World Report - Engineering Schools MIT Engineering Year that was released Rank 1 1 1 1 1 1 1 1 1 1

More information

Psychology NRC Study S Rankings (1 of 6)

Psychology NRC Study S Rankings (1 of 6) 1 2 3 4 5 6 Princeton U. Harvard U. Stanford U. U. of Wisconsin at Madison Yale U. U. of Rochester U. of Michigan at Ann Arbor San Diego State U. and U. of California at San Diego Columbia U. U. of California

More information

Psychology NRC Study R Rankings (1 of 6)

Psychology NRC Study R Rankings (1 of 6) 1 2 3 4 5 6 7 8 Princeton U. Harvard U. Stanford U. U. of Michigan at Ann Arbor Yale U. U. of Wisconsin at Madison U. of Rochester U. of California at Los Angeles Columbia U. Brown U. U. of Chicago U.

More information

How To Rank A Graduate School

How To Rank A Graduate School Graduate School Rankings Debate: U.S. News and World Report --and beyond Every year, U.S. News and World Report publishes a ranking of different graduate programs and, every year, college and university

More information

A leader in the development and application of information technology to prevent and treat disease.

A leader in the development and application of information technology to prevent and treat disease. A leader in the development and application of information technology to prevent and treat disease. About MOLECULAR HEALTH Molecular Health was founded in 2004 with the vision of changing healthcare. Today

More information

BIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology http://tinyurl.com/bioinf525-w16

BIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology http://tinyurl.com/bioinf525-w16 Course Director: Dr. Barry Grant (DCM&B, bjgrant@med.umich.edu) Description: This is a three module course covering (1) Foundations of Bioinformatics, (2) Statistics in Bioinformatics, and (3) Systems

More information

Using Ontologies in Proteus for Modeling Data Mining Analysis of Proteomics Experiments

Using Ontologies in Proteus for Modeling Data Mining Analysis of Proteomics Experiments Using Ontologies in Proteus for Modeling Data Mining Analysis of Proteomics Experiments Mario Cannataro, Pietro Hiram Guzzi, Tommaso Mazza, and Pierangelo Veltri University Magna Græcia of Catanzaro, 88100

More information

Dr Alexander Henzing

Dr Alexander Henzing Horizon 2020 Health, Demographic Change & Wellbeing EU funding, research and collaboration opportunities for 2016/17 Innovate UK funding opportunities in omics, bridging health and life sciences Dr Alexander

More information

Medical Informatics II

Medical Informatics II Medical Informatics II Zlatko Trajanoski Institute for Genomics and Bioinformatics Graz University of Technology http://genome.tugraz.at zlatko.trajanoski@tugraz.at Medical Informatics II Introduction

More information

Resumen Curricular de los Profesores. Jesse Boehm

Resumen Curricular de los Profesores. Jesse Boehm Resumen Curricular de los Profesores Jesse Boehm Jesse Boehm is the assistant director of the Cancer Program at the Broad Institute. In this role, he works closely with Cancer Program director Todd Golub

More information

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear

More information

GrantsNet (for training in the biomedical sciences and undergraduate science education) http://www.grantsnet.org/

GrantsNet (for training in the biomedical sciences and undergraduate science education) http://www.grantsnet.org/ Sponsored by the University of Colorado at Boulder SUMMER RESEARCH OPPORTUNITIES FOR UNDERGRADUATES Sites that have lists of opportunities: American Mathematical Society Research Experiences for Undergraduates

More information

HAVE YOU BEEN NEWLY DIAGNOSED with DCIS?

HAVE YOU BEEN NEWLY DIAGNOSED with DCIS? HAVE YOU BEEN NEWLY DIAGNOSED with DCIS? Jen D. Mother and volunteer. Diagnosed with DCIS breast cancer in 2012. An educational guide prepared by Genomic Health This guide is designed to educate women

More information

How To Use A Breast Cancer Test To Help You Choose Chemotherapy

How To Use A Breast Cancer Test To Help You Choose Chemotherapy Gene expression profiling and expanded immunohistochemistry tests for guiding adjuvant chemotherapy decisions in early breast cancer management: MammaPrint, Oncotype DX, IHC4 and Mammostrat Issued: September

More information

How Can Institutions Foster OMICS Research While Protecting Patients?

How Can Institutions Foster OMICS Research While Protecting Patients? IOM Workshop on the Review of Omics-Based Tests for Predicting Patient Outcomes in Clinical Trials How Can Institutions Foster OMICS Research While Protecting Patients? E. Albert Reece, MD, PhD, MBA Vice

More information

Regulatory Issues in Genetic Testing and Targeted Drug Development

Regulatory Issues in Genetic Testing and Targeted Drug Development Regulatory Issues in Genetic Testing and Targeted Drug Development Janet Woodcock, M.D. Deputy Commissioner for Operations Food and Drug Administration October 12, 2006 Genetic and Genomic Tests are Types

More information

DISCUSSION ITEM ANNUAL REPORT ON NEWLY APPROVED INDIRECT COSTS AND DISCUSSION OF THE RECOVERY OF INDIRECT COSTS FROM RESEARCH BACKGROUND

DISCUSSION ITEM ANNUAL REPORT ON NEWLY APPROVED INDIRECT COSTS AND DISCUSSION OF THE RECOVERY OF INDIRECT COSTS FROM RESEARCH BACKGROUND F2 Office of the President TO MEMBERS OF THE COMMITTEE ON FINANCE: For Meeting of November 17, 2010 DISCUSSION ITEM ANNUAL REPORT ON NEWLY APPROVED INDIRECT COSTS AND DISCUSSION OF THE RECOVERY OF INDIRECT

More information

Vad är bioinformatik och varför behöver vi det i vården? a bioinformatician's perspectives

Vad är bioinformatik och varför behöver vi det i vården? a bioinformatician's perspectives Vad är bioinformatik och varför behöver vi det i vården? a bioinformatician's perspectives Dirk.Repsilber@oru.se 2015-05-21 Functional Bioinformatics, Örebro University Vad är bioinformatik och varför

More information

Association of American Medical College-Affiliated, Liaison Committee on Medical Education-

Association of American Medical College-Affiliated, Liaison Committee on Medical Education- Supplemental Digital Appendix 1 Association of American Medical College-Affiliated, Liaison Committee on Medical Education- Accredited U.S. Medical Schools Included in an Assessment of Clerkship Grading

More information

Ensemble Learning of Colorectal Cancer Survival Rates

Ensemble Learning of Colorectal Cancer Survival Rates Ensemble Learning of Colorectal Cancer Survival Rates Chris Roadknight School of Computing Science University of Nottingham Malaysia Campus Malaysia Chris.roadknight@nottingham.edu.my Uwe Aickelin School

More information

Universities classified as "very high research activity"

Universities classified as very high research activity Universities classified as "very high research activity" 108 institutions classified as "RU/VH: Research Universities (very high research activity)" in the 2010 Carnegie Classification of Institutions

More information

Data, Measurements, Features

Data, Measurements, Features Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are

More information

One of the most mature trials that examined PROCEEDINGS. Hormone Therapy in Postmenopausal Women With Breast Cancer * William J.

One of the most mature trials that examined PROCEEDINGS. Hormone Therapy in Postmenopausal Women With Breast Cancer * William J. Hormone Therapy in Postmenopausal Women With Breast Cancer * William J. Gradishar, MD ABSTRACT *Based on a presentation given by Dr Gradishar at a roundtable symposium held in Baltimore on June 28, 25.

More information

Tuition and Fees. & Room and Board. Costs 2011-12

Tuition and Fees. & Room and Board. Costs 2011-12 National and Regional Comparisons of Tuition and Fees & Room and Board Costs 2011-12 Table of Contents Table of Contents... 1 Comparator Institutions... 3 University of Wyoming Comparator Institutions...

More information

UC AND THE NATIONAL RESEARCH COUNCIL RATINGS OF GRADUATE PROGRAMS

UC AND THE NATIONAL RESEARCH COUNCIL RATINGS OF GRADUATE PROGRAMS UC AND THE NATIONAL RESEARCH COUNCIL RATINGS OF GRADUATE PROGRAMS In the Fall of 1995, the University of California was the subject of some stunning news when the National Research Council (NRC) announced

More information

Mathematical Models of Supervised Learning and their Application to Medical Diagnosis

Mathematical Models of Supervised Learning and their Application to Medical Diagnosis Genomic, Proteomic and Transcriptomic Lab High Performance Computing and Networking Institute National Research Council, Italy Mathematical Models of Supervised Learning and their Application to Medical

More information

Protein Protein Interaction Networks

Protein Protein Interaction Networks Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics

More information

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE ACCELERATING PROGRESS IS IN OUR GENES AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE GENESPRING GENE EXPRESSION (GX) MASS PROFILER PROFESSIONAL (MPP) PATHWAY ARCHITECT (PA) See Deeper. Reach Further. BIOINFORMATICS

More information

> Semantic Web Use Cases and Case Studies

> Semantic Web Use Cases and Case Studies > Semantic Web Use Cases and Case Studies Case Study: Applied Semantic Knowledgebase for Detection of Patients at Risk of Organ Failure through Immune Rejection Robert Stanley 1, Bruce McManus 2, Raymond

More information

Summary of Doctoral Degree Programs in Philosophy

Summary of Doctoral Degree Programs in Philosophy Summary of Doctoral Degree Programs in Philosophy Faculty and Student Demographics All data collected by the ican Philosophical Association. The data in this publication have been provided by the departments

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014 RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer

More information

MultiQuant Software 2.0 for Targeted Protein / Peptide Quantification

MultiQuant Software 2.0 for Targeted Protein / Peptide Quantification MultiQuant Software 2.0 for Targeted Protein / Peptide Quantification Gold Standard for Quantitative Data Processing Because of the sensitivity, selectivity, speed and throughput at which MRM assays can

More information

Microarray Technology

Microarray Technology Microarrays And Functional Genomics CPSC265 Matt Hudson Microarray Technology Relatively young technology Usually used like a Northern blot can determine the amount of mrna for a particular gene Except

More information

Report series: General cancer information

Report series: General cancer information Fighting cancer with information Report series: General cancer information Eastern Cancer Registration and Information Centre ECRIC report series: General cancer information Cancer is a general term for

More information

Graduate Programs Applicant Report 2011

Graduate Programs Applicant Report 2011 Graduate Programs Applicant Report 2011 OHSU data was accumulated: 1 July 2010 to 30 June 2011 and refers to applicant pool for students admitted in summer/fall of 2011. No data from MBA applicants is

More information

Targeted Therapy What the Surgeon Needs to Know

Targeted Therapy What the Surgeon Needs to Know Targeted Therapy What the Surgeon Needs to Know AATS Focus in Thoracic Surgery 2014 David R. Jones, M.D. Professor & Chief, Thoracic Surgery Memorial Sloan Kettering Cancer Center I have no disclosures

More information

STATE OF MICHIGAN DEPARTMENT OF INSURANCE AND FINANCIAL SERVICES Before the Director of Insurance and Financial Services

STATE OF MICHIGAN DEPARTMENT OF INSURANCE AND FINANCIAL SERVICES Before the Director of Insurance and Financial Services STATE OF MICHIGAN DEPARTMENT OF INSURANCE AND FINANCIAL SERVICES Before the Director of Insurance and Financial Services In the matter of: Petitioner, v Blue Care Network of Michigan, Respondent. File

More information

Summary of Doctoral Degree Programs in Philosophy

Summary of Doctoral Degree Programs in Philosophy Summary of Doctoral Degree Programs in Philosophy, Opportunities, and Program Completion All data collected by the American Philosophical Association. The data in this publication have been provided by

More information

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics. Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are

More information

NASPAA s Research Universities Report 3/4/16

NASPAA s Research Universities Report 3/4/16 NASPAA s Research Universities Report 3/4/16 Data Source: 2014-2015 NASPAA Annual Data Report N= 109 schools, 120 programs 70% of Research Universities Fall 2015 Current Enrollment 15552 students Average

More information

Medical School Math Requirements and Recommendations

Medical School Math Requirements and Recommendations Medical School Math Requirements and Recommendations All information in this document comes from the 2010-2011 Medical School Admission Requirements book (commonly known as the MSAR). Students should check

More information

Building a Collaborative Informatics Platform for Translational Research: Prof. Yike Guo Department of Computing Imperial College London

Building a Collaborative Informatics Platform for Translational Research: Prof. Yike Guo Department of Computing Imperial College London Building a Collaborative Informatics Platform for Translational Research: An IMI Project Experience Prof. Yike Guo Department of Computing Imperial College London Living in the Era of BIG Big Data : Massive

More information

Fulfilling the Promise

Fulfilling the Promise Fulfilling the Promise Advancing the Fight Against Cancer: America s Medical Schools and Teaching Hospitals For more than a century, the nation s medical schools and teaching hospitals have worked to understand,

More information

ALCHEMIST (Adjuvant Lung Cancer Enrichment Marker Identification and Sequencing Trials)

ALCHEMIST (Adjuvant Lung Cancer Enrichment Marker Identification and Sequencing Trials) ALCHEMIST (Adjuvant Lung Cancer Enrichment Marker Identification and Sequencing Trials) 3 Integrated Trials Testing Targeted Therapy in Early Stage Lung Cancer Part of NCI s Precision Medicine Effort in

More information

PREDICTIVE ANALYTICS: PROVIDING NOVEL APPROACHES TO ENHANCE OUTCOMES RESEARCH LEVERAGING BIG AND COMPLEX DATA

PREDICTIVE ANALYTICS: PROVIDING NOVEL APPROACHES TO ENHANCE OUTCOMES RESEARCH LEVERAGING BIG AND COMPLEX DATA PREDICTIVE ANALYTICS: PROVIDING NOVEL APPROACHES TO ENHANCE OUTCOMES RESEARCH LEVERAGING BIG AND COMPLEX DATA IMS Symposium at ISPOR at Montreal June 2 nd, 2014 Agenda Topic Presenter Time Introduction:

More information

Medical School Math Requirements and Recommendations

Medical School Math Requirements and Recommendations Medical School Math Requirements and Recommendations All information in this document comes from the 2011-2012 Medical School Admission Requirements book (commonly known as the MSAR). Students should check

More information

A Data Based Assessment of Research Doctorate Programs in the United States

A Data Based Assessment of Research Doctorate Programs in the United States A Data Based Assessment of Resear rch Doctorate Programs in the United States National Research Council Initial Analysis for University of California, Davis Graduat te Program in Civil and Environmental

More information

How many of you have checked out the web site on protein-dna interactions?

How many of you have checked out the web site on protein-dna interactions? How many of you have checked out the web site on protein-dna interactions? Example of an approximately 40,000 probe spotted oligo microarray with enlarged inset to show detail. Find and be ready to discuss

More information

Personalized Predictive Medicine and Genomic Clinical Trials

Personalized Predictive Medicine and Genomic Clinical Trials Personalized Predictive Medicine and Genomic Clinical Trials Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute http://brb.nci.nih.gov brb.nci.nih.gov Powerpoint presentations

More information

Center for Causal Discovery (CCD) of Biomedical Knowledge from Big Data University of Pittsburgh Carnegie Mellon University Pittsburgh Supercomputing

Center for Causal Discovery (CCD) of Biomedical Knowledge from Big Data University of Pittsburgh Carnegie Mellon University Pittsburgh Supercomputing Center for Causal Discovery (CCD) of Biomedical Knowledge from Big Data University of Pittsburgh Carnegie Mellon University Pittsburgh Supercomputing Center Yale University PIs: Ivet Bahar, Jeremy Berg,

More information

Biomedical Informatics for Medical Physicists Introduction

Biomedical Informatics for Medical Physicists Introduction Biomedical Informatics for Medical Physicists Introduction Mark Phillips 2014 AAPM Annual Meeting Biomedical Informatics for Medical Physicists 2014 Introduction AAPM Annual Meeting 1 / 7 Background and

More information

Scientific Thought. Opportunities in Biomedical Sciences. The Traditional Path. Stuart E. Ravnik, Ph.D. Observation

Scientific Thought. Opportunities in Biomedical Sciences. The Traditional Path. Stuart E. Ravnik, Ph.D. Observation Opportunities in Biomedical Sciences Stuart E. Ravnik, Ph.D. Assistant Dean Graduate School of Biomedical Sciences Stuart E. Ravnik, Ph.D. July 24, 2003 Scientific Thought Observation Experimentation Hypothesis

More information

SCHOOL SCHOOL S WEB ADDRESS. HOURS Tempe Arizona Ph.D. 4-5 54-84 January 15 $60 Not given 550/213

SCHOOL SCHOOL S WEB ADDRESS. HOURS Tempe Arizona Ph.D. 4-5 54-84 January 15 $60 Not given 550/213 SCHOOL SCHOOL S WEB ADDRESS 1 Arizona State University http://wpcarey.asu.edu/acc/doctoral.cfm 2 Baruch College CUNY http://zicklin.baruch.cuny.edu/programs/doctoral/areas-of-study/accounting 3 Bentley

More information

Cancer Biostatistics Workshop Science of Doing Science - Biostatistics

Cancer Biostatistics Workshop Science of Doing Science - Biostatistics Cancer Biostatistics Workshop Science of Doing Science - Biostatistics Yu Shyr, PhD Jan. 18, 2008 Cancer Biostatistics Center Vanderbilt-Ingram Cancer Center Yu.Shyr@vanderbilt.edu Aims Cancer Biostatistics

More information

Prostate Cancer. Treatments as unique as you are

Prostate Cancer. Treatments as unique as you are Prostate Cancer Treatments as unique as you are UCLA Prostate Cancer Program Prostate cancer is the second most common cancer among men. The UCLA Prostate Cancer Program brings together the elements essential

More information

Molecular Genetics: Challenges for Statistical Practice. J.K. Lindsey

Molecular Genetics: Challenges for Statistical Practice. J.K. Lindsey Molecular Genetics: Challenges for Statistical Practice J.K. Lindsey 1. What is a Microarray? 2. Design Questions 3. Modelling Questions 4. Longitudinal Data 5. Conclusions 1. What is a microarray? A microarray

More information

BenefitsMonitor National Higher Education Participants. Mercer Health & Benefits 20

BenefitsMonitor National Higher Education Participants. Mercer Health & Benefits 20 BenefitsMonitor National Higher Education Participants Arizona State University Austin Peay State University Bates College Baylor College of Medicine Baylor University Boston University Bowling Green State

More information

3. Career Tools Podcasts

3. Career Tools Podcasts Workshop minutes: Title: Young Mass spectrometrists Workshop Date: June 1 st, 2015 Host: Olga Friese and Kristin Wildsmith Panelist: Industry: Lisa Marzilli, Daniel Spellman Academia: Leslie Hicks Attendees:

More information

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376 Course Director: Dr. Kayvan Najarian (DCM&B, kayvan@umich.edu) Lectures: Labs: Mondays and Wednesdays 9:00 AM -10:30 AM Rm. 2065 Palmer Commons Bldg. Wednesdays 10:30 AM 11:30 AM (alternate weeks) Rm.

More information

Update on the Status of Computational Science and Engineering in U.S. Graduate Programs

Update on the Status of Computational Science and Engineering in U.S. Graduate Programs HPCERC1999 023 28 September 1999 Update on the Status of Computational Science and Engineering in U.S. Graduate Programs Author Martha Lee Ennis The University of New Mexico High Performance Computing,

More information

Q2 Which university will you be attending? American University (366) Arizona State University (367) Boston University (368) Brown University (439)

Q2 Which university will you be attending? American University (366) Arizona State University (367) Boston University (368) Brown University (439) Decline Offer Survey Thank you for considering the University of Washington during your search for a graduate degree program. We understand that many factors influence each applicant s decision in selecting

More information

2010 NRC R and S Rankings of UC Santa Cruz Research-Doctorate Programs

2010 NRC R and S Rankings of UC Santa Cruz Research-Doctorate Programs 2010 NRC R and S Rankings of UC Santa Cruz Research-Doctorate Programs UCSC Program NRC Field Charts Anthropology Anthropology R Chart S Chart Astronomy & Astrophysics Astrophysics & Astronomy R Chart

More information

Big Data for Population Health and Personalised Medicine through EMR Linkages

Big Data for Population Health and Personalised Medicine through EMR Linkages Big Data for Population Health and Personalised Medicine through EMR Linkages Zheng-Ming CHEN Professor of Epidemiology Nuffield Dept. of Population Health, University of Oxford Big Data for Health Policy

More information

Courses -Alabama- University of Alabama 2 Must be met with English courses University of South Alabama

Courses -Alabama- University of Alabama 2 Must be met with English courses University of South Alabama Compiled Medical School English Requirements PLEASE NOTE: this information can change and it is best to check with the individual school to be certain of what the current requirements are at the current

More information

Preprocessing, Management, and Analysis of Mass Spectrometry Proteomics Data

Preprocessing, Management, and Analysis of Mass Spectrometry Proteomics Data Preprocessing, Management, and Analysis of Mass Spectrometry Proteomics Data M. Cannataro, P. H. Guzzi, T. Mazza, and P. Veltri Università Magna Græcia di Catanzaro, Italy 1 Introduction Mass Spectrometry

More information

An Introduction to Genomics and SAS Scientific Discovery Solutions

An Introduction to Genomics and SAS Scientific Discovery Solutions An Introduction to Genomics and SAS Scientific Discovery Solutions Dr Karen M Miller Product Manager Bioinformatics SAS EMEA 16.06.03 Copyright 2003, SAS Institute Inc. All rights reserved. 1 Overview!

More information

NIH/NIGMS Trainee Forum: Computational Biology and Medical Informatics at Georgia Tech

NIH/NIGMS Trainee Forum: Computational Biology and Medical Informatics at Georgia Tech ACM-BCB 2015 (Sept. 10 th, 10:00am-12:30pm) NIH/NIGMS Trainee Forum: Computational Biology and Medical Informatics at Georgia Tech Chair: Professor Greg Gibson Georgia Institute of Technology Co-Chair:

More information

MACHINE LEARNING BASICS WITH R

MACHINE LEARNING BASICS WITH R MACHINE LEARNING [Hands-on Introduction of Supervised Machine Learning Methods] DURATION 2 DAY The field of machine learning is concerned with the question of how to construct computer programs that automatically

More information

An Introduction to Data Mining

An Introduction to Data Mining An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail

More information

Tufts University Senior Survey 2010 Graduate Schools by Major Report

Tufts University Senior Survey 2010 Graduate Schools by Major Report Tufts University Graduate Schools by Major Report Note: This report includes both first and second major data, resulting in several students appearing twice under their first major and again under their

More information

Member Institutions. The leading recruiting source for postgraduate life scientists

Member Institutions. The leading recruiting source for postgraduate life scientists MEDIA KIT Albert Einstein Brandeis University Brown University California Institute of Technology Cedars Sinai City of Hope Columbia University Emory University Florida State University Georgetown University

More information

in the Rankings U.S. News & World Report

in the Rankings U.S. News & World Report in the Rankings UCLA performs very well in all the national and international rankings of the best public and private universities, including the most widely known list published by U.S. News & World Report.

More information

Biomedical Big Data and Precision Medicine

Biomedical Big Data and Precision Medicine Biomedical Big Data and Precision Medicine Jie Yang Department of Mathematics, Statistics, and Computer Science University of Illinois at Chicago October 8, 2015 1 Explosion of Biomedical Data 2 Types

More information

A Primer of Genome Science THIRD

A Primer of Genome Science THIRD A Primer of Genome Science THIRD EDITION GREG GIBSON-SPENCER V. MUSE North Carolina State University Sinauer Associates, Inc. Publishers Sunderland, Massachusetts USA Contents Preface xi 1 Genome Projects:

More information

Big Data Trends A Basis for Personalized Medicine

Big Data Trends A Basis for Personalized Medicine Big Data Trends A Basis for Personalized Medicine Dr. Hellmuth Broda, Principal Technology Architect emedikation: Verordnung, Support Prozesse & Logistik 5. Juni, 2013, Inselspital Bern Over 150,000 Employees

More information

Molecular Diagnostics in Cancer Testing

Molecular Diagnostics in Cancer Testing Product Sheet More information at http://www.biomarketgroup.com/market-research-report/molecular-diagnostics-in-cancertesting.html Molecular Diagnostics in Cancer Testing Published: 2015-AUG-01 Pages:

More information

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources 1 of 8 11/7/2004 11:00 AM National Center for Biotechnology Information About NCBI NCBI at a Glance A Science Primer Human Genome Resources Model Organisms Guide Outreach and Education Databases and Tools

More information

Pharmacology skills for drug discovery. Why is pharmacology important?

Pharmacology skills for drug discovery. Why is pharmacology important? skills for drug discovery Why is pharmacology important?, the science underlying the interaction between chemicals and living systems, emerged as a distinct discipline allied to medicine in the mid-19th

More information

Biomarker Discovery and Data Visualization Tool for Ovarian Cancer Screening

Biomarker Discovery and Data Visualization Tool for Ovarian Cancer Screening , pp.169-178 http://dx.doi.org/10.14257/ijbsbt.2014.6.2.17 Biomarker Discovery and Data Visualization Tool for Ovarian Cancer Screening Ki-Seok Cheong 2,3, Hye-Jeong Song 1,3, Chan-Young Park 1,3, Jong-Dae

More information

Knowledge Discovery from patents using KMX Text Analytics

Knowledge Discovery from patents using KMX Text Analytics Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs anton.heijs@treparel.com Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers

More information

Statistics Graduate Courses

Statistics Graduate Courses Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

More information

NIH 2009* Total $ Awarded. NIH 2009 Rank

NIH 2009* Total $ Awarded. NIH 2009 Rank Organization Name (Schools of Nursing) 2009* Total $ Awarded 2009 2008 Total $ Awarded 2008 2007 UNIVERSITY OF PENNSYLVANIA $10,908,657 1 $7,721,221 2 4 5 UNIVERSITY OF CALIFORNIA SAN FRANCISCO $8,780,469

More information

A NEW COLLABORATIVE WEB-BASED DATABASE ARCHITECTURE FOR COMMUNITY- BASED PHARMACEUTICAL RESEARCH

A NEW COLLABORATIVE WEB-BASED DATABASE ARCHITECTURE FOR COMMUNITY- BASED PHARMACEUTICAL RESEARCH A NEW COLLABORATIVE WEB-BASED DATABASE ARCHITECTURE FOR COMMUNITY- BASED PHARMACEUTICAL RESEARCH Collaborative Drug Discovery, Inc. Sean Ekins PhD, DSc. Using CDD can save time, money and improve discovery

More information

Dental School Additional Required Courses Job Shadowing/ # of hours Alabama

Dental School Additional Required Courses Job Shadowing/ # of hours Alabama Dental Schools with Math and/or Advanced Science Requirements for the 2015 Application Cycle For nearly all U.S. dental schools, the minimum required science courses for admission include one year each

More information

Final Project Report

Final Project Report CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes

More information

INTERNATIONAL JOURNAL FOR ENGINEERING APPLICATIONS AND TECHNOLOGY DATA MINING IN HEALTHCARE SECTOR. ankitanandurkar2394@gmail.com

INTERNATIONAL JOURNAL FOR ENGINEERING APPLICATIONS AND TECHNOLOGY DATA MINING IN HEALTHCARE SECTOR. ankitanandurkar2394@gmail.com IJFEAT INTERNATIONAL JOURNAL FOR ENGINEERING APPLICATIONS AND TECHNOLOGY DATA MINING IN HEALTHCARE SECTOR Bharti S. Takey 1, Ankita N. Nandurkar 2,Ashwini A. Khobragade 3,Pooja G. Jaiswal 4,Swapnil R.

More information

Independent Validation of the Prognostic Gene Expression Ratio Test in Formalin Fixed, Paraffin Embedded (FFPE) Mesothelioma Tumor Tissue Specimens

Independent Validation of the Prognostic Gene Expression Ratio Test in Formalin Fixed, Paraffin Embedded (FFPE) Mesothelioma Tumor Tissue Specimens Independent Validation of the Prognostic Gene Expression Ratio Test in Formalin Fixed, Paraffin Embedded (FFPE) Mesothelioma Tumor Tissue Specimens Assunta De Rienzo, Ph.D. 1, Robert W. Cook, Ph.D. 2,

More information

M.S. AND PH.D. IN BIOMEDICAL ENGINEERING

M.S. AND PH.D. IN BIOMEDICAL ENGINEERING M.S. AND PH.D. IN BIOMEDICAL ENGINEERING WHEREAS, the Board of Visitors recently approved the Virginia Tech-Wake Forest University School of Biomedical Engineering and Sciences (SBES) to form a joint research

More information

Core Facility Genomics

Core Facility Genomics Core Facility Genomics versatile genome or transcriptome analyses based on quantifiable highthroughput data ascertainment 1 Topics Collaboration with Harald Binder and Clemens Kreutz Project: Microarray

More information

National Bureau for Academic Accreditation And Education Quality Assurance LINGUISTICS # UNIVERSITY CITY STATE DEGREE MAJOR SPECIALTY RESTRICTION

National Bureau for Academic Accreditation And Education Quality Assurance LINGUISTICS # UNIVERSITY CITY STATE DEGREE MAJOR SPECIALTY RESTRICTION 1 UNIVERSITY OF MASSACHUSETTS - BOSTON ~ BOSTON MA M 1 ARIZONA STATE UNIVERSITY - TEMPE TEMPE AZ MD ~ M for Linguistics is for Residential Program ONLY. The online option is not ~ M in Linguistics is for

More information

How To Change Medicine

How To Change Medicine P4 Medicine: Personalized, Predictive, Preventive, Participatory A Change of View that Changes Everything Leroy E. Hood Institute for Systems Biology David J. Galas Battelle Memorial Institute Version

More information

Clinical Trial Designs for Incorporating Multiple Biomarkers in Combination Studies with Targeted Agents

Clinical Trial Designs for Incorporating Multiple Biomarkers in Combination Studies with Targeted Agents Clinical Trial Designs for Incorporating Multiple Biomarkers in Combination Studies with Targeted Agents J. Jack Lee, Ph.D. Department of Biostatistics 3 Primary Goals for Clinical Trials Test safety and

More information

Hacking Brain Disease for a Cure

Hacking Brain Disease for a Cure Hacking Brain Disease for a Cure Magali Haas, CEO & Founder #P4C2014 Innovator Presentation 2 Brain Disease is Personal The Reasons We Fail in CNS Major challenges hindering CNS drug development include:

More information

Changes in Breast Cancer Reports After Second Opinion. Dr. Vicente Marco Department of Pathology Hospital Quiron Barcelona. Spain

Changes in Breast Cancer Reports After Second Opinion. Dr. Vicente Marco Department of Pathology Hospital Quiron Barcelona. Spain Changes in Breast Cancer Reports After Second Opinion Dr. Vicente Marco Department of Pathology Hospital Quiron Barcelona. Spain Second Opinion in Breast Pathology Usually requested when a patient is referred

More information

Data Mining Techniques for Prognosis in Pancreatic Cancer

Data Mining Techniques for Prognosis in Pancreatic Cancer Data Mining Techniques for Prognosis in Pancreatic Cancer by Stuart Floyd A Thesis Submitted to the Faculty of the WORCESTER POLYTECHNIC INSTITUE In partial fulfillment of the requirements for the Degree

More information