Big Challenges of Big Data - What are the statistical tasks for the precision medicine era?
|
|
- Clemence Alexander
- 8 years ago
- Views:
Transcription
1 Big Challenges of Big Data - What are the statistical tasks for the precision medicine era? Oct 18, 2015 Yu Shyr, Ph.D. Vanderbilt Center for Quantitative Sciences
2 Highlights Overview of the BIG data in biomedical research Future of the BIG data in biomedical research Statistical challenges & tasks Vanderbilt University Precision Medicine Initiative
3
4
5 President Obama s Precision Medicine Initiative January 30 th, 2015 President s 2016 Budget will provide a $215 million investment to support this effort, including: $130 million to NIH for development of a voluntary national research cohort of a million or more volunteers to propel our understanding of health and disease and set the foundation for a new way of doing research through engaged participants and open, responsible data sharing. $70 million to the National Cancer Institute (NCI), part of NIH, to scale up efforts to identify genomic drivers in cancer and apply that knowledge in the development of more effective approaches to cancer treatment.
6 President Obama s Precision Medicine Initiative January 30 th, 2015 $10 million to FDA to acquire additional expertise and advance the development of high quality, curated databases to support the regulatory structure needed to advance innovation in precision medicine and protect public health. $5 million to ONC to support the development of interoperability standards and requirements that address privacy and enable secure exchange of data across systems.
7 Omics biomedical research Microarray: cdna (about 5,000 variables), Affymetrix U133 Plus 2.0 (about 45,000 variables) SNPs (about 500,000 2,000,000 variables) Next Generation Sequencing (?)
8 Storage of the Data? cdna, Microarray, SNPs NGseq raw imaging data: > 2 TB per sample RNAseq or Exome seq data: 10 GB per sample (raw data), GB during the processing. Whole genome seq: 200 GB per sample (raw data), GB during the processing.
9 Raw 1:N:0:ATCACG NTGGAGTCCTAGGCACAGCTCTAAGCCTCCTTATTCGAGCCGAGCTGGGCC + #4=DDDDDDDDDDE<DAEEEIDFEIEIEIEIIIIIIDEDDDDA@DDDDII@
10
11 RNA Sequencing
12 Why is RNAseq data more difficult to analyze? There are a lot of zeros in the data (count data) The range of the count data is very wide Large variation Usually a small sample size Need to ensure fair comparisons between conditions, sometimes also between genes.
13 NGS Data Analysis
14 Culture of Reproducibility In 2015, Institute of Medicine of the National Academies formed a committee to study the Clinical Development and Use of Biomarkers for Molecularly Targeted Therapies In testimony before Congress on March 5 th, 2013 Bruce Alberts, then the editor-in-chief of Science, outlined what needs to be done to bolster the credibility of the scientific enterprise. Journals must do more to enforce standards. Budding scientists must be taught technical skills, including statistics, and must be imbued with skepticism towards their own results and those of others.
15 This should have been a warning that the big data were over-fitting the small number of cases a standard concern in data analysis.
16 Using the NCI60 to Predict Sensitivity Potti et al (2006), Nature Medicine, 12: The main conclusion is that we can use microarray data from cell lines (the NCI60) to define drug response signatures, which can be used to predict whether patients will respond. They provide examples using 7 commonly used agents.
17 Top Headlines The Cancer Letter (7/23/2010) Thirty-three biostatisticians sent a letter to NCI Director Harold Varmus urged the organization to suspend three trials until a more rigorous investigation of Potti s work is completed.
18 Top Headlines The Cancer Letter (7/23/2010) A Baron, K Bandeen-Roche, D Berry, J Bryan, V Carey, K Chaloner, M Delorenzi, B Efron, R Elston, D Ghosh, J Goldberg, S Goodman, F Harrell, S Hilsenbeck, W Huber, R Irizarry, C Kendziorski, M Kosorok, T Louis, JS Marron, M Newton, M Ochs, G Parmigiani, J Quackenbush, G Rosner, I Ruczinski, Y Shyr, S Skates, TP Speed, JD Storey, Z Szallasi, R Tibshirani, S Zeger
19 From: William T Barry [mailto:bill.barry@duke.edu] Sent: Thursday, November 18, :10 AM To: Shyr, Yu Subject: Request from Duke University s Institute for Genome Sciences and Policy Dear Dr Shyr, Duke University s Institute for Genome Sciences and Policy (Duke IGSP) currently has 3 actively enrolling genomics cancer trials that are monitored by an independent, 5-member Data Safety and Monitoring Board-Oversight Committee (DSMB-OC). The primary objective of these trials is validation of genomic biomarkers in a prospective clinical setting. I invite your participation to serve on this Board. Duke IGSP seeks members with specific professional expertise and who are completely independent of financial or scientific interest or other potential conflict of interest with the clinical genomic studies or Duke University. The DSMB-OC meets three-time a year not only to assure patient safety by reviewing enrollment and safety data, but also to review trial procedures and processes. Duke IGSP would welcome your participation to serve on its DSMB-OC.
20 What did we learn? The most common mistakes are simple Confounding in the Experimental Design: Mixing up the sample labels Mixing up the gene labels Mixing up the group labels 26 (13 completed and 13 partial) very top journal papers withdrew. You need at least one quantitative scientist in your team.
21 The log files of the statistical analyses (not the results) should be added to the supplemental data. This will help readers understand the detailed statistical analysis procedures.
22 Recent issues in the reproducibility of computational research have surfaced: Scientific papers commonly leave out experimental details necessary for reproduction Studies have shown difficulty replicating published experimental results Recent increase in retracted papers High number of failing clinical trials
23 Culture of Reproducibility To increase the trust in computational research, it is necessary for individual researchers, institutions, funding bodies, and journals to establish a culture of reproducibility. At a minimum, research should be sufficiently documented for the researchers themselves to reproduce their results.
24 Rule 1: For every result, keep track of how it was produced Rule 2: Avoid manual data manipulation steps Rule 3: Archive the exact versions of all external programs used Rule 4: Version control all custom scripts (Subversion, Git) Rule 5: Record all intermediate results, when possible in standardized formats
25 Rule 6: For analyses with randomness, note underlying random seed Rule 7: Always store raw data behind plots Rule 8: Generate hierarchical analysis output, allowing layers of increasing detail to be inspected Rule 9: Connect textual statements to underlying results Rule 10: Provide public access to scripts, runs, and results
26 Microbiome and PheWAS
27
28 The launch of the US BRAIN and European Human Brain Projects coincides with growing international efforts toward transparency and increased access to publicly funded research in the neurosciences. However, big science efforts are not the only drivers of data-sharing needs, as neuroscientists across the full spectrum of research grapple with the overwhelming volume of data being generated daily and a scientific environment that is increasingly focused oncollaboration.
29 The authors consider the issue of sharing of the richly diverse and heterogeneous small data sets produced by individual neuroscientists, so-called long-tail data. The utility of these data, the diversity of repositories and options available for sharing such data, and emerging best practices.
30 Ridge Regression Analysis Ridge regression reduces this variability by shrinking the coefficients, resulting in more prediction accuracy at the cost of usually only a small increase of bias. In Ridge regression, the coefficients are shrunken towards zero, but will never become exactly zero. So, when the number of predictors is large, Ridge regression will not provide a sparse model that is easy to interpret.
31 Regression Analysis The Lasso was developed by Tibshirani (1996) to improve both prediction accuracy and model interpretability by combining the nice features of Ridge regression and subset selection. The Lasso reduces the variability of the estimates by shrinking the coefficients and at the same time produces interpretable models by shrinking some coefficients to exactly zero.
32 Elastic Net Analysis Zou and Hastie (2005) proposed the Elastic Net to overcome the limitations of the Lasso in some situations. The Elastic Net also combines shrinkage and variable selection, and in addition encourages grouping of variables: groups of highly correlated variables tend to be selected together, where the Lasso would only select one variable of the group.
33 Regression Analysis Also, in the case P >> N, Lasso algorithms are limited because at most N variables can be selected. Zou and Hastie (2005) conjecture that, whenever Ridge regression improves on OLS, the Elastic Net will improve the Lasso.
34 Lasso and Elastic Net Elastic net is a related technique. Elastic net is a hybrid of ridge regression and lasso regularization. Like lasso, elastic net can generate reduced models by generating zero-valued coefficients. Empirical studies have suggested that the elastic net technique can outperform lasso on data with highly correlated predictors.
35 Definition of Ridge Regression, Lasso, EN The loss functions for Ridge regression, the Lasso, and the Elastic Net can be viewed as constrained versions of the ordinary least squares (OLS) regression loss function. In Ridge regression, the sum of squares of the coefficients is constrained as follows:
36 Definition of Ridge Regression, Lasso, EN The Lasso constrains the sum of the absolute values of the coefficients: with t 1 the Lasso tuning parameter.
37 Definition of Ridge Regression, Lasso, EN Finally, the Elastic Net combines the Ridge regression and the Lasso constraints:
38 Summary Lasso The lasso technique solves this regularization problem. For a given value of λ, a nonnegative parameter, lasso solves the problem
39 Summary Lasso As λ increases, the number of nonzero components of β decreases. The lasso problem involves the L 1 norm of β, as contrasted with the elastic net algorithm.
40 Summary Elastic Net The elastic net technique solves this regularization problem. For an α strictly between 0 and 1, and a nonnegative λ, elastic net solves the problem where
41 Summary Elastic Net Elastic net is the same as lasso when α = 1. As α shrinks toward 0, elastic net approaches ridge regression. For other values of α, the penalty term P α (β) interpolates between the L 1 norm of β and the squared L 2 norm of β.
42 Limitations of the lasso The group lasso and sparse group lasso acts like the lasso at the group level depending on λ. In fact if the group sizes are all one, it reduces to the lasso. In group lasso, if a group of parameters is non-zero, they will all be non-zero. The sparse group lasso yields sparsity at both the group and individual feature levels, in order to select groups and predictors within a group.
43 Definition of Ridge Regression, Lasso, EN These constrained loss functions can also be written as penalized loss functions:
44 NATURE REVIEWS CANCER VOLUME 13 NOVEMBER 2013
45 Microbiome research is just one of many flavors of the big data projects that have become ubiquitous in the life sciences. Brain scientists are attempting to map all of the 86 billion neurons in the human brain and catalog the trillions of connections they make with other neurons. As science moves toward big data endeavors, so grows the concern that much of what is discovered is fool s gold.
46 Studying microbiome : 16S rdna gene sequencing 16S rrna gene is found in all bacterial species Variable sequence can be thought of as a molecular fingerprint. Can be used to identify bacterial genera and species. Degenerate primers are designed form the conserved region. Large public databases available for comparison.
47 Sequence clustering into OTUs (Operational Taxonomic Units)
48
49
50 Statistical methods Sparse Dirichlet-multinomial Regression for simultaneous selection of microbiome-associated variables and their affected taxa Kernel-based Regression Methods for testing the effect of microbiome composition on the clinical/biological outcome(s). Network analysis
51 END
52 Questions
Model selection in R featuring the lasso. Chris Franck LISA Short Course March 26, 2013
Model selection in R featuring the lasso Chris Franck LISA Short Course March 26, 2013 Goals Overview of LISA Classic data example: prostate data (Stamey et. al) Brief review of regression and model selection.
More informationPredictive Gene Signature Selection for Adjuvant Chemotherapy in Non-Small Cell Lung Cancer Patients
Predictive Gene Signature Selection for Adjuvant Chemotherapy in Non-Small Cell Lung Cancer Patients by Li Liu A practicum report submitted to the Department of Public Health Sciences in conformity with
More informationRegularized Logistic Regression for Mind Reading with Parallel Validation
Regularized Logistic Regression for Mind Reading with Parallel Validation Heikki Huttunen, Jukka-Pekka Kauppi, Jussi Tohka Tampere University of Technology Department of Signal Processing Tampere, Finland
More informationLasso on Categorical Data
Lasso on Categorical Data Yunjin Choi, Rina Park, Michael Seo December 14, 2012 1 Introduction In social science studies, the variables of interest are often categorical, such as race, gender, and nationality.
More informationCommonwealth Advanced Data Analytics Alliance & The President s Precision Medicine Initiative
Commonwealth Advanced Data Analytics Alliance & The President s Precision Medicine Initiative Deputy Secretary Anthony Fung Presentation to the Health IT Standards Advisory Committee December 17, 2015
More informationSpeaker First Plenary Session THE USE OF "BIG DATA" - WHERE ARE WE AND WHAT DOES THE FUTURE HOLD? William H. Crown, PhD
Speaker First Plenary Session THE USE OF "BIG DATA" - WHERE ARE WE AND WHAT DOES THE FUTURE HOLD? William H. Crown, PhD Optum Labs Cambridge, MA, USA Statistical Methods and Machine Learning ISPOR International
More informationIntegrating DNA Motif Discovery and Genome-Wide Expression Analysis. Erin M. Conlon
Integrating DNA Motif Discovery and Genome-Wide Expression Analysis Department of Mathematics and Statistics University of Massachusetts Amherst Statistics in Functional Genomics Workshop Ascona, Switzerland
More informationSocial Media Aided Stock Market Predictions by Sparsity Induced Regression
Social Media Aided Stock Market Predictions by Sparsity Induced Regression Delft Center for Systems and Control Social Media Aided Stock Market Predictions by Sparsity Induced Regression For the degree
More informationCross Validation techniques in R: A brief overview of some methods, packages, and functions for assessing prediction models.
Cross Validation techniques in R: A brief overview of some methods, packages, and functions for assessing prediction models. Dr. Jon Starkweather, Research and Statistical Support consultant This month
More informationRegulatory Issues in Genetic Testing and Targeted Drug Development
Regulatory Issues in Genetic Testing and Targeted Drug Development Janet Woodcock, M.D. Deputy Commissioner for Operations Food and Drug Administration October 12, 2006 Genetic and Genomic Tests are Types
More informationJust the Facts: A Basic Introduction to the Science Underlying NCBI Resources
1 of 8 11/7/2004 11:00 AM National Center for Biotechnology Information About NCBI NCBI at a Glance A Science Primer Human Genome Resources Model Organisms Guide Outreach and Education Databases and Tools
More informationA leader in the development and application of information technology to prevent and treat disease.
A leader in the development and application of information technology to prevent and treat disease. About MOLECULAR HEALTH Molecular Health was founded in 2004 with the vision of changing healthcare. Today
More informationStatistical issues in the analysis of microarray data
Statistical issues in the analysis of microarray data Daniel Gerhard Institute of Biostatistics Leibniz University of Hannover ESNATS Summerschool, Zermatt D. Gerhard (LUH) Analysis of microarray data
More informationNext Generation Sequencing: Adjusting to Big Data. Daniel Nicorici, Dr.Tech. Statistikot Suomen Lääketeollisuudessa 29.10.2013
Next Generation Sequencing: Adjusting to Big Data Daniel Nicorici, Dr.Tech. Statistikot Suomen Lääketeollisuudessa 29.10.2013 Outline Human Genome Project Next-Generation Sequencing Personalized Medicine
More informationBuilding risk prediction models - with a focus on Genome-Wide Association Studies. Charles Kooperberg
Building risk prediction models - with a focus on Genome-Wide Association Studies Risk prediction models Based on data: (D i, X i1,..., X ip ) i = 1,..., n we like to fit a model P(D = 1 X 1,..., X p )
More informationBayesian Penalized Methods for High Dimensional Data
Bayesian Penalized Methods for High Dimensional Data Joseph G. Ibrahim Joint with Hongtu Zhu and Zakaria Khondker What is Covered? Motivation GLRR: Bayesian Generalized Low Rank Regression L2R2: Bayesian
More informationEffective Linear Discriminant Analysis for High Dimensional, Low Sample Size Data
Effective Linear Discriant Analysis for High Dimensional, Low Sample Size Data Zhihua Qiao, Lan Zhou and Jianhua Z. Huang Abstract In the so-called high dimensional, low sample size (HDLSS) settings, LDA
More informationMachine Learning Big Data using Map Reduce
Machine Learning Big Data using Map Reduce By Michael Bowles, PhD Where Does Big Data Come From? -Web data (web logs, click histories) -e-commerce applications (purchase histories) -Retail purchase histories
More informationPredicting daily incoming solar energy from weather data
Predicting daily incoming solar energy from weather data ROMAIN JUBAN, PATRICK QUACH Stanford University - CS229 Machine Learning December 12, 2013 Being able to accurately predict the solar power hitting
More informationConnecting Basic Research and Healthcare Big Data
Elsevier Health Analytics WHS 2015 Big Data in Health Connecting Basic Research and Healthcare Big Data Olaf Lodbrok Managing Director Elsevier Health Analytics o.lodbrok@elsevier.com t +49 89 5383 600
More informationBiomedical Big Data and Precision Medicine
Biomedical Big Data and Precision Medicine Jie Yang Department of Mathematics, Statistics, and Computer Science University of Illinois at Chicago October 8, 2015 1 Explosion of Biomedical Data 2 Types
More informationCOMMON METHODOLOGICAL ISSUES FOR CER IN BIG DATA
COMMON METHODOLOGICAL ISSUES FOR CER IN BIG DATA Harvard Medical School and Harvard School of Public Health sharon@hcp.med.harvard.edu December 2013 1 / 16 OUTLINE UNCERTAINTY AND SELECTIVE INFERENCE 1
More informationlife science data mining
life science data mining - '.)'-. < } ti» (>.:>,u» c ~'editors Stephen Wong Harvard Medical School, USA Chung-Sheng Li /BM Thomas J Watson Research Center World Scientific NEW JERSEY LONDON SINGAPORE.
More informationLocation matters. 3 techniques to incorporate geo-spatial effects in one's predictive model
Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model Xavier Conort xavier.conort@gear-analytics.com Motivation Location matters! Observed value at one location is
More informationHow is Big Data Different? A Paradigm Shift
How is Big Data Different? A Paradigm Shift Jennifer Clarke, Ph.D. Associate Professor Department of Statistics Department of Food Science and Technology University of Nebraska Lincoln ASA Snake River
More informationPenalized Logistic Regression and Classification of Microarray Data
Penalized Logistic Regression and Classification of Microarray Data Milan, May 2003 Anestis Antoniadis Laboratoire IMAG-LMC University Joseph Fourier Grenoble, France Penalized Logistic Regression andclassification
More informationCancer Biostatistics Workshop Science of Doing Science - Biostatistics
Cancer Biostatistics Workshop Science of Doing Science - Biostatistics Yu Shyr, PhD Jan. 18, 2008 Cancer Biostatistics Center Vanderbilt-Ingram Cancer Center Yu.Shyr@vanderbilt.edu Aims Cancer Biostatistics
More informationVision for the Cohort and the Precision Medicine Initiative Francis S. Collins, M.D., Ph.D. Director, National Institutes of Health Precision
Vision for the Cohort and the Precision Medicine Initiative Francis S. Collins, M.D., Ph.D. Director, National Institutes of Health Precision Medicine Initiative: Building a Large U.S. Research Cohort
More informationEuro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences
Euro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences WP11 Data Storage and Analysis Task 11.1 Coordination Deliverable 11.2 Community Needs of
More informationPenalized regression: Introduction
Penalized regression: Introduction Patrick Breheny August 30 Patrick Breheny BST 764: Applied Statistical Modeling 1/19 Maximum likelihood Much of 20th-century statistics dealt with maximum likelihood
More informationCausal Leading Indicators Detection for Demand Forecasting
Causal Leading Indicators Detection for Demand Forecasting Yves R. Sagaert, El-Houssaine Aghezzaf, Nikolaos Kourentzes, Bram Desmet Department of Industrial Management, Ghent University 13/07/2015 EURO
More informationCase Study Life Sciences Data
Case Study Life Sciences Data Centre for Integrative Systems Biology and Bioinformatics www.imperial.ac.uk/bioinfsupport Sarah Butcher s.butcher@imperial.ac.uk www.imperial.ac.uk/bioinfsupport Bio-data
More informationPredicting Health Care Costs by Two-part Model with Sparse Regularization
Predicting Health Care Costs by Two-part Model with Sparse Regularization Atsuyuki Kogure Keio University, Japan July, 2015 Abstract We consider the problem of predicting health care costs using the two-part
More informationStatistics in Medicine Research Lecture Series CSMC Fall 2014
Catherine Bresee, MS Senior Biostatistician Biostatistics & Bioinformatics Research Institute Statistics in Medicine Research Lecture Series CSMC Fall 2014 Overview Review concept of statistical power
More informationBig data, Genomics and Public Health: Big Data meets DNA
Big data, Genomics and Public Health: Big Data meets DNA Winston Hide, Harvard School of Public Health and Harvard Stem Cell Institute Critical Data - Secondary use of Big Data from Critical Care - January
More informationORACLE HEALTH SCIENCES INFORM ADVANCED MOLECULAR ANALYTICS
ORACLE HEALTH SCIENCES INFORM ADVANCED MOLECULAR ANALYTICS INCORPORATE GENOMIC DATA INTO CLINICAL R&D KEY BENEFITS Enable more targeted, biomarker-driven clinical trials Improves efficiencies, compressing
More informationPreciseTM Whitepaper
Precise TM Whitepaper Introduction LIMITATIONS OF EXISTING RNA-SEQ METHODS Correctly designed gene expression studies require large numbers of samples, accurate results and low analysis costs. Analysis
More informationABSTRACT JEL: C35, C63, M15. KEYWORDS: Project Management, Performance, Prediction, Earned Value INTRODUCTION
GLOBAL JOURNAL OF BUSINESS RESEARCH VOLUME 7 NUMBER 5 013 EXTREME PROGRAMMING PROJECT PERFORMANCE MANAGEMENT BY STATISTICAL EARNED VALUE ANALYSIS Wei Lu, Duke University Li Lu, University of Electronic
More informationOpen Access to Manuscripts, Open Science, and Big Data
Open Access to Manuscripts, Open Science, and Big Data Progress, and the Elsevier Perspective in 2013 Presented by: Dan Morgan Title: Senior Manager Access Relations, Global Academic Relations Company
More informationSoftware and Methods for the Analysis of Affymetrix GeneChip Data. Rafael A Irizarry Department of Biostatistics Johns Hopkins University
Software and Methods for the Analysis of Affymetrix GeneChip Data Rafael A Irizarry Department of Biostatistics Johns Hopkins University Outline Overview Bioconductor Project Examples 1: Gene Annotation
More informationTransferability of Economic Evaluations in Clinical Trials
Transferability of Economic Evaluations in Clinical Trials Henry Glick Institutt for helseledelse og helseøkonomi November 25, 2008 The Problem Multicenter and multinational studies are the norm for the
More informationVertical data integration for melanoma prognosis. Australia 3 Melanoma Institute Australia, NSW 2060 Australia. kaushala@maths.usyd.edu.au.
Vertical integration for melanoma prognosis Kaushala Jayawardana 1,4, Samuel Müller 1, Sarah-Jane Schramm 2,3, Graham J. Mann 2,3 and Jean Yang 1 1 School of Mathematics and Statistics, University of Sydney,
More informationMaster of Science in Healthcare Informatics and Analytics Program Overview
Master of Science in Healthcare Informatics and Analytics Program Overview The program is a 60 credit, 100 week course of study that is designed to graduate students who: Understand and can apply the appropriate
More informationMasters of Science in Clinical Research (MSCR) Curriculum. Goal/Objective of the MSCR
Masters of Science in Clinical (MSCR) Curriculum Goal/Objective of the MSCR The MSCR program is an interdisciplinary research degree program housed within the Department of Epidemiology in the School of
More informationA Primer of Genome Science THIRD
A Primer of Genome Science THIRD EDITION GREG GIBSON-SPENCER V. MUSE North Carolina State University Sinauer Associates, Inc. Publishers Sunderland, Massachusetts USA Contents Preface xi 1 Genome Projects:
More informationBig Data: a new era for Statistics
Big Data: a new era for Statistics Richard J. Samworth Abstract Richard Samworth (1996) is a Professor of Statistics in the University s Statistical Laboratory, and has been a Fellow of St John s since
More informationAccelerating Development and Approval of Targeted Cancer Therapies
Accelerating Development and Approval of Targeted Cancer Therapies Anna Barker, NCI David Epstein, Novartis Oncology Stephen Friend, Sage Bionetworks Cindy Geoghegan, Patient and Partners David Kessler,
More informationSingle-Cell DNA Sequencing with the C 1. Single-Cell Auto Prep System. Reveal hidden populations and genetic diversity within complex samples
DATA Sheet Single-Cell DNA Sequencing with the C 1 Single-Cell Auto Prep System Reveal hidden populations and genetic diversity within complex samples Single-cell sensitivity Discover and detect SNPs,
More informationStatistics and the Search for Scientific Truth
Statistics and the Search for Scientific Truth Martin Hazelton 1 Institute of Fundamental Sciences Massey University 11 November 2015 1 Presenter: m.hazelton@massey.ac.nz U3A, November 2015 1 / 30 Science
More informationFactors for success in big data science
Factors for success in big data science Damjan Vukcevic Data Science Murdoch Childrens Research Institute 16 October 2014 Big Data Reading Group (Department of Mathematics & Statistics, University of Melbourne)
More informationWorkshop on Establishing a Central Resource of Data from Genome Sequencing Projects
Report on the Workshop on Establishing a Central Resource of Data from Genome Sequencing Projects Background and Goals of the Workshop June 5 6, 2012 The use of genome sequencing in human research is growing
More informationQuality Assessment of Exon and Gene Arrays
Quality Assessment of Exon and Gene Arrays I. Introduction In this white paper we describe some quality assessment procedures that are computed from CEL files from Whole Transcript (WT) based arrays such
More informationThe MSCR Curriculum and Its Advantages
Masters of Science in Clinical Research (MSCR) Curriculum Goal/Objective of the MSCR The MSCR program is an interdisciplinary research degree program housed within the Department of Epidemiology in the
More information5. Multiple regression
5. Multiple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/5 QBUS6840 Predictive Analytics 5. Multiple regression 2/39 Outline Introduction to multiple linear regression Some useful
More informationHealthcare data analytics. Da-Wei Wang Institute of Information Science wdw@iis.sinica.edu.tw
Healthcare data analytics Da-Wei Wang Institute of Information Science wdw@iis.sinica.edu.tw Outline Data Science Enabling technologies Grand goals Issues Google flu trend Privacy Conclusion Analytics
More informationBig Data: Big N. V.C. 14.387 Note. December 2, 2014
Big Data: Big N V.C. 14.387 Note December 2, 2014 Examples of Very Big Data Congressional record text, in 100 GBs Nielsen s scanner data, 5TBs Medicare claims data are in 100 TBs Facebook 200,000 TBs See
More informationPackage metafuse. November 7, 2015
Type Package Package metafuse November 7, 2015 Title Fused Lasso Approach in Regression Coefficient Clustering Version 1.0-1 Date 2015-11-06 Author Lu Tang, Peter X.K. Song Maintainer Lu Tang
More informationMODULE 2: Advanced methodologies and tools for research. Research funding and innovation.
MODULE 2: Advanced methodologies and tools for research. Research funding and innovation. Code: 43642 Credits: 6 ECTS Type: Compulsory Language: English/Spanish Module s Coordinator: Àlex Sánchez alex.sanchez@vhir.org
More informationProtein Protein Interaction Networks
Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics
More informationModel Validation Techniques
Model Validation Techniques Kevin Mahoney, FCAS kmahoney@ travelers.com CAS RPM Seminar March 17, 2010 Uses of Statistical Models in P/C Insurance Examples of Applications Determine expected loss cost
More informationCorrelational Research
Correlational Research Chapter Fifteen Correlational Research Chapter Fifteen Bring folder of readings The Nature of Correlational Research Correlational Research is also known as Associational Research.
More information2019 Healthcare That Works for All
2019 Healthcare That Works for All This paper is one of a series describing what a decade of successful change in healthcare could look like in 2019. Each paper focuses on one aspect of healthcare. To
More informationFlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem
FlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem Elsa Bernard Laurent Jacob Julien Mairal Jean-Philippe Vert September 24, 2013 Abstract FlipFlop implements a fast method for de novo transcript
More informationService courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.
Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are
More informationBIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology http://tinyurl.com/bioinf525-w16
Course Director: Dr. Barry Grant (DCM&B, bjgrant@med.umich.edu) Description: This is a three module course covering (1) Foundations of Bioinformatics, (2) Statistics in Bioinformatics, and (3) Systems
More informationData Mining Builds Process Understanding for Vaccine Manufacturing
Data Mining Builds Process Understanding for Vaccine Manufacturing WCBP 2009 Current Topics in Vaccine Development January 14, 2009 Julia O Neill, Principal Engineer Merck & Co., Inc. Global Vaccine Technology
More informationGene Expression Analysis
Gene Expression Analysis Jie Peng Department of Statistics University of California, Davis May 2012 RNA expression technologies High-throughput technologies to measure the expression levels of thousands
More informationClustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca
Clustering Adrian Groza Department of Computer Science Technical University of Cluj-Napoca Outline 1 Cluster Analysis What is Datamining? Cluster Analysis 2 K-means 3 Hierarchical Clustering What is Datamining?
More informationPREDA S4-classes. Francesco Ferrari October 13, 2015
PREDA S4-classes Francesco Ferrari October 13, 2015 Abstract This document provides a description of custom S4 classes used to manage data structures for PREDA: an R package for Position RElated Data Analysis.
More informationBig data in macroeconomics Lucrezia Reichlin London Business School and now-casting economics ltd. COEURE workshop Brussels 3-4 July 2015
Big data in macroeconomics Lucrezia Reichlin London Business School and now-casting economics ltd COEURE workshop Brussels 3-4 July 2015 WHAT IS BIG DATA IN ECONOMICS? Frank Diebold claimed to have introduced
More informationTowards a Big Data Taxonomy. Bill Mandrick, PhD Data Tactics Version 26_August_2013
Towards a Big Data Taxonomy Bill Mandrick, PhD Data Tactics Version 26_August_2013 Scientific Taxonomies Represent Types of Processes Types of Objects Physical Objects Information Artifacts Types of Characteristics
More informationGuidance for Industry
Guidance for Industry Q2B Validation of Analytical Procedures: Methodology November 1996 ICH Guidance for Industry Q2B Validation of Analytical Procedures: Methodology Additional copies are available from:
More informationPractical Data Science with Azure Machine Learning, SQL Data Mining, and R
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be
More informationEthical Principles in Clinical Research. Christine Grady Department of Bioethics NIH Clinical Center
Ethical Principles in Clinical Research Christine Grady Department of Bioethics NIH Clinical Center 1 Ethical principles Are these studies ethical? How do we know? Ethics of clinical research The goal
More informationMASCOT Search Results Interpretation
The Mascot protein identification program (Matrix Science, Ltd.) uses statistical methods to assess the validity of a match. MS/MS data is not ideal. That is, there are unassignable peaks (noise) and usually
More informationGenomic Medicine The Future of Cancer Care. Shayma Master Kazmi, M.D. Medical Oncology/Hematology Cancer Treatment Centers of America
Genomic Medicine The Future of Cancer Care Shayma Master Kazmi, M.D. Medical Oncology/Hematology Cancer Treatment Centers of America Personalized Medicine Personalized health care is a broad term for interventions
More informationFlorida Study of Career and Technical Education
Florida Study of Career and Technical Education Final Report Louis Jacobson, Ph.D. Christine Mokher, Ph.D. 2014 IRM-2014-U-008790 Approved for Distribution Unlimited This document represents the best opinion
More informationMachine Learning Methods for Demand Estimation
Machine Learning Methods for Demand Estimation By Patrick Bajari, Denis Nekipelov, Stephen P. Ryan, and Miaoyu Yang Over the past decade, there has been a high level of interest in modeling consumer behavior
More information6.2.8 Neural networks for data mining
6.2.8 Neural networks for data mining Walter Kosters 1 In many application areas neural networks are known to be valuable tools. This also holds for data mining. In this chapter we discuss the use of neural
More informationHow Can Institutions Foster OMICS Research While Protecting Patients?
IOM Workshop on the Review of Omics-Based Tests for Predicting Patient Outcomes in Clinical Trials How Can Institutions Foster OMICS Research While Protecting Patients? E. Albert Reece, MD, PhD, MBA Vice
More informationFocusing on results not data comprehensive data analysis for targeted next generation sequencing
Focusing on results not data comprehensive data analysis for targeted next generation sequencing Daniel Swan, Jolyon Holdstock, Angela Matchan, Richard Stark, John Shovelton, Duarte Mohla and Simon Hughes
More informationSummary of Discussion on Non-clinical Pharmacology Studies on Anticancer Drugs
Provisional Translation (as of January 27, 2014)* November 15, 2013 Pharmaceuticals and Bio-products Subcommittees, Science Board Summary of Discussion on Non-clinical Pharmacology Studies on Anticancer
More informationIndustry Environment and Concepts for Forecasting 1
Table of Contents Industry Environment and Concepts for Forecasting 1 Forecasting Methods Overview...2 Multilevel Forecasting...3 Demand Forecasting...4 Integrating Information...5 Simplifying the Forecast...6
More informationBIOS 6660: Analysis of Biomedical Big Data Using R and Bioconductor, Fall 2015 Computer Lab: Education 2 North Room 2201DE (TTh 10:30 to 11:50 am)
BIOS 6660: Analysis of Biomedical Big Data Using R and Bioconductor, Fall 2015 Computer Lab: Education 2 North Room 2201DE (TTh 10:30 to 11:50 am) Course Instructor: Dr. Tzu L. Phang, Assistant Professor
More informationCurrent reporting in published research
Current reporting in published research Doug Altman Centre for Statistics in Medicine, Oxford, UK and EQUATOR Network Research article A published research article is a permanent record that will be used
More informationUsing the Grid for the interactive workflow management in biomedicine. Andrea Schenone BIOLAB DIST University of Genova
Using the Grid for the interactive workflow management in biomedicine Andrea Schenone BIOLAB DIST University of Genova overview background requirements solution case study results background A multilevel
More informationThe degrees of freedom of the Lasso in underdetermined linear regression models
The degrees of freedom of the Lasso in underdetermined linear regression models C. Dossal (1), M. Kachour (2), J. Fadili (2), G. Peyré (3), C. Chesneau (4) (1) IMB, Université Bordeaux 1 (2) GREYC, ENSICAEN
More informationWir schaffen Wissen heute für morgen. Workshop Research Integrity at PSI 2013 Data management Tuesday June 4 2013, 13.30 17.00. Louis Tiefenauer, PSI
Wir schaffen Wissen heute für morgen Workshop Research Integrity at PSI 2013 Data management Tuesday June 4 2013, 13.30 17.00 Louis Tiefenauer, PSI PSI, 10. Juni 2013 Program Dur. End Welcome by Thierry
More informationOctober 17, 2005. Elias Zerhouni, M.D. Director National Institutes of Health One Center Drive Suite 126 MSC 0148 Bethesda, MD 20892
October 17, 2005 Elias Zerhouni, M.D. Director National Institutes of Health One Center Drive Suite 126 MSC 0148 Bethesda, MD 20892 Dear Dr. Zerhouni: The undersigned nonprofit medical and scientific societies
More informationData Analysis on the ABI PRISM 7700 Sequence Detection System: Setting Baselines and Thresholds. Overview. Data Analysis Tutorial
Data Analysis on the ABI PRISM 7700 Sequence Detection System: Setting Baselines and Thresholds Overview In order for accuracy and precision to be optimal, the assay must be properly evaluated and a few
More informationDouble Degree Track in Neuroscience &International Public Policy at the University of Wisconsin-Madison
Double Degree Track in Neuroscience &International Public Policy at the University of Wisconsin-Madison Purpose: The Neuroscience and Public Policy Program offers a double degree track that leads to the
More informationRT 2 Profiler PCR Array: Web-Based Data Analysis Tutorial
RT 2 Profiler PCR Array: Web-Based Data Analysis Tutorial Samuel J. Rulli, Jr., Ph.D. qpcr-applications Scientist Samuel.Rulli@QIAGEN.com Pathway Focused Research from Sample Prep to Data Analysis! -2-
More informationSTATE OF MICHIGAN DEPARTMENT OF INSURANCE AND FINANCIAL SERVICES Before the Director of Insurance and Financial Services
STATE OF MICHIGAN DEPARTMENT OF INSURANCE AND FINANCIAL SERVICES Before the Director of Insurance and Financial Services In the matter of: Petitioner, v Blue Care Network of Michigan, Respondent. File
More informationIntegrated Resource Plan
Integrated Resource Plan March 19, 2004 PREPARED FOR KAUA I ISLAND UTILITY COOPERATIVE LCG Consulting 4962 El Camino Real, Suite 112 Los Altos, CA 94022 650-962-9670 1 IRP 1 ELECTRIC LOAD FORECASTING 1.1
More informationFrom Reads to Differentially Expressed Genes. The statistics of differential gene expression analysis using RNA-seq data
From Reads to Differentially Expressed Genes The statistics of differential gene expression analysis using RNA-seq data experimental design data collection modeling statistical testing biological heterogeneity
More informationShiny Server Pro: Regulatory Compliance and Validation Issues
Shiny Server Pro: Regulatory Compliance and Validation Issues A Guidance Document for the Use of Shiny Server Pro in Regulated Clinical Trial Environments June 19, 2014 RStudio, Inc. 250 Northern Ave.
More information200627 - AC - Clinical Trials
Coordinating unit: Teaching unit: Academic year: Degree: ECTS credits: 2014 200 - FME - School of Mathematics and Statistics 715 - EIO - Department of Statistics and Operations Research MASTER'S DEGREE
More informationBasic Analysis of Microarray Data
Basic Analysis of Microarray Data A User Guide and Tutorial Scott A. Ness, Ph.D. Co-Director, Keck-UNM Genomics Resource and Dept. of Molecular Genetics and Microbiology University of New Mexico HSC Tel.
More informationG E N OM I C S S E RV I C ES
GENOMICS SERVICES THE NEW YORK GENOME CENTER NYGC is an independent non-profit implementing advanced genomic research to improve diagnosis and treatment of serious diseases. capabilities. N E X T- G E
More informationIntroduction to transcriptome analysis using High Throughput Sequencing technologies (HTS)
Introduction to transcriptome analysis using High Throughput Sequencing technologies (HTS) A typical RNA Seq experiment Library construction Protocol variations Fragmentation methods RNA: nebulization,
More information