Big Data Integration in Biomedical Studies: A Few Personal Views and Experiences
|
|
- Garry Hubbard
- 8 years ago
- Views:
Transcription
1 Big Data Integration in Biomedical Studies: A Few Personal Views and Experiences Hongtu Zhu, Ph.D Department of Biostatistics and Biomedical Research Imaging Center The University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
2 Big Data Outline BIAS and Big Data Integration Image-on-Scalar Models Image-on-Genetic Association Models Prediction Models
3 Big Data
4 What is Big Data? 5V=Volume, Velocity, Variety, Value, and Veracity The size of big data is beyond the ability of commonly used software tools to capture, manage, and process within a tolerable elapsed time. Alzheimer s Disease Neuroimaging Initiative (US$134 millions) Philadelphia Neurodevelopmental Cohort (PNC) Human Connectome Project (HCP) Big Data in Boxes
5 How to promote statistics in Big Data industry? Closely collaborate with people who are collecting Big Data Work as a team to develop new methods, packages, and textbooks with nice case studies Organize more workshops and short courses Train next-generation statisticians: training grants and new courses a data scientist; an excellent programmer; an applied mathematician
6 How to make important contributions to Big Data? Start with a few big data bases Start with a few methodological and clinical projects Develop a package with a set of good computational and statistical tools to efficiently extract important information from large Big data
7 BIAS: Biostatistics and Imaging AnalysiS and Big Data Integration
8 BIAS: Biostatistics and Imaging Analysis Lab Man Power Computer Power Programming Power Statistics Mathematics
9 aims to simulate the complete human brain on Supercomputers to better understand how it functions. The Brain Research through Advancing Innovative Neurotechnologies or BRAIN, aims to reconstruct the activity of every single neuron as they fire simultaneously in different brain circuits, or perhaps even whole brains.
10 Big Neuroimaging Data NIH normal brain development 1000 Functional Connectome Project Alzheimer s Disease Neuroimaging Initiative National Database for Autism Research (NDAR) Human Connectome Project Philadelphia Neurodevelopmental Cohort Genome superstruct Project
11 Complex Study Design cross-sectional studies; clustered studies including longitudinal and twin/familial studies; ρ 1 5 ρ 5 9 ρ 1 9
12 Neuroimaging research example neuroscience Neuroimaging Applications Structural MRI Structural MRI Diffusion MRI Functional MRI Overview Complementary techniques Diffusion MRI - Variety of acquisitions - Measurement basics - Limitations & artefacts - Analysis principles - Acquisition tips Functional MRI (task) Functional MRI (resting)
13 Complex Data Structure Multivariate Imaging Measures Smooth Functional Imaging Measures Whole-brain Imaging Measures 4D-Time Series Imaging Measures
14 Medical Informatics & Management Big Data Integration Disease Etiology Prevention Treatment Medical Industry Care Policy System Science Insurance Economics Pharmaceutical
15 Big Data Integration E B E: environmental factors G D G: genetic markers D: disease Selection
16 Image-on-Scalar Models
17 Big Data Integration E B E: environmental factors G D G: genetic markers D: disease Selection
18 The NIMH Strategic Plan Strategic Objective 1: Promote Discovery in the Brain and Behavioral Sciences to Fuel Research on the Causes of Mental Disorders Identifying and validating high sensitivity and specificity biomarkers that define valid subtypes of the major mental illnesses. Strategic Objective 2: Chart Mental Illness Trajectories to Determine When, Where, and How to Intervene Conducting longitudinal studies that track changes in behavior with brain structure, connectivity, and function, in order to characterize the progression from primary changes to subsequent clinical presentation, and to identify predictors of divergence from the typical trajectory.
19 Smoothed Functional Data Covariates (e.g., age, gender, diagnostic)
20 Case 1: DTI Fiber Tract Data Data (e) Diffusion properties (e.g., FA, RA) Grids Y i (s j ) = (y i,1 (s j ),, y i,m (s j )) T {s 1,,s ng } Covariates (e.g., age, gender, diagnostic) x 1,, x n FA MD
21 Longitudinal Data Longitudinal Tract Data Spatial-temporal Process t y i (s,t 3 ) y i (s,t 2 ) y i (s,t 1 ) Functional Mixed Effect Models s y i (s,t) = x i (t) T B (s)+ z i (t) T ξ i (s)+η i (s,t)+ε i (s,t) Objectives: Dynamic functional effects of covariates of interest on functional response.
22 Ex 1: Longitudinal Tract Data genu DTImaging parameters: TR/TE = 5200/73 ms Slice thickness = 2mm In-plane resolution = 2x2 mm^2 b = 1000 s/mm^2 One reference scan b = 0 s/mm^2 Repeated 5 times when 6 gradient directions applied.
23 Ex 1: Longitudinal Tract Data
24 Ex 1: Longitudinal Tract Data
25 Neuroimaging Data with Discontinuity Noisy Piecewise Smooth Function with Unknown Jumps and Edges Subject1 Subject2 Covariates (e.g., age, gender, diagnostic, stimulus)
26 Case 2: Piecewise Smooth Data Mathematics. Noisy Piecewise Smooth Functions with Unknown Jumps and Edges Image is the point or set of points in the range corresponding to a designated point in the domain of a given function. Ω is a compact set. f ( x ) M R m x Ω R k f : Ω M R m f ( x) k d x < for some k>0 Ω
27 M2: Spatial Varying Coefficient Model Decomposition: y i (d) = f (x i,b(d) +η i (d)) +ε i (d),d D Piecewise Smooth Short-range Correlation Varying Coefficients Long-range Correlation B(d) L K η ij ()~ SP(0, Ση ) ε ij ( ) ~ SP(0,Σ ε ), 3D volume/ 2D surface Covariance operator: Σ y (d,d ') = Σ η (d,d ') + Σ ε (d,d) Li, Zhu, Shen, Lin, Gilmore, and Ibrahim (2011). JRSSB. Zhu, Fan, and Kong (2014) JASA
28 Spatial Varying Coefficient Model Cartoon Model B k (d) Disjoint Partition D L = l= 1 Dl and Dl Dl ' = φ Piecewise Smoothness: Lipschitz condition Smoothed Boundary Local Patch Degree of Jumps
29 Kernel-based Smoothing Methods Arias-Casto, Salmon, Willett (2011) Local constant/linear Yaroslavsky/Bilateral Filter Nonlocal Means PS
30 Kernel-based Smoothing Methods Propogation-Seperation Method J. Polzehl and V. Spokoiny, (2000,2005) Features Increasing Bandwidth 0 < h < h <! < h = S r 0 1 Adaptive Weights 0 Adaptive Estimates
31 Simulation
32 Simulation
33 ADNI PET data ADNI PET data EX2: ADNI PET Data Data were obtained from the Alzheimer s Disease Neuroimaging Initiative (ADNI) database. We consider PET scans obtained at baseline, 6 months, and 12Data months. were obtained from the Alzheimer s Disease Subjects Neuroimaging are classified Initiative as having (ADNI) mild database. cognitive impairment (MCI), We consider as AD patients, PET scans or as obtained Normal at Controls baseline, (NC). 6 months, and 12 months. Subjects are classified At as screening: Cont d having mild cognitive impairment (MCI), as AD patients, or as Normal Controls (NC). Diagnostic status age (years) N male N female AD 75.9± At screening: MCI 76.3± NC 77.0± Diagnostic status age (years) N male N female We randomly chose 80 subjects for the training set to develop AD 75.9± the prediction model. MCI 76.3± We predicted NC the PET scans 77.0± at 4.2 month 12, 30 based on 20 the baseline and 6-month scans for 79 subjects in the test set. We used gender, diagnostic status (MCI, AD, NC), and age (55-90 years) as covariates for the semi-parametric model. Hyun,J.W., Li, Y. M., J. H. Gilmore, Z. Lu, M. Styner, H. Zhu (2014) SGPP. NeuroImage
34 Prediction results-cont d EX2: ADNI PET Data (a) (b) (c) Figure : Observed (upper panel) and predicted (bottom panel) PET images at month 12 for (a) an AD patient, (b) an MCI subject, and (c) a NC subject. One selected slice is shown.
35 EX2: Prediction ADNI results-cont d PET Data (a) (b) (c) Prediction results-cont d Figure : rtmspe maps for prediction of ADNI PET images at month 12 for 79 test subjects. Selected slices are shown for (a) Semi-parametric model; (b) Semi-parametric Table : rtmspe model+fpca; for ADNI PET(c) images Semi-parametric model+fpca+spatial-temporal model. Semi-parametric model Semi-parametric mode+fpca Semi-parametric model+fpca+spatial-temporal model
36 Image-on-Genetic Association Models
37 The NIMH Strategic Plan Strategic Objective 1: Promote Discovery in the Brain and Behavioral Sciences to Fuel Research on the Causes of Mental Disorders Identify the genetic and environmental factors associated with mental disorders. Strategic Objective 2: Chart Mental Illness Trajectories to Determine When, Where, and How to Intervene When identifying behavioral, neural, and/or genetic markers along the trajectory of illness, design the studies to consider variation in relation to age, sex, gender, race, ethnicity, and other important socio-demographic factors.
38 Big Data Integration E B E: environmental factors G D G: genetic markers D: disease Selection
39 Statistical Methods Hibar, et al. HBM 2012
40 Data Structure Person No.1. Person No.1000 Imaging: 3D Matrix. 3D Matrix Genetic : Person No Person No. 100 SNP1 SNP2. SNP
41 M3: High Dimensional Regression Model Data {(Y i, X i ) :i = 1,,n} Y i = {y i (v) :v V} {X i (g) : g G 0 } Phenotype Genotype Error Y X B E n p y n p x p x p y n p y (p x,n, p y )
42 Sparse Projection Regression Model Sun, Zhu, Liu, and Ibrahim (2014) JASA
43 Sparse Projection Regression Model
44 Sparse and Low-rank Representation Sparsity and Structure on B. Low Rank B b X Sparsity E B p x p y Regularization Methods Lasso 1, 2, 3,. SCAD, MCP,.. b Y p λ (B) p λ (b X ) p λ (b Y ) p λ (E B ) Shen, Shen, and Zhu (201?)
45 E n p y Factor Model Long-range Correlation Short-range Correlation E i Λ ξ i η i p y 1 p y q p y 1 q 1 Σ E Λ Σ η Zhu, Zakaria, Lu, and Ibrahim (2014) JASA Λ T
46 M4: Voxel-wise GWAS Hibar, et al. HBM 2012
47 M4: Voxel-wise GWAS Fast Sure-Independence Screening Procedure WC2S SNP Each SNP Huang,. and Zhu (2014)
48 EX3: 93 ROI-GWAS
49 EX4: Whole Brain-GWAS
50 Prediction Models
51 Alzheimers Disease Big Data DREAM Challenge 1 Its goal is to apply an open science approach to rapidly identify accurate predictive AD biomarkers that can be used by the scientific, industrial and regulatory communities to improve AD diagnosis and treatment. Sub 1: Predict the change in cognitive scores 24 months after initial assessment. Sub 2: Predict the set of cognitively normal individuals whose biomarkers are suggestive of amyloid perturbation. Sub 3: Classify individuals into diagnostic groups using MR imaging.
52 Big Data Integration E B E: environmental factors G D G: genetic markers D: disease Selection
53 HRM versus FRM Data {(y i, X i ) :i =1,, n} X i = {X i (d) : d D} y i =< X i,θ > +ε i Strategy 1: Discrete Approach (High-dimension Regression Model (HRM)) Strategy 2: Functional Regression Model (FRM) y i =θ 0 + D θ(d) X i (d)m(d)+ε i
54 High-dimension Regression Model Approach 1: Regularization Methods Key Conditions: Sparsity of S Restricted null-space property for design matrix X
55 High-dimension Regression Model Tensor Structure: Ultra-high dimensionality (256^3) Spatial structure Zhou, Li, and Zhu (2013) Li, Zhou, and Li (2013) CP decomposition Tucker decomposition θ
56 Scalar-on-Image Models Simulations Key Conditions: Tensor Approximation B Restricted space property for X and B
57 Scalar-on-Image Models Strategy 2: Functional Approach y i =θ 0 + y i =θ 0 + D θ k k=1 θ(d) X i (d)m(d)+ε i D θ(d) = θ k ψ k (d) k=1 ψ k (d) X i (d)m(d)+ε i Basis Methods: fixed and data-driven basis functions K θ = {θ(d) = θ k ψ k (d) : (θ 1, ) 2 } C(d, d ') = Cov(X(d), X(d ')) = λ k ζ k (d)ζ k (d ') k=1 k=1
58 Key Conditions Key Conditions: an excellent set of basis functions Sparsity of basis representation Decay rate of spectral of C or {θ k : k =1, } K 1/2 CK 1/2 θ(d) K θ k ψ k (d) k=1 K << n
59 Extensions M5: Functional linear Cox regression models M6: Generalized scalar-to-image regression models M7: Multiscale Functional Linear models
60 M5: Functional Linear Cox Regression Model Data {(y i, X i ) :i =1,, n} X i = {X i (d) : d D} Model y i = min(t i,c i ) h(t) = f (t) / S(t) = h 0 (t)exp(z i T γ + X i (s) = µ(s) + Consistency j=1 T i : failure time; C i :censored time ξ ij φ j (s) + ε i (s) Asymptotic distribution of score test X i (s)β(s)ds) S
61 M5: Functional Linear Cox Regression Model Mild Cognitive Impairment subjects Interested in predicting the timing of an MCI patient that converts to AD by integrating the imaging data, the clinical variables, and genetic covariates. Full Model: AUC=0.96 Partial Model: AUC=0.82
62 jects (Rudin et al. 1992). Most current research on functional regression has focused on ensional functional predictors. Exceptions for two-dimensional predictors using splines uillas and Lai (2010) and Reiss and Ogden (2010). Some recent work (Zhou et al. 2013; M6: Generalized scalar-to-image regression models Data {(y i, X i ) :i =1,, n} X i = {X i (d) : d D} th et al. 2013; Huang et al. 2013; Reiss et al. 2013) has considered neuroimaging appliith two-dimensional and three-dimensional predictors. We propose a new TV framework ssion with 2-d image predictors. This approach can be easily extended to k-d image preith k 2. Further, most of theoretical work on functional regression has focused on asymptotic Model convergence y ~ exponential rate for the excess family(µ,φ) risk. In this paper, we derive nonasymptotic nds on the risk by exploiting various techniques from compressive sensing, as well as oximation theory. g(µ) These techniques = θ T 0 Z+ allows < X,βus to 0 > obtain finite-sample bounds that hold h probability, and are specified explicitly in terms of the sample size n Total n, the Variation image size and the image smoothness. Estimation: we have a new observation (y X (n+1) i ;µ(x, we i ;γ would,β( ))) like to + use λ X β (n+1) TV to predict Y n+1. Let ˆ imate of the coefficient i=1image from the training data which are n i.i.d. copies of (Y,X), ) ),...,(Y n,x (n) ). The accuracy of the prediction can be measured by its excess risk: Non-asymptotic Error Bound: R 2n = n E D X (n+1), ˆ 0E 2 o 1/2, represents taking expectation over (Y n+1,x (n+1) ) only. In this paper, we are interested
63 M6: Generalized scalar-to-image regression models TV Lasso Lasso on WL
64 M7: Multiscale Functional Linear models Data {(y i, X i ) :i =1,, n} X i = {X i (d) : d D} Models (A1) D = ( K D k ) D 0 k=1 Informative sets + Irrelevant set (A2) (A3) y {X(d) :d D 0 } y p({x(d) :d D 1 },,{X(d) :d D K })
65 Simulation I: Classification Class 0 Class 1 0 White 1 Green 2 Red Figure 1: Simulation Study I: the left and right panels are respectively true images for classes 0 and 1. X i (d) = β 0 (d) + β 1 (d)y i + ε i (d) Type I Type II Type III The white and green color in the left panel respectively corresponds to 0(g) = 0 and 1 for g = (g 1,g 2,g 3 ) The white, green and red N(0,4) Short-range correlation Long-range correlation color in the right panel respectively corresponds to 0(g) + 1 (g) = 0, 1, and 2 for g =(g 1,g 2,g 3 ) The di erence between Classes 0 and 1 images lies in the red cuboid region.
66 red cuboid region. For each type of noise, we simulate 100 images by linear model (1), where 60 images Simulation I: Classification are from Class 0 and the rest 40 images are from Class 1. Then we use SWPCA to get Table 1: Misclassification rates for PCA and SWPCA under the di erent number of PCs. Noise Number of PCs PCA SWPCA1 SWPCA2 SWPCA3 Type I Type II Type III
67 ension reduction and then do classification analysis on the low dimensional space. ults for PCA is shown on the third column of Table 1. The performance of SWPC rall much better than PCA. Simulation I: Classification ble 2: Classification performance comparison for SWPCA and other classification m Noise slda spls SLR SVM ROAD PCA SWPCA Type I Type II Type III slda: sparse discriminant analysis spls: sparse partial least squares analysis SLR: sparse logistic regression SVM: support vector machine ROAD: Here we also compare classification performance of SWPCA with other classifica thods. The first compared classification method is sparse discriminant analysis (Cl nsen et al., 2011) with the corresponding software slda. The classification errors A in the second column of Table 2 are more than or equal to 0.27 for three di e
68 PET mance of SWPCA with other classification methods. The real data is Alzheimer s Disease Neuroimaging Initiative (ADNI) pet data (Jagust et al., 2010). Alzheimer s disease (AD) EX5: ADNI-PET is the most common form of dementia and results in the loss of memory, thinking and language skills. ADNI is a worldwide project, launched in 2003, and is to develop biological AD NC
69 of the biological markers to the understanding of the progression of Alz n the human brain. Here we consider the real data, which contains es. ADNI e one out test: *95*69 3D-images are split into a training set with images and a test set of one image. Simulation is repeated 196times, erro 94 AD subjects and 104 NC subjects age error on predicting testing image. Table 3: Results of Real Data: average misclassification rates. slda spls slogistic SVM ROAD PCA SWPCA
70 ASA: Statistics in Imaging Section SAMSI 2013 Neuroimaging Data Analysis Challenges in Computational Neuroscience July Summer School August Opening Workshop Shape Analysis Spike Train Analysis Big Data Integration Compressed Sensing Functional Data Analysis Thank You!!
Functional Data Analysis of Big Neuroimaging Data
Functional Data Analysis of Big Neuroimaging Data Hongtu Zhu, Ph.D Department of Biostatistics and Biomedical Research Imaging Center The University of North Carolina at Chapel Hill, Chapel Hill, NC 27599,
More informationBig Data Integration in Biomedical Studies
Big Data Integration in Biomedical Studies Hongtu Zhu, Ph.D Department of Biostatistics and Biomedical Research Imaging Center The University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
More informationBayesian Penalized Methods for High Dimensional Data
Bayesian Penalized Methods for High Dimensional Data Joseph G. Ibrahim Joint with Hongtu Zhu and Zakaria Khondker What is Covered? Motivation GLRR: Bayesian Generalized Low Rank Regression L2R2: Bayesian
More informationNon-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning
Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning SAMSI 10 May 2013 Outline Introduction to NMF Applications Motivations NMF as a middle step
More informationBIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376
Course Director: Dr. Kayvan Najarian (DCM&B, kayvan@umich.edu) Lectures: Labs: Mondays and Wednesdays 9:00 AM -10:30 AM Rm. 2065 Palmer Commons Bldg. Wednesdays 10:30 AM 11:30 AM (alternate weeks) Rm.
More informationStatistics Graduate Courses
Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
More informationService courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.
Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are
More informationEHz Network Analysis and phonemic Diagrams
Contribute 22 Classification of EEG recordings in auditory brain activity via a logistic functional linear regression model Irène Gannaz Abstract We want to analyse EEG recordings in order to investigate
More informationHacking Brain Disease for a Cure
Hacking Brain Disease for a Cure Magali Haas, CEO & Founder #P4C2014 Innovator Presentation 2 Brain Disease is Personal The Reasons We Fail in CNS Major challenges hindering CNS drug development include:
More informationPresentation by: Ahmad Alsahaf. Research collaborator at the Hydroinformatics lab - Politecnico di Milano MSc in Automation and Control Engineering
Johann Bernoulli Institute for Mathematics and Computer Science, University of Groningen 9-October 2015 Presentation by: Ahmad Alsahaf Research collaborator at the Hydroinformatics lab - Politecnico di
More informationParallel Data Selection Based on Neurodynamic Optimization in the Era of Big Data
Parallel Data Selection Based on Neurodynamic Optimization in the Era of Big Data Jun Wang Department of Mechanical and Automation Engineering The Chinese University of Hong Kong Shatin, New Territories,
More informationrunl I IUI%I/\L Magnetic Resonance Imaging
runl I IUI%I/\L Magnetic Resonance Imaging SECOND EDITION Scott A. HuetteS Brain Imaging and Analysis Center, Duke University Allen W. Song Brain Imaging and Analysis Center, Duke University Gregory McCarthy
More informationBIG DATA AND HIGH DIMENSIONAL DATA ANALYSIS
BIG DATA AND HIGH DIMENSIONAL DATA ANALYSIS B.L.S. Prakasa Rao CR RAO ADVANCED INSTITUTE OF MATHEMATICS, STATISTICS AND COMPUTER SCIENCE (AIMSCS) University of Hyderabad Campus GACHIBOWLI, HYDERABAD 500046
More informationBEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES
BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 123 CHAPTER 7 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 7.1 Introduction Even though using SVM presents
More informationMedical Image Processing on the GPU. Past, Present and Future. Anders Eklund, PhD Virginia Tech Carilion Research Institute andek@vtc.vt.
Medical Image Processing on the GPU Past, Present and Future Anders Eklund, PhD Virginia Tech Carilion Research Institute andek@vtc.vt.edu Outline Motivation why do we need GPUs? Past - how was GPU programming
More informationThe Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
More informationLinear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
More informationNeuroimaging module I: Modern neuroimaging methods of investigation of the human brain in health and disease
1 Neuroimaging module I: Modern neuroimaging methods of investigation of the human brain in health and disease The following contains a summary of the content of the neuroimaging module I on the postgraduate
More informationPublication List. Chen Zehua Department of Statistics & Applied Probability National University of Singapore
Publication List Chen Zehua Department of Statistics & Applied Probability National University of Singapore Publications Journal Papers 1. Y. He and Z. Chen (2014). A sequential procedure for feature selection
More informationStatistics for BIG data
Statistics for BIG data Statistics for Big Data: Are Statisticians Ready? Dennis Lin Department of Statistics The Pennsylvania State University John Jordan and Dennis K.J. Lin (ICSA-Bulletine 2014) Before
More informationStatistical Models in Data Mining
Statistical Models in Data Mining Sargur N. Srihari University at Buffalo The State University of New York Department of Computer Science and Engineering Department of Biostatistics 1 Srihari Flood of
More informationServer Load Prediction
Server Load Prediction Suthee Chaidaroon (unsuthee@stanford.edu) Joon Yeong Kim (kim64@stanford.edu) Jonghan Seo (jonghan@stanford.edu) Abstract Estimating server load average is one of the methods that
More informationFactors for success in big data science
Factors for success in big data science Damjan Vukcevic Data Science Murdoch Childrens Research Institute 16 October 2014 Big Data Reading Group (Department of Mathematics & Statistics, University of Melbourne)
More informationCCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York
BME I5100: Biomedical Signal Processing Linear Discrimination Lucas C. Parra Biomedical Engineering Department CCNY 1 Schedule Week 1: Introduction Linear, stationary, normal - the stuff biology is not
More informationModelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic
More informationUnsupervised and supervised dimension reduction: Algorithms and connections
Unsupervised and supervised dimension reduction: Algorithms and connections Jieping Ye Department of Computer Science and Engineering Evolutionary Functional Genomics Center The Biodesign Institute Arizona
More informationArtificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence
Artificial Neural Networks and Support Vector Machines CS 486/686: Introduction to Artificial Intelligence 1 Outline What is a Neural Network? - Perceptron learners - Multi-layer networks What is a Support
More informationFunctional Data Analysis of MALDI TOF Protein Spectra
Functional Data Analysis of MALDI TOF Protein Spectra Dean Billheimer dean.billheimer@vanderbilt.edu. Department of Biostatistics Vanderbilt University Vanderbilt Ingram Cancer Center FDA for MALDI TOF
More informationSUCCESSFUL PREDICTION OF HORSE RACING RESULTS USING A NEURAL NETWORK
SUCCESSFUL PREDICTION OF HORSE RACING RESULTS USING A NEURAL NETWORK N M Allinson and D Merritt 1 Introduction This contribution has two main sections. The first discusses some aspects of multilayer perceptrons,
More informationThe Data Mining Process
Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data
More informationadvancing magnetic resonance imaging and medical data analysis
advancing magnetic resonance imaging and medical data analysis INFN - CSN5 CdS - Luglio 2015 Sezione di Genova 2015-2017 objectives Image reconstruction, artifacts and noise management O.1 Components RF
More informationExample: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
More informationApplying Statistics Recommended by Regulatory Documents
Applying Statistics Recommended by Regulatory Documents Steven Walfish President, Statistical Outsourcing Services steven@statisticaloutsourcingservices.com 301-325 325-31293129 About the Speaker Mr. Steven
More informationLogistic Regression (1/24/13)
STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used
More informationLasso on Categorical Data
Lasso on Categorical Data Yunjin Choi, Rina Park, Michael Seo December 14, 2012 1 Introduction In social science studies, the variables of interest are often categorical, such as race, gender, and nationality.
More informationMachine Learning and Pattern Recognition Logistic Regression
Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,
More informationLearning outcomes. Knowledge and understanding. Competence and skills
Syllabus Master s Programme in Statistics and Data Mining 120 ECTS Credits Aim The rapid growth of databases provides scientists and business people with vast new resources. This programme meets the challenges
More informationTOWARD BIG DATA ANALYSIS WORKSHOP
TOWARD BIG DATA ANALYSIS WORKSHOP 邁 向 巨 量 資 料 分 析 研 討 會 摘 要 集 2015.06.05-06 巨 量 資 料 之 矩 陣 視 覺 化 陳 君 厚 中 央 研 究 院 統 計 科 學 研 究 所 摘 要 視 覺 化 (Visualization) 與 探 索 式 資 料 分 析 (Exploratory Data Analysis, EDA)
More informationPharmaSUG2011 Paper HS03
PharmaSUG2011 Paper HS03 Using SAS Predictive Modeling to Investigate the Asthma s Patient Future Hospitalization Risk Yehia H. Khalil, University of Louisville, Louisville, KY, US ABSTRACT The focus of
More informationMorphological analysis on structural MRI for the early diagnosis of neurodegenerative diseases. Marco Aiello On behalf of MAGIC-5 collaboration
Morphological analysis on structural MRI for the early diagnosis of neurodegenerative diseases Marco Aiello On behalf of MAGIC-5 collaboration Index Motivations of morphological analysis Segmentation of
More informationWorkshop on current challenges in statistical learning (11w5051)
Workshop on current challenges in statistical learning (11w5051) Hugh Chipman (Acadia University) Robert Tibshirani (Stanford University) Ji Zhu (University of Michigan) Xiaotong Shen (University of Minnesota)
More informationSimple and efficient online algorithms for real world applications
Simple and efficient online algorithms for real world applications Università degli Studi di Milano Milano, Italy Talk @ Centro de Visión por Computador Something about me PhD in Robotics at LIRA-Lab,
More informationBig Data - Lecture 1 Optimization reminders
Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Schedule Introduction Major issues Examples Mathematics
More informationLogistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.
Logistic Regression Vibhav Gogate The University of Texas at Dallas Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Generative vs. Discriminative Classifiers Want to Learn: h:x Y X features
More informationData Preprocessing. Week 2
Data Preprocessing Week 2 Topics Data Types Data Repositories Data Preprocessing Present homework assignment #1 Team Homework Assignment #2 Read pp. 227 240, pp. 250 250, and pp. 259 263 the text book.
More informationGE Medical Systems Training in Partnership. Module 8: IQ: Acquisition Time
Module 8: IQ: Acquisition Time IQ : Acquisition Time Objectives...Describe types of data acquisition modes....compute acquisition times for 2D and 3D scans. 2D Acquisitions The 2D mode acquires and reconstructs
More informationBig Data: Rethinking Text Visualization
Big Data: Rethinking Text Visualization Dr. Anton Heijs anton.heijs@treparel.com Treparel April 8, 2013 Abstract In this white paper we discuss text visualization approaches and how these are important
More informationPREDICTIVE ANALYTICS: PROVIDING NOVEL APPROACHES TO ENHANCE OUTCOMES RESEARCH LEVERAGING BIG AND COMPLEX DATA
PREDICTIVE ANALYTICS: PROVIDING NOVEL APPROACHES TO ENHANCE OUTCOMES RESEARCH LEVERAGING BIG AND COMPLEX DATA IMS Symposium at ISPOR at Montreal June 2 nd, 2014 Agenda Topic Presenter Time Introduction:
More informationComparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data
CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear
More informationIntroduction to Pattern Recognition
Introduction to Pattern Recognition Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)
More informationStudy Design and Statistical Analysis
Study Design and Statistical Analysis Anny H Xiang, PhD Department of Preventive Medicine University of Southern California Outline Designing Clinical Research Studies Statistical Data Analysis Designing
More informationLocal Clinical Trials
Local Clinical Trials The Alzheimer s Association, Connecticut Chapter does not officially endorse any specific research study. The following information regarding clinical trials is provided as a service
More informationWhy Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts
More informationMEU. INSTITUTE OF HEALTH SCIENCES COURSE SYLLABUS. Biostatistics
MEU. INSTITUTE OF HEALTH SCIENCES COURSE SYLLABUS title- course code: Program name: Contingency Tables and Log Linear Models Level Biostatistics Hours/week Ther. Recite. Lab. Others Total Master of Sci.
More informationFitting Subject-specific Curves to Grouped Longitudinal Data
Fitting Subject-specific Curves to Grouped Longitudinal Data Djeundje, Viani Heriot-Watt University, Department of Actuarial Mathematics & Statistics Edinburgh, EH14 4AS, UK E-mail: vad5@hw.ac.uk Currie,
More informationMachine Learning Logistic Regression
Machine Learning Logistic Regression Jeff Howbert Introduction to Machine Learning Winter 2012 1 Logistic regression Name is somewhat misleading. Really a technique for classification, not regression.
More informationBiomarker Discovery and Data Visualization Tool for Ovarian Cancer Screening
, pp.169-178 http://dx.doi.org/10.14257/ijbsbt.2014.6.2.17 Biomarker Discovery and Data Visualization Tool for Ovarian Cancer Screening Ki-Seok Cheong 2,3, Hye-Jeong Song 1,3, Chan-Young Park 1,3, Jong-Dae
More informationCS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing
CS Master Level Courses and Areas The graduate courses offered may change over time, in response to new developments in computer science and the interests of faculty and students; the list of graduate
More informationCS 591.03 Introduction to Data Mining Instructor: Abdullah Mueen
CS 591.03 Introduction to Data Mining Instructor: Abdullah Mueen LECTURE 3: DATA TRANSFORMATION AND DIMENSIONALITY REDUCTION Chapter 3: Data Preprocessing Data Preprocessing: An Overview Data Quality Major
More informationPradeep Redddy Raamana
Pradeep Redddy Raamana Research Scientist at Simon Fraser University pkr1@sfu.ca Summary Highlights 8 years of experience in machine learning, data mining and statistical modelling. 6 years of experience
More informationClustering & Visualization
Chapter 5 Clustering & Visualization Clustering in high-dimensional databases is an important problem and there are a number of different clustering paradigms which are applicable to high-dimensional data.
More informationRetinal Imaging Biomarkers for Early Diagnosis of Alzheimer s Disease
Retinal Imaging Biomarkers for Early Diagnosis of Alzheimer s Disease Eleonora (Nora) Lad, MD, PhD Assistant Professor of Ophthalmology, Vitreoretinal diseases Duke Center for Macular Diseases Duke University
More informationInternational Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
More informationAdequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection
Directions in Statistical Methodology for Multivariable Predictive Modeling Frank E Harrell Jr University of Virginia Seattle WA 19May98 Overview of Modeling Process Model selection Regression shape Diagnostics
More informationIntroduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk
Introduction to Machine Learning and Data Mining Prof. Dr. Igor Trakovski trakovski@nyus.edu.mk Neural Networks 2 Neural Networks Analogy to biological neural systems, the most robust learning systems
More informationRecognizing Cats and Dogs with Shape and Appearance based Models. Group Member: Chu Wang, Landu Jiang
Recognizing Cats and Dogs with Shape and Appearance based Models Group Member: Chu Wang, Landu Jiang Abstract Recognizing cats and dogs from images is a challenging competition raised by Kaggle platform
More informationA multi-scale approach to InSAR time series analysis
A multi-scale approach to InSAR time series analysis M. Simons, E. Hetland, P. Muse, Y. N. Lin & C. DiCaprio U Interferogram stack time A geophysical perspective on deformation tomography Examples: Long
More informationInformation Management course
Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli (alberto.ceselli@unimi.it)
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
More informationAutomatic Morphological Analysis of the Medial Temporal Lobe
Automatic Morphological Analysis of the Medial Temporal Lobe Neuroimaging analysis applied to the diagnosis and prognosis of the Alzheimer s disease. Andrea Chincarini, INFN Genova Clinical aspects of
More informationGENETIC DATA ANALYSIS
GENETIC DATA ANALYSIS 1 Genetic Data: Future of Personalized Healthcare To achieve personalization in Healthcare, there is a need for more advancements in the field of Genomics. The human genome is made
More informationData Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
More informationThese slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
Music and Machine Learning (IFT6080 Winter 08) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher
More informationANXIETY & COGNITIVE IMPAIRMENT
ANXIETY & COGNITIVE IMPAIRMENT Dr. Sherri Hayden, Ph.D., R. Psych. Neuropsychologist, UBC Hospital Clinic for Alzheimer Disease & Related Disorders Clinical Assistant Professor, UBC Department of Medicine,
More informationECE 533 Project Report Ashish Dhawan Aditi R. Ganesan
Handwritten Signature Verification ECE 533 Project Report by Ashish Dhawan Aditi R. Ganesan Contents 1. Abstract 3. 2. Introduction 4. 3. Approach 6. 4. Pre-processing 8. 5. Feature Extraction 9. 6. Verification
More informationLasso-based Spam Filtering with Chinese Emails
Journal of Computational Information Systems 8: 8 (2012) 3315 3322 Available at http://www.jofcis.com Lasso-based Spam Filtering with Chinese Emails Zunxiong LIU 1, Xianlong ZHANG 1,, Shujuan ZHENG 2 1
More informationInference from sub-nyquist Samples
Inference from sub-nyquist Samples Alireza Razavi, Mikko Valkama Department of Electronics and Communications Engineering/TUT Characteristics of Big Data (4 V s) Volume: Traditional computing methods are
More informationExploratory data analysis for microarray data
Eploratory data analysis for microarray data Anja von Heydebreck Ma Planck Institute for Molecular Genetics, Dept. Computational Molecular Biology, Berlin, Germany heydebre@molgen.mpg.de Visualization
More informationAPPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder
APPM4720/5720: Fast algorithms for big data Gunnar Martinsson The University of Colorado at Boulder Course objectives: The purpose of this course is to teach efficient algorithms for processing very large
More informationIdentification algorithms for hybrid systems
Identification algorithms for hybrid systems Giancarlo Ferrari-Trecate Modeling paradigms Chemistry White box Thermodynamics System Mechanics... Drawbacks: Parameter values of components must be known
More informationAssessment. Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall
Automatic Photo Quality Assessment Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall Estimating i the photorealism of images: Distinguishing i i paintings from photographs h Florin
More informationJoint models for classification and comparison of mortality in different countries.
Joint models for classification and comparison of mortality in different countries. Viani D. Biatat 1 and Iain D. Currie 1 1 Department of Actuarial Mathematics and Statistics, and the Maxwell Institute
More informationMachine Learning for Medical Image Analysis. A. Criminisi & the InnerEye team @ MSRC
Machine Learning for Medical Image Analysis A. Criminisi & the InnerEye team @ MSRC Medical image analysis the goal Automatic, semantic analysis and quantification of what observed in medical scans Brain
More informationGE Global Research. The Future of Brain Health
GE Global Research The Future of Brain Health mission statement We will know the brain as well as we know the body. Future generations won t have to face Alzheimer s, TBI and other neurological diseases.
More informationMonitoring Creatures Great and Small: Computer Vision Systems for Looking at Grizzly Bears, Fish, and Grasshoppers
Monitoring Creatures Great and Small: Computer Vision Systems for Looking at Grizzly Bears, Fish, and Grasshoppers Greg Mori, Maryam Moslemi, Andy Rova, Payam Sabzmeydani, Jens Wawerla Simon Fraser University
More informationLecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
More informationNAVIGATING SCIENTIFIC LITERATURE A HOLISTIC PERSPECTIVE. Venu Govindaraju
NAVIGATING SCIENTIFIC LITERATURE A HOLISTIC PERSPECTIVE Venu Govindaraju BIOMETRICS DOCUMENT ANALYSIS PATTERN RECOGNITION 8/24/2015 ICDAR- 2015 2 Towards a Globally Optimal Approach for Learning Deep Unsupervised
More informationNetwork Analysis I & II
Network Analysis I & II Dani S. Bassett Department of Physics University of California Santa Barbara Outline Lecture One: 1. Complexity in the Human Brain from processes to patterns 2. Graph Theory and
More informationSPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg
SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg IN SPSS SESSION 2, WE HAVE LEARNT: Elementary Data Analysis Group Comparison & One-way
More informationExamples. David Ruppert. April 25, 2009. Cornell University. Statistics for Financial Engineering: Some R. Examples. David Ruppert.
Cornell University April 25, 2009 Outline 1 2 3 4 A little about myself BA and MA in mathematics PhD in statistics in 1977 taught in the statistics department at North Carolina for 10 years have been in
More informationTwo Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering
Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering Department of Industrial Engineering and Management Sciences Northwestern University September 15th, 2014
More informationUniversity of California San Francisco, CA, USA 4 Department of Veterans Affairs Medical Center, San Francisco, CA, USA
Disrupted Brain Connectivity in Alzheimer s Disease: Effects of Network Thresholding Madelaine Daianu 1, Emily L. Dennis 1, Neda Jahanshad 1, Talia M. Nir 1, Arthur W. Toga 1, Clifford R. Jack, Jr. 2,
More informationA Proposed Data Mining Model for the Associated Factors of Alzheimer s Disease
A Proposed Data Mining Model for the Associated Factors of Alzheimer s Disease Dr. Nevine Makram Labib and Mohamed Sayed Badawy Department of Computer and Information Systems Faculty of Management Sciences,
More informationVisualization methods for patent data
Visualization methods for patent data Treparel 2013 Dr. Anton Heijs (CTO & Founder) Delft, The Netherlands Introduction Treparel can provide advanced visualizations for patent data. This document describes
More informationAdaptive Demand-Forecasting Approach based on Principal Components Time-series an application of data-mining technique to detection of market movement
Adaptive Demand-Forecasting Approach based on Principal Components Time-series an application of data-mining technique to detection of market movement Toshio Sugihara Abstract In this study, an adaptive
More informationHT2015: SC4 Statistical Data Mining and Machine Learning
HT2015: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Bayesian Nonparametrics Parametric vs Nonparametric
More informationBig Data Analytics for Healthcare
Big Data Analytics for Healthcare Jimeng Sun Chandan K. Reddy Healthcare Analytics Department IBM TJ Watson Research Center Department of Computer Science Wayne State University 1 Healthcare Analytics
More informationAdvanced MRI methods in diagnostics of spinal cord pathology
Advanced MRI methods in diagnostics of spinal cord pathology Stanisław Kwieciński Department of Magnetic Resonance MR IMAGING LAB MRI /MRS IN BIOMEDICAL RESEARCH ON HUMANS AND ANIMAL MODELS IN VIVO Equipment:
More informationSupport Vector Machines with Clustering for Training with Very Large Datasets
Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano
More informationSoftware Packages The following data analysis software packages will be showcased:
Analyze This! Practicalities of fmri and Diffusion Data Analysis Data Download Instructions Weekday Educational Course, ISMRM 23 rd Annual Meeting and Exhibition Tuesday 2 nd June 2015, 10:00-12:00, Room
More information