Molecular Genetics: Challenges for Statistical Practice. J.K. Lindsey



Similar documents
Analysis of gene expression data. Ulf Leser and Philippe Thomas

Microarray Technology

Measuring gene expression (Microarrays) Ulf Leser

Statistical issues in the analysis of microarray data

Appendix 2 Molecular Biology Core Curriculum. Websites and Other Resources

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

How many of you have checked out the web site on protein-dna interactions?

Introduction to transcriptome analysis using High Throughput Sequencing technologies (HTS)

Basic Analysis of Microarray Data

Microarray Data Analysis. A step by step analysis using BRB-Array Tools

Recombinant DNA and Biotechnology

Scottish Qualifications Authority

Row Quantile Normalisation of Microarrays

School of Nursing. Presented by Yvette Conley, PhD

CCR Biology - Chapter 9 Practice Test - Summer 2012

Core Facility Genomics

Long-Term Effects of Drug Addiction

Combinatorial Chemistry and solid phase synthesis seminar and laboratory course

Introduction To Real Time Quantitative PCR (qpcr)

An Introduction to Genomics and SAS Scientific Discovery Solutions

CHROMOSOMES Dr. Fern Tsien, Dept. of Genetics, LSUHSC, NO, LA

Gene expression analysis. Ulf Leser and Karin Zimmermann

To be able to describe polypeptide synthesis including transcription and splicing

ALLEN Mouse Brain Atlas

Real-Time PCR Vs. Traditional PCR

13.4 Gene Regulation and Expression

Lecture 11 Data storage and LIMS solutions. Stéphane LE CROM

Activity 4 Long-Term Effects of Drug Addiction

Biotechnology: DNA Technology & Genomics

Chapter 3 Contd. Western blotting & SDS PAGE

RNA & Protein Synthesis

IKDT Laboratory. IKDT as Service Lab (CRO) for Molecular Diagnostics

Gene Mapping Techniques

Lecture 13: DNA Technology. DNA Sequencing. DNA Sequencing Genetic Markers - RFLPs polymerase chain reaction (PCR) products of biotechnology

Translation Study Guide

Parameter inference of a basic p53 model using ABC

Frequently Asked Questions (FAQ)

DNA and the Cell. Version 2.3. English version. ELLS European Learning Laboratory for the Life Sciences

14.3 Studying the Human Genome

Correlation of microarray and quantitative real-time PCR results. Elisa Wurmbach Mount Sinai School of Medicine New York

GENE REGULATION. Teacher Packet

Pharmaceutical Biotechnology. Recombinant DNA technology Western blotting and SDS-PAGE

The sequence of bases on the mrna is a code that determines the sequence of amino acids in the polypeptide being synthesized:

Genetic information (DNA) determines structure of proteins DNA RNA proteins cell structure enzymes control cell chemistry ( metabolism )

The Open2Dprot Proteomics Project for n-dimensional Protein Expression Data Analysis

Becker Muscular Dystrophy

Comparative genomic hybridization Because arrays are more than just a tool for expression analysis

Real-time PCR: Understanding C t

Structure and Function of DNA

Validation and Replication

Replication Study Guide

Design and Analysis of Comparative Microarray Experiments

Identification of rheumatoid arthritis and osteoarthritis patients by transcriptome-based rule set generation

Essentials of Real Time PCR. About Sequence Detection Chemistries

The Techniques of Molecular Biology: Forensic DNA Fingerprinting

DNA Sequence Analysis

Regents Biology REGENTS REVIEW: PROTEIN SYNTHESIS

Analysis of Illumina Gene Expression Microarray Data

Final Project Report

Next Generation Sequencing: Adjusting to Big Data. Daniel Nicorici, Dr.Tech. Statistikot Suomen Lääketeollisuudessa

Qualitative Simulation and Model Checking in Genetic Regulatory Networks

1 Mutation and Genetic Change

Lab # 12: DNA and RNA

The Future of the Electronic Health Record. Gerry Higgins, Ph.D., Johns Hopkins

FlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem

Chapter 5: Organization and Expression of Immunoglobulin Genes

Six Trends in Robotics in the Life Sciences

Bioinformatics Resources at a Glance

Molecular Genetics. RNA, Transcription, & Protein Synthesis

Consistent Assay Performance Across Universal Arrays and Scanners

Control of Gene Expression

Ms. Campbell Protein Synthesis Practice Questions Regents L.E.

Exiqon Array Software Manual. Quick guide to data extraction from mircury LNA microrna Arrays

Software and Methods for the Analysis of Affymetrix GeneChip Data. Rafael A Irizarry Department of Biostatistics Johns Hopkins University

Biology Behind the Crime Scene Week 4: Lab #4 Genetics Exercise (Meiosis) and RFLP Analysis of DNA

Discovery and Quantification of RNA with RNASeq Roderic Guigó Serra Centre de Regulació Genòmica (CRG)

Gene Expression Analysis

How To Get A Cell Print

HiPer RT-PCR Teaching Kit

G E N OM I C S S E RV I C ES

PRACTICE TEST QUESTIONS

Materials and Methods. Blocking of Globin Reverse Transcription to Enhance Human Whole Blood Gene Expression Profiling

Transcription and Translation of DNA

Single-Cell DNA Sequencing with the C 1. Single-Cell Auto Prep System. Reveal hidden populations and genetic diversity within complex samples

Human Genome Organization: An Update. Genome Organization: An Update

DNA Replication & Protein Synthesis. This isn t a baaaaaaaddd chapter!!!

Transcription:

Molecular Genetics: Challenges for Statistical Practice J.K. Lindsey 1. What is a Microarray? 2. Design Questions 3. Modelling Questions 4. Longitudinal Data 5. Conclusions

1. What is a microarray? A microarray looks similar to a computer chip: a glass slide systematically covered with tens or hundreds of thousands of pieces of DNA, called probes, arranged in known positions. Two main kinds of probes: (spotted) cdna and (high density) oligonucleotides. DNA microarrays allow biologists to conduct large-scale experiments yielding massive quantities of data. One microarray experiment produces hundreds of thousands of observations. Experimental size is growing faster than computing power. 1

Roughly, the experimental goals may be classified as identifying new genes, assigning function to genes, determining when genes are expressed, studying interactions among genes, detecting gene mutations, elucidating mechanisms of disease. among others. 2

Most current microarrays are used to study gene expression. This means the translation of DNA code into a protein (or RNA). mrna from an individual to be studied is prepared and then labelled with a fluorescent dye and applied to the microarray. This preparation combines (hybridises) with certain probes on the microarray and the rest is then washed away. The microarray is scanned to make it fluorescent and the intensity of each spot (probe) is recorded. The results are then preprocessed, often with black-box commercial software: filtering, transformation, normalisation. 3

This is a complex, multi-step procedure with variability at every stage: background fluorescent, dye effects, spatial artifacts, etc. yielding multi-level (probe, array, batch...) measurement errors, as well as biological variability. In more complex cases, two or more samples may be applied to the same array. For example, in a two-channel microarray, matched samples labelled with two different dyes are competitively hybridised on the same array. 4

Ultimate goal: detecting real variations in intensities indicating differential gene expression. A simple, often used, criterion is fold change : an arbitrarily chosen difference in level of expression. For example twofold change between experimental and control cases. However, this does not allow for sampling variation. 5

2. Design questions High quality microarrays are expensive, often more expensive than the cost of obtaining the material to be studied. Thus, there will be few individual sampling units with enormous quantities of data on each one. Power and Sample Size Thus, classical power and sample size calculations are difficult because there are many outcomes of interest per slide and they are not independent. New methods, specifically for microarrays, are being developed. 6

Replications Technical replication: same sample on different microarrays. Results from the same RNA sample may differ among identical probes on the same microarray, among (supposedly) identical microarrays, among labs using the same procedures, among different types of arrays. all sources of measurement, and perhaps systematic, error. 7

Whenever possible, arrays should be from a single batch, processed by the same technician on the same day. Differences among labs is of special concern for reproducibility of results, the foundation of science! Some work has been done on standard benchmark slides for interlab comparisons. Biological replication: samples from different individuals. Variation in technical replication complicates comparisons among different individuals. 8

Treatment Differences By the use of more than one dye, certain microarray procedures allow comparisons of two (or more) RNA samples within the same slide: Then, for example, compare matched pairs, each pair of two samples on two chips, each sample to a reference sample, in a chain or loop, with each sample on two different microarrays. As far as possible, comparisons of most interest should be on the same slide. 9

Probe Layout Design of the layout of the probes on the microarray involves: position of control probes, physical separation among replicate probes, accounting for systematic patterns due to the manufacturing process (such as variation in printing pins). 10

3. Modelling Questions Gene expression levels are usually not normally distributed, with different probes often having different distributions. However, there is so much noise in microarrays that, if the design and many nuisance covariates are not accounted for, the normal distribution may be reasonable. A standard method is to fit the same statistical model independently to all probes on a microarray. This involves making thousands of similar inferences and ignores similarities among probes. Especially for simple models, many probes may yield virtually identical results. 11

Consider construction of gene profiles to classify patients into treatment groups. Suppose 150 individuals are used to construct the profiles based on 20,000 genes. A logistic regression is to be developed to predict whether or not to treat new patients, depending on the pattern of gene levels. At most, there only 150 different gene patterns: on average, sets of about 133 genes have the same pattern. How, then, can relevant models be constructed? 12

The number of different types of slides is rapidly growing: promoter chips, protein chips (using antibodies), chip-on-chip, exon chips, etc. Each requires the development of specific analysis techniques. 13

4. Longitudinal Data Samples may be taken at several points in time from the same organism or cell culture. Thousands of genes may be monitored over time. The phenomenon under study may be a trend or periodic. Trends: development of an organism, effect of some treatment such as a drug... Periodic: cell cycles, circadian rhythms... 14

How can reasonable models be fitted when there are few time points: 5-20, many probes: 10-20,000, perhaps also under several biological conditions, with no or very few replications?? Biologists are looking for genes varying over time and, possibly, how they vary. 15

5. Conclusions Microarray experiments provide statistical challenges in a wide variety of areas: design exploratory analysis testing modelling How would that eminent geneticist, R.A. Fisher, have approached all of these problems? 16