The Advantages and Disadvantages of Using Gene Ontology

Size: px
Start display at page:

Download "The Advantages and Disadvantages of Using Gene Ontology"

Transcription

1 Extracting Biological Information from Gene Lists Simon Andrews, Laura Biggins, Boo Virk v1.0

2 Biological material Sample for analysis Analysis of processed sample: Data acquisition sequencing, microarray analysis, mass spectrometry Isolation of DNA, RNA or proteins Sample processing Analysis doesn t end here! Results Table Containing hits genes, transcripts or proteins Public databases Raw data file(s) Data analysis: identification of genes, transcripts or proteins

3 Why functional analysis? Advantages: Biological insight Validation of experiment Generate new hypothesis Limitations: Amount of information depends on the species Will only find known/published links between genes If working on something novel information available may be limited

4 What this course covers Morning Introduction to Gene Lists Gene List Practical Coffee Presenting results Presenting Results Practical Afternoon Motif Searching Motif Searching Practical Coffee Networks and Interactions Network Practical Commercial tools

5 Gene Lists Types of gene list: Names of genes Names of genes ordered by qualitative value Names of genes ordered by quantitative value Gene lists can be ranked P-value Other Stat Ordered Gene lists can be filtered Cut off point Subset of genes

6 Transforming Gene ID s Need to use relevant ID to extract information from databases BioMart ID conversion tool allows us to do this easily and quickly online

7

8

9 Download this data, import transformed ID s into table

10 I have my gene list, what next? Hyperlinked table: Gene UniProt Name Score Reactome UniProt TP53 P04637 Tumor Suppressor p Reactome UniProt CDK1 P06493 Cyclin-dependent kinase Reactome UniProt POLE Q07864 DNA Polymerase Epsilon Reactome UniProt KPNB1 Q14974 Importin subunit beta Reactome UniProt CHEK1 O14757 Serine/threonine-protein kinase Chk Reactome UniProt AURKB Q96GD4 Aurora kinase B Reactome UniProt RPA2 P15927 Replication protein A 32 kda subunit Reactome UniProt CDT1 Q9H211 DNA replication factor Cdt Reactome UniProt MCMBP Q9BTE3 MCM complex-binding protein Reactome UniProt TUBG1 P23258 Tubulin gamma-1 chain Reactome UniProt RAN P62826 GTP-binding nuclear protein Ran Reactome UniProt RANGRF Q9HD47 Ran guanine nucleotide factor Mog Reactome UniProt BLM P54132 Bloom syndrome protein Reactome UniProt PCNA P12004 Proliferating Cell Nuclear Antigen Reactome UniProt SETD8 Q9NQR1 Pr-Set Reactome UniProt RCC1 P18754 Regulator of chromosome condensation Reactome UniProt MCM5 P33992 DNA replication licensing factor MCM Reactome UniProt CDC25C P30307 M-phase inducer phosphatase Reactome UniProt PLK1 P53350 Serine/threonine-protein kinase PLK Reactome UniProt MZT1 Q08AG7 Mitotic-spindle organizing protein Reactome UniProt

11 Hyperlinked tables Advantages Easy to create no special data-mining software needed One-click direct access to relevant pages Reference resource However Need to become familiar with the resources available - tailor hyperlinks to be specific for your organism and questions being asked Information on one gene at a time in your gene set Need to get relevant resource ID s

12 I have my gene list, what next? Annotated table: Gene UniProt Name Score PANTHER GO-Slim BP CHEK1 O14757 Serine/threonine-protein kinase Chk apoptotic process;nitrogen compound metabolic process;biosynthetic process;transcription from RNA polymerase II promoter;cellular protein modification process;cell cycle;cell communication;apoptotic process;response to stress;response to abiotic stimulus;regulation of transcription from RNA polymerase II promoter;regulation of cell cycle;chromatin organization MCM5 P33992 DNA replication licensing factor MCM cell cycle;cell communication RCC1 P18754 Regulator of chromosome condensation cellular component movement;mitosis;chromosome segregation;cellular component morphogenesis;intracellular protein transport;cellular component organization CDC25C P30307 M-phase inducer phosphatase DNA replication;cell cycle PLK1 P53350 Serine/threonine-protein kinase PLK DNA replication;dna repair;dna recombination;cell cycle TP53 P04637 Tumor Suppressor p glycogen metabolic process;protein phosphorylation;mitosis;cell communication CDK1 P06493 Cyclin-dependent kinase nitrogen compound metabolic process;biosynthetic process;dna replication;rna metabolic process;cellular process;regulation of biological process;regulation of catalytic activity BLM P54132 Bloom syndrome protein nucleobase-containing compound metabolic process;cell cycle;cell communication;rna localization;intracellular protein transport;nuclear transport RPA2 P15927 Replication protein A 32 kda subunit nucleobase-containing compound metabolic process;mitosis;nucleobase-containing compound transport;regulation of catalytic activity TUBG1 P23258 Tubulin gamma-1 chain phosphate-containing compound metabolic process;cellular protein modification process;cell cycle KPNB1 Q14974 Importin subunit beta phosphate-containing compound metabolic process;protein phosphorylation;cytokinesis;cell cycle;regulation of cell cycle;chromatin organization;cytoskeleton organization MZT1 Q08AG7 Mitotic-spindle organizing protein protein targeting;nuclear transport PCNA P12004 Proliferating Cell Nuclear Antigen RAN P62826 GTP-binding nuclear protein Ran POLE Q07864 DNA Polymerase Epsilon AURKB Q96GD4 Aurora kinase B MCMBP Q9BTE3 MCM complex-binding protein CDT1 Q9H211 DNA replication factor Cdt RANGRF Q9HD47 Ran guanine nucleotide factor Mog SETD8 Q9NQR1 Pr-Set

13 Annotation Sources There are many databases for annotation sources, including (but not limited to): Gene Ontology (GO)(most popular) Pathways and Interactions Protein domains Co-expression Transcription binding sites

14 Annotation Sources There are many databases for annotation sources, including (but not limited to): Gene Ontology (GO)(most popular) Pathways and Interactions Protein domains Co-expression Transcription binding sites

15 What is Gene Ontology (GO)? Collaborative effort addressing need for consistent descriptions of gene products across different databases GO project has three structured ontologies describing gene products independent of species: Biological Processes (BP), Cellular Components (CC) Molecular Functions (MF)

16 GO Structure 3 GO domains: Root ontology terms general Parent Child specific

17 Subsets of GO terms GO slim terms: Cut-down versions of the GO ontologies that contain a subset of terms from the GO resource Give a broad overview of the ontology content without the detail of the specific, fine-grained terms GO fat terms: subset comprising more specific terms

18 Annotation Sources There are many databases for annotation sources, including (but not limited to): Gene Ontology (GO)(most popular) Pathways and Interactions Protein domains Co-expression Transcription binding sites

19 Pathway and Interactions Are specific pathways enriched in my list? What other genes are in this pathway? Which genes/gene products interact with my genes of interest? Databases include:

20 Annotation Sources There are many databases for annotation sources, including (but not limited to): Gene Ontology (GO)(most popular) Pathways and Interactions Protein domains Co-expression Transcription binding sites

21 Protein Domain Can I find shared protein domains? What is the function of shared domain? Which other proteins share this domain? Databases include:

22 Annotation Sources There are many databases for annotation sources, including (but not limited to): Gene Ontology (GO)(most popular) Pathways and Interactions Protein domains Co-expression Transcription binding sites

23 Co-expression Which genes are co-expressed? Automatic grouping of genes (rather than human curation (GO)) Databases include:

24 Annotation Sources There are many databases for annotation sources, including (but not limited to): Gene Ontology (GO)(most popular) Pathways and Interactions Protein domains Co-expression Transcription binding sites

25 Annotated tables Advantages Information on function from larger gene sets Sort groups of genes (GO term, pathway, protein domain) Relatively easy to create Reference resource However Need to become familiar with the resources specific to your research Lots of information can be difficult to sort efficiently

26 What does functional information tell me? Have functional information about a gene set Can verify genes implicated in experiment are functionally relevant, and to discover unexpected shared functions Determine which functions are enriched in gene set How? Compare to a background list of genes

27 What is a background list? In theory, any gene that could have been differentially expressed in your experiment RNA seq all genes apart from those with less than reads Arrays all genes in the array ChipSeq - any gene on the chip. vs

28 Choosing a Background List Which background list to use? Whole set of genes Tissue/cell specific genes Manually made list, derived from your experiment and analysis? vs

29 Statistics to test for enrichment 13,101 genes on chip Gene List Related to disease 260/747 = 34.8% 3005 genes related to disease 3005/13,101= 23.1% Are these proportions the same? Do not related to disease 487/747 = 65.2%

30 Lots of Statistical tests to choose from Hypergeometric test Fisher s exact/chi-squared Binomial Kolmogorov Smirnov Permutation

31 Hypergeometric test Uses hypergeometric distribution to measure the probability of having drawn a specific number of successes (out of a total number of draws) from a population Example: Imagine that there are 4 green and 16 red marbles in a box. You close your eyes and draw 5 marbles without replacement What is the probability that exactly 2 of the 5 are green?

32 13,101 genes on chip Gene List Map to disease 260/747 = 34.8% 3005 genes map to disease 3005/13,101= 23.1% Are these proportions the same? Do not map to disease 487/747 = 65.2% What is the probability (p-value) that exactly 260 genes (out of 747) map to disease, given that there are 3005 of those genes in the background (13,101 genes)?

33 Hypergeometric test Test Input Sample Size Output Specifics Hypergeometric Unranked/Ranked List Large (5% of background) P- value Finite population probability of success changes Limitations: Assumes independence of categories Result terms often include directly related terms Is there really evidence for both terms? Works better with larger samples (5% of background)

34 Test Input Sample Size Output Specifics Hypergeometric Unranked/Ranked List Large (5% of background) P- value Finite population probability of success changes Based on Hypergeometric test: Test Input Sample Size Output Specifics Fisher s Exact Binomial Unranked/Ranked List Unranked/Ranked List Small P- value Large P- value Can be used to compare 2 conditions as well as gene list to background one-tailed or two-tailed Does not assume finite population probability of success remains the same

35 Limitations of Fisher s Exact and Binomial test Neither account for variation in the number of genes annotated to individual terms/functions being tested or the number of terms/functions associated with individual genes Therefore, tend to over-estimate significance if the gene set has an unusually high number of annotations Assume independence of categories

36 Lots of Statistical tests to choose from Hypergeometric Fisher s exact/chi-squared Binomial Kolmogorov Smirnov Permutation Used for ranked gene lists only Output: enrichment scores (ES) for functions, which can then be translated into a p-value

37 Multiple testing correction Error types in statistics: Statistical Decision: Significant True state in Gene List Not Overrepresented Type I error (False Positive) Overrepresented Correct Not Significant Correct Type II error (False Negative) Traditionally, a test or a difference are said to be significant if the probability of type I error is: α =< 0.05

38 Example: You want to compare 3 groups and you carry out 3 hypergeometric tests, each with a 5% level of significance (P<0.05) Probability of not making type I error = 95% = (1 0.05) Overall probability of no type I errors is: 0.95 * 0.95 * 0.95 = Therefore probability of at least one type I error is: = or 14.3% If comparing 5 groups instead of 3, the multiple testing error rate is 40%! (=1-(0.95) n ) Solution for multiple comparisons: Multiple testing correction Probability of error increases from 5% to 14.3%

39 Multiple test corrections Bonferroni Significant level (e.g. 0.05) /number of tests = new threshold This is an over correction if tests are correlated Benjamini-Hochberg Rank the p-values Apply more stringent correction to the most significant, and least stringent to the least significant p-values

40 Statistical issues We want to Identify functions of maximal biological significance BUT this is not perfectly correlated with statistical significance Use p values as a tool to rank functions but don t take them too literally Need to correct for multiple testing

41 Tools for functional gene list analysis There are many different tools available, both free and commercial Popular web-based tools include:

42 PANTHER (Protein ANnotation THrough Evolutionary Relationship) One of the most widely used online resources for gene function classification and genome wide data analysis PANTHER users have successfully analysed data from: Gene expression Proteomics Genome-wide association study (GWAS) experiments PANTHER is part of the GO consortium, thus PANTHER annotation = up to date GO curation

43 PANTHER for functional classification

44

45

46

47

48 Send list to > File Saves table in a tab delimited.txt file

49 PANTHER for statistics

50

51 Annotations from PANTHER include: GO-slim terms PANTHER protein class PANTHER Pathway terms Doesn t cluster together genes with similar GO terms in table Statistics: Binomial test with Bonferroni multiple testing correction

52 Gathers data from many different databases this is customisable Functional Clustering Uses many annotations, including GO-Fat terms more specific set of GO terms Statistics: Fisher s Exact Test and multiple testing correction

53 DAVID for functional classification

54

55

56

57

58

59

60

61

62 Functional Clustering

63 Which DAVID tool should I use?

64 GOrilla

65 Which tool to use? Choose a tool that: Includes your gene / probe identifiers Includes your species Has up to date annotation Lets you define your background (if possible) Try a few different tools Try gene lists of varying length

Identification of rheumatoid arthritis and osteoarthritis patients by transcriptome-based rule set generation

Identification of rheumatoid arthritis and osteoarthritis patients by transcriptome-based rule set generation Identification of rheumatoid arthritis and osterthritis patients by transcriptome-based rule set generation Bering Limited Report generated on September 19, 2014 Contents 1 Dataset summary 2 1.1 Project

More information

PREDA S4-classes. Francesco Ferrari October 13, 2015

PREDA S4-classes. Francesco Ferrari October 13, 2015 PREDA S4-classes Francesco Ferrari October 13, 2015 Abstract This document provides a description of custom S4 classes used to manage data structures for PREDA: an R package for Position RElated Data Analysis.

More information

Gene Enrichment Analysis

Gene Enrichment Analysis a Analysis of DNA Chips and Gene Networks Spring Semester, 2009 Lecture 14a: January 21, 2010 Lecturer: Ron Shamir Scribe: Roye Rozov Gene Enrichment Analysis 14.1 Introduction This lecture introduces

More information

Exercise with Gene Ontology - Cytoscape - BiNGO

Exercise with Gene Ontology - Cytoscape - BiNGO Exercise with Gene Ontology - Cytoscape - BiNGO This practical has material extracted from http://www.cbs.dtu.dk/chipcourse/exercises/ex_go/goexercise11.php In this exercise we will analyze microarray

More information

Analysis of Illumina Gene Expression Microarray Data

Analysis of Illumina Gene Expression Microarray Data Analysis of Illumina Gene Expression Microarray Data Asta Laiho, Msc. Tech. Bioinformatics research engineer The Finnish DNA Microarray Centre Turku Centre for Biotechnology, Finland The Finnish DNA Microarray

More information

Course on Functional Analysis. ::: Gene Set Enrichment Analysis - GSEA -

Course on Functional Analysis. ::: Gene Set Enrichment Analysis - GSEA - Course on Functional Analysis ::: Madrid, June 31st, 2007. Gonzalo Gómez, PhD. ggomez@cnio.es Bioinformatics Unit CNIO ::: Contents. 1. Introduction. 2. GSEA Software 3. Data Formats 4. Using GSEA 5. GSEA

More information

Tutorial for proteome data analysis using the Perseus software platform

Tutorial for proteome data analysis using the Perseus software platform Tutorial for proteome data analysis using the Perseus software platform Laboratory of Mass Spectrometry, LNBio, CNPEM Tutorial version 1.0, January 2014. Note: This tutorial was written based on the information

More information

Minería de Datos ANALISIS DE UN SET DE DATOS.! Visualization Techniques! Combined Graph! Charts and Pies! Search for specific functions

Minería de Datos ANALISIS DE UN SET DE DATOS.! Visualization Techniques! Combined Graph! Charts and Pies! Search for specific functions Minería de Datos ANALISIS DE UN SET DE DATOS! Visualization Techniques! Combined Graph! Charts and Pies! Search for specific functions Data Mining on the DAG ü When working with large datasets, annotation

More information

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison RETRIEVING SEQUENCE INFORMATION Nucleotide sequence databases Database search Sequence alignment and comparison Biological sequence databases Originally just a storage place for sequences. Currently the

More information

A Primer of Genome Science THIRD

A Primer of Genome Science THIRD A Primer of Genome Science THIRD EDITION GREG GIBSON-SPENCER V. MUSE North Carolina State University Sinauer Associates, Inc. Publishers Sunderland, Massachusetts USA Contents Preface xi 1 Genome Projects:

More information

NOVEL GENOME-SCALE CORRELATION BETWEEN DNA REPLICATION AND RNA TRANSCRIPTION DURING THE CELL CYCLE IN YEAST IS PREDICTED BY DATA-DRIVEN MODELS

NOVEL GENOME-SCALE CORRELATION BETWEEN DNA REPLICATION AND RNA TRANSCRIPTION DURING THE CELL CYCLE IN YEAST IS PREDICTED BY DATA-DRIVEN MODELS NOVEL GENOME-SCALE CORRELATION BETWEEN DNA REPLICATION AND RNA TRANSCRIPTION DURING THE CELL CYCLE IN YEAST IS PREDICTED BY DATA-DRIVEN MODELS Orly Alter (a) *, Gene H. Golub (b), Patrick O. Brown (c)

More information

Comparing Methods for Identifying Transcription Factor Target Genes

Comparing Methods for Identifying Transcription Factor Target Genes Comparing Methods for Identifying Transcription Factor Target Genes Alena van Bömmel (R 3.3.73) Matthew Huska (R 3.3.18) Max Planck Institute for Molecular Genetics Folie 1 Transcriptional Regulation TF

More information

Systematic discovery of regulatory motifs in human promoters and 30 UTRs by comparison of several mammals

Systematic discovery of regulatory motifs in human promoters and 30 UTRs by comparison of several mammals Systematic discovery of regulatory motifs in human promoters and 30 UTRs by comparison of several mammals Xiaohui Xie 1, Jun Lu 1, E. J. Kulbokas 1, Todd R. Golub 1, Vamsi Mootha 1, Kerstin Lindblad-Toh

More information

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE ACCELERATING PROGRESS IS IN OUR GENES AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE GENESPRING GENE EXPRESSION (GX) MASS PROFILER PROFESSIONAL (MPP) PATHWAY ARCHITECT (PA) See Deeper. Reach Further. BIOINFORMATICS

More information

org.rn.eg.db December 16, 2015 org.rn.egaccnum is an R object that contains mappings between Entrez Gene identifiers and GenBank accession numbers.

org.rn.eg.db December 16, 2015 org.rn.egaccnum is an R object that contains mappings between Entrez Gene identifiers and GenBank accession numbers. org.rn.eg.db December 16, 2015 org.rn.egaccnum Map Entrez Gene identifiers to GenBank Accession Numbers org.rn.egaccnum is an R object that contains mappings between Entrez Gene identifiers and GenBank

More information

Gene Expression Analysis

Gene Expression Analysis Gene Expression Analysis Jie Peng Department of Statistics University of California, Davis May 2012 RNA expression technologies High-throughput technologies to measure the expression levels of thousands

More information

Bioinformatics Grid - Enabled Tools For Biologists.

Bioinformatics Grid - Enabled Tools For Biologists. Bioinformatics Grid - Enabled Tools For Biologists. What is Grid-Enabled Tools (GET)? As number of data from the genomics and proteomics experiment increases. Problems arise for the current sequence analysis

More information

How many of you have checked out the web site on protein-dna interactions?

How many of you have checked out the web site on protein-dna interactions? How many of you have checked out the web site on protein-dna interactions? Example of an approximately 40,000 probe spotted oligo microarray with enlarged inset to show detail. Find and be ready to discuss

More information

InSyBio BioNets: Utmost efficiency in gene expression data and biological networks analysis

InSyBio BioNets: Utmost efficiency in gene expression data and biological networks analysis InSyBio BioNets: Utmost efficiency in gene expression data and biological networks analysis WHITE PAPER By InSyBio Ltd Konstantinos Theofilatos Bioinformatician, PhD InSyBio Technical Sales Manager August

More information

Discovery and Quantification of RNA with RNASeq Roderic Guigó Serra Centre de Regulació Genòmica (CRG) roderic.guigo@crg.cat

Discovery and Quantification of RNA with RNASeq Roderic Guigó Serra Centre de Regulació Genòmica (CRG) roderic.guigo@crg.cat Bioinformatique et Séquençage Haut Débit, Discovery and Quantification of RNA with RNASeq Roderic Guigó Serra Centre de Regulació Genòmica (CRG) roderic.guigo@crg.cat 1 RNA Transcription to RNA and subsequent

More information

Sequence Formats and Sequence Database Searches. Gloria Rendon SC11 Education June, 2011

Sequence Formats and Sequence Database Searches. Gloria Rendon SC11 Education June, 2011 Sequence Formats and Sequence Database Searches Gloria Rendon SC11 Education June, 2011 Sequence A is the primary structure of a biological molecule. It is a chain of residues that form a precise linear

More information

How to create and interpret the predictive analysis of a compound

How to create and interpret the predictive analysis of a compound How to create and interpret the predictive analysis of a compound Platform with suite of tools Predict & understand biological effects of small molecules & compounds Predict targets and metabolites, potential

More information

GenBank, Entrez, & FASTA

GenBank, Entrez, & FASTA GenBank, Entrez, & FASTA Nucleotide Sequence Databases First generation GenBank is a representative example started as sort of a museum to preserve knowledge of a sequence from first discovery great repositories,

More information

Web-Based Genomic Information Integration with Gene Ontology

Web-Based Genomic Information Integration with Gene Ontology Web-Based Genomic Information Integration with Gene Ontology Kai Xu 1 IMAGEN group, National ICT Australia, Sydney, Australia, kai.xu@nicta.com.au Abstract. Despite the dramatic growth of online genomic

More information

Next Generation Sequencing: Technology, Mapping, and Analysis

Next Generation Sequencing: Technology, Mapping, and Analysis Next Generation Sequencing: Technology, Mapping, and Analysis Gary Benson Computer Science, Biology, Bioinformatics Boston University gbenson@bu.edu http://tandem.bu.edu/ The Human Genome Project took

More information

Analysis and Integration of Big Data from Next-Generation Genomics, Epigenomics, and Transcriptomics

Analysis and Integration of Big Data from Next-Generation Genomics, Epigenomics, and Transcriptomics Analysis and Integration of Big Data from Next-Generation Genomics, Epigenomics, and Transcriptomics Christopher Benner, PhD Director, Integrative Genomics and Bioinformatics Core (IGC) idash Webinar,

More information

Microarray Data Analysis. A step by step analysis using BRB-Array Tools

Microarray Data Analysis. A step by step analysis using BRB-Array Tools Microarray Data Analysis A step by step analysis using BRB-Array Tools 1 EXAMINATION OF DIFFERENTIAL GENE EXPRESSION (1) Objective: to find genes whose expression is changed before and after chemotherapy.

More information

PANTHER User Manual. For PANTHER 9.0. Date: January 7, 2015. The PANTHER Team. Authors:

PANTHER User Manual. For PANTHER 9.0. Date: January 7, 2015. The PANTHER Team. Authors: PANTHER User Manual For PANTHER 9.0 Date: January 7, 2015 Authors: The PANTHER Team Contents 1 Welcome to PANTHER System 1 1.1 About this document........... 1 1.2 How to cite PANTHER.......... 1 1.3 PANTHER

More information

ProteinPilot Report for ProteinPilot Software

ProteinPilot Report for ProteinPilot Software ProteinPilot Report for ProteinPilot Software Detailed Analysis of Protein Identification / Quantitation Results Automatically Sean L Seymour, Christie Hunter SCIEX, USA Pow erful mass spectrometers like

More information

BBSRC TECHNOLOGY STRATEGY: TECHNOLOGIES NEEDED BY RESEARCH KNOWLEDGE PROVIDERS

BBSRC TECHNOLOGY STRATEGY: TECHNOLOGIES NEEDED BY RESEARCH KNOWLEDGE PROVIDERS BBSRC TECHNOLOGY STRATEGY: TECHNOLOGIES NEEDED BY RESEARCH KNOWLEDGE PROVIDERS 1. The Technology Strategy sets out six areas where technological developments are required to push the frontiers of knowledge

More information

Genomes and SNPs in Malaria and Sickle Cell Anemia

Genomes and SNPs in Malaria and Sickle Cell Anemia Genomes and SNPs in Malaria and Sickle Cell Anemia Introduction to Genome Browsing with Ensembl Ensembl The vast amount of information in biological databases today demands a way of organising and accessing

More information

Integrating DNA Motif Discovery and Genome-Wide Expression Analysis. Erin M. Conlon

Integrating DNA Motif Discovery and Genome-Wide Expression Analysis. Erin M. Conlon Integrating DNA Motif Discovery and Genome-Wide Expression Analysis Department of Mathematics and Statistics University of Massachusetts Amherst Statistics in Functional Genomics Workshop Ascona, Switzerland

More information

Network Webinar Series

Network Webinar Series Undergraduate Educator Network Series Sponsored by Undergraduate Education Subcommittee SOT Education Committee June 4, 2015 12:00 Noon ET (c) SOT2015 Welcome Kristine Willett, PhD Co-Chair, C Undergraduate

More information

Pathway Analysis : An Introduction

Pathway Analysis : An Introduction Pathway Analysis : An Introduction Experiments Literature and other KB Data Knowledge Structure in Data through statistics Structure in Knowledge through GO and other Ontologies Gain insight into Data

More information

ProteinQuest user guide

ProteinQuest user guide ProteinQuest user guide 1. Introduction... 3 1.1 With ProteinQuest you can... 3 1.2 ProteinQuest basic version 4 1.3 ProteinQuest extended version... 5 2. ProteinQuest dictionaries... 6 3. Directions for

More information

From Data to Foresight:

From Data to Foresight: Laura Haas, IBM Fellow IBM Research - Almaden From Data to Foresight: Leveraging Data and Analytics for Materials Research 1 2011 IBM Corporation The road from data to foresight is long? Consumer Reports

More information

Activity 7.21 Transcription factors

Activity 7.21 Transcription factors Purpose To consolidate understanding of protein synthesis. To explain the role of transcription factors and hormones in switching genes on and off. Play the transcription initiation complex game Regulation

More information

Frequently Asked Questions Next Generation Sequencing

Frequently Asked Questions Next Generation Sequencing Frequently Asked Questions Next Generation Sequencing Import These Frequently Asked Questions for Next Generation Sequencing are some of the more common questions our customers ask. Questions are divided

More information

Protein Protein Interaction Networks

Protein Protein Interaction Networks Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics

More information

Control of Gene Expression

Control of Gene Expression Control of Gene Expression What is Gene Expression? Gene expression is the process by which informa9on from a gene is used in the synthesis of a func9onal gene product. What is Gene Expression? Figure

More information

Introduction to Genome Annotation

Introduction to Genome Annotation Introduction to Genome Annotation AGCGTGGTAGCGCGAGTTTGCGAGCTAGCTAGGCTCCGGATGCGA CCAGCTTTGATAGATGAATATAGTGTGCGCGACTAGCTGTGTGTT GAATATATAGTGTGTCTCTCGATATGTAGTCTGGATCTAGTGTTG GTGTAGATGGAGATCGCGTAGCGTGGTAGCGCGAGTTTGCGAGCT

More information

The EcoCyc Curation Process

The EcoCyc Curation Process The EcoCyc Curation Process Ingrid M. Keseler SRI International 1 HOW OFTEN IS THE GOLDEN GATE BRIDGE PAINTED? Many misconceptions exist about how often the Bridge is painted. Some say once every seven

More information

Understanding West Nile Virus Infection

Understanding West Nile Virus Infection Understanding West Nile Virus Infection The QIAGEN Bioinformatics Solution: Biomedical Genomics Workbench (BXWB) + Ingenuity Pathway Analysis (IPA) Functional Genomics & Predictive Medicine, May 21-22,

More information

What s New in Pathway Studio Web 11.1

What s New in Pathway Studio Web 11.1 1 1 What s New in Pathway Studio Web 11.1 Elseiver is pleased to announce the release of Pathway Studio Web 11.1 for all database subscriptions (Mammal, Mammal+ChemEffect+DiseaseFx, Plant). This release

More information

MORPHEUS. http://biodev.cea.fr/morpheus/ Prediction of Transcription Factors Binding Sites based on Position Weight Matrix.

MORPHEUS. http://biodev.cea.fr/morpheus/ Prediction of Transcription Factors Binding Sites based on Position Weight Matrix. MORPHEUS http://biodev.cea.fr/morpheus/ Prediction of Transcription Factors Binding Sites based on Position Weight Matrix. Reference: MORPHEUS, a Webtool for Transcripton Factor Binding Analysis Using

More information

Molecular Genetics: Challenges for Statistical Practice. J.K. Lindsey

Molecular Genetics: Challenges for Statistical Practice. J.K. Lindsey Molecular Genetics: Challenges for Statistical Practice J.K. Lindsey 1. What is a Microarray? 2. Design Questions 3. Modelling Questions 4. Longitudinal Data 5. Conclusions 1. What is a microarray? A microarray

More information

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources 1 of 8 11/7/2004 11:00 AM National Center for Biotechnology Information About NCBI NCBI at a Glance A Science Primer Human Genome Resources Model Organisms Guide Outreach and Education Databases and Tools

More information

Analysis of gene expression data. Ulf Leser and Philippe Thomas

Analysis of gene expression data. Ulf Leser and Philippe Thomas Analysis of gene expression data Ulf Leser and Philippe Thomas This Lecture Protein synthesis Microarray Idea Technologies Applications Problems Quality control Normalization Analysis next week! Ulf Leser:

More information

BIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology http://tinyurl.com/bioinf525-w16

BIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology http://tinyurl.com/bioinf525-w16 Course Director: Dr. Barry Grant (DCM&B, bjgrant@med.umich.edu) Description: This is a three module course covering (1) Foundations of Bioinformatics, (2) Statistics in Bioinformatics, and (3) Systems

More information

SAP HANA Enabling Genome Analysis

SAP HANA Enabling Genome Analysis SAP HANA Enabling Genome Analysis Joanna L. Kelley, PhD Postdoctoral Scholar, Stanford University Enakshi Singh, MSc HANA Product Management, SAP Labs LLC Outline Use cases Genomics review Challenges in

More information

Module 1. Sequence Formats and Retrieval. Charles Steward

Module 1. Sequence Formats and Retrieval. Charles Steward The Open Door Workshop Module 1 Sequence Formats and Retrieval Charles Steward 1 Aims Acquaint you with different file formats and associated annotations. Introduce different nucleotide and protein databases.

More information

Special report. Chronic Lymphocytic Leukemia (CLL) Genomic Biology 3020 April 20, 2006

Special report. Chronic Lymphocytic Leukemia (CLL) Genomic Biology 3020 April 20, 2006 Special report Chronic Lymphocytic Leukemia (CLL) Genomic Biology 3020 April 20, 2006 Gene And Protein The gene that causes the mutation is CCND1 and the protein NP_444284 The mutation deals with the cell

More information

Validation and Replication

Validation and Replication Validation and Replication Overview Definitions of validation and replication Difficulties and limitations Working examples from our group and others Why? False positive results still occur. even after

More information

Core Facility Genomics

Core Facility Genomics Core Facility Genomics versatile genome or transcriptome analyses based on quantifiable highthroughput data ascertainment 1 Topics Collaboration with Harald Binder and Clemens Kreutz Project: Microarray

More information

Nebula A web-server for advanced ChIP-seq data analysis. Tutorial. by Valentina BOEVA

Nebula A web-server for advanced ChIP-seq data analysis. Tutorial. by Valentina BOEVA Nebula A web-server for advanced ChIP-seq data analysis Tutorial by Valentina BOEVA Content Upload data to the history pp. 5-6 Check read number and sequencing quality pp. 7-9 Visualize.BAM files in UCSC

More information

Lecture 11 Data storage and LIMS solutions. Stéphane LE CROM lecrom@biologie.ens.fr

Lecture 11 Data storage and LIMS solutions. Stéphane LE CROM lecrom@biologie.ens.fr Lecture 11 Data storage and LIMS solutions Stéphane LE CROM lecrom@biologie.ens.fr Various steps of a DNA microarray experiment Experimental steps Data analysis Experimental design set up Chips on catalog

More information

Shouguo Gao Ph. D Department of Physics and Comprehensive Diabetes Center

Shouguo Gao Ph. D Department of Physics and Comprehensive Diabetes Center Computational Challenges in Storage, Analysis and Interpretation of Next-Generation Sequencing Data Shouguo Gao Ph. D Department of Physics and Comprehensive Diabetes Center Next Generation Sequencing

More information

Control of Gene Expression

Control of Gene Expression Control of Gene Expression (Learning Objectives) Explain the role of gene expression is differentiation of function of cells which leads to the emergence of different tissues, organs, and organ systems

More information

How Sequencing Experiments Fail

How Sequencing Experiments Fail How Sequencing Experiments Fail v1.0 Simon Andrews simon.andrews@babraham.ac.uk Classes of Failure Technical Tracking Library Contamination Biological Interpretation Something went wrong with a machine

More information

Supplementary Figure 1: Quality Assessment of Mouse Arrays. Supplementary Figure 2: Quality Assessment of Rat Arrays

Supplementary Figure 1: Quality Assessment of Mouse Arrays. Supplementary Figure 2: Quality Assessment of Rat Arrays Supplementary Figure 1: Quality Assessment of Mouse Arrays The mouse microarray data were subjected to an extensive quality-control procedure prior to conducting downstream analyses. We assessed the spread

More information

Tutorial for Proteomics Data Submission. Katalin F. Medzihradszky Robert J. Chalkley UCSF

Tutorial for Proteomics Data Submission. Katalin F. Medzihradszky Robert J. Chalkley UCSF Tutorial for Proteomics Data Submission Katalin F. Medzihradszky Robert J. Chalkley UCSF Why Have Guidelines? Large-scale proteomics studies create huge amounts of data. It is impossible/impractical to

More information

School of Nursing. Presented by Yvette Conley, PhD

School of Nursing. Presented by Yvette Conley, PhD Presented by Yvette Conley, PhD What we will cover during this webcast: Briefly discuss the approaches introduced in the paper: Genome Sequencing Genome Wide Association Studies Epigenomics Gene Expression

More information

Introduction to SAGEnhaft

Introduction to SAGEnhaft Introduction to SAGEnhaft Tim Beissbarth October 13, 2015 1 Overview Serial Analysis of Gene Expression (SAGE) is a gene expression profiling technique that estimates the abundance of thousands of gene

More information

Comparative genomic hybridization Because arrays are more than just a tool for expression analysis

Comparative genomic hybridization Because arrays are more than just a tool for expression analysis Microarray Data Analysis Workshop MedVetNet Workshop, DTU 2008 Comparative genomic hybridization Because arrays are more than just a tool for expression analysis Carsten Friis ( with several slides from

More information

Data Integration. Lectures 16 & 17. ECS289A, WQ03, Filkov

Data Integration. Lectures 16 & 17. ECS289A, WQ03, Filkov Data Integration Lectures 16 & 17 Lectures Outline Goals for Data Integration Homogeneous data integration time series data (Filkov et al. 2002) Heterogeneous data integration microarray + sequence microarray

More information

University Uses Business Intelligence Software to Boost Gene Research

University Uses Business Intelligence Software to Boost Gene Research Microsoft SQL Server 2008 R2 Customer Solution Case Study University Uses Business Intelligence Software to Boost Gene Research Overview Country or Region: Scotland Industry: Education Customer Profile

More information

Appendix 2 Molecular Biology Core Curriculum. Websites and Other Resources

Appendix 2 Molecular Biology Core Curriculum. Websites and Other Resources Appendix 2 Molecular Biology Core Curriculum Websites and Other Resources Chapter 1 - The Molecular Basis of Cancer 1. Inside Cancer http://www.insidecancer.org/ From the Dolan DNA Learning Center Cold

More information

FACULTY OF MEDICAL SCIENCE

FACULTY OF MEDICAL SCIENCE Doctor of Philosophy in Biochemistry FACULTY OF MEDICAL SCIENCE Naresuan University 73 Doctor of Philosophy in Biochemistry The Biochemistry Department at Naresuan University is a leader in lower northern

More information

Gene expression analysis. Ulf Leser and Karin Zimmermann

Gene expression analysis. Ulf Leser and Karin Zimmermann Gene expression analysis Ulf Leser and Karin Zimmermann Ulf Leser: Bioinformatics, Wintersemester 2010/2011 1 Last lecture What are microarrays? - Biomolecular devices measuring the transcriptome of a

More information

Next Generation Sequencing: Adjusting to Big Data. Daniel Nicorici, Dr.Tech. Statistikot Suomen Lääketeollisuudessa 29.10.2013

Next Generation Sequencing: Adjusting to Big Data. Daniel Nicorici, Dr.Tech. Statistikot Suomen Lääketeollisuudessa 29.10.2013 Next Generation Sequencing: Adjusting to Big Data Daniel Nicorici, Dr.Tech. Statistikot Suomen Lääketeollisuudessa 29.10.2013 Outline Human Genome Project Next-Generation Sequencing Personalized Medicine

More information

t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon

t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon t-tests in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com www.excelmasterseries.com

More information

Sickle cell anemia: Altered beta chain Single AA change (#6 Glu to Val) Consequence: Protein polymerizes Change in RBC shape ---> phenotypes

Sickle cell anemia: Altered beta chain Single AA change (#6 Glu to Val) Consequence: Protein polymerizes Change in RBC shape ---> phenotypes Protein Structure Polypeptide: Protein: Therefore: Example: Single chain of amino acids 1 or more polypeptide chains All polypeptides are proteins Some proteins contain >1 polypeptide Hemoglobin (O 2 binding

More information

Statistical issues in the analysis of microarray data

Statistical issues in the analysis of microarray data Statistical issues in the analysis of microarray data Daniel Gerhard Institute of Biostatistics Leibniz University of Hannover ESNATS Summerschool, Zermatt D. Gerhard (LUH) Analysis of microarray data

More information

GeneProf and the new GeneProf Web Services

GeneProf and the new GeneProf Web Services GeneProf and the new GeneProf Web Services Florian Halbritter florian.halbritter@ed.ac.uk Stem Cell Bioinformatics Group (Simon R. Tomlinson) simon.tomlinson@ed.ac.uk December 10, 2012 Florian Halbritter

More information

Name Date Period. 2. When a molecule of double-stranded DNA undergoes replication, it results in

Name Date Period. 2. When a molecule of double-stranded DNA undergoes replication, it results in DNA, RNA, Protein Synthesis Keystone 1. During the process shown above, the two strands of one DNA molecule are unwound. Then, DNA polymerases add complementary nucleotides to each strand which results

More information

Translation Study Guide

Translation Study Guide Translation Study Guide This study guide is a written version of the material you have seen presented in the replication unit. In translation, the cell uses the genetic information contained in mrna to

More information

Basic Analysis of Microarray Data

Basic Analysis of Microarray Data Basic Analysis of Microarray Data A User Guide and Tutorial Scott A. Ness, Ph.D. Co-Director, Keck-UNM Genomics Resource and Dept. of Molecular Genetics and Microbiology University of New Mexico HSC Tel.

More information

Scottish Qualifications Authority

Scottish Qualifications Authority National Unit specification: general information Unit code: FH2G 12 Superclass: RH Publication date: March 2011 Source: Scottish Qualifications Authority Version: 01 Summary This Unit is a mandatory Unit

More information

Network Analysis. BCH 5101: Analysis of -Omics Data 1/34

Network Analysis. BCH 5101: Analysis of -Omics Data 1/34 Network Analysis BCH 5101: Analysis of -Omics Data 1/34 Network Analysis Graphs as a representation of networks Examples of genome-scale graphs Statistical properties of genome-scale graphs The search

More information

Quantitative proteomics background

Quantitative proteomics background Proteomics data analysis seminar Quantitative proteomics and transcriptomics of anaerobic and aerobic yeast cultures reveals post transcriptional regulation of key cellular processes de Groot, M., Daran

More information

Frequently Asked Questions (FAQ)

Frequently Asked Questions (FAQ) Frequently Asked Questions (FAQ) Why screen your (therapeutic) antibody for cross-reactivity? Cross-reactivity of therapeutic antibodies leads to adverse effects and might render the antibody unsuitable

More information

Dr Alexander Henzing

Dr Alexander Henzing Horizon 2020 Health, Demographic Change & Wellbeing EU funding, research and collaboration opportunities for 2016/17 Innovate UK funding opportunities in omics, bridging health and life sciences Dr Alexander

More information

Pairwise Sequence Alignment

Pairwise Sequence Alignment Pairwise Sequence Alignment carolin.kosiol@vetmeduni.ac.at SS 2013 Outline Pairwise sequence alignment global - Needleman Wunsch Gotoh algorithm local - Smith Waterman algorithm BLAST - heuristics What

More information

Guide for Bioinformatics Project Module 3

Guide for Bioinformatics Project Module 3 Structure- Based Evidence and Multiple Sequence Alignment In this module we will revisit some topics we started to look at while performing our BLAST search and looking at the CDD database in the first

More information

Module 3 Questions. 7. Chemotaxis is an example of signal transduction. Explain, with the use of diagrams.

Module 3 Questions. 7. Chemotaxis is an example of signal transduction. Explain, with the use of diagrams. Module 3 Questions Section 1. Essay and Short Answers. Use diagrams wherever possible 1. With the use of a diagram, provide an overview of the general regulation strategies available to a bacterial cell.

More information

Overview of Eukaryotic Gene Prediction

Overview of Eukaryotic Gene Prediction Overview of Eukaryotic Gene Prediction CBB 231 / COMPSCI 261 W.H. Majoros What is DNA? Nucleus Chromosome Telomere Centromere Cell Telomere base pairs histones DNA (double helix) DNA is a Double Helix

More information

Microarray Technology

Microarray Technology Microarrays And Functional Genomics CPSC265 Matt Hudson Microarray Technology Relatively young technology Usually used like a Northern blot can determine the amount of mrna for a particular gene Except

More information

Lecture Outline. Introduction to Databases. Introduction. Data Formats Sample databases How to text search databases. Shifra Ben-Dor Irit Orr

Lecture Outline. Introduction to Databases. Introduction. Data Formats Sample databases How to text search databases. Shifra Ben-Dor Irit Orr Introduction to Databases Shifra Ben-Dor Irit Orr Lecture Outline Introduction Data and Database types Database components Data Formats Sample databases How to text search databases What units of information

More information

DNA Replication & Protein Synthesis. This isn t a baaaaaaaddd chapter!!!

DNA Replication & Protein Synthesis. This isn t a baaaaaaaddd chapter!!! DNA Replication & Protein Synthesis This isn t a baaaaaaaddd chapter!!! The Discovery of DNA s Structure Watson and Crick s discovery of DNA s structure was based on almost fifty years of research by other

More information

MeDIP-chip service report

MeDIP-chip service report MeDIP-chip service report Wednesday, 20 August, 2008 Sample source: Cells from University of *** Customer: ****** Organization: University of *** Contents of this service report General information and

More information

Analyzing microrna Data and Integrating mirna with Gene Expression Data in Partek Genomics Suite 6.6

Analyzing microrna Data and Integrating mirna with Gene Expression Data in Partek Genomics Suite 6.6 Analyzing microrna Data and Integrating mirna with Gene Expression Data in Partek Genomics Suite 6.6 Overview This tutorial outlines how microrna data can be analyzed within Partek Genomics Suite. Additionally,

More information

Biological Sequence Data Formats

Biological Sequence Data Formats Biological Sequence Data Formats Here we present three standard formats in which biological sequence data (DNA, RNA and protein) can be stored and presented. Raw Sequence: Data without description. FASTA

More information

Replication Study Guide

Replication Study Guide Replication Study Guide This study guide is a written version of the material you have seen presented in the replication unit. Self-reproduction is a function of life that human-engineered systems have

More information

Annex to the Accreditation Certificate D-PL-13372-01-00 according to DIN EN ISO/IEC 17025:2005

Annex to the Accreditation Certificate D-PL-13372-01-00 according to DIN EN ISO/IEC 17025:2005 Deutsche Akkreditierungsstelle GmbH German Accreditation Body Annex to the Accreditation Certificate D-PL-13372-01-00 according to DIN EN ISO/IEC 17025:2005 Period of validity: 26.03.2012 to 25.03.2017

More information

Control of Gene Expression

Control of Gene Expression Home Gene Regulation Is Necessary? Control of Gene Expression By switching genes off when they are not needed, cells can prevent resources from being wasted. There should be natural selection favoring

More information

Bioinformatics Resources at a Glance

Bioinformatics Resources at a Glance Bioinformatics Resources at a Glance A Note about FASTA Format There are MANY free bioinformatics tools available online. Bioinformaticists have developed a standard format for nucleotide and protein sequences

More information

Focusing on results not data comprehensive data analysis for targeted next generation sequencing

Focusing on results not data comprehensive data analysis for targeted next generation sequencing Focusing on results not data comprehensive data analysis for targeted next generation sequencing Daniel Swan, Jolyon Holdstock, Angela Matchan, Richard Stark, John Shovelton, Duarte Mohla and Simon Hughes

More information

Lecture 8. Protein Trafficking/Targeting. Protein targeting is necessary for proteins that are destined to work outside the cytoplasm.

Lecture 8. Protein Trafficking/Targeting. Protein targeting is necessary for proteins that are destined to work outside the cytoplasm. Protein Trafficking/Targeting (8.1) Lecture 8 Protein Trafficking/Targeting Protein targeting is necessary for proteins that are destined to work outside the cytoplasm. Protein targeting is more complex

More information

Biological Sciences Initiative. Human Genome

Biological Sciences Initiative. Human Genome Biological Sciences Initiative HHMI Human Genome Introduction In 2000, researchers from around the world published a draft sequence of the entire genome. 20 labs from 6 countries worked on the sequence.

More information

CCR Biology - Chapter 9 Practice Test - Summer 2012

CCR Biology - Chapter 9 Practice Test - Summer 2012 Name: Class: Date: CCR Biology - Chapter 9 Practice Test - Summer 2012 Multiple Choice Identify the choice that best completes the statement or answers the question. 1. Genetic engineering is possible

More information