The Advantages and Disadvantages of Using Gene Ontology
|
|
- Ruth Wilson
- 3 years ago
- Views:
Transcription
1 Extracting Biological Information from Gene Lists Simon Andrews, Laura Biggins, Boo Virk v1.0
2 Biological material Sample for analysis Analysis of processed sample: Data acquisition sequencing, microarray analysis, mass spectrometry Isolation of DNA, RNA or proteins Sample processing Analysis doesn t end here! Results Table Containing hits genes, transcripts or proteins Public databases Raw data file(s) Data analysis: identification of genes, transcripts or proteins
3 Why functional analysis? Advantages: Biological insight Validation of experiment Generate new hypothesis Limitations: Amount of information depends on the species Will only find known/published links between genes If working on something novel information available may be limited
4 What this course covers Morning Introduction to Gene Lists Gene List Practical Coffee Presenting results Presenting Results Practical Afternoon Motif Searching Motif Searching Practical Coffee Networks and Interactions Network Practical Commercial tools
5 Gene Lists Types of gene list: Names of genes Names of genes ordered by qualitative value Names of genes ordered by quantitative value Gene lists can be ranked P-value Other Stat Ordered Gene lists can be filtered Cut off point Subset of genes
6 Transforming Gene ID s Need to use relevant ID to extract information from databases BioMart ID conversion tool allows us to do this easily and quickly online
7
8
9 Download this data, import transformed ID s into table
10 I have my gene list, what next? Hyperlinked table: Gene UniProt Name Score Reactome UniProt TP53 P04637 Tumor Suppressor p Reactome UniProt CDK1 P06493 Cyclin-dependent kinase Reactome UniProt POLE Q07864 DNA Polymerase Epsilon Reactome UniProt KPNB1 Q14974 Importin subunit beta Reactome UniProt CHEK1 O14757 Serine/threonine-protein kinase Chk Reactome UniProt AURKB Q96GD4 Aurora kinase B Reactome UniProt RPA2 P15927 Replication protein A 32 kda subunit Reactome UniProt CDT1 Q9H211 DNA replication factor Cdt Reactome UniProt MCMBP Q9BTE3 MCM complex-binding protein Reactome UniProt TUBG1 P23258 Tubulin gamma-1 chain Reactome UniProt RAN P62826 GTP-binding nuclear protein Ran Reactome UniProt RANGRF Q9HD47 Ran guanine nucleotide factor Mog Reactome UniProt BLM P54132 Bloom syndrome protein Reactome UniProt PCNA P12004 Proliferating Cell Nuclear Antigen Reactome UniProt SETD8 Q9NQR1 Pr-Set Reactome UniProt RCC1 P18754 Regulator of chromosome condensation Reactome UniProt MCM5 P33992 DNA replication licensing factor MCM Reactome UniProt CDC25C P30307 M-phase inducer phosphatase Reactome UniProt PLK1 P53350 Serine/threonine-protein kinase PLK Reactome UniProt MZT1 Q08AG7 Mitotic-spindle organizing protein Reactome UniProt
11 Hyperlinked tables Advantages Easy to create no special data-mining software needed One-click direct access to relevant pages Reference resource However Need to become familiar with the resources available - tailor hyperlinks to be specific for your organism and questions being asked Information on one gene at a time in your gene set Need to get relevant resource ID s
12 I have my gene list, what next? Annotated table: Gene UniProt Name Score PANTHER GO-Slim BP CHEK1 O14757 Serine/threonine-protein kinase Chk apoptotic process;nitrogen compound metabolic process;biosynthetic process;transcription from RNA polymerase II promoter;cellular protein modification process;cell cycle;cell communication;apoptotic process;response to stress;response to abiotic stimulus;regulation of transcription from RNA polymerase II promoter;regulation of cell cycle;chromatin organization MCM5 P33992 DNA replication licensing factor MCM cell cycle;cell communication RCC1 P18754 Regulator of chromosome condensation cellular component movement;mitosis;chromosome segregation;cellular component morphogenesis;intracellular protein transport;cellular component organization CDC25C P30307 M-phase inducer phosphatase DNA replication;cell cycle PLK1 P53350 Serine/threonine-protein kinase PLK DNA replication;dna repair;dna recombination;cell cycle TP53 P04637 Tumor Suppressor p glycogen metabolic process;protein phosphorylation;mitosis;cell communication CDK1 P06493 Cyclin-dependent kinase nitrogen compound metabolic process;biosynthetic process;dna replication;rna metabolic process;cellular process;regulation of biological process;regulation of catalytic activity BLM P54132 Bloom syndrome protein nucleobase-containing compound metabolic process;cell cycle;cell communication;rna localization;intracellular protein transport;nuclear transport RPA2 P15927 Replication protein A 32 kda subunit nucleobase-containing compound metabolic process;mitosis;nucleobase-containing compound transport;regulation of catalytic activity TUBG1 P23258 Tubulin gamma-1 chain phosphate-containing compound metabolic process;cellular protein modification process;cell cycle KPNB1 Q14974 Importin subunit beta phosphate-containing compound metabolic process;protein phosphorylation;cytokinesis;cell cycle;regulation of cell cycle;chromatin organization;cytoskeleton organization MZT1 Q08AG7 Mitotic-spindle organizing protein protein targeting;nuclear transport PCNA P12004 Proliferating Cell Nuclear Antigen RAN P62826 GTP-binding nuclear protein Ran POLE Q07864 DNA Polymerase Epsilon AURKB Q96GD4 Aurora kinase B MCMBP Q9BTE3 MCM complex-binding protein CDT1 Q9H211 DNA replication factor Cdt RANGRF Q9HD47 Ran guanine nucleotide factor Mog SETD8 Q9NQR1 Pr-Set
13 Annotation Sources There are many databases for annotation sources, including (but not limited to): Gene Ontology (GO)(most popular) Pathways and Interactions Protein domains Co-expression Transcription binding sites
14 Annotation Sources There are many databases for annotation sources, including (but not limited to): Gene Ontology (GO)(most popular) Pathways and Interactions Protein domains Co-expression Transcription binding sites
15 What is Gene Ontology (GO)? Collaborative effort addressing need for consistent descriptions of gene products across different databases GO project has three structured ontologies describing gene products independent of species: Biological Processes (BP), Cellular Components (CC) Molecular Functions (MF)
16 GO Structure 3 GO domains: Root ontology terms general Parent Child specific
17 Subsets of GO terms GO slim terms: Cut-down versions of the GO ontologies that contain a subset of terms from the GO resource Give a broad overview of the ontology content without the detail of the specific, fine-grained terms GO fat terms: subset comprising more specific terms
18 Annotation Sources There are many databases for annotation sources, including (but not limited to): Gene Ontology (GO)(most popular) Pathways and Interactions Protein domains Co-expression Transcription binding sites
19 Pathway and Interactions Are specific pathways enriched in my list? What other genes are in this pathway? Which genes/gene products interact with my genes of interest? Databases include:
20 Annotation Sources There are many databases for annotation sources, including (but not limited to): Gene Ontology (GO)(most popular) Pathways and Interactions Protein domains Co-expression Transcription binding sites
21 Protein Domain Can I find shared protein domains? What is the function of shared domain? Which other proteins share this domain? Databases include:
22 Annotation Sources There are many databases for annotation sources, including (but not limited to): Gene Ontology (GO)(most popular) Pathways and Interactions Protein domains Co-expression Transcription binding sites
23 Co-expression Which genes are co-expressed? Automatic grouping of genes (rather than human curation (GO)) Databases include:
24 Annotation Sources There are many databases for annotation sources, including (but not limited to): Gene Ontology (GO)(most popular) Pathways and Interactions Protein domains Co-expression Transcription binding sites
25 Annotated tables Advantages Information on function from larger gene sets Sort groups of genes (GO term, pathway, protein domain) Relatively easy to create Reference resource However Need to become familiar with the resources specific to your research Lots of information can be difficult to sort efficiently
26 What does functional information tell me? Have functional information about a gene set Can verify genes implicated in experiment are functionally relevant, and to discover unexpected shared functions Determine which functions are enriched in gene set How? Compare to a background list of genes
27 What is a background list? In theory, any gene that could have been differentially expressed in your experiment RNA seq all genes apart from those with less than reads Arrays all genes in the array ChipSeq - any gene on the chip. vs
28 Choosing a Background List Which background list to use? Whole set of genes Tissue/cell specific genes Manually made list, derived from your experiment and analysis? vs
29 Statistics to test for enrichment 13,101 genes on chip Gene List Related to disease 260/747 = 34.8% 3005 genes related to disease 3005/13,101= 23.1% Are these proportions the same? Do not related to disease 487/747 = 65.2%
30 Lots of Statistical tests to choose from Hypergeometric test Fisher s exact/chi-squared Binomial Kolmogorov Smirnov Permutation
31 Hypergeometric test Uses hypergeometric distribution to measure the probability of having drawn a specific number of successes (out of a total number of draws) from a population Example: Imagine that there are 4 green and 16 red marbles in a box. You close your eyes and draw 5 marbles without replacement What is the probability that exactly 2 of the 5 are green?
32 13,101 genes on chip Gene List Map to disease 260/747 = 34.8% 3005 genes map to disease 3005/13,101= 23.1% Are these proportions the same? Do not map to disease 487/747 = 65.2% What is the probability (p-value) that exactly 260 genes (out of 747) map to disease, given that there are 3005 of those genes in the background (13,101 genes)?
33 Hypergeometric test Test Input Sample Size Output Specifics Hypergeometric Unranked/Ranked List Large (5% of background) P- value Finite population probability of success changes Limitations: Assumes independence of categories Result terms often include directly related terms Is there really evidence for both terms? Works better with larger samples (5% of background)
34 Test Input Sample Size Output Specifics Hypergeometric Unranked/Ranked List Large (5% of background) P- value Finite population probability of success changes Based on Hypergeometric test: Test Input Sample Size Output Specifics Fisher s Exact Binomial Unranked/Ranked List Unranked/Ranked List Small P- value Large P- value Can be used to compare 2 conditions as well as gene list to background one-tailed or two-tailed Does not assume finite population probability of success remains the same
35 Limitations of Fisher s Exact and Binomial test Neither account for variation in the number of genes annotated to individual terms/functions being tested or the number of terms/functions associated with individual genes Therefore, tend to over-estimate significance if the gene set has an unusually high number of annotations Assume independence of categories
36 Lots of Statistical tests to choose from Hypergeometric Fisher s exact/chi-squared Binomial Kolmogorov Smirnov Permutation Used for ranked gene lists only Output: enrichment scores (ES) for functions, which can then be translated into a p-value
37 Multiple testing correction Error types in statistics: Statistical Decision: Significant True state in Gene List Not Overrepresented Type I error (False Positive) Overrepresented Correct Not Significant Correct Type II error (False Negative) Traditionally, a test or a difference are said to be significant if the probability of type I error is: α =< 0.05
38 Example: You want to compare 3 groups and you carry out 3 hypergeometric tests, each with a 5% level of significance (P<0.05) Probability of not making type I error = 95% = (1 0.05) Overall probability of no type I errors is: 0.95 * 0.95 * 0.95 = Therefore probability of at least one type I error is: = or 14.3% If comparing 5 groups instead of 3, the multiple testing error rate is 40%! (=1-(0.95) n ) Solution for multiple comparisons: Multiple testing correction Probability of error increases from 5% to 14.3%
39 Multiple test corrections Bonferroni Significant level (e.g. 0.05) /number of tests = new threshold This is an over correction if tests are correlated Benjamini-Hochberg Rank the p-values Apply more stringent correction to the most significant, and least stringent to the least significant p-values
40 Statistical issues We want to Identify functions of maximal biological significance BUT this is not perfectly correlated with statistical significance Use p values as a tool to rank functions but don t take them too literally Need to correct for multiple testing
41 Tools for functional gene list analysis There are many different tools available, both free and commercial Popular web-based tools include:
42 PANTHER (Protein ANnotation THrough Evolutionary Relationship) One of the most widely used online resources for gene function classification and genome wide data analysis PANTHER users have successfully analysed data from: Gene expression Proteomics Genome-wide association study (GWAS) experiments PANTHER is part of the GO consortium, thus PANTHER annotation = up to date GO curation
43 PANTHER for functional classification
44
45
46
47
48 Send list to > File Saves table in a tab delimited.txt file
49 PANTHER for statistics
50
51 Annotations from PANTHER include: GO-slim terms PANTHER protein class PANTHER Pathway terms Doesn t cluster together genes with similar GO terms in table Statistics: Binomial test with Bonferroni multiple testing correction
52 Gathers data from many different databases this is customisable Functional Clustering Uses many annotations, including GO-Fat terms more specific set of GO terms Statistics: Fisher s Exact Test and multiple testing correction
53 DAVID for functional classification
54
55
56
57
58
59
60
61
62 Functional Clustering
63 Which DAVID tool should I use?
64 GOrilla
65 Which tool to use? Choose a tool that: Includes your gene / probe identifiers Includes your species Has up to date annotation Lets you define your background (if possible) Try a few different tools Try gene lists of varying length
Identification of rheumatoid arthritis and osteoarthritis patients by transcriptome-based rule set generation
Identification of rheumatoid arthritis and osterthritis patients by transcriptome-based rule set generation Bering Limited Report generated on September 19, 2014 Contents 1 Dataset summary 2 1.1 Project
More informationPREDA S4-classes. Francesco Ferrari October 13, 2015
PREDA S4-classes Francesco Ferrari October 13, 2015 Abstract This document provides a description of custom S4 classes used to manage data structures for PREDA: an R package for Position RElated Data Analysis.
More informationGene Enrichment Analysis
a Analysis of DNA Chips and Gene Networks Spring Semester, 2009 Lecture 14a: January 21, 2010 Lecturer: Ron Shamir Scribe: Roye Rozov Gene Enrichment Analysis 14.1 Introduction This lecture introduces
More informationExercise with Gene Ontology - Cytoscape - BiNGO
Exercise with Gene Ontology - Cytoscape - BiNGO This practical has material extracted from http://www.cbs.dtu.dk/chipcourse/exercises/ex_go/goexercise11.php In this exercise we will analyze microarray
More informationAnalysis of Illumina Gene Expression Microarray Data
Analysis of Illumina Gene Expression Microarray Data Asta Laiho, Msc. Tech. Bioinformatics research engineer The Finnish DNA Microarray Centre Turku Centre for Biotechnology, Finland The Finnish DNA Microarray
More informationCourse on Functional Analysis. ::: Gene Set Enrichment Analysis - GSEA -
Course on Functional Analysis ::: Madrid, June 31st, 2007. Gonzalo Gómez, PhD. ggomez@cnio.es Bioinformatics Unit CNIO ::: Contents. 1. Introduction. 2. GSEA Software 3. Data Formats 4. Using GSEA 5. GSEA
More informationTutorial for proteome data analysis using the Perseus software platform
Tutorial for proteome data analysis using the Perseus software platform Laboratory of Mass Spectrometry, LNBio, CNPEM Tutorial version 1.0, January 2014. Note: This tutorial was written based on the information
More informationMinería de Datos ANALISIS DE UN SET DE DATOS.! Visualization Techniques! Combined Graph! Charts and Pies! Search for specific functions
Minería de Datos ANALISIS DE UN SET DE DATOS! Visualization Techniques! Combined Graph! Charts and Pies! Search for specific functions Data Mining on the DAG ü When working with large datasets, annotation
More informationRETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison
RETRIEVING SEQUENCE INFORMATION Nucleotide sequence databases Database search Sequence alignment and comparison Biological sequence databases Originally just a storage place for sequences. Currently the
More informationA Primer of Genome Science THIRD
A Primer of Genome Science THIRD EDITION GREG GIBSON-SPENCER V. MUSE North Carolina State University Sinauer Associates, Inc. Publishers Sunderland, Massachusetts USA Contents Preface xi 1 Genome Projects:
More informationNOVEL GENOME-SCALE CORRELATION BETWEEN DNA REPLICATION AND RNA TRANSCRIPTION DURING THE CELL CYCLE IN YEAST IS PREDICTED BY DATA-DRIVEN MODELS
NOVEL GENOME-SCALE CORRELATION BETWEEN DNA REPLICATION AND RNA TRANSCRIPTION DURING THE CELL CYCLE IN YEAST IS PREDICTED BY DATA-DRIVEN MODELS Orly Alter (a) *, Gene H. Golub (b), Patrick O. Brown (c)
More informationComparing Methods for Identifying Transcription Factor Target Genes
Comparing Methods for Identifying Transcription Factor Target Genes Alena van Bömmel (R 3.3.73) Matthew Huska (R 3.3.18) Max Planck Institute for Molecular Genetics Folie 1 Transcriptional Regulation TF
More informationSystematic discovery of regulatory motifs in human promoters and 30 UTRs by comparison of several mammals
Systematic discovery of regulatory motifs in human promoters and 30 UTRs by comparison of several mammals Xiaohui Xie 1, Jun Lu 1, E. J. Kulbokas 1, Todd R. Golub 1, Vamsi Mootha 1, Kerstin Lindblad-Toh
More informationAGILENT S BIOINFORMATICS ANALYSIS SOFTWARE
ACCELERATING PROGRESS IS IN OUR GENES AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE GENESPRING GENE EXPRESSION (GX) MASS PROFILER PROFESSIONAL (MPP) PATHWAY ARCHITECT (PA) See Deeper. Reach Further. BIOINFORMATICS
More informationorg.rn.eg.db December 16, 2015 org.rn.egaccnum is an R object that contains mappings between Entrez Gene identifiers and GenBank accession numbers.
org.rn.eg.db December 16, 2015 org.rn.egaccnum Map Entrez Gene identifiers to GenBank Accession Numbers org.rn.egaccnum is an R object that contains mappings between Entrez Gene identifiers and GenBank
More informationGene Expression Analysis
Gene Expression Analysis Jie Peng Department of Statistics University of California, Davis May 2012 RNA expression technologies High-throughput technologies to measure the expression levels of thousands
More informationBioinformatics Grid - Enabled Tools For Biologists.
Bioinformatics Grid - Enabled Tools For Biologists. What is Grid-Enabled Tools (GET)? As number of data from the genomics and proteomics experiment increases. Problems arise for the current sequence analysis
More informationHow many of you have checked out the web site on protein-dna interactions?
How many of you have checked out the web site on protein-dna interactions? Example of an approximately 40,000 probe spotted oligo microarray with enlarged inset to show detail. Find and be ready to discuss
More informationInSyBio BioNets: Utmost efficiency in gene expression data and biological networks analysis
InSyBio BioNets: Utmost efficiency in gene expression data and biological networks analysis WHITE PAPER By InSyBio Ltd Konstantinos Theofilatos Bioinformatician, PhD InSyBio Technical Sales Manager August
More informationDiscovery and Quantification of RNA with RNASeq Roderic Guigó Serra Centre de Regulació Genòmica (CRG) roderic.guigo@crg.cat
Bioinformatique et Séquençage Haut Débit, Discovery and Quantification of RNA with RNASeq Roderic Guigó Serra Centre de Regulació Genòmica (CRG) roderic.guigo@crg.cat 1 RNA Transcription to RNA and subsequent
More informationSequence Formats and Sequence Database Searches. Gloria Rendon SC11 Education June, 2011
Sequence Formats and Sequence Database Searches Gloria Rendon SC11 Education June, 2011 Sequence A is the primary structure of a biological molecule. It is a chain of residues that form a precise linear
More informationHow to create and interpret the predictive analysis of a compound
How to create and interpret the predictive analysis of a compound Platform with suite of tools Predict & understand biological effects of small molecules & compounds Predict targets and metabolites, potential
More informationGenBank, Entrez, & FASTA
GenBank, Entrez, & FASTA Nucleotide Sequence Databases First generation GenBank is a representative example started as sort of a museum to preserve knowledge of a sequence from first discovery great repositories,
More informationWeb-Based Genomic Information Integration with Gene Ontology
Web-Based Genomic Information Integration with Gene Ontology Kai Xu 1 IMAGEN group, National ICT Australia, Sydney, Australia, kai.xu@nicta.com.au Abstract. Despite the dramatic growth of online genomic
More informationNext Generation Sequencing: Technology, Mapping, and Analysis
Next Generation Sequencing: Technology, Mapping, and Analysis Gary Benson Computer Science, Biology, Bioinformatics Boston University gbenson@bu.edu http://tandem.bu.edu/ The Human Genome Project took
More informationAnalysis and Integration of Big Data from Next-Generation Genomics, Epigenomics, and Transcriptomics
Analysis and Integration of Big Data from Next-Generation Genomics, Epigenomics, and Transcriptomics Christopher Benner, PhD Director, Integrative Genomics and Bioinformatics Core (IGC) idash Webinar,
More informationMicroarray Data Analysis. A step by step analysis using BRB-Array Tools
Microarray Data Analysis A step by step analysis using BRB-Array Tools 1 EXAMINATION OF DIFFERENTIAL GENE EXPRESSION (1) Objective: to find genes whose expression is changed before and after chemotherapy.
More informationPANTHER User Manual. For PANTHER 9.0. Date: January 7, 2015. The PANTHER Team. Authors:
PANTHER User Manual For PANTHER 9.0 Date: January 7, 2015 Authors: The PANTHER Team Contents 1 Welcome to PANTHER System 1 1.1 About this document........... 1 1.2 How to cite PANTHER.......... 1 1.3 PANTHER
More informationProteinPilot Report for ProteinPilot Software
ProteinPilot Report for ProteinPilot Software Detailed Analysis of Protein Identification / Quantitation Results Automatically Sean L Seymour, Christie Hunter SCIEX, USA Pow erful mass spectrometers like
More informationBBSRC TECHNOLOGY STRATEGY: TECHNOLOGIES NEEDED BY RESEARCH KNOWLEDGE PROVIDERS
BBSRC TECHNOLOGY STRATEGY: TECHNOLOGIES NEEDED BY RESEARCH KNOWLEDGE PROVIDERS 1. The Technology Strategy sets out six areas where technological developments are required to push the frontiers of knowledge
More informationGenomes and SNPs in Malaria and Sickle Cell Anemia
Genomes and SNPs in Malaria and Sickle Cell Anemia Introduction to Genome Browsing with Ensembl Ensembl The vast amount of information in biological databases today demands a way of organising and accessing
More informationIntegrating DNA Motif Discovery and Genome-Wide Expression Analysis. Erin M. Conlon
Integrating DNA Motif Discovery and Genome-Wide Expression Analysis Department of Mathematics and Statistics University of Massachusetts Amherst Statistics in Functional Genomics Workshop Ascona, Switzerland
More informationNetwork Webinar Series
Undergraduate Educator Network Series Sponsored by Undergraduate Education Subcommittee SOT Education Committee June 4, 2015 12:00 Noon ET (c) SOT2015 Welcome Kristine Willett, PhD Co-Chair, C Undergraduate
More informationPathway Analysis : An Introduction
Pathway Analysis : An Introduction Experiments Literature and other KB Data Knowledge Structure in Data through statistics Structure in Knowledge through GO and other Ontologies Gain insight into Data
More informationProteinQuest user guide
ProteinQuest user guide 1. Introduction... 3 1.1 With ProteinQuest you can... 3 1.2 ProteinQuest basic version 4 1.3 ProteinQuest extended version... 5 2. ProteinQuest dictionaries... 6 3. Directions for
More informationFrom Data to Foresight:
Laura Haas, IBM Fellow IBM Research - Almaden From Data to Foresight: Leveraging Data and Analytics for Materials Research 1 2011 IBM Corporation The road from data to foresight is long? Consumer Reports
More informationActivity 7.21 Transcription factors
Purpose To consolidate understanding of protein synthesis. To explain the role of transcription factors and hormones in switching genes on and off. Play the transcription initiation complex game Regulation
More informationFrequently Asked Questions Next Generation Sequencing
Frequently Asked Questions Next Generation Sequencing Import These Frequently Asked Questions for Next Generation Sequencing are some of the more common questions our customers ask. Questions are divided
More informationProtein Protein Interaction Networks
Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics
More informationControl of Gene Expression
Control of Gene Expression What is Gene Expression? Gene expression is the process by which informa9on from a gene is used in the synthesis of a func9onal gene product. What is Gene Expression? Figure
More informationIntroduction to Genome Annotation
Introduction to Genome Annotation AGCGTGGTAGCGCGAGTTTGCGAGCTAGCTAGGCTCCGGATGCGA CCAGCTTTGATAGATGAATATAGTGTGCGCGACTAGCTGTGTGTT GAATATATAGTGTGTCTCTCGATATGTAGTCTGGATCTAGTGTTG GTGTAGATGGAGATCGCGTAGCGTGGTAGCGCGAGTTTGCGAGCT
More informationThe EcoCyc Curation Process
The EcoCyc Curation Process Ingrid M. Keseler SRI International 1 HOW OFTEN IS THE GOLDEN GATE BRIDGE PAINTED? Many misconceptions exist about how often the Bridge is painted. Some say once every seven
More informationUnderstanding West Nile Virus Infection
Understanding West Nile Virus Infection The QIAGEN Bioinformatics Solution: Biomedical Genomics Workbench (BXWB) + Ingenuity Pathway Analysis (IPA) Functional Genomics & Predictive Medicine, May 21-22,
More informationWhat s New in Pathway Studio Web 11.1
1 1 What s New in Pathway Studio Web 11.1 Elseiver is pleased to announce the release of Pathway Studio Web 11.1 for all database subscriptions (Mammal, Mammal+ChemEffect+DiseaseFx, Plant). This release
More informationMORPHEUS. http://biodev.cea.fr/morpheus/ Prediction of Transcription Factors Binding Sites based on Position Weight Matrix.
MORPHEUS http://biodev.cea.fr/morpheus/ Prediction of Transcription Factors Binding Sites based on Position Weight Matrix. Reference: MORPHEUS, a Webtool for Transcripton Factor Binding Analysis Using
More informationMolecular Genetics: Challenges for Statistical Practice. J.K. Lindsey
Molecular Genetics: Challenges for Statistical Practice J.K. Lindsey 1. What is a Microarray? 2. Design Questions 3. Modelling Questions 4. Longitudinal Data 5. Conclusions 1. What is a microarray? A microarray
More informationJust the Facts: A Basic Introduction to the Science Underlying NCBI Resources
1 of 8 11/7/2004 11:00 AM National Center for Biotechnology Information About NCBI NCBI at a Glance A Science Primer Human Genome Resources Model Organisms Guide Outreach and Education Databases and Tools
More informationAnalysis of gene expression data. Ulf Leser and Philippe Thomas
Analysis of gene expression data Ulf Leser and Philippe Thomas This Lecture Protein synthesis Microarray Idea Technologies Applications Problems Quality control Normalization Analysis next week! Ulf Leser:
More informationBIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology http://tinyurl.com/bioinf525-w16
Course Director: Dr. Barry Grant (DCM&B, bjgrant@med.umich.edu) Description: This is a three module course covering (1) Foundations of Bioinformatics, (2) Statistics in Bioinformatics, and (3) Systems
More informationSAP HANA Enabling Genome Analysis
SAP HANA Enabling Genome Analysis Joanna L. Kelley, PhD Postdoctoral Scholar, Stanford University Enakshi Singh, MSc HANA Product Management, SAP Labs LLC Outline Use cases Genomics review Challenges in
More informationModule 1. Sequence Formats and Retrieval. Charles Steward
The Open Door Workshop Module 1 Sequence Formats and Retrieval Charles Steward 1 Aims Acquaint you with different file formats and associated annotations. Introduce different nucleotide and protein databases.
More informationSpecial report. Chronic Lymphocytic Leukemia (CLL) Genomic Biology 3020 April 20, 2006
Special report Chronic Lymphocytic Leukemia (CLL) Genomic Biology 3020 April 20, 2006 Gene And Protein The gene that causes the mutation is CCND1 and the protein NP_444284 The mutation deals with the cell
More informationValidation and Replication
Validation and Replication Overview Definitions of validation and replication Difficulties and limitations Working examples from our group and others Why? False positive results still occur. even after
More informationCore Facility Genomics
Core Facility Genomics versatile genome or transcriptome analyses based on quantifiable highthroughput data ascertainment 1 Topics Collaboration with Harald Binder and Clemens Kreutz Project: Microarray
More informationNebula A web-server for advanced ChIP-seq data analysis. Tutorial. by Valentina BOEVA
Nebula A web-server for advanced ChIP-seq data analysis Tutorial by Valentina BOEVA Content Upload data to the history pp. 5-6 Check read number and sequencing quality pp. 7-9 Visualize.BAM files in UCSC
More informationLecture 11 Data storage and LIMS solutions. Stéphane LE CROM lecrom@biologie.ens.fr
Lecture 11 Data storage and LIMS solutions Stéphane LE CROM lecrom@biologie.ens.fr Various steps of a DNA microarray experiment Experimental steps Data analysis Experimental design set up Chips on catalog
More informationShouguo Gao Ph. D Department of Physics and Comprehensive Diabetes Center
Computational Challenges in Storage, Analysis and Interpretation of Next-Generation Sequencing Data Shouguo Gao Ph. D Department of Physics and Comprehensive Diabetes Center Next Generation Sequencing
More informationControl of Gene Expression
Control of Gene Expression (Learning Objectives) Explain the role of gene expression is differentiation of function of cells which leads to the emergence of different tissues, organs, and organ systems
More informationHow Sequencing Experiments Fail
How Sequencing Experiments Fail v1.0 Simon Andrews simon.andrews@babraham.ac.uk Classes of Failure Technical Tracking Library Contamination Biological Interpretation Something went wrong with a machine
More informationSupplementary Figure 1: Quality Assessment of Mouse Arrays. Supplementary Figure 2: Quality Assessment of Rat Arrays
Supplementary Figure 1: Quality Assessment of Mouse Arrays The mouse microarray data were subjected to an extensive quality-control procedure prior to conducting downstream analyses. We assessed the spread
More informationTutorial for Proteomics Data Submission. Katalin F. Medzihradszky Robert J. Chalkley UCSF
Tutorial for Proteomics Data Submission Katalin F. Medzihradszky Robert J. Chalkley UCSF Why Have Guidelines? Large-scale proteomics studies create huge amounts of data. It is impossible/impractical to
More informationSchool of Nursing. Presented by Yvette Conley, PhD
Presented by Yvette Conley, PhD What we will cover during this webcast: Briefly discuss the approaches introduced in the paper: Genome Sequencing Genome Wide Association Studies Epigenomics Gene Expression
More informationIntroduction to SAGEnhaft
Introduction to SAGEnhaft Tim Beissbarth October 13, 2015 1 Overview Serial Analysis of Gene Expression (SAGE) is a gene expression profiling technique that estimates the abundance of thousands of gene
More informationComparative genomic hybridization Because arrays are more than just a tool for expression analysis
Microarray Data Analysis Workshop MedVetNet Workshop, DTU 2008 Comparative genomic hybridization Because arrays are more than just a tool for expression analysis Carsten Friis ( with several slides from
More informationData Integration. Lectures 16 & 17. ECS289A, WQ03, Filkov
Data Integration Lectures 16 & 17 Lectures Outline Goals for Data Integration Homogeneous data integration time series data (Filkov et al. 2002) Heterogeneous data integration microarray + sequence microarray
More informationUniversity Uses Business Intelligence Software to Boost Gene Research
Microsoft SQL Server 2008 R2 Customer Solution Case Study University Uses Business Intelligence Software to Boost Gene Research Overview Country or Region: Scotland Industry: Education Customer Profile
More informationAppendix 2 Molecular Biology Core Curriculum. Websites and Other Resources
Appendix 2 Molecular Biology Core Curriculum Websites and Other Resources Chapter 1 - The Molecular Basis of Cancer 1. Inside Cancer http://www.insidecancer.org/ From the Dolan DNA Learning Center Cold
More informationFACULTY OF MEDICAL SCIENCE
Doctor of Philosophy in Biochemistry FACULTY OF MEDICAL SCIENCE Naresuan University 73 Doctor of Philosophy in Biochemistry The Biochemistry Department at Naresuan University is a leader in lower northern
More informationGene expression analysis. Ulf Leser and Karin Zimmermann
Gene expression analysis Ulf Leser and Karin Zimmermann Ulf Leser: Bioinformatics, Wintersemester 2010/2011 1 Last lecture What are microarrays? - Biomolecular devices measuring the transcriptome of a
More informationNext Generation Sequencing: Adjusting to Big Data. Daniel Nicorici, Dr.Tech. Statistikot Suomen Lääketeollisuudessa 29.10.2013
Next Generation Sequencing: Adjusting to Big Data Daniel Nicorici, Dr.Tech. Statistikot Suomen Lääketeollisuudessa 29.10.2013 Outline Human Genome Project Next-Generation Sequencing Personalized Medicine
More informationt Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon
t-tests in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com www.excelmasterseries.com
More informationSickle cell anemia: Altered beta chain Single AA change (#6 Glu to Val) Consequence: Protein polymerizes Change in RBC shape ---> phenotypes
Protein Structure Polypeptide: Protein: Therefore: Example: Single chain of amino acids 1 or more polypeptide chains All polypeptides are proteins Some proteins contain >1 polypeptide Hemoglobin (O 2 binding
More informationStatistical issues in the analysis of microarray data
Statistical issues in the analysis of microarray data Daniel Gerhard Institute of Biostatistics Leibniz University of Hannover ESNATS Summerschool, Zermatt D. Gerhard (LUH) Analysis of microarray data
More informationGeneProf and the new GeneProf Web Services
GeneProf and the new GeneProf Web Services Florian Halbritter florian.halbritter@ed.ac.uk Stem Cell Bioinformatics Group (Simon R. Tomlinson) simon.tomlinson@ed.ac.uk December 10, 2012 Florian Halbritter
More informationName Date Period. 2. When a molecule of double-stranded DNA undergoes replication, it results in
DNA, RNA, Protein Synthesis Keystone 1. During the process shown above, the two strands of one DNA molecule are unwound. Then, DNA polymerases add complementary nucleotides to each strand which results
More informationTranslation Study Guide
Translation Study Guide This study guide is a written version of the material you have seen presented in the replication unit. In translation, the cell uses the genetic information contained in mrna to
More informationBasic Analysis of Microarray Data
Basic Analysis of Microarray Data A User Guide and Tutorial Scott A. Ness, Ph.D. Co-Director, Keck-UNM Genomics Resource and Dept. of Molecular Genetics and Microbiology University of New Mexico HSC Tel.
More informationScottish Qualifications Authority
National Unit specification: general information Unit code: FH2G 12 Superclass: RH Publication date: March 2011 Source: Scottish Qualifications Authority Version: 01 Summary This Unit is a mandatory Unit
More informationNetwork Analysis. BCH 5101: Analysis of -Omics Data 1/34
Network Analysis BCH 5101: Analysis of -Omics Data 1/34 Network Analysis Graphs as a representation of networks Examples of genome-scale graphs Statistical properties of genome-scale graphs The search
More informationQuantitative proteomics background
Proteomics data analysis seminar Quantitative proteomics and transcriptomics of anaerobic and aerobic yeast cultures reveals post transcriptional regulation of key cellular processes de Groot, M., Daran
More informationFrequently Asked Questions (FAQ)
Frequently Asked Questions (FAQ) Why screen your (therapeutic) antibody for cross-reactivity? Cross-reactivity of therapeutic antibodies leads to adverse effects and might render the antibody unsuitable
More informationDr Alexander Henzing
Horizon 2020 Health, Demographic Change & Wellbeing EU funding, research and collaboration opportunities for 2016/17 Innovate UK funding opportunities in omics, bridging health and life sciences Dr Alexander
More informationPairwise Sequence Alignment
Pairwise Sequence Alignment carolin.kosiol@vetmeduni.ac.at SS 2013 Outline Pairwise sequence alignment global - Needleman Wunsch Gotoh algorithm local - Smith Waterman algorithm BLAST - heuristics What
More informationGuide for Bioinformatics Project Module 3
Structure- Based Evidence and Multiple Sequence Alignment In this module we will revisit some topics we started to look at while performing our BLAST search and looking at the CDD database in the first
More informationModule 3 Questions. 7. Chemotaxis is an example of signal transduction. Explain, with the use of diagrams.
Module 3 Questions Section 1. Essay and Short Answers. Use diagrams wherever possible 1. With the use of a diagram, provide an overview of the general regulation strategies available to a bacterial cell.
More informationOverview of Eukaryotic Gene Prediction
Overview of Eukaryotic Gene Prediction CBB 231 / COMPSCI 261 W.H. Majoros What is DNA? Nucleus Chromosome Telomere Centromere Cell Telomere base pairs histones DNA (double helix) DNA is a Double Helix
More informationMicroarray Technology
Microarrays And Functional Genomics CPSC265 Matt Hudson Microarray Technology Relatively young technology Usually used like a Northern blot can determine the amount of mrna for a particular gene Except
More informationLecture Outline. Introduction to Databases. Introduction. Data Formats Sample databases How to text search databases. Shifra Ben-Dor Irit Orr
Introduction to Databases Shifra Ben-Dor Irit Orr Lecture Outline Introduction Data and Database types Database components Data Formats Sample databases How to text search databases What units of information
More informationDNA Replication & Protein Synthesis. This isn t a baaaaaaaddd chapter!!!
DNA Replication & Protein Synthesis This isn t a baaaaaaaddd chapter!!! The Discovery of DNA s Structure Watson and Crick s discovery of DNA s structure was based on almost fifty years of research by other
More informationMeDIP-chip service report
MeDIP-chip service report Wednesday, 20 August, 2008 Sample source: Cells from University of *** Customer: ****** Organization: University of *** Contents of this service report General information and
More informationAnalyzing microrna Data and Integrating mirna with Gene Expression Data in Partek Genomics Suite 6.6
Analyzing microrna Data and Integrating mirna with Gene Expression Data in Partek Genomics Suite 6.6 Overview This tutorial outlines how microrna data can be analyzed within Partek Genomics Suite. Additionally,
More informationBiological Sequence Data Formats
Biological Sequence Data Formats Here we present three standard formats in which biological sequence data (DNA, RNA and protein) can be stored and presented. Raw Sequence: Data without description. FASTA
More informationReplication Study Guide
Replication Study Guide This study guide is a written version of the material you have seen presented in the replication unit. Self-reproduction is a function of life that human-engineered systems have
More informationAnnex to the Accreditation Certificate D-PL-13372-01-00 according to DIN EN ISO/IEC 17025:2005
Deutsche Akkreditierungsstelle GmbH German Accreditation Body Annex to the Accreditation Certificate D-PL-13372-01-00 according to DIN EN ISO/IEC 17025:2005 Period of validity: 26.03.2012 to 25.03.2017
More informationControl of Gene Expression
Home Gene Regulation Is Necessary? Control of Gene Expression By switching genes off when they are not needed, cells can prevent resources from being wasted. There should be natural selection favoring
More informationBioinformatics Resources at a Glance
Bioinformatics Resources at a Glance A Note about FASTA Format There are MANY free bioinformatics tools available online. Bioinformaticists have developed a standard format for nucleotide and protein sequences
More informationFocusing on results not data comprehensive data analysis for targeted next generation sequencing
Focusing on results not data comprehensive data analysis for targeted next generation sequencing Daniel Swan, Jolyon Holdstock, Angela Matchan, Richard Stark, John Shovelton, Duarte Mohla and Simon Hughes
More informationLecture 8. Protein Trafficking/Targeting. Protein targeting is necessary for proteins that are destined to work outside the cytoplasm.
Protein Trafficking/Targeting (8.1) Lecture 8 Protein Trafficking/Targeting Protein targeting is necessary for proteins that are destined to work outside the cytoplasm. Protein targeting is more complex
More informationBiological Sciences Initiative. Human Genome
Biological Sciences Initiative HHMI Human Genome Introduction In 2000, researchers from around the world published a draft sequence of the entire genome. 20 labs from 6 countries worked on the sequence.
More informationCCR Biology - Chapter 9 Practice Test - Summer 2012
Name: Class: Date: CCR Biology - Chapter 9 Practice Test - Summer 2012 Multiple Choice Identify the choice that best completes the statement or answers the question. 1. Genetic engineering is possible
More information