Processing Genome Data using Scalable Database Technology. My Background
|
|
|
- Adrian Clark
- 10 years ago
- Views:
Transcription
1 Johann Christoph Freytag, Ph.D. Stanford University, February 2004 Harvard Univ. Visiting Scientist, Microsoft Res. (2002) My Background Professor of ERCR (European Computer Industry Research Centre), München (87-89) DEC s Database Technology Center, München (90-93) Starburst project, IBM Almaden Research Center (85-87) Visiting Scientist, Almaden Research Center (97/98) Visiting Scientist, IBM SVL (2001) nur zum nicht-kommerziellen Gebrauch 1
2 What s the Meaning of Life DNA RNA Protein Genomic Transmitter Transcription? Messenger Translation? Gene product Replication!? Overview (Biological) Motivation/Problems Using Database Technology Gene-EYe Integration-Platform Data Cleansing BLAST-Integration into GDB In-and-Out-the-Database: Using Workflow for dry Experiments Summary nur zum nicht-kommerziellen Gebrauch 2
3 View of Biological Areas Environment Diseases Experiments Pathways Life Evolution DNA Genome RNA Transcriptome Amino Acids Proteome Biological Motivation View of Data Data Source Environment Diseases Experiments OMIM Express Brenda Pathways Life Evolution Gene Ontology Taxonomy KEGG EMBL DNA RefSeq Genome LocusLink RNA EMBL Transcriptome (EST) Amino Acids SWISS-PROT Proteome Interpro Biological Motivation nur zum nicht-kommerziellen Gebrauch 3
4 Complex Relationships A graph depicting the relationships between 400+ biological data sources served by the EBI via SRS Database Growth of EMBL (# of records) More than 400 Data Sources on the WEB Source: DBIS (our) Approach SwissProt EMBL Database Model of the Biological World ESEMBLE... KABAT nur zum nicht-kommerziellen Gebrauch 4
5 (Biological) Motivation/Problems Using Scalable Database Technology Gene-EYe Integration-Platform Data Cleansing BLAST-Integration into GDB In-and-Out-the-Database: Using Workflow Concepts for dry Experiments Summary Overview Gene-EYe Integration-Platform Vision Provide mechanisms for unified handling of different data sources data source integration change management user defined data preparation Provide relevant tools for sequence manipulation and retrieval work flow support for operation and administration Gen-Eye Vision nur zum nicht-kommerziellen Gebrauch 5
6 Gene-EYe Integration-Platform The Big Picture Genome Data Warehouse Layer (GDW Schema) KNOWLEDGE Biological Entities -> Biological Concepts (e.g. Life Cycle) Genome DataBase Layer (GDB Schema) CONTENT Relational Entities -> Biological Entities (e.g. Gene) Genome Data Store Layer (GDS Schema) DATA Flat File Data -> Relational Entities (e.g. EMBL) Design GDS: From Flat File to Database Genome Data Store Layer (GDS Schema) Data Storage Data Cleansing Update/Admin GDS Load Tools GDS Admin Tools ENSEMBL DDL InterPro DDL TAXO DDL SWALL DDL EMBL DDL ENSEMBL scanner InterPro scanner TAXO scanner SWALL scanner EMBL scanner nur zum nicht-kommerziellen Gebrauch 6
7 The Data Import Pipeline - Revisited Data File Scanner Load Files Loader Summary Instance Spec. CLOB Files Load Spec. Gene-EYe GDS Format Spec. Content Spec. DDL-Gen. DDL Script Controller Phase 1: Property Files Perl scripts Hand crafted Phase 2: GEM 1 Repository de.hui.dbis.geneeye.* (Java) Autogenerate from Metadata 1: CWM compliant GeneEYe Metadata Repository Modeling the Maintenance Process nur zum nicht-kommerziellen Gebrauch 7
8 GDB-Layer: From Data to Biology Genome Database Layer (GDB Schema) Data Integration Data Cleansing (Sem.) Queries Data GDB Builder (IBM Clio?) Schema Gene Protein Transcript Tissue Variant [Data] EMBL SWALL TAXO InterPro ENSEMBL GDB Mapper (IBM Clio) [Definition] Defined by and in cooperation w/ domain experts Genome Data Store Layer (GDS Schema) Data Storage Data Cleansing (Syn.) Update/Admin Schema Mapping with Clio with permission of Dr. Felix Naumann IBM Almaden Research Center Clio Source Schema User mapping Clio Target Schema DB SQL or XQuery DB nur zum nicht-kommerziellen Gebrauch 8
9 Clio Features with permission of Dr. Felix Naumann IBM Almaden Research Center Schema Viewer Visual mapping between schema elements Attribute Matcher Intelligent suggestions of likely mappings Data Viewer Data examples for mapping queries Queries SQL, XSLT, Xquery Use and adhere to source and target schema constraints GDW: Providing Facts for Research Genome Data Warehouse Layer (GDW Schema) Data Mining Ontology Mapping Process Simulation Ontology GDW Miner GDB Explorer Variant Tissue Transcript Protein Gene Variant Tissue Transcript Protein Gene Genome Database Layer (GDB Schema) Data Integration Data Cleansing (Sem.) Queries nur zum nicht-kommerziellen Gebrauch 9
10 (Biological) Motivation/Problems Using Scalable Database Technology Gene-EYe Integration-Platform Data Cleansing BLAST-Integration into GDB In-and-Out-the-Database: Using Workflow Concepts for dry Experiments Summary Overview Errors in Genome Data DNA Sequence Determination Classes of errors in genome data production Genome Experimental errors DNA Feature Gene Annotation Analysis errors mrna Transformation errors Propagated errors Stale data Protein Sequence Determination [Müller, Naumann, Freytag, ICIQ, 2003] only 1.3 % difference Protein Function SUBUNIT Annotation The main difference are transcript DISEASE copy numbers in the brain DNA, RNA Sequence agagattagcgcgctagatcgatatgataga 0,23% gctatatcatccgagatagcagatagctcta gcacactattacacgagcagcgaccttatat 2,58% Structure Annotation Protein Sequence MDDREDLVYQAKLAEQAERYDEMVESMKKVD AGMDVELTVEERNLLSVAYKNVIGARRASWY RIISSIEQKEENKGGEDKLKMIREYRQMVER FUNCTION Function Annotation 5% - 30% MAY ACT AS INTRACELLULAR SIGNALING COMPONENT... BINDS DIRECTLY TO 5% ZO-1-40% INVOLVED IN ACUTE LEUKEMIAS nur zum nicht-kommerziellen Gebrauch 10
11 Reliability-based Merging (cont.) Domain expert identifies reliable parts for merging Definition of a set of views for integration Current work: r 1 ID A 1 A 2 A 3 1 A A B r 1 4 B mismatch patterns? 5 C How to Merge assess their r 2 ID A 1 A 2 A 3 1 B B B B D Which are the relevant relevance & importance? ID A 1 A 2 A 3 1 A A B B C e.g. MIN() (Biological) Motivation/Problems Using Scalable Database Technology Gene-EYe Integration-Platform BLAST-Integration into GDB In-and-Out-the-Database: Using Workflow Concepts for dry Experiments Summary Overview nur zum nicht-kommerziellen Gebrauch 11
12 BLAST: General Introduction DC-File Algorithm/Package: Similarity Search Devloped by Altschul et al. (1990) Three Steps: FormatDB Preprocessing 1. Search for Word Pairs (Iseq, DSeq) of Length L on the Data Collection of Sequences above Threshhold T 2. Expansion of each Word Pair until the Value V of their Alignment is away from the local maximum 3. Output of complete alignment (Highscoring Segment Pair, HSP), if Value(Alignment) > S Index Sequences BLAST Report Features Query sequence BLAST Call Output: Powerset of Alignments BLAST UDF Implementation Goal: Using BLAST in SQL-statements How? BLAST-UDF implemented as Table Function Use in SQL Query SELECT * FROM TABLE( BLAST(<Parameter>, <Query Sequence>, <Comparison Sequence> )) Each call returns a set of alignments over Sequences in the Database nur zum nicht-kommerziellen Gebrauch 12
13 Structure of UDB Table Function Implementation: Mapping of program into calling structure for table functions Communiaction between the different calls via scratchpad scratchpad: Storage area which remains intact and unchanged between UDF calls Storage of data structures for different steps especially for output from postprocessing: SeqAlign For all Initialize SEQUENCES Alignment without gaps Postprocessing For all UDF BLAST ALIGNMENTS Output of the results Release data structures related with sequences Release global data structures FIRST OPEN FETCH CLOSE FINAL (Biological) Motivation/Problems Using Scalable Database Technology Gene-EYe Integration-Platform BLAST-Integration into GDB In-and-Out-the-Database: Using Workflow Concepts for dry Experiments Summary Overview nur zum nicht-kommerziellen Gebrauch 13
14 The Challenge: Exon Skipping Gene Protein One Gen with 100 Exons ~ Variations n Exons within one Gene linearly combined (splicing) Used as Pattern for Protein Generation Challenge: Exon Skipping Do alternative fusion points new funktional (i.e. biologigacl meaningful) patterns? nur zum nicht-kommerziellen Gebrauch 14
15 Functional Genomics: Gain of New Insight First Horizon: Simple Exon Skipping New Functionality! Flow of Processing Steps Generate Exon Sequence Local Database (automatic) Remote Tool (Web Based) Find Similarity Search Supported by local DB Check for Biological Validity nur zum nicht-kommerziellen Gebrauch 15
16 Implementation Some facts 60 days 100% load One splice form per minute So far: ca splice forms First biolog. meaningful results Cooperations Cooperation with Univ. of Jena (Rolf Backofen) Berlin Center of Bioinformatics (BCB) Charite, FU, Max-Planck-Institut (M. Vingron) Industry: IBM, small companies, Patrick Chappatte, Switzerland nur zum nicht-kommerziellen Gebrauch 16
17 Database Environment IBM p-server (sponsored by IBM) CPU CPU CPU CPU 2.3 TByte CPU CPU CPU CPU DB2 Summary Lesson learnt Highly Dynamic Environment Data: changes frequently User: changes frequently Provide a framework for Date integration Data processing Data changes Data dependencies.. Meta data management Future Work Query processing Include domain knowledge Data cleansing Set of UDFs for biological data processing Visualization of Data Summary nur zum nicht-kommerziellen Gebrauch 17
RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison
RETRIEVING SEQUENCE INFORMATION Nucleotide sequence databases Database search Sequence alignment and comparison Biological sequence databases Originally just a storage place for sequences. Currently the
Module 1. Sequence Formats and Retrieval. Charles Steward
The Open Door Workshop Module 1 Sequence Formats and Retrieval Charles Steward 1 Aims Acquaint you with different file formats and associated annotations. Introduce different nucleotide and protein databases.
Lecture 11 Data storage and LIMS solutions. Stéphane LE CROM [email protected]
Lecture 11 Data storage and LIMS solutions Stéphane LE CROM [email protected] Various steps of a DNA microarray experiment Experimental steps Data analysis Experimental design set up Chips on catalog
GenBank, Entrez, & FASTA
GenBank, Entrez, & FASTA Nucleotide Sequence Databases First generation GenBank is a representative example started as sort of a museum to preserve knowledge of a sequence from first discovery great repositories,
AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE
ACCELERATING PROGRESS IS IN OUR GENES AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE GENESPRING GENE EXPRESSION (GX) MASS PROFILER PROFESSIONAL (MPP) PATHWAY ARCHITECT (PA) See Deeper. Reach Further. BIOINFORMATICS
New solutions for Big Data Analysis and Visualization
New solutions for Big Data Analysis and Visualization From HPC to cloud-based solutions Barcelona, February 2013 Nacho Medina [email protected] http://bioinfo.cipf.es/imedina Head of the Computational Biology
Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources
1 of 8 11/7/2004 11:00 AM National Center for Biotechnology Information About NCBI NCBI at a Glance A Science Primer Human Genome Resources Model Organisms Guide Outreach and Education Databases and Tools
Gene Models & Bed format: What they represent.
GeneModels&Bedformat:Whattheyrepresent. Gene models are hypotheses about the structure of transcripts produced by a gene. Like all models, they may be correct, partly correct, or entirely wrong. Typically,
Bioinformatics Resources at a Glance
Bioinformatics Resources at a Glance A Note about FASTA Format There are MANY free bioinformatics tools available online. Bioinformaticists have developed a standard format for nucleotide and protein sequences
A Multiple DNA Sequence Translation Tool Incorporating Web Robot and Intelligent Recommendation Techniques
Proceedings of the 2007 WSEAS International Conference on Computer Engineering and Applications, Gold Coast, Australia, January 17-19, 2007 402 A Multiple DNA Sequence Translation Tool Incorporating Web
Bioinformatics Grid - Enabled Tools For Biologists.
Bioinformatics Grid - Enabled Tools For Biologists. What is Grid-Enabled Tools (GET)? As number of data from the genomics and proteomics experiment increases. Problems arise for the current sequence analysis
Three data delivery cases for EMBL- EBI s Embassy. Guy Cochrane www.ebi.ac.uk
Three data delivery cases for EMBL- EBI s Embassy Guy Cochrane www.ebi.ac.uk EMBL European Bioinformatics Institute Genes, genomes & variation European Nucleotide Archive 1000 Genomes Ensembl Ensembl Genomes
BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS
BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS NEW YORK CITY COLLEGE OF TECHNOLOGY The City University Of New York School of Arts and Sciences Biological Sciences Department Course title:
Frequently Asked Questions Next Generation Sequencing
Frequently Asked Questions Next Generation Sequencing Import These Frequently Asked Questions for Next Generation Sequencing are some of the more common questions our customers ask. Questions are divided
Linear Sequence Analysis. 3-D Structure Analysis
Linear Sequence Analysis What can you learn from a (single) protein sequence? Calculate it s physical properties Molecular weight (MW), isoelectric point (pi), amino acid content, hydropathy (hydrophilic
SGI. High Throughput Computing (HTC) Wrapper Program for Bioinformatics on SGI ICE and SGI UV Systems. January, 2012. Abstract. Haruna Cofer*, PhD
White Paper SGI High Throughput Computing (HTC) Wrapper Program for Bioinformatics on SGI ICE and SGI UV Systems Haruna Cofer*, PhD January, 2012 Abstract The SGI High Throughput Computing (HTC) Wrapper
Genome Viewing. Module 2. Using Genome Browsers to View Annotation of the Human Genome
Module 2 Genome Viewing Using Genome Browsers to View Annotation of the Human Genome Bert Overduin, Ph.D. PANDA Coordination & Outreach EMBL - European Bioinformatics Institute Wellcome Trust Genome Campus
Software and Methods for the Analysis of Affymetrix GeneChip Data. Rafael A Irizarry Department of Biostatistics Johns Hopkins University
Software and Methods for the Analysis of Affymetrix GeneChip Data Rafael A Irizarry Department of Biostatistics Johns Hopkins University Outline Overview Bioconductor Project Examples 1: Gene Annotation
When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want
1 When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want to search other databases as well. There are very
Vad är bioinformatik och varför behöver vi det i vården? a bioinformatician's perspectives
Vad är bioinformatik och varför behöver vi det i vården? a bioinformatician's perspectives [email protected] 2015-05-21 Functional Bioinformatics, Örebro University Vad är bioinformatik och varför
Preparing the scenario for the use of patient s genome sequences in clinic. Joaquín Dopazo
Preparing the scenario for the use of patient s genome sequences in clinic Joaquín Dopazo Computational Medicine Institute, Centro de Investigación Príncipe Felipe (CIPF), Functional Genomics Node, (INB),
SAP HANA Enabling Genome Analysis
SAP HANA Enabling Genome Analysis Joanna L. Kelley, PhD Postdoctoral Scholar, Stanford University Enakshi Singh, MSc HANA Product Management, SAP Labs LLC Outline Use cases Genomics review Challenges in
Web-Based Genomic Information Integration with Gene Ontology
Web-Based Genomic Information Integration with Gene Ontology Kai Xu 1 IMAGEN group, National ICT Australia, Sydney, Australia, [email protected] Abstract. Despite the dramatic growth of online genomic
Teaching Bioinformatics to Undergraduates
Teaching Bioinformatics to Undergraduates http://www.med.nyu.edu/rcr/asm Stuart M. Brown Research Computing, NYU School of Medicine I. What is Bioinformatics? II. Challenges of teaching bioinformatics
BIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology http://tinyurl.com/bioinf525-w16
Course Director: Dr. Barry Grant (DCM&B, [email protected]) Description: This is a three module course covering (1) Foundations of Bioinformatics, (2) Statistics in Bioinformatics, and (3) Systems
Focusing on results not data comprehensive data analysis for targeted next generation sequencing
Focusing on results not data comprehensive data analysis for targeted next generation sequencing Daniel Swan, Jolyon Holdstock, Angela Matchan, Richard Stark, John Shovelton, Duarte Mohla and Simon Hughes
Distributed Data Mining in Discovery Net. Dr. Moustafa Ghanem Department of Computing Imperial College London
Distributed Data Mining in Discovery Net Dr. Moustafa Ghanem Department of Computing Imperial College London 1. What is Discovery Net 2. Distributed Data Mining for Compute Intensive Tasks 3. Distributed
How To Use The Assembly Database In A Microarray (Perl) With A Microarcode) (Perperl 2) (For Macrogenome) (Genome 2)
The Ensembl Core databases and API Useful links Installation instructions: http://www.ensembl.org/info/docs/api/api_installation.html Schema description: http://www.ensembl.org/info/docs/api/core/core_schema.html
DNA and the Cell. Version 2.3. English version. ELLS European Learning Laboratory for the Life Sciences
DNA and the Cell Anastasios Koutsos Alexandra Manaia Julia Willingale-Theune Version 2.3 English version ELLS European Learning Laboratory for the Life Sciences Anastasios Koutsos, Alexandra Manaia and
BBSRC TECHNOLOGY STRATEGY: TECHNOLOGIES NEEDED BY RESEARCH KNOWLEDGE PROVIDERS
BBSRC TECHNOLOGY STRATEGY: TECHNOLOGIES NEEDED BY RESEARCH KNOWLEDGE PROVIDERS 1. The Technology Strategy sets out six areas where technological developments are required to push the frontiers of knowledge
Protein Protein Interaction Networks
Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics
Sequence Formats and Sequence Database Searches. Gloria Rendon SC11 Education June, 2011
Sequence Formats and Sequence Database Searches Gloria Rendon SC11 Education June, 2011 Sequence A is the primary structure of a biological molecule. It is a chain of residues that form a precise linear
Oracle Warehouse Builder 10g
Oracle Warehouse Builder 10g Architectural White paper February 2004 Table of contents INTRODUCTION... 3 OVERVIEW... 4 THE DESIGN COMPONENT... 4 THE RUNTIME COMPONENT... 5 THE DESIGN ARCHITECTURE... 6
The Future of the Electronic Health Record. Gerry Higgins, Ph.D., Johns Hopkins
The Future of the Electronic Health Record Gerry Higgins, Ph.D., Johns Hopkins Topics to be covered Near Term Opportunities: Commercial, Usability, Unification of different applications. OMICS : The patient
EMBL Identity & Access Management
EMBL Identity & Access Management Rupert Lück EMBL Heidelberg e IRG Workshop Zürich Apr 24th 2008 Outline EMBL Overview Identity & Access Management for EMBL IT Requirements & Strategy Project Goal and
An Introduction to Genomics and SAS Scientific Discovery Solutions
An Introduction to Genomics and SAS Scientific Discovery Solutions Dr Karen M Miller Product Manager Bioinformatics SAS EMEA 16.06.03 Copyright 2003, SAS Institute Inc. All rights reserved. 1 Overview!
Introduction to Genome Annotation
Introduction to Genome Annotation AGCGTGGTAGCGCGAGTTTGCGAGCTAGCTAGGCTCCGGATGCGA CCAGCTTTGATAGATGAATATAGTGTGCGCGACTAGCTGTGTGTT GAATATATAGTGTGTCTCTCGATATGTAGTCTGGATCTAGTGTTG GTGTAGATGGAGATCGCGTAGCGTGGTAGCGCGAGTTTGCGAGCT
CD-HIT User s Guide. Last updated: April 5, 2010. http://cd-hit.org http://bioinformatics.org/cd-hit/
CD-HIT User s Guide Last updated: April 5, 2010 http://cd-hit.org http://bioinformatics.org/cd-hit/ Program developed by Weizhong Li s lab at UCSD http://weizhong-lab.ucsd.edu [email protected] 1. Introduction
University of Glasgow - Programme Structure Summary C1G5-5100 MSc Bioinformatics, Polyomics and Systems Biology
University of Glasgow - Programme Structure Summary C1G5-5100 MSc Bioinformatics, Polyomics and Systems Biology Programme Structure - the MSc outcome will require 180 credits total (full-time only) - 60
Genetomic Promototypes
Genetomic Promototypes Mirkó Palla and Dana Pe er Department of Mechanical Engineering Clarkson University Potsdam, New York and Department of Genetics Harvard Medical School 77 Avenue Louis Pasteur Boston,
SQL Server Administrator Introduction - 3 Days Objectives
SQL Server Administrator Introduction - 3 Days INTRODUCTION TO MICROSOFT SQL SERVER Exploring the components of SQL Server Identifying SQL Server administration tasks INSTALLING SQL SERVER Identifying
Genomes and SNPs in Malaria and Sickle Cell Anemia
Genomes and SNPs in Malaria and Sickle Cell Anemia Introduction to Genome Browsing with Ensembl Ensembl The vast amount of information in biological databases today demands a way of organising and accessing
Scientific databases. Biological data management
Scientific databases Biological data management The term paper within the framework of the course Principles of Modern Database Systems by Aleksejs Kontijevskis PhD student The Linnaeus Centre for Bioinformatics
Activity 7.21 Transcription factors
Purpose To consolidate understanding of protein synthesis. To explain the role of transcription factors and hormones in switching genes on and off. Play the transcription initiation complex game Regulation
13.4 Gene Regulation and Expression
13.4 Gene Regulation and Expression Lesson Objectives Describe gene regulation in prokaryotes. Explain how most eukaryotic genes are regulated. Relate gene regulation to development in multicellular organisms.
Lecture Outline. Introduction to Databases. Introduction. Data Formats Sample databases How to text search databases. Shifra Ben-Dor Irit Orr
Introduction to Databases Shifra Ben-Dor Irit Orr Lecture Outline Introduction Data and Database types Database components Data Formats Sample databases How to text search databases What units of information
Data Integration and ETL with Oracle Warehouse Builder: Part 1
Oracle University Contact Us: + 38516306373 Data Integration and ETL with Oracle Warehouse Builder: Part 1 Duration: 3 Days What you will learn This Data Integration and ETL with Oracle Warehouse Builder:
OpenCB a next generation big data analytics and visualisation platform for the Omics revolution
OpenCB a next generation big data analytics and visualisation platform for the Omics revolution Development at the University of Cambridge - Closing the Omics / Moore s law gap with Dell & Intel Ignacio
Pipeline Pilot Enterprise Server. Flexible Integration of Disparate Data and Applications. Capture and Deployment of Best Practices
overview Pipeline Pilot Enterprise Server Pipeline Pilot Enterprise Server (PPES) is a powerful client-server platform that streamlines the integration and analysis of the vast quantities of data flooding
PeptidomicsDB: a new platform for sharing MS/MS data.
PeptidomicsDB: a new platform for sharing MS/MS data. Federica Viti, Ivan Merelli, Dario Di Silvestre, Pietro Brunetti, Luciano Milanesi, Pierluigi Mauri NETTAB2010 Napoli, 01/12/2010 Mass Spectrometry
Data Management for Biobanks
Data Management for Biobanks JOHANN EDER CLAUS DABRINGER MICHAELA SCHICHO KONRAD STARK University of Klagenfurt and University of Vienna Data Management for Biobanks Local Integration Project Support Anonymization
OpenCB development - A Big Data analytics and visualisation platform for the Omics revolution
OpenCB development - A Big Data analytics and visualisation platform for the Omics revolution Ignacio Medina, Paul Calleja, John Taylor (University of Cambridge, UIS, HPC Service (HPCS)) Abstract The advent
To be able to describe polypeptide synthesis including transcription and splicing
Thursday 8th March COPY LO: To be able to describe polypeptide synthesis including transcription and splicing Starter Explain the difference between transcription and translation BATS Describe and explain
Using the Grid for the interactive workflow management in biomedicine. Andrea Schenone BIOLAB DIST University of Genova
Using the Grid for the interactive workflow management in biomedicine Andrea Schenone BIOLAB DIST University of Genova overview background requirements solution case study results background A multilevel
DNA Replication & Protein Synthesis. This isn t a baaaaaaaddd chapter!!!
DNA Replication & Protein Synthesis This isn t a baaaaaaaddd chapter!!! The Discovery of DNA s Structure Watson and Crick s discovery of DNA s structure was based on almost fifty years of research by other
Discovery and Quantification of RNA with RNASeq Roderic Guigó Serra Centre de Regulació Genòmica (CRG) [email protected]
Bioinformatique et Séquençage Haut Débit, Discovery and Quantification of RNA with RNASeq Roderic Guigó Serra Centre de Regulació Genòmica (CRG) [email protected] 1 RNA Transcription to RNA and subsequent
Protein Synthesis How Genes Become Constituent Molecules
Protein Synthesis Protein Synthesis How Genes Become Constituent Molecules Mendel and The Idea of Gene What is a Chromosome? A chromosome is a molecule of DNA 50% 50% 1. True 2. False True False Protein
UCLA Team Sequences Cell Line, Puts Open Source Software Framework into Production
Page 1 of 6 UCLA Team Sequences Cell Line, Puts Open Source Software Framework into Production February 05, 2010 Newsletter: BioInform BioInform - February 5, 2010 By Vivien Marx Scientists at the department
IEEE International Conference on Computing, Analytics and Security Trends CAST-2016 (19 21 December, 2016) Call for Paper
IEEE International Conference on Computing, Analytics and Security Trends CAST-2016 (19 21 December, 2016) Call for Paper CAST-2015 provides an opportunity for researchers, academicians, scientists and
Challenges associated with analysis and storage of NGS data
Challenges associated with analysis and storage of NGS data Gabriella Rustici Research and training coordinator Functional Genomics Group [email protected] Next-generation sequencing Next-generation sequencing
Semantic Data Management. Xavier Lopez, Ph.D., Director, Spatial & Semantic Technologies
Semantic Data Management Xavier Lopez, Ph.D., Director, Spatial & Semantic Technologies 1 Enterprise Information Challenge Source: Oracle customer 2 Vision of Semantically Linked Data The Network of Collaborative
UGENE Quick Start Guide
Quick Start Guide This document contains a quick introduction to UGENE. For more detailed information, you can find the UGENE User Manual and other special manuals in project website: http://ugene.unipro.ru.
The Steps. 1. Transcription. 2. Transferal. 3. Translation
Protein Synthesis Protein synthesis is simply the "making of proteins." Although the term itself is easy to understand, the multiple steps that a cell in a plant or animal must go through are not. In order
<Insert Picture Here> Oracle SQL Developer 3.0: Overview and New Features
1 Oracle SQL Developer 3.0: Overview and New Features Sue Harper Senior Principal Product Manager The following is intended to outline our general product direction. It is intended
MDM and Data Warehousing Complement Each Other
Master Management MDM and Warehousing Complement Each Other Greater business value from both 2011 IBM Corporation Executive Summary Master Management (MDM) and Warehousing (DW) complement each other There
Check Your Data Freedom: A Taxonomy to Assess Life Science Database Openness
Check Your Data Freedom: A Taxonomy to Assess Life Science Database Openness Melanie Dulong de Rosnay Fellow, Science Commons and Berkman Center for Internet & Society at Harvard University This article
HL7 Clinical Genomics and Structured Documents Work Groups
HL7 Clinical Genomics and Structured Documents Work Groups CDA Implementation Guide: Genetic Testing Report (GTR) Amnon Shabo (Shvo), PhD [email protected] HL7 Clinical Genomics WG Co-chair and Modeling
Basic processing of next-generation sequencing (NGS) data
Basic processing of next-generation sequencing (NGS) data Getting from raw sequence data to expression analysis! 1 Reminder: we are measuring expression of protein coding genes by transcript abundance
Introduction to Data Mining
Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association
LifeScope Genomic Analysis Software 2.5
USER GUIDE LifeScope Genomic Analysis Software 2.5 Graphical User Interface DATA ANALYSIS METHODS AND INTERPRETATION Publication Part Number 4471877 Rev. A Revision Date November 2011 For Research Use
iway Roadmap Michael Corcoran Sr. VP Corporate Marketing
16.06.2015 iway Roadmap Michael Corcoran Sr. VP Corporate Marketing iway 7 Products 1 iway 7 Products iway 7 Products 360 Viewer Remediation Sentinel Portal Golden Record Search and View Omni Patient Data
An EVIDENCE-ENHANCED HEALTHCARE ECOSYSTEM for Cancer: I/T perspectives
An EVIDENCE-ENHANCED HEALTHCARE ECOSYSTEM for Cancer: I/T perspectives Chalapathy Neti, Ph.D. Associate Director, Healthcare Transformation, Shahram Ebadollahi, Ph.D. Research Staff Memeber IBM Research,
Translation Study Guide
Translation Study Guide This study guide is a written version of the material you have seen presented in the replication unit. In translation, the cell uses the genetic information contained in mrna to
Search and Data Mining: Techniques. Applications Anya Yarygina Boris Novikov
Search and Data Mining: Techniques Applications Anya Yarygina Boris Novikov Introduction Data mining applications Data mining system products and research prototypes Additional themes on data mining Social
Optimization of ETL Work Flow in Data Warehouse
Optimization of ETL Work Flow in Data Warehouse Kommineni Sivaganesh M.Tech Student, CSE Department, Anil Neerukonda Institute of Technology & Science Visakhapatnam, India. [email protected] P Srinivasu
Delivering the power of the world s most successful genomics platform
Delivering the power of the world s most successful genomics platform NextCODE Health is bringing the full power of the world s largest and most successful genomics platform to everyday clinical care NextCODE
Introduction to Bioinformatics 2. DNA Sequence Retrieval and comparison
Introduction to Bioinformatics 2. DNA Sequence Retrieval and comparison Benjamin F. Matthews United States Department of Agriculture Soybean Genomics and Improvement Laboratory Beltsville, MD 20708 [email protected]
An agent-based layered middleware as tool integration
An agent-based layered middleware as tool integration Flavio Corradini Leonardo Mariani Emanuela Merelli University of L Aquila University of Milano University of Camerino ITALY ITALY ITALY Helsinki FSE/ESEC
SQL Server Training Course Content
SQL Server Training Course Content SQL Server Training Objectives Installing Microsoft SQL Server Upgrading to SQL Server Management Studio Monitoring the Database Server Database and Index Maintenance
Biological Databases and Protein Sequence Analysis
Biological Databases and Protein Sequence Analysis Introduction M. Madan Babu, Center for Biotechnology, Anna University, Chennai 25, India Bioinformatics is the application of Information technology to
Doctor of Philosophy in Computer Science
Doctor of Philosophy in Computer Science Background/Rationale The program aims to develop computer scientists who are armed with methods, tools and techniques from both theoretical and systems aspects
Human Genome Organization: An Update. Genome Organization: An Update
Human Genome Organization: An Update Genome Organization: An Update Highlights of Human Genome Project Timetable Proposed in 1990 as 3 billion dollar joint venture between DOE and NIH with 15 year completion
EFFECTIVE STORAGE OF XBRL DOCUMENTS
EFFECTIVE STORAGE OF XBRL DOCUMENTS An Oracle & UBmatrix Whitepaper June 2007 Page 1 Introduction Today s business world requires the ability to report, validate, and analyze business information efficiently,
Extraction and Visualization of Protein-Protein Interactions from PubMed
Extraction and Visualization of Protein-Protein Interactions from PubMed Ulf Leser Knowledge Management in Bioinformatics Humboldt-Universität Berlin Finding Relevant Knowledge Find information about Much
