Processing Genome Data using Scalable Database Technology. My Background
|
|
- Adrian Clark
- 8 years ago
- Views:
Transcription
1 Johann Christoph Freytag, Ph.D. Stanford University, February 2004 Harvard Univ. Visiting Scientist, Microsoft Res. (2002) My Background Professor of ERCR (European Computer Industry Research Centre), München (87-89) DEC s Database Technology Center, München (90-93) Starburst project, IBM Almaden Research Center (85-87) Visiting Scientist, Almaden Research Center (97/98) Visiting Scientist, IBM SVL (2001) nur zum nicht-kommerziellen Gebrauch 1
2 What s the Meaning of Life DNA RNA Protein Genomic Transmitter Transcription? Messenger Translation? Gene product Replication!? Overview (Biological) Motivation/Problems Using Database Technology Gene-EYe Integration-Platform Data Cleansing BLAST-Integration into GDB In-and-Out-the-Database: Using Workflow for dry Experiments Summary nur zum nicht-kommerziellen Gebrauch 2
3 View of Biological Areas Environment Diseases Experiments Pathways Life Evolution DNA Genome RNA Transcriptome Amino Acids Proteome Biological Motivation View of Data Data Source Environment Diseases Experiments OMIM Express Brenda Pathways Life Evolution Gene Ontology Taxonomy KEGG EMBL DNA RefSeq Genome LocusLink RNA EMBL Transcriptome (EST) Amino Acids SWISS-PROT Proteome Interpro Biological Motivation nur zum nicht-kommerziellen Gebrauch 3
4 Complex Relationships A graph depicting the relationships between 400+ biological data sources served by the EBI via SRS Database Growth of EMBL (# of records) More than 400 Data Sources on the WEB Source: DBIS (our) Approach SwissProt EMBL Database Model of the Biological World ESEMBLE... KABAT nur zum nicht-kommerziellen Gebrauch 4
5 (Biological) Motivation/Problems Using Scalable Database Technology Gene-EYe Integration-Platform Data Cleansing BLAST-Integration into GDB In-and-Out-the-Database: Using Workflow Concepts for dry Experiments Summary Overview Gene-EYe Integration-Platform Vision Provide mechanisms for unified handling of different data sources data source integration change management user defined data preparation Provide relevant tools for sequence manipulation and retrieval work flow support for operation and administration Gen-Eye Vision nur zum nicht-kommerziellen Gebrauch 5
6 Gene-EYe Integration-Platform The Big Picture Genome Data Warehouse Layer (GDW Schema) KNOWLEDGE Biological Entities -> Biological Concepts (e.g. Life Cycle) Genome DataBase Layer (GDB Schema) CONTENT Relational Entities -> Biological Entities (e.g. Gene) Genome Data Store Layer (GDS Schema) DATA Flat File Data -> Relational Entities (e.g. EMBL) Design GDS: From Flat File to Database Genome Data Store Layer (GDS Schema) Data Storage Data Cleansing Update/Admin GDS Load Tools GDS Admin Tools ENSEMBL DDL InterPro DDL TAXO DDL SWALL DDL EMBL DDL ENSEMBL scanner InterPro scanner TAXO scanner SWALL scanner EMBL scanner nur zum nicht-kommerziellen Gebrauch 6
7 The Data Import Pipeline - Revisited Data File Scanner Load Files Loader Summary Instance Spec. CLOB Files Load Spec. Gene-EYe GDS Format Spec. Content Spec. DDL-Gen. DDL Script Controller Phase 1: Property Files Perl scripts Hand crafted Phase 2: GEM 1 Repository de.hui.dbis.geneeye.* (Java) Autogenerate from Metadata 1: CWM compliant GeneEYe Metadata Repository Modeling the Maintenance Process nur zum nicht-kommerziellen Gebrauch 7
8 GDB-Layer: From Data to Biology Genome Database Layer (GDB Schema) Data Integration Data Cleansing (Sem.) Queries Data GDB Builder (IBM Clio?) Schema Gene Protein Transcript Tissue Variant [Data] EMBL SWALL TAXO InterPro ENSEMBL GDB Mapper (IBM Clio) [Definition] Defined by and in cooperation w/ domain experts Genome Data Store Layer (GDS Schema) Data Storage Data Cleansing (Syn.) Update/Admin Schema Mapping with Clio with permission of Dr. Felix Naumann IBM Almaden Research Center Clio Source Schema User mapping Clio Target Schema DB SQL or XQuery DB nur zum nicht-kommerziellen Gebrauch 8
9 Clio Features with permission of Dr. Felix Naumann IBM Almaden Research Center Schema Viewer Visual mapping between schema elements Attribute Matcher Intelligent suggestions of likely mappings Data Viewer Data examples for mapping queries Queries SQL, XSLT, Xquery Use and adhere to source and target schema constraints GDW: Providing Facts for Research Genome Data Warehouse Layer (GDW Schema) Data Mining Ontology Mapping Process Simulation Ontology GDW Miner GDB Explorer Variant Tissue Transcript Protein Gene Variant Tissue Transcript Protein Gene Genome Database Layer (GDB Schema) Data Integration Data Cleansing (Sem.) Queries nur zum nicht-kommerziellen Gebrauch 9
10 (Biological) Motivation/Problems Using Scalable Database Technology Gene-EYe Integration-Platform Data Cleansing BLAST-Integration into GDB In-and-Out-the-Database: Using Workflow Concepts for dry Experiments Summary Overview Errors in Genome Data DNA Sequence Determination Classes of errors in genome data production Genome Experimental errors DNA Feature Gene Annotation Analysis errors mrna Transformation errors Propagated errors Stale data Protein Sequence Determination [Müller, Naumann, Freytag, ICIQ, 2003] only 1.3 % difference Protein Function SUBUNIT Annotation The main difference are transcript DISEASE copy numbers in the brain DNA, RNA Sequence agagattagcgcgctagatcgatatgataga 0,23% gctatatcatccgagatagcagatagctcta gcacactattacacgagcagcgaccttatat 2,58% Structure Annotation Protein Sequence MDDREDLVYQAKLAEQAERYDEMVESMKKVD AGMDVELTVEERNLLSVAYKNVIGARRASWY RIISSIEQKEENKGGEDKLKMIREYRQMVER FUNCTION Function Annotation 5% - 30% MAY ACT AS INTRACELLULAR SIGNALING COMPONENT... BINDS DIRECTLY TO 5% ZO-1-40% INVOLVED IN ACUTE LEUKEMIAS nur zum nicht-kommerziellen Gebrauch 10
11 Reliability-based Merging (cont.) Domain expert identifies reliable parts for merging Definition of a set of views for integration Current work: r 1 ID A 1 A 2 A 3 1 A A B r 1 4 B mismatch patterns? 5 C How to Merge assess their r 2 ID A 1 A 2 A 3 1 B B B B D Which are the relevant relevance & importance? ID A 1 A 2 A 3 1 A A B B C e.g. MIN() (Biological) Motivation/Problems Using Scalable Database Technology Gene-EYe Integration-Platform BLAST-Integration into GDB In-and-Out-the-Database: Using Workflow Concepts for dry Experiments Summary Overview nur zum nicht-kommerziellen Gebrauch 11
12 BLAST: General Introduction DC-File Algorithm/Package: Similarity Search Devloped by Altschul et al. (1990) Three Steps: FormatDB Preprocessing 1. Search for Word Pairs (Iseq, DSeq) of Length L on the Data Collection of Sequences above Threshhold T 2. Expansion of each Word Pair until the Value V of their Alignment is away from the local maximum 3. Output of complete alignment (Highscoring Segment Pair, HSP), if Value(Alignment) > S Index Sequences BLAST Report Features Query sequence BLAST Call Output: Powerset of Alignments BLAST UDF Implementation Goal: Using BLAST in SQL-statements How? BLAST-UDF implemented as Table Function Use in SQL Query SELECT * FROM TABLE( BLAST(<Parameter>, <Query Sequence>, <Comparison Sequence> )) Each call returns a set of alignments over Sequences in the Database nur zum nicht-kommerziellen Gebrauch 12
13 Structure of UDB Table Function Implementation: Mapping of program into calling structure for table functions Communiaction between the different calls via scratchpad scratchpad: Storage area which remains intact and unchanged between UDF calls Storage of data structures for different steps especially for output from postprocessing: SeqAlign For all Initialize SEQUENCES Alignment without gaps Postprocessing For all UDF BLAST ALIGNMENTS Output of the results Release data structures related with sequences Release global data structures FIRST OPEN FETCH CLOSE FINAL (Biological) Motivation/Problems Using Scalable Database Technology Gene-EYe Integration-Platform BLAST-Integration into GDB In-and-Out-the-Database: Using Workflow Concepts for dry Experiments Summary Overview nur zum nicht-kommerziellen Gebrauch 13
14 The Challenge: Exon Skipping Gene Protein One Gen with 100 Exons ~ Variations n Exons within one Gene linearly combined (splicing) Used as Pattern for Protein Generation Challenge: Exon Skipping Do alternative fusion points new funktional (i.e. biologigacl meaningful) patterns? nur zum nicht-kommerziellen Gebrauch 14
15 Functional Genomics: Gain of New Insight First Horizon: Simple Exon Skipping New Functionality! Flow of Processing Steps Generate Exon Sequence Local Database (automatic) Remote Tool (Web Based) Find Similarity Search Supported by local DB Check for Biological Validity nur zum nicht-kommerziellen Gebrauch 15
16 Implementation Some facts 60 days 100% load One splice form per minute So far: ca splice forms First biolog. meaningful results Cooperations Cooperation with Univ. of Jena (Rolf Backofen) Berlin Center of Bioinformatics (BCB) Charite, FU, Max-Planck-Institut (M. Vingron) Industry: IBM, small companies, Patrick Chappatte, Switzerland nur zum nicht-kommerziellen Gebrauch 16
17 Database Environment IBM p-server (sponsored by IBM) CPU CPU CPU CPU 2.3 TByte CPU CPU CPU CPU DB2 Summary Lesson learnt Highly Dynamic Environment Data: changes frequently User: changes frequently Provide a framework for Date integration Data processing Data changes Data dependencies.. Meta data management Future Work Query processing Include domain knowledge Data cleansing Set of UDFs for biological data processing Visualization of Data Summary nur zum nicht-kommerziellen Gebrauch 17
RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison
RETRIEVING SEQUENCE INFORMATION Nucleotide sequence databases Database search Sequence alignment and comparison Biological sequence databases Originally just a storage place for sequences. Currently the
More informationModule 1. Sequence Formats and Retrieval. Charles Steward
The Open Door Workshop Module 1 Sequence Formats and Retrieval Charles Steward 1 Aims Acquaint you with different file formats and associated annotations. Introduce different nucleotide and protein databases.
More informationLecture 11 Data storage and LIMS solutions. Stéphane LE CROM lecrom@biologie.ens.fr
Lecture 11 Data storage and LIMS solutions Stéphane LE CROM lecrom@biologie.ens.fr Various steps of a DNA microarray experiment Experimental steps Data analysis Experimental design set up Chips on catalog
More informationGenBank, Entrez, & FASTA
GenBank, Entrez, & FASTA Nucleotide Sequence Databases First generation GenBank is a representative example started as sort of a museum to preserve knowledge of a sequence from first discovery great repositories,
More informationAGILENT S BIOINFORMATICS ANALYSIS SOFTWARE
ACCELERATING PROGRESS IS IN OUR GENES AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE GENESPRING GENE EXPRESSION (GX) MASS PROFILER PROFESSIONAL (MPP) PATHWAY ARCHITECT (PA) See Deeper. Reach Further. BIOINFORMATICS
More informationNew solutions for Big Data Analysis and Visualization
New solutions for Big Data Analysis and Visualization From HPC to cloud-based solutions Barcelona, February 2013 Nacho Medina imedina@cipf.es http://bioinfo.cipf.es/imedina Head of the Computational Biology
More informationJust the Facts: A Basic Introduction to the Science Underlying NCBI Resources
1 of 8 11/7/2004 11:00 AM National Center for Biotechnology Information About NCBI NCBI at a Glance A Science Primer Human Genome Resources Model Organisms Guide Outreach and Education Databases and Tools
More informationGene Models & Bed format: What they represent.
GeneModels&Bedformat:Whattheyrepresent. Gene models are hypotheses about the structure of transcripts produced by a gene. Like all models, they may be correct, partly correct, or entirely wrong. Typically,
More informationBioinformatics Resources at a Glance
Bioinformatics Resources at a Glance A Note about FASTA Format There are MANY free bioinformatics tools available online. Bioinformaticists have developed a standard format for nucleotide and protein sequences
More informationA Multiple DNA Sequence Translation Tool Incorporating Web Robot and Intelligent Recommendation Techniques
Proceedings of the 2007 WSEAS International Conference on Computer Engineering and Applications, Gold Coast, Australia, January 17-19, 2007 402 A Multiple DNA Sequence Translation Tool Incorporating Web
More informationBioinformatics Grid - Enabled Tools For Biologists.
Bioinformatics Grid - Enabled Tools For Biologists. What is Grid-Enabled Tools (GET)? As number of data from the genomics and proteomics experiment increases. Problems arise for the current sequence analysis
More informationThree data delivery cases for EMBL- EBI s Embassy. Guy Cochrane www.ebi.ac.uk
Three data delivery cases for EMBL- EBI s Embassy Guy Cochrane www.ebi.ac.uk EMBL European Bioinformatics Institute Genes, genomes & variation European Nucleotide Archive 1000 Genomes Ensembl Ensembl Genomes
More informationBIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS
BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS NEW YORK CITY COLLEGE OF TECHNOLOGY The City University Of New York School of Arts and Sciences Biological Sciences Department Course title:
More informationIO Informatics The Sentient Suite
IO Informatics The Sentient Suite Our software, The Sentient Suite, allows a user to assemble, view, analyze and search very disparate information in a common environment. The disparate data can be numeric
More informationThe Integrated Microbial Genomes (IMG) System: A Case Study in Biological Data Management
The Integrated Microbial Genomes (IMG) System: A Case Study in Biological Data Management Victor M. Markowitz 1, Frank Korzeniewski 1, Krishna Palaniappan 1, Ernest Szeto 1, Natalia Ivanova 2, and Nikos
More informationFrequently Asked Questions Next Generation Sequencing
Frequently Asked Questions Next Generation Sequencing Import These Frequently Asked Questions for Next Generation Sequencing are some of the more common questions our customers ask. Questions are divided
More informationA Practitioner's G uide to Data Management and Data Integration in Bioinformatics
3 CHAPTER A Practitioner's G uide to Data Management and Data Integration in Bioinformatics Barbara A. Eckman 3.1 INTRODUCTION Integration of a large and widely diverse set of data sources and analytical
More informationLinear Sequence Analysis. 3-D Structure Analysis
Linear Sequence Analysis What can you learn from a (single) protein sequence? Calculate it s physical properties Molecular weight (MW), isoelectric point (pi), amino acid content, hydropathy (hydrophilic
More informationSGI. High Throughput Computing (HTC) Wrapper Program for Bioinformatics on SGI ICE and SGI UV Systems. January, 2012. Abstract. Haruna Cofer*, PhD
White Paper SGI High Throughput Computing (HTC) Wrapper Program for Bioinformatics on SGI ICE and SGI UV Systems Haruna Cofer*, PhD January, 2012 Abstract The SGI High Throughput Computing (HTC) Wrapper
More informationData integration for metagenomics: current status and future plans
integration for metagenomics: current status and future plans Neil Wipat Computing Science University of Newcastle NERC Microbial Metagenomics Overview metamicrobase Current method of data integration
More informationGenome Viewing. Module 2. Using Genome Browsers to View Annotation of the Human Genome
Module 2 Genome Viewing Using Genome Browsers to View Annotation of the Human Genome Bert Overduin, Ph.D. PANDA Coordination & Outreach EMBL - European Bioinformatics Institute Wellcome Trust Genome Campus
More informationSoftware and Methods for the Analysis of Affymetrix GeneChip Data. Rafael A Irizarry Department of Biostatistics Johns Hopkins University
Software and Methods for the Analysis of Affymetrix GeneChip Data Rafael A Irizarry Department of Biostatistics Johns Hopkins University Outline Overview Bioconductor Project Examples 1: Gene Annotation
More informationWhen you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want
1 When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want to search other databases as well. There are very
More informationThe Galaxy workflow. George Magklaras PhD RHCE
The Galaxy workflow George Magklaras PhD RHCE Biotechnology Center of Oslo & The Norwegian Center of Molecular Medicine University of Oslo, Norway http://www.biotek.uio.no http://www.ncmm.uio.no http://www.no.embnet.org
More informationVad är bioinformatik och varför behöver vi det i vården? a bioinformatician's perspectives
Vad är bioinformatik och varför behöver vi det i vården? a bioinformatician's perspectives Dirk.Repsilber@oru.se 2015-05-21 Functional Bioinformatics, Örebro University Vad är bioinformatik och varför
More informationPreparing the scenario for the use of patient s genome sequences in clinic. Joaquín Dopazo
Preparing the scenario for the use of patient s genome sequences in clinic Joaquín Dopazo Computational Medicine Institute, Centro de Investigación Príncipe Felipe (CIPF), Functional Genomics Node, (INB),
More informationSAP HANA Enabling Genome Analysis
SAP HANA Enabling Genome Analysis Joanna L. Kelley, PhD Postdoctoral Scholar, Stanford University Enakshi Singh, MSc HANA Product Management, SAP Labs LLC Outline Use cases Genomics review Challenges in
More informationWeb-Based Genomic Information Integration with Gene Ontology
Web-Based Genomic Information Integration with Gene Ontology Kai Xu 1 IMAGEN group, National ICT Australia, Sydney, Australia, kai.xu@nicta.com.au Abstract. Despite the dramatic growth of online genomic
More informationTeaching Bioinformatics to Undergraduates
Teaching Bioinformatics to Undergraduates http://www.med.nyu.edu/rcr/asm Stuart M. Brown Research Computing, NYU School of Medicine I. What is Bioinformatics? II. Challenges of teaching bioinformatics
More informationBIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology http://tinyurl.com/bioinf525-w16
Course Director: Dr. Barry Grant (DCM&B, bjgrant@med.umich.edu) Description: This is a three module course covering (1) Foundations of Bioinformatics, (2) Statistics in Bioinformatics, and (3) Systems
More informationFocusing on results not data comprehensive data analysis for targeted next generation sequencing
Focusing on results not data comprehensive data analysis for targeted next generation sequencing Daniel Swan, Jolyon Holdstock, Angela Matchan, Richard Stark, John Shovelton, Duarte Mohla and Simon Hughes
More informationDistributed Data Mining in Discovery Net. Dr. Moustafa Ghanem Department of Computing Imperial College London
Distributed Data Mining in Discovery Net Dr. Moustafa Ghanem Department of Computing Imperial College London 1. What is Discovery Net 2. Distributed Data Mining for Compute Intensive Tasks 3. Distributed
More informationHow To Use The Assembly Database In A Microarray (Perl) With A Microarcode) (Perperl 2) (For Macrogenome) (Genome 2)
The Ensembl Core databases and API Useful links Installation instructions: http://www.ensembl.org/info/docs/api/api_installation.html Schema description: http://www.ensembl.org/info/docs/api/core/core_schema.html
More informationData Integration of Bioinformatics and Web-Based Software Development
Integration of Biological XML data Ph. D. Lecture Bioinformatics & Software Systems Lab. Woo-Hyuk Jang Information and Communications Univ. Where are we? Client-Side Info. Management Business related Issues
More informationDNA and the Cell. Version 2.3. English version. ELLS European Learning Laboratory for the Life Sciences
DNA and the Cell Anastasios Koutsos Alexandra Manaia Julia Willingale-Theune Version 2.3 English version ELLS European Learning Laboratory for the Life Sciences Anastasios Koutsos, Alexandra Manaia and
More informationThe EcoCyc Curation Process
The EcoCyc Curation Process Ingrid M. Keseler SRI International 1 HOW OFTEN IS THE GOLDEN GATE BRIDGE PAINTED? Many misconceptions exist about how often the Bridge is painted. Some say once every seven
More informationBBSRC TECHNOLOGY STRATEGY: TECHNOLOGIES NEEDED BY RESEARCH KNOWLEDGE PROVIDERS
BBSRC TECHNOLOGY STRATEGY: TECHNOLOGIES NEEDED BY RESEARCH KNOWLEDGE PROVIDERS 1. The Technology Strategy sets out six areas where technological developments are required to push the frontiers of knowledge
More informationProtein Protein Interaction Networks
Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics
More informationSequence Formats and Sequence Database Searches. Gloria Rendon SC11 Education June, 2011
Sequence Formats and Sequence Database Searches Gloria Rendon SC11 Education June, 2011 Sequence A is the primary structure of a biological molecule. It is a chain of residues that form a precise linear
More informationOracle Warehouse Builder 10g
Oracle Warehouse Builder 10g Architectural White paper February 2004 Table of contents INTRODUCTION... 3 OVERVIEW... 4 THE DESIGN COMPONENT... 4 THE RUNTIME COMPONENT... 5 THE DESIGN ARCHITECTURE... 6
More informationSoftware Description Technology
Software applications using NCB Technology. Software Description Technology LEX Provide learning management system that is a central resource for online medical education content and computer-based learning
More informationThe Future of the Electronic Health Record. Gerry Higgins, Ph.D., Johns Hopkins
The Future of the Electronic Health Record Gerry Higgins, Ph.D., Johns Hopkins Topics to be covered Near Term Opportunities: Commercial, Usability, Unification of different applications. OMICS : The patient
More informationEMBL Identity & Access Management
EMBL Identity & Access Management Rupert Lück EMBL Heidelberg e IRG Workshop Zürich Apr 24th 2008 Outline EMBL Overview Identity & Access Management for EMBL IT Requirements & Strategy Project Goal and
More informationAn Introduction to Genomics and SAS Scientific Discovery Solutions
An Introduction to Genomics and SAS Scientific Discovery Solutions Dr Karen M Miller Product Manager Bioinformatics SAS EMEA 16.06.03 Copyright 2003, SAS Institute Inc. All rights reserved. 1 Overview!
More informationIntroduction to Genome Annotation
Introduction to Genome Annotation AGCGTGGTAGCGCGAGTTTGCGAGCTAGCTAGGCTCCGGATGCGA CCAGCTTTGATAGATGAATATAGTGTGCGCGACTAGCTGTGTGTT GAATATATAGTGTGTCTCTCGATATGTAGTCTGGATCTAGTGTTG GTGTAGATGGAGATCGCGTAGCGTGGTAGCGCGAGTTTGCGAGCT
More informationCD-HIT User s Guide. Last updated: April 5, 2010. http://cd-hit.org http://bioinformatics.org/cd-hit/
CD-HIT User s Guide Last updated: April 5, 2010 http://cd-hit.org http://bioinformatics.org/cd-hit/ Program developed by Weizhong Li s lab at UCSD http://weizhong-lab.ucsd.edu liwz@sdsc.edu 1. Introduction
More informationUniversity of Glasgow - Programme Structure Summary C1G5-5100 MSc Bioinformatics, Polyomics and Systems Biology
University of Glasgow - Programme Structure Summary C1G5-5100 MSc Bioinformatics, Polyomics and Systems Biology Programme Structure - the MSc outcome will require 180 credits total (full-time only) - 60
More informationGenetomic Promototypes
Genetomic Promototypes Mirkó Palla and Dana Pe er Department of Mechanical Engineering Clarkson University Potsdam, New York and Department of Genetics Harvard Medical School 77 Avenue Louis Pasteur Boston,
More informationSQL Server Administrator Introduction - 3 Days Objectives
SQL Server Administrator Introduction - 3 Days INTRODUCTION TO MICROSOFT SQL SERVER Exploring the components of SQL Server Identifying SQL Server administration tasks INSTALLING SQL SERVER Identifying
More informationGenomes and SNPs in Malaria and Sickle Cell Anemia
Genomes and SNPs in Malaria and Sickle Cell Anemia Introduction to Genome Browsing with Ensembl Ensembl The vast amount of information in biological databases today demands a way of organising and accessing
More informationScientific databases. Biological data management
Scientific databases Biological data management The term paper within the framework of the course Principles of Modern Database Systems by Aleksejs Kontijevskis PhD student The Linnaeus Centre for Bioinformatics
More informationActivity 7.21 Transcription factors
Purpose To consolidate understanding of protein synthesis. To explain the role of transcription factors and hormones in switching genes on and off. Play the transcription initiation complex game Regulation
More information13.4 Gene Regulation and Expression
13.4 Gene Regulation and Expression Lesson Objectives Describe gene regulation in prokaryotes. Explain how most eukaryotic genes are regulated. Relate gene regulation to development in multicellular organisms.
More informationLecture Outline. Introduction to Databases. Introduction. Data Formats Sample databases How to text search databases. Shifra Ben-Dor Irit Orr
Introduction to Databases Shifra Ben-Dor Irit Orr Lecture Outline Introduction Data and Database types Database components Data Formats Sample databases How to text search databases What units of information
More informationData Integration and ETL with Oracle Warehouse Builder: Part 1
Oracle University Contact Us: + 38516306373 Data Integration and ETL with Oracle Warehouse Builder: Part 1 Duration: 3 Days What you will learn This Data Integration and ETL with Oracle Warehouse Builder:
More informationOpenCB a next generation big data analytics and visualisation platform for the Omics revolution
OpenCB a next generation big data analytics and visualisation platform for the Omics revolution Development at the University of Cambridge - Closing the Omics / Moore s law gap with Dell & Intel Ignacio
More informationPipeline Pilot Enterprise Server. Flexible Integration of Disparate Data and Applications. Capture and Deployment of Best Practices
overview Pipeline Pilot Enterprise Server Pipeline Pilot Enterprise Server (PPES) is a powerful client-server platform that streamlines the integration and analysis of the vast quantities of data flooding
More informationPeptidomicsDB: a new platform for sharing MS/MS data.
PeptidomicsDB: a new platform for sharing MS/MS data. Federica Viti, Ivan Merelli, Dario Di Silvestre, Pietro Brunetti, Luciano Milanesi, Pierluigi Mauri NETTAB2010 Napoli, 01/12/2010 Mass Spectrometry
More informationData Management for Biobanks
Data Management for Biobanks JOHANN EDER CLAUS DABRINGER MICHAELA SCHICHO KONRAD STARK University of Klagenfurt and University of Vienna Data Management for Biobanks Local Integration Project Support Anonymization
More informationProteome Data Integration: Characteristics and Challenges
Proteome Data Integration: Characteristics and Challenges K. Belhajjame 1, S.M. Embury 1, H. Fan 2, C. Goble 1, H. Hermjakob 4, S.J. Hubbard 1, D. Jones 3, P. Jones 4, N. Martin 2, S. Oliver 1, C. Orengo
More informationWork Package 13.5: Authors: Paul Flicek and Ilkka Lappalainen. 1. Introduction
Work Package 13.5: Report summarising the technical feasibility of the European Genotype Archive to collect, store, and use genotype data stored in European biobanks in a manner that complies with all
More informationOpenCB development - A Big Data analytics and visualisation platform for the Omics revolution
OpenCB development - A Big Data analytics and visualisation platform for the Omics revolution Ignacio Medina, Paul Calleja, John Taylor (University of Cambridge, UIS, HPC Service (HPCS)) Abstract The advent
More informationTo be able to describe polypeptide synthesis including transcription and splicing
Thursday 8th March COPY LO: To be able to describe polypeptide synthesis including transcription and splicing Starter Explain the difference between transcription and translation BATS Describe and explain
More informationApplying data integration into reconstruction of gene networks from micro
Applying data integration into reconstruction of gene networks from microarray data PhD Thesis Proposal Dipartimento di Informatica e Scienze dell Informazione Università degli Studi di Genova December
More informationUsing the Grid for the interactive workflow management in biomedicine. Andrea Schenone BIOLAB DIST University of Genova
Using the Grid for the interactive workflow management in biomedicine Andrea Schenone BIOLAB DIST University of Genova overview background requirements solution case study results background A multilevel
More informationDNA Replication & Protein Synthesis. This isn t a baaaaaaaddd chapter!!!
DNA Replication & Protein Synthesis This isn t a baaaaaaaddd chapter!!! The Discovery of DNA s Structure Watson and Crick s discovery of DNA s structure was based on almost fifty years of research by other
More informationDiscovery and Quantification of RNA with RNASeq Roderic Guigó Serra Centre de Regulació Genòmica (CRG) roderic.guigo@crg.cat
Bioinformatique et Séquençage Haut Débit, Discovery and Quantification of RNA with RNASeq Roderic Guigó Serra Centre de Regulació Genòmica (CRG) roderic.guigo@crg.cat 1 RNA Transcription to RNA and subsequent
More informationProtein Synthesis How Genes Become Constituent Molecules
Protein Synthesis Protein Synthesis How Genes Become Constituent Molecules Mendel and The Idea of Gene What is a Chromosome? A chromosome is a molecule of DNA 50% 50% 1. True 2. False True False Protein
More informationUCLA Team Sequences Cell Line, Puts Open Source Software Framework into Production
Page 1 of 6 UCLA Team Sequences Cell Line, Puts Open Source Software Framework into Production February 05, 2010 Newsletter: BioInform BioInform - February 5, 2010 By Vivien Marx Scientists at the department
More informationIEEE International Conference on Computing, Analytics and Security Trends CAST-2016 (19 21 December, 2016) Call for Paper
IEEE International Conference on Computing, Analytics and Security Trends CAST-2016 (19 21 December, 2016) Call for Paper CAST-2015 provides an opportunity for researchers, academicians, scientists and
More informationChallenges associated with analysis and storage of NGS data
Challenges associated with analysis and storage of NGS data Gabriella Rustici Research and training coordinator Functional Genomics Group gabry@ebi.ac.uk Next-generation sequencing Next-generation sequencing
More informationSemantic Data Management. Xavier Lopez, Ph.D., Director, Spatial & Semantic Technologies
Semantic Data Management Xavier Lopez, Ph.D., Director, Spatial & Semantic Technologies 1 Enterprise Information Challenge Source: Oracle customer 2 Vision of Semantically Linked Data The Network of Collaborative
More informationUGENE Quick Start Guide
Quick Start Guide This document contains a quick introduction to UGENE. For more detailed information, you can find the UGENE User Manual and other special manuals in project website: http://ugene.unipro.ru.
More informationThe Steps. 1. Transcription. 2. Transferal. 3. Translation
Protein Synthesis Protein synthesis is simply the "making of proteins." Although the term itself is easy to understand, the multiple steps that a cell in a plant or animal must go through are not. In order
More information<Insert Picture Here> Oracle SQL Developer 3.0: Overview and New Features
1 Oracle SQL Developer 3.0: Overview and New Features Sue Harper Senior Principal Product Manager The following is intended to outline our general product direction. It is intended
More informationDataFoundry Data Warehousing and Integration for Scientific Data Management
UCRL-ID-127593 DataFoundry Data Warehousing and Integration for Scientific Data Management R. Musick, T. Critchlow, M. Ganesh, K. Fidelis, A. Zemla and T. Slezak U.S. Department of Energy Livermore National
More informationKam D. Dahlquist Department of Biology. John David N. Dionisio Department of Electrical Engineering & Computer Science
http://xmlpipedb.cs.lmu.edu Kam D. Dahlquist Department of Biology John David N. Dionisio Department of Electrical Engineering & Computer Science Loyola Marymount University A Reusable, Open Source Tool
More informationA demonstration of the use of Datagrid testbed and services for the biomedical community
A demonstration of the use of Datagrid testbed and services for the biomedical community Biomedical applications work package V. Breton, Y Legré (CNRS/IN2P3) R. Météry (CS) Credits : C. Blanchet, T. Contamine,
More informationMDM and Data Warehousing Complement Each Other
Master Management MDM and Warehousing Complement Each Other Greater business value from both 2011 IBM Corporation Executive Summary Master Management (MDM) and Warehousing (DW) complement each other There
More informationCheck Your Data Freedom: A Taxonomy to Assess Life Science Database Openness
Check Your Data Freedom: A Taxonomy to Assess Life Science Database Openness Melanie Dulong de Rosnay Fellow, Science Commons and Berkman Center for Internet & Society at Harvard University This article
More informationHL7 Clinical Genomics and Structured Documents Work Groups
HL7 Clinical Genomics and Structured Documents Work Groups CDA Implementation Guide: Genetic Testing Report (GTR) Amnon Shabo (Shvo), PhD shabo@il.ibm.com HL7 Clinical Genomics WG Co-chair and Modeling
More informationBasic processing of next-generation sequencing (NGS) data
Basic processing of next-generation sequencing (NGS) data Getting from raw sequence data to expression analysis! 1 Reminder: we are measuring expression of protein coding genes by transcript abundance
More informationIntroduction to Data Mining
Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association
More informationLifeScope Genomic Analysis Software 2.5
USER GUIDE LifeScope Genomic Analysis Software 2.5 Graphical User Interface DATA ANALYSIS METHODS AND INTERPRETATION Publication Part Number 4471877 Rev. A Revision Date November 2011 For Research Use
More informationiway Roadmap Michael Corcoran Sr. VP Corporate Marketing
16.06.2015 iway Roadmap Michael Corcoran Sr. VP Corporate Marketing iway 7 Products 1 iway 7 Products iway 7 Products 360 Viewer Remediation Sentinel Portal Golden Record Search and View Omni Patient Data
More informationAn EVIDENCE-ENHANCED HEALTHCARE ECOSYSTEM for Cancer: I/T perspectives
An EVIDENCE-ENHANCED HEALTHCARE ECOSYSTEM for Cancer: I/T perspectives Chalapathy Neti, Ph.D. Associate Director, Healthcare Transformation, Shahram Ebadollahi, Ph.D. Research Staff Memeber IBM Research,
More informationDAWIS-M.D.-adata warehouse system for metabolic data
DAWIS-M.D.-adata warehouse system for metabolic data Klaus Hippe, Benjamin Kormeier, Thoralf Töpel, Sebastian Janowski and Ralf Hofestädt Bioinformatics Department Bielefeld University Universitätsstraße
More informationTranslation Study Guide
Translation Study Guide This study guide is a written version of the material you have seen presented in the replication unit. In translation, the cell uses the genetic information contained in mrna to
More informationSearch and Data Mining: Techniques. Applications Anya Yarygina Boris Novikov
Search and Data Mining: Techniques Applications Anya Yarygina Boris Novikov Introduction Data mining applications Data mining system products and research prototypes Additional themes on data mining Social
More informationOptimization of ETL Work Flow in Data Warehouse
Optimization of ETL Work Flow in Data Warehouse Kommineni Sivaganesh M.Tech Student, CSE Department, Anil Neerukonda Institute of Technology & Science Visakhapatnam, India. Sivaganesh07@gmail.com P Srinivasu
More informationDelivering the power of the world s most successful genomics platform
Delivering the power of the world s most successful genomics platform NextCODE Health is bringing the full power of the world s largest and most successful genomics platform to everyday clinical care NextCODE
More informationIntroduction to Bioinformatics 2. DNA Sequence Retrieval and comparison
Introduction to Bioinformatics 2. DNA Sequence Retrieval and comparison Benjamin F. Matthews United States Department of Agriculture Soybean Genomics and Improvement Laboratory Beltsville, MD 20708 matthewb@ba.ars.usda.gov
More informationAn agent-based layered middleware as tool integration
An agent-based layered middleware as tool integration Flavio Corradini Leonardo Mariani Emanuela Merelli University of L Aquila University of Milano University of Camerino ITALY ITALY ITALY Helsinki FSE/ESEC
More informationSQL Server Training Course Content
SQL Server Training Course Content SQL Server Training Objectives Installing Microsoft SQL Server Upgrading to SQL Server Management Studio Monitoring the Database Server Database and Index Maintenance
More informationData Integration and Decision-Making For Biomarkers Discovery, Validation and Evaluation. D. POLVERARI, CTO October 06-07 2008
Data Integration and Decision-Making For Biomarkers Discovery, Validation and Evaluation D. POLVERARI, CTO October 06-07 2008 Data integration definition and aims Definition : Data integration consists
More informationBiological Databases and Protein Sequence Analysis
Biological Databases and Protein Sequence Analysis Introduction M. Madan Babu, Center for Biotechnology, Anna University, Chennai 25, India Bioinformatics is the application of Information technology to
More informationDoctor of Philosophy in Computer Science
Doctor of Philosophy in Computer Science Background/Rationale The program aims to develop computer scientists who are armed with methods, tools and techniques from both theoretical and systems aspects
More informationHuman Genome Organization: An Update. Genome Organization: An Update
Human Genome Organization: An Update Genome Organization: An Update Highlights of Human Genome Project Timetable Proposed in 1990 as 3 billion dollar joint venture between DOE and NIH with 15 year completion
More informationEFFECTIVE STORAGE OF XBRL DOCUMENTS
EFFECTIVE STORAGE OF XBRL DOCUMENTS An Oracle & UBmatrix Whitepaper June 2007 Page 1 Introduction Today s business world requires the ability to report, validate, and analyze business information efficiently,
More informationExtraction and Visualization of Protein-Protein Interactions from PubMed
Extraction and Visualization of Protein-Protein Interactions from PubMed Ulf Leser Knowledge Management in Bioinformatics Humboldt-Universität Berlin Finding Relevant Knowledge Find information about Much
More information