GnpIS: an information system for plant breeding

Size: px
Start display at page:

Download "GnpIS: an information system for plant breeding"

Transcription

1 GnpIS: an information system for plant breeding 21th october 2010 Thematic day on Integrative genomics - Nantes Hadi Quesneville The URGI unit A G R I C U L T U R E A L I M E N T A T I O N E N V I R O N N E M E N T

2 URGI: Unité de Recherche en Génomique-Info Research Unit INRA unit (French National Institute for Agricultural Research) Plant breeding and Genetics Department Strong connexions with other plant INRA departments Host a Bioinformatic platform IBISA Grade Member of the French National Network of Bioinformatic Platforms (ReNaBi) Research Data integration Functional and evolutional genome dynamics The URGI unit 18/11/2010 Hadi Quesneville 2

3 Databases Central role for data analysis Repository Navigation Link heterogeneous informations Experiments: Transcriptomes, EST, SNP, chip Genome annotations: genes, repeats Genetic informations: markers, recombination rates Lineages: lines, populations, species Performance issues Adapted schema for query 18/11/2010 Hadi Quesneville 3

4 Genome data integration The URGI unit 18/11/2010 Hadi Quesneville 4

5 Data integration Data integration involves combining data residing in different sources and providing users with a unified view of these data. (from wikipedia) The URGI unit 18/11/2010 Hadi Quesneville 5

6 Genome base data integration Natural way to integrate «omics» data Map data to genome sequence Compare genome coordinates Identifiy relationships Compute correlations Few generic operations needed Generic unary and binary operators Density, counts Statistics Visualisations 2 nd Crossomics meeting 18/11/2010 Hadi Quesneville 6

7 Unary operators (1 source) range merge 18/11/2010 Hadi Quesneville 7

8 Binary operators (2 sources) overlap add 18/11/2010 Hadi Quesneville 8

9 Binary operators (2 sources) diff merge 18/11/2010 Hadi Quesneville 9

10 Inspired from R-tree (Guttman 1984) Hierarchical set of bin Adjacent bin at a level have adjacent ID Bin from different level do not overlap Coordinate indexes Assign segment to smallest bin that contain it Bins simple_feature Mb simple_feature_id int kb seq_region_id int kb bin float kb seq_region_start int seq_region_end int Chromosome seq_region_strand analysis_id tinyint int score double 18/11/2010 Hadi Quesneville 10

11 Coordinate indexes Mb 100kb 10kb 1kb Bins Chromosome simple_feature Select * from simple_feature where seq_region_id=1 AND ( bin = OR bin = Bin level ( x10 l ) Bin number ( / 10 l ) OR bin between and OR bin between and ) simple_feature_id seq_region_id bin seq_region_start seq_region_end seq_region_strand analysis_id score int int float int int tinyint int double AND seq_region_start<= AND seq_region_end >= /11/2010 Hadi Quesneville 11

12 S-MART The URGI unit Hadi Quesneville

13 Database data integration The URGI unit 18/11/2010 Hadi Quesneville 13

14 Architectures Data integration is defined as a triple <G,S,M> where: G is the global (or mediated) schema, S is the heterogeneous set of source schemas, M is the mapping that maps queries between the source and the global schemas. (from wikipedia) Two architectures Data Warehouse Virtual Database 2 nd Crossomics meeting 18/11/2010 Hadi Quesneville 14

15 GnpMap GnpIS data warehouse EST, mrna Maps DNA Polymorphismes Genome annotation GnpGenomeAster SIReGal Genetic collections Transcriptome Proteome GnpProt Phenotype evaluations (P=GxE) The URGI unit 18/11/2010 Hadi Quesneville 15

16 Data consistency GnpGenome foreign keys GnpIS Xref GnpSNP A foreign key is a relationship or link between two tables which ensures that the data stored in a database is consistent. GnpArray GnpProt Deleting a record that contains a value referred to by a foreign key in another table would break referential integrity. GnpMap SIReGal Some relational database management systems (RDBMS) can enforce referential integrity GnpSeq Aster Core module Ephesis by deleting the foreign key rows as well to maintain integrity, by returning an error and not performing the delete. Architecturally, this offers a tightly coupled approach because the data reside together in a single repository at query-time The URGI unit 18/11/2010 Hadi Quesneville 16

17 Interoperability A property referring to the ability of diverse systems to work together (inter-operate) capability of different programs to exchange data via a common set of exchange formats, to read and write the same file formats, and to use the same protocols. (loose) links between RDBMS Xrefs Web services Tools over the data warehouse Quick search Biomart Galaxy (from wikipedia) 2 nd Crossomics meeting 18/11/2010 Hadi Quesneville 17

18 GnpIS interoperability Submission Queries Pipelines DB Data Mart Web Interfaces submission Complex queries Excel Files Simple queries The URGI unit 18/11/2010 Hadi Quesneville 18

19 Quick search The URGI unit 18/11/2010 Hadi Quesneville 19

20 Quick search results The URGI unit 18/11/2010 Hadi Quesneville 20

21 Biomart: advanced search The URGI unit 18/11/2010 Hadi Quesneville 21

22 Get QTL by theme, trait, QTL name, markers. Hadi Quesneville

23 Attributes for results Hadi Quesneville

24 Results and links for details Delphine Steinbach Hadi Quesneville

25 Interoperability (QTL) with GnpMap GnpGenome Poplar GnpMap URGI Démo GnpIS Dijon Hadi Quesneville 20/05/10 25

26 ( markers QTL mapped on GnpGenome (via URGI Démo GnpIS Dijon Hadi Quesneville 20/05/10 26

27 QTLs found in GnpMap URGI Démo GnpIS Dijon Hadi Quesneville 20/05/10 27

28 A data integration workbench The URGI unit 18/11/2010 Hadi Quesneville 28

29 Galaxy URGI Démo GnpIS Dijon Hadi Quesneville 20/05/10 29

30 Get data from Biomart URGI Démo GnpIS Dijon Hadi Quesneville 20/05/10 30

31 Text manipulation Exemple: URGI Démo GnpIS Dijon Hadi Quesneville 20/05/10 31

32 Other manipulations URGI Démo GnpIS Dijon Hadi Quesneville 20/05/10 32

33 Galaxy Workflow The URGI unit 18/11/2010 Hadi Quesneville 33

34 URGI - Data Integration Workbench GnpIS External DB Data Mart Data Mart Data banks User files Galaxy Developer Pipelines User The URGI unit 18/11/2010 Hadi Quesneville 34

35 Acknowlegments URGI M. Alaux, F. Alfama, J. Amselem, N. Choisne, S. Durand, O. Inizan, V. Jamilloux, A. Keliet, E. Kimmel, N. Lapalu, I. Luyten, N. Mohellibi, C. Pommier, H. Quesneville, S. Reboux, D. Steinbach, M. Zytnicki S. Arnoux (CDD), M. Bras (CDD), B. Brault (CDD), L. Brigitte (CDD), T. Flutre (PhD), C. Hoede (CDD), H. Mors (Master2) E. Permal (Post-doc), D. Verdelet (CDD), D.Valdenaire (CDD) left URGI B. Hilseberger (CDD) The URGI unit 09/11/09 Hadi Quesneville

36 Partners Wheat genomics: P. Leroy, C. Ravel, E. Paux, C. Feuillet Grape genomics A.F. Adam-Blondon, M. Moroldo SNP thematic D. Brunel, F. Granier, H. McKhann C. De Poittevin Genetic resources J.M. Prosperi, P. Roumet Fungi genomics M.H. Lebrun, T. Rouxel Maize genomics M. Falque, J. Joets, A. Charcosset Colot s Lab V. Colot, I. Ahmed, A. Sarazin Tree genomics C.Plomion, C. Poittevin The URGI unit 18/11/2010 Hadi Quesneville 36

URGI and ELIXIR France for plants and food

URGI and ELIXIR France for plants and food URGI and ELIXIR France for plants and food Elixir - SME & Innovation event, Data Driven Innovation. 19 th march 2015 A L I M E N T A T I O N A G R I C U L T U R E E N V I R O N N E M E N T URGI: Unité

More information

-> Integration of MAPHiTS in Galaxy

-> Integration of MAPHiTS in Galaxy Enabling NGS Analysis with(out) the Infrastructure, 12:0512 Development of a workflow for SNPs detection in grapevine From Sets to Graphs: Towards a Realistic Enrichment Analy species: MAPHiTS -> Integration

More information

Towards the construction of an integrated Wheat Information System

Towards the construction of an integrated Wheat Information System Towards the construction of an integrated Wheat Information System Mario Caccamo 1, Hadi Quesneville 2 Report- June 2012 1. The Genome Analysis Centre (TGAC), Norwich Research Park, Norwich, UK 2. INRA,

More information

HETEROGENEOUS DATA INTEGRATION FOR CLINICAL DECISION SUPPORT SYSTEM. Aniket Bochare - aniketb1@umbc.edu. CMSC 601 - Presentation

HETEROGENEOUS DATA INTEGRATION FOR CLINICAL DECISION SUPPORT SYSTEM. Aniket Bochare - aniketb1@umbc.edu. CMSC 601 - Presentation HETEROGENEOUS DATA INTEGRATION FOR CLINICAL DECISION SUPPORT SYSTEM Aniket Bochare - aniketb1@umbc.edu CMSC 601 - Presentation Date-04/25/2011 AGENDA Introduction and Background Framework Heterogeneous

More information

Simplifying Data Interpretation with Nexus Copy Number

Simplifying Data Interpretation with Nexus Copy Number Simplifying Data Interpretation with Nexus Copy Number A WHITE PAPER FROM BIODISCOVERY, INC. Rapid technological advancements, such as high-density acgh and SNP arrays as well as next-generation sequencing

More information

i2b2 Clinical Research Chart

i2b2 Clinical Research Chart i2b2 Clinical Research Chart Shawn Murphy MD, Ph.D. Griffin Weber MD, Ph.D. Michael Mendis Vivian Gainer MS Lori Phillips MS Rajesh Kuttan Wensong Pan MS Henry Chueh MD Susanne Churchill Ph.D. John Glaser

More information

i2b2 Clinical Research Chart

i2b2 Clinical Research Chart i2b2 Clinical Research Chart Shawn Murphy MD, Ph.D. Griffin Weber MD, Ph.D. Michael Mendis Vivian Gainer MS Lori Phillips MS Rajesh Kuttan Wensong Pan MS Henry Chueh MD Susanne Churchill Ph.D. John Glaser

More information

Data Grids. Lidan Wang April 5, 2007

Data Grids. Lidan Wang April 5, 2007 Data Grids Lidan Wang April 5, 2007 Outline Data-intensive applications Challenges in data access, integration and management in Grid setting Grid services for these data-intensive application Architectural

More information

Lecture 11 Data storage and LIMS solutions. Stéphane LE CROM lecrom@biologie.ens.fr

Lecture 11 Data storage and LIMS solutions. Stéphane LE CROM lecrom@biologie.ens.fr Lecture 11 Data storage and LIMS solutions Stéphane LE CROM lecrom@biologie.ens.fr Various steps of a DNA microarray experiment Experimental steps Data analysis Experimental design set up Chips on catalog

More information

Processing Genome Data using Scalable Database Technology. My Background

Processing Genome Data using Scalable Database Technology. My Background Johann Christoph Freytag, Ph.D. freytag@dbis.informatik.hu-berlin.de http://www.dbis.informatik.hu-berlin.de Stanford University, February 2004 PhD @ Harvard Univ. Visiting Scientist, Microsoft Res. (2002)

More information

An Introduction to Genomics and SAS Scientific Discovery Solutions

An Introduction to Genomics and SAS Scientific Discovery Solutions An Introduction to Genomics and SAS Scientific Discovery Solutions Dr Karen M Miller Product Manager Bioinformatics SAS EMEA 16.06.03 Copyright 2003, SAS Institute Inc. All rights reserved. 1 Overview!

More information

BIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology http://tinyurl.com/bioinf525-w16

BIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology http://tinyurl.com/bioinf525-w16 Course Director: Dr. Barry Grant (DCM&B, bjgrant@med.umich.edu) Description: This is a three module course covering (1) Foundations of Bioinformatics, (2) Statistics in Bioinformatics, and (3) Systems

More information

A Primer of Genome Science THIRD

A Primer of Genome Science THIRD A Primer of Genome Science THIRD EDITION GREG GIBSON-SPENCER V. MUSE North Carolina State University Sinauer Associates, Inc. Publishers Sunderland, Massachusetts USA Contents Preface xi 1 Genome Projects:

More information

University of Glasgow - Programme Structure Summary C1G5-5100 MSc Bioinformatics, Polyomics and Systems Biology

University of Glasgow - Programme Structure Summary C1G5-5100 MSc Bioinformatics, Polyomics and Systems Biology University of Glasgow - Programme Structure Summary C1G5-5100 MSc Bioinformatics, Polyomics and Systems Biology Programme Structure - the MSc outcome will require 180 credits total (full-time only) - 60

More information

Marker-Assisted Backcrossing. Marker-Assisted Selection. 1. Select donor alleles at markers flanking target gene. Losing the target allele

Marker-Assisted Backcrossing. Marker-Assisted Selection. 1. Select donor alleles at markers flanking target gene. Losing the target allele Marker-Assisted Backcrossing Marker-Assisted Selection CS74 009 Jim Holland Target gene = Recurrent parent allele = Donor parent allele. Select donor allele at markers linked to target gene.. Select recurrent

More information

Databases. DSIC. Academic Year 2010-2011

Databases. DSIC. Academic Year 2010-2011 Databases DSIC. Academic Year 2010-2011 1 Lecturer José Hernández-Orallo Office 236, 2nd floor DSIC. Email: jorallo@dsic.upv.es http://www.dsic.upv.es/~jorallo/docent/bda/bdaeng.html Attention hours On

More information

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison RETRIEVING SEQUENCE INFORMATION Nucleotide sequence databases Database search Sequence alignment and comparison Biological sequence databases Originally just a storage place for sequences. Currently the

More information

Customer Bank Account Management System Technical Specification Document

Customer Bank Account Management System Technical Specification Document Customer Bank Account Management System Technical Specification Document Technical Specification Document Page 1 of 15 Table of Contents Contents 1 Introduction 3 2 Design Overview 4 3 Topology Diagram.6

More information

INRA's Big Data perspectives and implementation challenges. Pascal Neveu UMR MISTEA INRA - Montpellier

INRA's Big Data perspectives and implementation challenges. Pascal Neveu UMR MISTEA INRA - Montpellier INRA's Big Data perspectives and implementation challenges UMR MISTEA INRA - Montpellier Agronomic Sciences Raises integrated issues and challenges: How to adapt agriculture to climate change? How agriculture

More information

The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets

The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets!! Large data collections appear in many scientific domains like climate studies.!! Users and

More information

Bioinformatics Resources at a Glance

Bioinformatics Resources at a Glance Bioinformatics Resources at a Glance A Note about FASTA Format There are MANY free bioinformatics tools available online. Bioinformaticists have developed a standard format for nucleotide and protein sequences

More information

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources 1 of 8 11/7/2004 11:00 AM National Center for Biotechnology Information About NCBI NCBI at a Glance A Science Primer Human Genome Resources Model Organisms Guide Outreach and Education Databases and Tools

More information

Files. Files. Files. Files. Files. File Organisation. What s it all about? What s in a file?

Files. Files. Files. Files. Files. File Organisation. What s it all about? What s in a file? Files What s it all about? Information being stored about anything important to the business/individual keeping the files. The simple concepts used in the operation of manual files are often a good guide

More information

Research Roadmap for the Future. National Grape and Wine Initiative March 2013

Research Roadmap for the Future. National Grape and Wine Initiative March 2013 Research Roadmap for the Future National Grape and Wine Initiative March 2013 Objective of Today s Meeting Our mission drives the roadmap Our Mission Drive research to maximize productivity, sustainability

More information

IMPLEMENTATION OF DATA WAREHOUSE SAP BW IN THE PRODUCTION COMPANY. Maria Kowal, Galina Setlak

IMPLEMENTATION OF DATA WAREHOUSE SAP BW IN THE PRODUCTION COMPANY. Maria Kowal, Galina Setlak 174 No:13 Intelligent Information and Engineering Systems IMPLEMENTATION OF DATA WAREHOUSE SAP BW IN THE PRODUCTION COMPANY Maria Kowal, Galina Setlak Abstract: in this paper the implementation of Data

More information

Work Package 13.5: Authors: Paul Flicek and Ilkka Lappalainen. 1. Introduction

Work Package 13.5: Authors: Paul Flicek and Ilkka Lappalainen. 1. Introduction Work Package 13.5: Report summarising the technical feasibility of the European Genotype Archive to collect, store, and use genotype data stored in European biobanks in a manner that complies with all

More information

Big Data: Challenges and Opportunities

Big Data: Challenges and Opportunities Big Data: Challenges and Opportunities NGWI & USDA/ARS Meeting USDA Carver Center April 16, 2014 Doreen Ware Acting Chief Science Information Officer USDA ARS Big Data: Challenges and Response Biology

More information

Data integration for metagenomics: current status and future plans

Data integration for metagenomics: current status and future plans integration for metagenomics: current status and future plans Neil Wipat Computing Science University of Newcastle NERC Microbial Metagenomics Overview metamicrobase Current method of data integration

More information

INTERNATIONAL CONFERENCE ON HARMONISATION OF TECHNICAL REQUIREMENTS FOR REGISTRATION OF PHARMACEUTICALS FOR HUMAN USE Q5B

INTERNATIONAL CONFERENCE ON HARMONISATION OF TECHNICAL REQUIREMENTS FOR REGISTRATION OF PHARMACEUTICALS FOR HUMAN USE Q5B INTERNATIONAL CONFERENCE ON HARMONISATION OF TECHNICAL REQUIREMENTS FOR REGISTRATION OF PHARMACEUTICALS FOR HUMAN USE ICH HARMONISED TRIPARTITE GUIDELINE QUALITY OF BIOTECHNOLOGICAL PRODUCTS: ANALYSIS

More information

DevOps Ignition to reach Galaxy continuous integration

DevOps Ignition to reach Galaxy continuous integration DevOps Ignition to reach Galaxy continuous integration Olivier Inizan, Mikael Loaec, Jonathan Kreplak and Hadi Quesneville. INRA URGI, RD 10 route de Saint Cyr 78026 Versailles Cedex NOM DE L AUTEUR /

More information

Chapter 11 Mining Databases on the Web

Chapter 11 Mining Databases on the Web Chapter 11 Mining bases on the Web INTRODUCTION While Chapters 9 and 10 provided an overview of Web data mining, this chapter discusses aspects of mining the databases on the Web. Essentially, we use the

More information

Step by Step Guide to Importing Genetic Data into JMP Genomics

Step by Step Guide to Importing Genetic Data into JMP Genomics Step by Step Guide to Importing Genetic Data into JMP Genomics Page 1 Introduction Data for genetic analyses can exist in a variety of formats. Before this data can be analyzed it must imported into one

More information

Master's projects at ITMO University. Daniil Chivilikhin PhD Student @ ITMO University

Master's projects at ITMO University. Daniil Chivilikhin PhD Student @ ITMO University Master's projects at ITMO University Daniil Chivilikhin PhD Student @ ITMO University General information Guidance from our lab's researchers Publishable results 2 Research areas Research at ITMO Evolutionary

More information

Using Web Services for Customised Data Entry

Using Web Services for Customised Data Entry Using Web Services for Customised Data Entry A thesis submitted in partial fulfilment of the requirements for the Degree of Master of Applied Computing at Lincoln University by Yanbo Deng Lincoln University

More information

PLANT BREEDING: CAN METABOLOMICS HELP?

PLANT BREEDING: CAN METABOLOMICS HELP? PLANT BREEDING: CAN METABOLOMICS HELP? Carlos Muñoz Schick Ingeniero Agrónomo, M.S., Ph.D. UNIVERSIDAD DE CHILE Facultad de Ciencias Agronómicas OUTLINE OF THE PRESENTATION Origin of Plant Breeding Domestication

More information

An EVIDENCE-ENHANCED HEALTHCARE ECOSYSTEM for Cancer: I/T perspectives

An EVIDENCE-ENHANCED HEALTHCARE ECOSYSTEM for Cancer: I/T perspectives An EVIDENCE-ENHANCED HEALTHCARE ECOSYSTEM for Cancer: I/T perspectives Chalapathy Neti, Ph.D. Associate Director, Healthcare Transformation, Shahram Ebadollahi, Ph.D. Research Staff Memeber IBM Research,

More information

GOBII. Genomic & Open-source Breeding Informatics Initiative

GOBII. Genomic & Open-source Breeding Informatics Initiative GOBII Genomic & Open-source Breeding Informatics Initiative My Background BS Animal Science, University of Tennessee MS Animal Breeding, University of Georgia Random regression models for longitudinal

More information

CCR Biology - Chapter 9 Practice Test - Summer 2012

CCR Biology - Chapter 9 Practice Test - Summer 2012 Name: Class: Date: CCR Biology - Chapter 9 Practice Test - Summer 2012 Multiple Choice Identify the choice that best completes the statement or answers the question. 1. Genetic engineering is possible

More information

NCBI resources III: GEO and ftp site. Yanbin Yin Spring 2013

NCBI resources III: GEO and ftp site. Yanbin Yin Spring 2013 NCBI resources III: GEO and ftp site Yanbin Yin Spring 2013 1 Homework assignment 2 Search colon cancer at GEO and find a data Series and perform a GEO2R analysis Write a report (in word or ppt) to include

More information

Mitochondrial DNA Analysis

Mitochondrial DNA Analysis Mitochondrial DNA Analysis Lineage Markers Lineage markers are passed down from generation to generation without changing Except for rare mutation events They can help determine the lineage (family tree)

More information

An example of bioinformatics application on plant breeding projects in Rijk Zwaan

An example of bioinformatics application on plant breeding projects in Rijk Zwaan An example of bioinformatics application on plant breeding projects in Rijk Zwaan Xiangyu Rao 17-08-2012 Introduction of RZ Rijk Zwaan is active worldwide as a vegetable breeding company that focuses on

More information

Clinical Research Infrastructure

Clinical Research Infrastructure Clinical Research Infrastructure Enhancing UK s Clinical Research Capabilities & Technologies At least 150m to establish /develop cutting-edge technological infrastructure, UK wide. to bring into practice

More information

Data Management for Biobanks

Data Management for Biobanks Data Management for Biobanks JOHANN EDER CLAUS DABRINGER MICHAELA SCHICHO KONRAD STARK University of Klagenfurt and University of Vienna Data Management for Biobanks Local Integration Project Support Anonymization

More information

Using the Grid for the interactive workflow management in biomedicine. Andrea Schenone BIOLAB DIST University of Genova

Using the Grid for the interactive workflow management in biomedicine. Andrea Schenone BIOLAB DIST University of Genova Using the Grid for the interactive workflow management in biomedicine Andrea Schenone BIOLAB DIST University of Genova overview background requirements solution case study results background A multilevel

More information

European Medicines Agency

European Medicines Agency European Medicines Agency July 1996 CPMP/ICH/139/95 ICH Topic Q 5 B Quality of Biotechnological Products: Analysis of the Expression Construct in Cell Lines Used for Production of r-dna Derived Protein

More information

Algorithms in Computational Biology (236522) spring 2007 Lecture #1

Algorithms in Computational Biology (236522) spring 2007 Lecture #1 Algorithms in Computational Biology (236522) spring 2007 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours: Tuesday 11:00-12:00/by appointment TA: Ilan Gronau, Taub 700, tel 4894 Office

More information

Comparative genomic hybridization Because arrays are more than just a tool for expression analysis

Comparative genomic hybridization Because arrays are more than just a tool for expression analysis Microarray Data Analysis Workshop MedVetNet Workshop, DTU 2008 Comparative genomic hybridization Because arrays are more than just a tool for expression analysis Carsten Friis ( with several slides from

More information

Computational Requirements

Computational Requirements Workshop on Establishing a Central Resource of Data from Genome Sequencing Projects Computational Requirements Steve Sherry, Lisa Brooks, Paul Flicek, Anton Nekrutenko, Kenna Shaw, Heidi Sofia High-density

More information

The Galaxy workflow. George Magklaras PhD RHCE

The Galaxy workflow. George Magklaras PhD RHCE The Galaxy workflow George Magklaras PhD RHCE Biotechnology Center of Oslo & The Norwegian Center of Molecular Medicine University of Oslo, Norway http://www.biotek.uio.no http://www.ncmm.uio.no http://www.no.embnet.org

More information

CORBA and Life Sciences

CORBA and Life Sciences CORBA and Life Sciences Ulf Leser 4. December 2002 Table of Content CORBA in a nutshell The Life Science Research Domain Task Force The Genome Maps Standard The CORBA approach to data integration Ulf Leser:

More information

Data Management Experiences and Best Practices from the Perspective of a Plant Research Institute

Data Management Experiences and Best Practices from the Perspective of a Plant Research Institute DILS 2014, Lisbon, Portugal Daniel Arend Data Management Experiences & Best Practices 1/11 Data Management Experiences and Best Practices from the Perspective of a Plant Research Institute Daniel Arend,

More information

Overview. Overarching observations

Overview. Overarching observations Overview Genomics and Health Information Technology Systems: Exploring the Issues April 27-28, 2011, Bethesda, MD Brief Meeting Summary, prepared by Greg Feero, M.D., Ph.D. (planning committee chair) The

More information

Fast. Integrated Genome Browser & DAS. Easy. Flexible. Free. bioviz.org/igb

Fast. Integrated Genome Browser & DAS. Easy. Flexible. Free. bioviz.org/igb bioviz.org/igb Integrated Genome Browser & DAS Free tools for visualizing, sharing, and publishing genomes and genome-scale data. Easy Flexible Fast Free Funding: National Science Foundation Arabidopsis

More information

Worksheet - COMPARATIVE MAPPING 1

Worksheet - COMPARATIVE MAPPING 1 Worksheet - COMPARATIVE MAPPING 1 The arrangement of genes and other DNA markers is compared between species in Comparative genome mapping. As early as 1915, the geneticist J.B.S Haldane reported that

More information

<Insert Picture Here> The Evolution Of Clinical Data Warehousing

<Insert Picture Here> The Evolution Of Clinical Data Warehousing The Evolution Of Clinical Data Warehousing Srinivas Karri Principal Consultant Agenda Value of Clinical Data Clinical Data warehousing & The Big Data Challenge

More information

GRIN-Global Project. the global plant genebank information management system

GRIN-Global Project. the global plant genebank information management system GRIN-Global Project the global plant genebank information management system So what is GRIN-Global? GRIN-Global (GG) is a software suite that enables genebanks to store and manage information associated

More information

Data deluge (and it s applications) Gianluigi Zanetti. Data deluge. (and its applications) Gianluigi Zanetti

Data deluge (and it s applications) Gianluigi Zanetti. Data deluge. (and its applications) Gianluigi Zanetti Data deluge (and its applications) Prologue Data is becoming cheaper and cheaper to produce and store Driving mechanism is parallelism on sensors, storage, computing Data directly produced are complex

More information

GenBank, Entrez, & FASTA

GenBank, Entrez, & FASTA GenBank, Entrez, & FASTA Nucleotide Sequence Databases First generation GenBank is a representative example started as sort of a museum to preserve knowledge of a sequence from first discovery great repositories,

More information

Human Genome Organization: An Update. Genome Organization: An Update

Human Genome Organization: An Update. Genome Organization: An Update Human Genome Organization: An Update Genome Organization: An Update Highlights of Human Genome Project Timetable Proposed in 1990 as 3 billion dollar joint venture between DOE and NIH with 15 year completion

More information

HOW WILL BIG DATA AFFECT RADIOLOGY (RESEARCH / ANALYTICS)? Ronald Arenson, MD

HOW WILL BIG DATA AFFECT RADIOLOGY (RESEARCH / ANALYTICS)? Ronald Arenson, MD HOW WILL BIG DATA AFFECT RADIOLOGY (RESEARCH / ANALYTICS)? Ronald Arenson, MD DEFINITION OF BIG DATA Big data is a broad term for data sets so large or complex that traditional data processing applications

More information

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE ACCELERATING PROGRESS IS IN OUR GENES AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE GENESPRING GENE EXPRESSION (GX) MASS PROFILER PROFESSIONAL (MPP) PATHWAY ARCHITECT (PA) See Deeper. Reach Further. BIOINFORMATICS

More information

Intro to Bioinformatics

Intro to Bioinformatics Intro to Bioinformatics Marylyn D Ritchie, PhD Professor, Biochemistry and Molecular Biology Director, Center for Systems Genomics The Pennsylvania State University Sarah A Pendergrass, PhD Research Associate

More information

AERES report on the research unit

AERES report on the research unit Section des Unités de recherche AERES report on the research unit Genetics, Diversity and Ecophysiology of Cereals From the University Blaise Pascal INRA February 2011 Section des Unités de recherche AERES

More information

Search and Data Mining: Techniques. Applications Anya Yarygina Boris Novikov

Search and Data Mining: Techniques. Applications Anya Yarygina Boris Novikov Search and Data Mining: Techniques Applications Anya Yarygina Boris Novikov Introduction Data mining applications Data mining system products and research prototypes Additional themes on data mining Social

More information

GENOMIC SELECTION: THE FUTURE OF MARKER ASSISTED SELECTION AND ANIMAL BREEDING

GENOMIC SELECTION: THE FUTURE OF MARKER ASSISTED SELECTION AND ANIMAL BREEDING GENOMIC SELECTION: THE FUTURE OF MARKER ASSISTED SELECTION AND ANIMAL BREEDING Theo Meuwissen Institute for Animal Science and Aquaculture, Box 5025, 1432 Ås, Norway, theo.meuwissen@ihf.nlh.no Summary

More information

When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want

When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want 1 When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want to search other databases as well. There are very

More information

ISM 318: Database Systems. Objectives. Database. Dr. Hamid R. Nemati

ISM 318: Database Systems. Objectives. Database. Dr. Hamid R. Nemati ISM 318: Database Systems Dr. Hamid R. Nemati Department of Information Systems Operations Management Bryan School of Business Economics Objectives Underst the basics of data databases Underst characteristics

More information

An introduction to bioinformatic tools for population genomic and metagenetic data analysis, 2.5 higher education credits Third Cycle

An introduction to bioinformatic tools for population genomic and metagenetic data analysis, 2.5 higher education credits Third Cycle An introduction to bioinformatic tools for population genomic and metagenetic data analysis, 2.5 higher education credits Third Cycle Faculty of Science; Department of Marine Sciences The Swedish Royal

More information

Unifying the Global Data Space using DDS and SQL

Unifying the Global Data Space using DDS and SQL Unifying the Global Data Space using and SQL OMG RT Embedded Systems Workshop 13 July 2006 Gerardo Pardo-Castellote, Ph.D. CTO gerardo.pardo@rti.com www.rti.com Fernando Crespo Sanchez fernando.crespo@rti.com

More information

Core Bioinformatics. Degree Type Year Semester. 4313473 Bioinformàtica/Bioinformatics OB 0 1

Core Bioinformatics. Degree Type Year Semester. 4313473 Bioinformàtica/Bioinformatics OB 0 1 Core Bioinformatics 2014/2015 Code: 42397 ECTS Credits: 12 Degree Type Year Semester 4313473 Bioinformàtica/Bioinformatics OB 0 1 Contact Name: Sònia Casillas Viladerrams Email: Sonia.Casillas@uab.cat

More information

Data search and visualization tools at the Comparative Evolutionary Genomics of Cotton Web resource

Data search and visualization tools at the Comparative Evolutionary Genomics of Cotton Web resource Data search and visualization tools at the Comparative Evolutionary Genomics of Cotton Web resource Alan R. Gingle Andrew H. Paterson Joshua A. Udall Jonathan F. Wendel 1 CEGC project goals set the context

More information

The growing challenges of big data in the agricultural and ecological sciences

The growing challenges of big data in the agricultural and ecological sciences The growing challenges of big data in the agricultural and ecological sciences chris.rawlings@rothamsted.ac.uk Head of Computational and Systems Biology Food Security Demand for food is projected to increase

More information

AgroPortal. a proposition for ontologybased services in the agronomic domain

AgroPortal. a proposition for ontologybased services in the agronomic domain AgroPortal a proposition for ontologybased services in the agronomic domain Clément Jonquet, Esther Dzalé-Yeumo, Elizabeth Arnaud, Pierre Larmande Why ontologies? Why an ontology repository? 2 Biologist

More information

Efficiently Identifying Inclusion Dependencies in RDBMS

Efficiently Identifying Inclusion Dependencies in RDBMS Efficiently Identifying Inclusion Dependencies in RDBMS Jana Bauckmann Department for Computer Science, Humboldt-Universität zu Berlin Rudower Chaussee 25, 12489 Berlin, Germany bauckmann@informatik.hu-berlin.de

More information

GEOG 482/582 : GIS Data Management. Lesson 10: Enterprise GIS Data Management Strategies GEOG 482/582 / My Course / University of Washington

GEOG 482/582 : GIS Data Management. Lesson 10: Enterprise GIS Data Management Strategies GEOG 482/582 / My Course / University of Washington GEOG 482/582 : GIS Data Management Lesson 10: Enterprise GIS Data Management Strategies Overview Learning Objective Questions: 1. What are challenges for multi-user database environments? 2. What is Enterprise

More information

Data Integration and Decision-Making For Biomarkers Discovery, Validation and Evaluation. D. POLVERARI, CTO October 06-07 2008

Data Integration and Decision-Making For Biomarkers Discovery, Validation and Evaluation. D. POLVERARI, CTO October 06-07 2008 Data Integration and Decision-Making For Biomarkers Discovery, Validation and Evaluation D. POLVERARI, CTO October 06-07 2008 Data integration definition and aims Definition : Data integration consists

More information

A Strategy for Plant Breeding Data Management in International Agricultural Research

A Strategy for Plant Breeding Data Management in International Agricultural Research A Strategy for Plant Breeding Data Management in International Agricultural Research Introduction Exchange of germplasm boosted crop improvement for subsistence agriculture during the 70s and 80s, and

More information

Data Integration. Lectures 16 & 17. ECS289A, WQ03, Filkov

Data Integration. Lectures 16 & 17. ECS289A, WQ03, Filkov Data Integration Lectures 16 & 17 Lectures Outline Goals for Data Integration Homogeneous data integration time series data (Filkov et al. 2002) Heterogeneous data integration microarray + sequence microarray

More information

The i2b2 Hive and the Clinical Research Chart

The i2b2 Hive and the Clinical Research Chart The i2b2 Hive and the Clinical Research Chart Henry Chueh Shawn Murphy The i2b2 Hive is centered around two concepts. The first concept is the existence of services provided by applications that are wrapped

More information

AN INTEGRATION APPROACH FOR THE STATISTICAL INFORMATION SYSTEM OF ISTAT USING SDMX STANDARDS

AN INTEGRATION APPROACH FOR THE STATISTICAL INFORMATION SYSTEM OF ISTAT USING SDMX STANDARDS Distr. GENERAL Working Paper No.2 26 April 2007 ENGLISH ONLY UNITED NATIONS STATISTICAL COMMISSION and ECONOMIC COMMISSION FOR EUROPE CONFERENCE OF EUROPEAN STATISTICIANS EUROPEAN COMMISSION STATISTICAL

More information

So, we have our map... ...Now, we want to see how it looks compared to someone else's maps. Wheat-CAP. Marcelo Soria (masoria@ucdavis.

So, we have our map... ...Now, we want to see how it looks compared to someone else's maps. Wheat-CAP. Marcelo Soria (masoria@ucdavis. So, we have our map......now, we want to see how it looks compared to someone else's maps. CMap First developed for Gramene, now is part of the GMOD project. CMap is the map visualization and comparison

More information

Module 1. Sequence Formats and Retrieval. Charles Steward

Module 1. Sequence Formats and Retrieval. Charles Steward The Open Door Workshop Module 1 Sequence Formats and Retrieval Charles Steward 1 Aims Acquaint you with different file formats and associated annotations. Introduce different nucleotide and protein databases.

More information

Enabling Collaboration Using the Biomedical Informatics Research Network (BIRN):

Enabling Collaboration Using the Biomedical Informatics Research Network (BIRN): Enabling Collaboration Using the Biomedical Informatics Research Network (BIRN): Karl Helmer Ph.D. Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital June 4, 2010 BIRN

More information

Introductory genetics for veterinary students

Introductory genetics for veterinary students Introductory genetics for veterinary students Michel Georges Introduction 1 References Genetics Analysis of Genes and Genomes 7 th edition. Hartl & Jones Molecular Biology of the Cell 5 th edition. Alberts

More information

Data Integration of Bioinformatics and Web-Based Software Development

Data Integration of Bioinformatics and Web-Based Software Development Integration of Biological XML data Ph. D. Lecture Bioinformatics & Software Systems Lab. Woo-Hyuk Jang Information and Communications Univ. Where are we? Client-Side Info. Management Business related Issues

More information

Delivering the power of the world s most successful genomics platform

Delivering the power of the world s most successful genomics platform Delivering the power of the world s most successful genomics platform NextCODE Health is bringing the full power of the world s largest and most successful genomics platform to everyday clinical care NextCODE

More information

Software Description Technology

Software Description Technology Software applications using NCB Technology. Software Description Technology LEX Provide learning management system that is a central resource for online medical education content and computer-based learning

More information

Introduction course leader and module leaders

Introduction course leader and module leaders Introduction course leader and module leaders Plant Breeding 2016 2018 Plant breeding course 2016-2018 The plant breeding course is led by Prof. Richard Visser. He is leading instructor in plant breeding

More information

Usability in bioinformatics mobile applications

Usability in bioinformatics mobile applications Usability in bioinformatics mobile applications what we are working on Noura Chelbah, Sergio Díaz, Óscar Torreño, and myself Juan Falgueras App name Performs Advantajes Dissatvantajes Link The problem

More information

BUILDING OLAP TOOLS OVER LARGE DATABASES

BUILDING OLAP TOOLS OVER LARGE DATABASES BUILDING OLAP TOOLS OVER LARGE DATABASES Rui Oliveira, Jorge Bernardino ISEC Instituto Superior de Engenharia de Coimbra, Polytechnic Institute of Coimbra Quinta da Nora, Rua Pedro Nunes, P-3030-199 Coimbra,

More information

Human Genome and Human Genome Project. Louxin Zhang

Human Genome and Human Genome Project. Louxin Zhang Human Genome and Human Genome Project Louxin Zhang A Primer to Genomics Cells are the fundamental working units of every living systems. DNA is made of 4 nucleotide bases. The DNA sequence is the particular

More information

The human gene encoding Glucose-6-phosphate dehydrogenase (G6PD) is located on chromosome X in cytogenetic band q28.

The human gene encoding Glucose-6-phosphate dehydrogenase (G6PD) is located on chromosome X in cytogenetic band q28. Tutorial Module 5 BioMart You will learn about BioMart, a joint project developed and maintained at EBI and OiCR www.biomart.org How to use BioMart to quickly obtain lists of gene information from Ensembl

More information

Analysis of ChIP-seq data in Galaxy

Analysis of ChIP-seq data in Galaxy Analysis of ChIP-seq data in Galaxy November, 2012 Local copy: https://galaxy.wi.mit.edu/ Joint project between BaRC and IT Main site: http://main.g2.bx.psu.edu/ 1 Font Conventions Bold and blue refers

More information

Analysis and Integration of Big Data from Next-Generation Genomics, Epigenomics, and Transcriptomics

Analysis and Integration of Big Data from Next-Generation Genomics, Epigenomics, and Transcriptomics Analysis and Integration of Big Data from Next-Generation Genomics, Epigenomics, and Transcriptomics Christopher Benner, PhD Director, Integrative Genomics and Bioinformatics Core (IGC) idash Webinar,

More information

Alison Yao, Ph.D. July 2014

Alison Yao, Ph.D. July 2014 * Alison Yao, Ph.D. Program Officer, Office of Genomics and Advanced Technologies Division of Microbiology and Infectious Diseases National Institute of Allergy and Infectious Diseases National Institutes

More information

The National Plant Genome Initiative

The National Plant Genome Initiative Research Challenges and Resource Needs in Cyberinfrastructure & Bioinformatics: BIG DATA in Plant Genomics The National Plant Genome Initiative Interagency Working Group on Plant Genomics Diane Jofuku

More information

Chapter 8: Recombinant DNA 2002 by W. H. Freeman and Company Chapter 8: Recombinant DNA 2002 by W. H. Freeman and Company

Chapter 8: Recombinant DNA 2002 by W. H. Freeman and Company Chapter 8: Recombinant DNA 2002 by W. H. Freeman and Company Genetic engineering: humans Gene replacement therapy or gene therapy Many technical and ethical issues implications for gene pool for germ-line gene therapy what traits constitute disease rather than just

More information

Database schema documentation for SNPdbe

Database schema documentation for SNPdbe Database schema documentation for SNPdbe Changes 02/27/12: seqs_containingsnps.taxid removed dbsnp_snp.tax_id renamed to dbsnp_snp.taxid General information: Data in SNPdbe is organized on several levels.

More information

Data Warehouse and Hive. Presented By: Shalva Gelenidze Supervisor: Nodar Momtselidze

Data Warehouse and Hive. Presented By: Shalva Gelenidze Supervisor: Nodar Momtselidze Data Warehouse and Hive Presented By: Shalva Gelenidze Supervisor: Nodar Momtselidze Decision support systems Decision Support Systems allowed managers, supervisors, and executives to once again see the

More information

Management von Forschungsprimärdaten und DOI Registrierung. Dr. Matthias Lange (Bioinformatics & Information Technology) June 19 th, 2013

Management von Forschungsprimärdaten und DOI Registrierung. Dr. Matthias Lange (Bioinformatics & Information Technology) June 19 th, 2013 Management von Forschungsprimärdaten und DOI Registrierung Dr. Matthias Lange (Bioinformatics & Information Technology) June 19 th, 2013 Outline Motivation: IPK data infrastructure LIMS: Integration of

More information