Distributed Data Mining in Discovery Net. Dr. Moustafa Ghanem Department of Computing Imperial College London
|
|
- Augustus Fleming
- 8 years ago
- Views:
Transcription
1 Distributed Data Mining in Discovery Net Dr. Moustafa Ghanem Department of Computing Imperial College London
2 1. What is Discovery Net 2. Distributed Data Mining for Compute Intensive Tasks 3. Distributed Data Mining for Sensor Grids 4. Knowledge Discovery from Naturally Distributed Data Sources 5. What Do Scientists Really Want?
3 1. What is Discovery Net
4 What is Discovery Net? Funding : One of the eight UK national e-science Pilot Projects funded by EPSRC ( 2.2M) Start Oct 2001, End March 2005 Goal :Construct the World s first Infrastructure for Global Knowledge Discovery Services Key Technologies: Open Service Computing High Throughput Devices and Real Time Data Mining Real Time Data Integration & Information Structuring Cross Domain Knowledge Discovery and Management Discovery Workflow and Discovery Planning
5 Discovery Net Applications Life Sciences High throughput genomics and proteomics Distributed Databases and Applications Environmental Modelling High throughput dispersed air sensing technology Sensor Grids A B C D E F G H I J K M L N Real time geo-hazard modelling Earthquake modelling through satellite imagery High performance Distributed Computation
6 Discovery Net Architecture DPML Web/Grid Services OGSA D-Net Clients: End-user applications and user interface allowing scientists to construct and drive knowledge discovery activities D-Net Middleware: Provides services and execution logic for distributed knowledge discovery and access to distributed resources and services High Performance Communication Protocol (GridFTP, DSTP..) Grid Infrastructure (GSI) Goal: Plug & Play Data Sources, Analysis Components & Knowledge Discovery Processes Computation & Data Resources: Distributed databases, compute servers and scientific devices.
7 Discovery Net Data Mining Components Generic Data Mining Classification, Clustering, Associations,.. Unstructured-Data Mining Text Mining, Image Mining Domain-specific Mining Bioinformatics, Cheminformatics,..
8 2. Distribution of Compute Intensive Tasks a. Distributed Data Mining for Geo-hazard Prediction
9 Grid-based Geo-hazard Data Mining Grid-based HPC Computation Automatically co-register a stack of imagery layers at high precision and speed. Workflow to Coordinate Grid Computation Data Warehousing & Modelling Co-registration & geo-rectification Image features extraction Cluster & classification Grid-based Data Access and Integration
10 Normalised cross-correlation (NCC) template algorithm Image before Image after Reading Data set Setting comparing window Significant correlation coefficient Reading Data set Setting search window Setting comparing window N Operating on a remotely accessed MPI UNIX parallel computer through fast network with DNet interface. Slow but high accuracy: 24 processors 10 hours for one scene of Landsat-7 ETM+ Pan imagery data. The algorithm also run on GRID. Y Delta X Delta X Correlation coefficient
11
12 2. Distribution of Compute Intensive Tasks b. Distributed Clustering
13 Workflows for Distributed Data Clustering
14 3. Distributed Mining over Sensor Grid Data Distributed Spatial Data Mining for Air Pollution Modelling
15 Sensor Specification The GUSTO Project - Update (Generic UV Sensors Technologies & Observations) High throughput open path spectrometer system Robust algorithm for pollutant concentration retrievals Measures SO2, NO, NO2,O3 & Benzene to ppb levels every few seconds Geared for networking of multiple GUSTO units within a GRID Infrastructure Can support Remote Sensing data for (contour) mapping of pollutants
16 Networking of Multiple GUSTO Units GUSTO unit 1 GUSTO unit 2 GUSTO unit 3 GUSTO unit 4 HTTP, SOAP, GSI Wireless connectivity Data upload service Sensor registry & control service SensorML Archived weather data Warehouse Archived health data Data access service Monitoring and control software HTTP, SOAP, GSI Public access Web visualizer Visualisation and Data Mining GRID Infrastructure
17 Pollution analysis
18
19
20
21 4. Knowledge Discovery from Naturally Distributed Data Sources Distributed Data Mining in Life Sciences
22 Distributed Data Mining for Life Sciences secondary structure tertiary structure polymorphism patient records epidemiology expression patterns physiology sequences alignments ATGCAAGTCCCT AAGATTGCATAA GCTCGCTCAGTT receptors signals pathways linkage maps cytogenetic maps physical maps
23 Information Integration Gene Expression Warehouse OMIM ExPASy SwissProt PDB ExPASy Enzyme Disease Protein Enzyme Affy Fragment LocusLink Known Gene MGD Sequence Metabolite Sequence Cluster SNP Pathway SPAD Genbank NCBI dbsnp KEGG NMR UniGene Given a collection of microarray generated gene expression data, what kind of questions the users wish to pose. Design an integration schema?
24 From Data Integration to Knowledge Unification In Silico Experiment D-World I-World K-World
25 Life Science Application: SC2002 HPC Challenge High Throughput Sequencers Identify Organism Chromosomes Identify Organism s DNA D-Net based Global Collaborative Real- Time Genome Annotation Nucleotide-level Annotation Genes Gene markers Regulatory Regions Segmental Duplication Literature References trnas, rrnas Non-translated RNAs Repetitive Elements SNP Variations.. EMBL TIGR NCBI SNP genscan grail E-PCR blast Repeat Masker genscan Protein-level Annotation Identify Proteins Functional Characteisation Domain Classify into Protein Families Homologues 3-D Structure Inter Pro SMART Inter Pro SWISS PROT blast PFAM 3D-PSSM Motif Search Genome Annotation Fold Prediction Literature References Secondary structure.. predator DSC Process-level Annotation Relate Cell Cycle Drugs Cell death Literature References Metabolism Biological Process.. Embryogenesis.. GO KEGG CSNDB GK Pathway Maps AmiGO virtual chip Ontologies GeneMaps GenNav 15 DBs 21 Applications
26 HPC Challenge SC2002 Download sequence from Reference Server Nucleotide Annotation Workflows Interactive Editor & Visualisation Real-time sequencing in London Inter Pro EMBL SMART NCBI KEGG SWISS PROT Save to Distributed Annotation Server TIGR SNP GO Distributed data and computation 1800 clicks 500 Web access 200 copy/paste 3 weeks work in 1 workflow and few second execution Execute distributed annotation workflow
27 Discovery Net in Action: China SARS Virtual Lab Homology search against viral genome DB Homology search against protein DB Genbank Annotation using Artemis and GenSense Gene prediction Predicted genes Annotation using Artemis and GenSense Homology search against motif DB Key word search GeneSense Ontology Exon prediction Splice site prediction Multiple sequence alignment Phylogenetic analysis Immunogenetics D-Net: Integration, interpretation, and discovery Relationship between SARS and other virus Mutual regions identification Protein localization site prediction Protein interaction prediction Relationship between SARS virus and human receptors prediction Microarray analysis Epidemiological analysis SARS patients diagnosis Classification and secondary structure prediction Bibliographic databases Bibliographic databases
28 Discovery Net in Action: SARS Virus Mutation Analysis
29 5. What do Scientist Really Want? Does it really work?
30 Towards Compositional Grid Services Native MPI OGSA-service Condor-G Web Service Service Browsing Workflow Warehousing Workflow Authoring Composing services Resource Mapping Sun Grid Engine Service Abstraction Oralce 10g Unicore Workflow Execution A compositional GRID Web Wrapper Workflow Management Collaborative Knowledge Management Workflow Deployment: Grid Service and Portal
31 Discovery Net Service Composition
32 Full Workflow
33 Executing Protein Annotation Workflow
34 Deployment of Node
35 Deploying Protein Annotation Workflow
36 Executing Deployed Service
37 Locating & Executing Deployed Service from Discovery Net
38 Workflow Provenance
39 Workflow Warehousing
40 Discovery Net Snapshot Scientific Information In Real Time Scientific Discovery Literature Real Time Data Integration Discovery Services Service Workflow Databases Operational Data Dynamic Application Integration Integrative Knowledge Management Using Distributed Resources Images Instrument Data
Using the Grid for the interactive workflow management in biomedicine. Andrea Schenone BIOLAB DIST University of Genova
Using the Grid for the interactive workflow management in biomedicine Andrea Schenone BIOLAB DIST University of Genova overview background requirements solution case study results background A multilevel
More informationRETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison
RETRIEVING SEQUENCE INFORMATION Nucleotide sequence databases Database search Sequence alignment and comparison Biological sequence databases Originally just a storage place for sequences. Currently the
More informationJust the Facts: A Basic Introduction to the Science Underlying NCBI Resources
1 of 8 11/7/2004 11:00 AM National Center for Biotechnology Information About NCBI NCBI at a Glance A Science Primer Human Genome Resources Model Organisms Guide Outreach and Education Databases and Tools
More informationA Primer of Genome Science THIRD
A Primer of Genome Science THIRD EDITION GREG GIBSON-SPENCER V. MUSE North Carolina State University Sinauer Associates, Inc. Publishers Sunderland, Massachusetts USA Contents Preface xi 1 Genome Projects:
More informationBioinformatics Grid - Enabled Tools For Biologists.
Bioinformatics Grid - Enabled Tools For Biologists. What is Grid-Enabled Tools (GET)? As number of data from the genomics and proteomics experiment increases. Problems arise for the current sequence analysis
More informationUsing Ontologies in Proteus for Modeling Data Mining Analysis of Proteomics Experiments
Using Ontologies in Proteus for Modeling Data Mining Analysis of Proteomics Experiments Mario Cannataro, Pietro Hiram Guzzi, Tommaso Mazza, and Pierangelo Veltri University Magna Græcia of Catanzaro, 88100
More informationCore Bioinformatics. Degree Type Year Semester. 4313473 Bioinformàtica/Bioinformatics OB 0 1
Core Bioinformatics 2014/2015 Code: 42397 ECTS Credits: 12 Degree Type Year Semester 4313473 Bioinformàtica/Bioinformatics OB 0 1 Contact Name: Sònia Casillas Viladerrams Email: Sonia.Casillas@uab.cat
More informationVad är bioinformatik och varför behöver vi det i vården? a bioinformatician's perspectives
Vad är bioinformatik och varför behöver vi det i vården? a bioinformatician's perspectives Dirk.Repsilber@oru.se 2015-05-21 Functional Bioinformatics, Örebro University Vad är bioinformatik och varför
More informationA Multiple DNA Sequence Translation Tool Incorporating Web Robot and Intelligent Recommendation Techniques
Proceedings of the 2007 WSEAS International Conference on Computer Engineering and Applications, Gold Coast, Australia, January 17-19, 2007 402 A Multiple DNA Sequence Translation Tool Incorporating Web
More informationOracle PharmaGRID Response. Dave Pearson Oracle Corporation UK
Oracle PharmaGRID Response Dave Pearson Oracle Corporation UK Grid Concepts and Vision! Everything is a service! Resource virtualisation and sharing Hardware, storage, network, data, function, instruments
More informationEMBL Identity & Access Management
EMBL Identity & Access Management Rupert Lück EMBL Heidelberg e IRG Workshop Zürich Apr 24th 2008 Outline EMBL Overview Identity & Access Management for EMBL IT Requirements & Strategy Project Goal and
More informationLinear Sequence Analysis. 3-D Structure Analysis
Linear Sequence Analysis What can you learn from a (single) protein sequence? Calculate it s physical properties Molecular weight (MW), isoelectric point (pi), amino acid content, hydropathy (hydrophilic
More informationTHE CCLRC DATA PORTAL
THE CCLRC DATA PORTAL Glen Drinkwater, Shoaib Sufi CCLRC Daresbury Laboratory, Daresbury, Warrington, Cheshire, WA4 4AD, UK. E-mail: g.j.drinkwater@dl.ac.uk, s.a.sufi@dl.ac.uk Abstract: The project aims
More informationCloud-Based Big Data Analytics in Bioinformatics
Cloud-Based Big Data Analytics in Bioinformatics Presented By Cephas Mawere Harare Institute of Technology, Zimbabwe 1 Introduction 2 Big Data Analytics Big Data are a collection of data sets so large
More informationIEEE International Conference on Computing, Analytics and Security Trends CAST-2016 (19 21 December, 2016) Call for Paper
IEEE International Conference on Computing, Analytics and Security Trends CAST-2016 (19 21 December, 2016) Call for Paper CAST-2015 provides an opportunity for researchers, academicians, scientists and
More informationSyllabus of B.Sc. (Bioinformatics) Subject- Bioinformatics (as one subject) B.Sc. I Year Semester I Paper I: Basic of Bioinformatics 85 marks
Syllabus of B.Sc. (Bioinformatics) Subject- Bioinformatics (as one subject) B.Sc. I Year Semester I Paper I: Basic of Bioinformatics 85 marks Semester II Paper II: Mathematics I 85 marks B.Sc. II Year
More informationIntegrating Bioinformatics, Medical Sciences and Drug Discovery
Integrating Bioinformatics, Medical Sciences and Drug Discovery M. Madan Babu Centre for Biotechnology, Anna University, Chennai - 600025 phone: 44-4332179 :: email: madanm1@rediffmail.com Bioinformatics
More informationProcessing Genome Data using Scalable Database Technology. My Background
Johann Christoph Freytag, Ph.D. freytag@dbis.informatik.hu-berlin.de http://www.dbis.informatik.hu-berlin.de Stanford University, February 2004 PhD @ Harvard Univ. Visiting Scientist, Microsoft Res. (2002)
More informationBIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS
BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS NEW YORK CITY COLLEGE OF TECHNOLOGY The City University Of New York School of Arts and Sciences Biological Sciences Department Course title:
More informationBioinformatics Resources at a Glance
Bioinformatics Resources at a Glance A Note about FASTA Format There are MANY free bioinformatics tools available online. Bioinformaticists have developed a standard format for nucleotide and protein sequences
More informationLecture 11 Data storage and LIMS solutions. Stéphane LE CROM lecrom@biologie.ens.fr
Lecture 11 Data storage and LIMS solutions Stéphane LE CROM lecrom@biologie.ens.fr Various steps of a DNA microarray experiment Experimental steps Data analysis Experimental design set up Chips on catalog
More informationA Practitioner's G uide to Data Management and Data Integration in Bioinformatics
3 CHAPTER A Practitioner's G uide to Data Management and Data Integration in Bioinformatics Barbara A. Eckman 3.1 INTRODUCTION Integration of a large and widely diverse set of data sources and analytical
More informationGuide for Bioinformatics Project Module 3
Structure- Based Evidence and Multiple Sequence Alignment In this module we will revisit some topics we started to look at while performing our BLAST search and looking at the CDD database in the first
More informationTeaching Bioinformatics to Undergraduates
Teaching Bioinformatics to Undergraduates http://www.med.nyu.edu/rcr/asm Stuart M. Brown Research Computing, NYU School of Medicine I. What is Bioinformatics? II. Challenges of teaching bioinformatics
More informationBIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology http://tinyurl.com/bioinf525-w16
Course Director: Dr. Barry Grant (DCM&B, bjgrant@med.umich.edu) Description: This is a three module course covering (1) Foundations of Bioinformatics, (2) Statistics in Bioinformatics, and (3) Systems
More informationBioinformatics: course introduction
Bioinformatics: course introduction Filip Železný Czech Technical University in Prague Faculty of Electrical Engineering Department of Cybernetics Intelligent Data Analysis lab http://ida.felk.cvut.cz
More informationREGULATIONS FOR THE DEGREE OF MASTER OF SCIENCE IN COMPUTER SCIENCE (MSc[CompSc])
244 REGULATIONS FOR THE DEGREE OF MASTER OF SCIENCE IN COMPUTER SCIENCE (MSc[CompSc]) (See also General Regulations) Any publication based on work approved for a higher degree should contain a reference
More informationHETEROGENEOUS DATA INTEGRATION FOR CLINICAL DECISION SUPPORT SYSTEM. Aniket Bochare - aniketb1@umbc.edu. CMSC 601 - Presentation
HETEROGENEOUS DATA INTEGRATION FOR CLINICAL DECISION SUPPORT SYSTEM Aniket Bochare - aniketb1@umbc.edu CMSC 601 - Presentation Date-04/25/2011 AGENDA Introduction and Background Framework Heterogeneous
More informationAn approach to grid scheduling by using Condor-G Matchmaking mechanism
An approach to grid scheduling by using Condor-G Matchmaking mechanism E. Imamagic, B. Radic, D. Dobrenic University Computing Centre, University of Zagreb, Croatia {emir.imamagic, branimir.radic, dobrisa.dobrenic}@srce.hr
More informationData Integration and Decision-Making For Biomarkers Discovery, Validation and Evaluation. D. POLVERARI, CTO October 06-07 2008
Data Integration and Decision-Making For Biomarkers Discovery, Validation and Evaluation D. POLVERARI, CTO October 06-07 2008 Data integration definition and aims Definition : Data integration consists
More informationUsability in bioinformatics mobile applications
Usability in bioinformatics mobile applications what we are working on Noura Chelbah, Sergio Díaz, Óscar Torreño, and myself Juan Falgueras App name Performs Advantajes Dissatvantajes Link The problem
More informationSeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications
Product Bulletin Sequencing Software SeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications Comprehensive reference sequence handling Helps interpret the role of each
More informationEuro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences
Euro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences WP11 Data Storage and Analysis Task 11.1 Coordination Deliverable 11.2 Community Needs of
More informationDescribing Web Services for user-oriented retrieval
Describing Web Services for user-oriented retrieval Duncan Hull, Robert Stevens, and Phillip Lord School of Computer Science, University of Manchester, Oxford Road, Manchester, UK. M13 9PL Abstract. As
More informationA Platform for Collaborative e-science Applications. Marian Bubak ICS / Cyfronet AGH Krakow, PL bubak@agh.edu.pl
A Platform for Collaborative e-science Applications Marian Bubak ICS / Cyfronet AGH Krakow, PL bubak@agh.edu.pl Outline Motivation Idea of an experiment Virtual laboratory Examples of experiments Summary
More informationFinal Project Report
CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes
More informationThe Human Genome Project
The Human Genome Project Brief History of the Human Genome Project Physical Chromosome Maps Genetic (or Linkage) Maps DNA Markers Sequencing and Annotating Genomic DNA What Have We learned from the HGP?
More informationApplications des grilles aux sciences du vivant
Institut des Grilles du CNRS Applications des grilles aux sciences du vivant V. Breton Credit: A. Da Costa, P. De Vlieger, J. Salzemann Introduction Grid technology provides services to do science differently,
More informationCluster, Grid, Cloud Concepts
Cluster, Grid, Cloud Concepts Kalaiselvan.K Contents Section 1: Cluster Section 2: Grid Section 3: Cloud Cluster An Overview Need for a Cluster Cluster categorizations A computer cluster is a group of
More informationBiological Databases and Protein Sequence Analysis
Biological Databases and Protein Sequence Analysis Introduction M. Madan Babu, Center for Biotechnology, Anna University, Chennai 25, India Bioinformatics is the application of Information technology to
More informationAGILENT S BIOINFORMATICS ANALYSIS SOFTWARE
ACCELERATING PROGRESS IS IN OUR GENES AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE GENESPRING GENE EXPRESSION (GX) MASS PROFILER PROFESSIONAL (MPP) PATHWAY ARCHITECT (PA) See Deeper. Reach Further. BIOINFORMATICS
More informationBBSRC TECHNOLOGY STRATEGY: TECHNOLOGIES NEEDED BY RESEARCH KNOWLEDGE PROVIDERS
BBSRC TECHNOLOGY STRATEGY: TECHNOLOGIES NEEDED BY RESEARCH KNOWLEDGE PROVIDERS 1. The Technology Strategy sets out six areas where technological developments are required to push the frontiers of knowledge
More informationNovel Mining of Cancer via Mutation in Tumor Protein P53 using Quick Propagation Network
Novel Mining of Cancer via Mutation in Tumor Protein P53 using Quick Propagation Network Ayad. Ghany Ismaeel, and Raghad. Zuhair Yousif Abstract There is multiple databases contain datasets of TP53 gene
More informationProtein Protein Interaction Networks
Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics
More informationAn Introduction to Genomics and SAS Scientific Discovery Solutions
An Introduction to Genomics and SAS Scientific Discovery Solutions Dr Karen M Miller Product Manager Bioinformatics SAS EMEA 16.06.03 Copyright 2003, SAS Institute Inc. All rights reserved. 1 Overview!
More informationThree data delivery cases for EMBL- EBI s Embassy. Guy Cochrane www.ebi.ac.uk
Three data delivery cases for EMBL- EBI s Embassy Guy Cochrane www.ebi.ac.uk EMBL European Bioinformatics Institute Genes, genomes & variation European Nucleotide Archive 1000 Genomes Ensembl Ensembl Genomes
More informationHow To Use The Assembly Database In A Microarray (Perl) With A Microarcode) (Perperl 2) (For Macrogenome) (Genome 2)
The Ensembl Core databases and API Useful links Installation instructions: http://www.ensembl.org/info/docs/api/api_installation.html Schema description: http://www.ensembl.org/info/docs/api/core/core_schema.html
More informationDr Alexander Henzing
Horizon 2020 Health, Demographic Change & Wellbeing EU funding, research and collaboration opportunities for 2016/17 Innovate UK funding opportunities in omics, bridging health and life sciences Dr Alexander
More informationThe Lattice Project: A Multi-Model Grid Computing System. Center for Bioinformatics and Computational Biology University of Maryland
The Lattice Project: A Multi-Model Grid Computing System Center for Bioinformatics and Computational Biology University of Maryland Parallel Computing PARALLEL COMPUTING a form of computation in which
More informationFrom Data to Foresight:
Laura Haas, IBM Fellow IBM Research - Almaden From Data to Foresight: Leveraging Data and Analytics for Materials Research 1 2011 IBM Corporation The road from data to foresight is long? Consumer Reports
More informationSGI. High Throughput Computing (HTC) Wrapper Program for Bioinformatics on SGI ICE and SGI UV Systems. January, 2012. Abstract. Haruna Cofer*, PhD
White Paper SGI High Throughput Computing (HTC) Wrapper Program for Bioinformatics on SGI ICE and SGI UV Systems Haruna Cofer*, PhD January, 2012 Abstract The SGI High Throughput Computing (HTC) Wrapper
More informationBig Data in BioMedical Sciences. Steven Newhouse, Head of Technical Services, EMBL-EBI
Big Data in BioMedical Sciences Steven Newhouse, Head of Technical Services, EMBL-EBI Big Data for BioMedical Sciences EMBL-EBI: What we do and why? Challenges & Opportunities Infrastructure Requirements
More informationAnwendungsintegration und Workflows mit UNICORE 6
Mitglied der Helmholtz-Gemeinschaft Anwendungsintegration und Workflows mit UNICORE 6 Bernd Schuller und UNICORE-Team Jülich Supercomputing Centre, Forschungszentrum Jülich GmbH 26. November 2009 D-Grid
More informationOn Enabling Hydrodynamics Data Analysis of Analytical Ultracentrifugation Experiments
On Enabling Hydrodynamics Data Analysis of Analytical Ultracentrifugation Experiments 18. June 2013 Morris Reidel, Shahbaz Memon, et al. Outline Background Ultrascan Application Ultrascan Software Components
More informationAn agent-based layered middleware as tool integration
An agent-based layered middleware as tool integration Flavio Corradini Leonardo Mariani Emanuela Merelli University of L Aquila University of Milano University of Camerino ITALY ITALY ITALY Helsinki FSE/ESEC
More informationBIO 3352: BIOINFORMATICS II HYBRID COURSE SYLLABUS
BIO 3352: BIOINFORMATICS II HYBRID COURSE SYLLABUS NEW YORK CITY COLLEGE OF TECHNOLOGY The City University Of New York School of Arts and Sciences Biological Sciences Department Course title: Bioinformatics
More informationWeb-Based Genomic Information Integration with Gene Ontology
Web-Based Genomic Information Integration with Gene Ontology Kai Xu 1 IMAGEN group, National ICT Australia, Sydney, Australia, kai.xu@nicta.com.au Abstract. Despite the dramatic growth of online genomic
More informationGenome Science Education for Engineering Majors
Genome Science Education for Engineering Majors Leslie Guadron 1, Alen M. Sajan 2, Olivia Plante 3, Stanley George 4, Yuying Gosser 5 1. Biomedical Engineering Junior, Peer-Leader, President of the Genomics
More informationWhat s New in Pathway Studio Web 11.1
1 1 What s New in Pathway Studio Web 11.1 Elseiver is pleased to announce the release of Pathway Studio Web 11.1 for all database subscriptions (Mammal, Mammal+ChemEffect+DiseaseFx, Plant). This release
More informationDesign of a Scientic Workow for the Analysis of Microarray experiments with Taverna and R
Design of a Scientic Workow for the Analysis of Microarray experiments with Taverna and R Marcus Ertelt Proposal for a diploma thesis December 2006 - May 2007 referees: Prof. Dr. Ulf Leser, PD Dr. Wolfgang
More informationIO Informatics The Sentient Suite
IO Informatics The Sentient Suite Our software, The Sentient Suite, allows a user to assemble, view, analyze and search very disparate information in a common environment. The disparate data can be numeric
More informationBig Data Europe
BIG DATA EUROPE SC1 Hangout Big Data Challenge in Health www.big-data-europe.eu Empowering Communities with Data Technologies Agenda for Today Welcome! Brief into and background (OPF) Introduction to the
More informationThe Integrated Microbial Genomes (IMG) System: A Case Study in Biological Data Management
The Integrated Microbial Genomes (IMG) System: A Case Study in Biological Data Management Victor M. Markowitz 1, Frank Korzeniewski 1, Krishna Palaniappan 1, Ernest Szeto 1, Natalia Ivanova 2, and Nikos
More informationPARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN
1 PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN Introduction What is cluster computing? Classification of Cluster Computing Technologies: Beowulf cluster Construction
More informationBig Data in BioMedical Sciences. Steven Newhouse, Head of Technical Services, EMBL-EBI
Big Data in BioMedical Sciences Steven Newhouse, Head of Technical Services, EMBL-EBI Big Data for BioMedical Sciences EMBL-EBI: What we do and why? Challenges & Opportunities Infrastructure Requirements
More informationIntroduction to Genome Annotation
Introduction to Genome Annotation AGCGTGGTAGCGCGAGTTTGCGAGCTAGCTAGGCTCCGGATGCGA CCAGCTTTGATAGATGAATATAGTGTGCGCGACTAGCTGTGTGTT GAATATATAGTGTGTCTCTCGATATGTAGTCTGGATCTAGTGTTG GTGTAGATGGAGATCGCGTAGCGTGGTAGCGCGAGTTTGCGAGCT
More informationTutorial for proteome data analysis using the Perseus software platform
Tutorial for proteome data analysis using the Perseus software platform Laboratory of Mass Spectrometry, LNBio, CNPEM Tutorial version 1.0, January 2014. Note: This tutorial was written based on the information
More informationSearch and Data Mining: Techniques. Applications Anya Yarygina Boris Novikov
Search and Data Mining: Techniques Applications Anya Yarygina Boris Novikov Introduction Data mining applications Data mining system products and research prototypes Additional themes on data mining Social
More informationEfficient Parallel Execution of Sequence Similarity Analysis Via Dynamic Load Balancing
Efficient Parallel Execution of Sequence Similarity Analysis Via Dynamic Load Balancing James D. Jackson Philip J. Hatcher Department of Computer Science Kingsbury Hall University of New Hampshire Durham,
More informationEMBL-EBI Industry Programme Workshop, 26th to 27th November 2012. Data Infrastructure for Omics-based Chemical Safety.
EMBL-EBI Industry Programme Workshop, 26th to 27th November 2012. Data Infrastructure for Omics-based Chemical Safety Danyel Jennen The systems toxicology approach Cf. Waters & Fostel. Toxicogenomics and
More informationCloud Computing Solutions for Genomics Across Geographic, Institutional and Economic Barriers
Cloud Computing Solutions for Genomics Across Geographic, Institutional and Economic Barriers Ntinos Krampis Asst. Professor J. Craig Venter Institute kkrampis@jcvi.org http://www.jcvi.org/cms/about/bios/kkrampis/
More informationIntegrated Rule-based Data Management System for Genome Sequencing Data
Integrated Rule-based Data Management System for Genome Sequencing Data A Research Data Management (RDM) Green Shoots Pilots Project Report by Michael Mueller, Simon Burbidge, Steven Lawlor and Jorge Ferrer
More informationorg.rn.eg.db December 16, 2015 org.rn.egaccnum is an R object that contains mappings between Entrez Gene identifiers and GenBank accession numbers.
org.rn.eg.db December 16, 2015 org.rn.egaccnum Map Entrez Gene identifiers to GenBank Accession Numbers org.rn.egaccnum is an R object that contains mappings between Entrez Gene identifiers and GenBank
More informationSequence Formats and Sequence Database Searches. Gloria Rendon SC11 Education June, 2011
Sequence Formats and Sequence Database Searches Gloria Rendon SC11 Education June, 2011 Sequence A is the primary structure of a biological molecule. It is a chain of residues that form a precise linear
More informationA W orkflow Management System for Bioinformatics Grid
A W orkflow Management System for Bioinformatics Grid Giovanni Aloisio, Massimo Cafaro, Sandro Fiore, Maria Mirto C A C T/IS U FI SP A CI, University of Lecce and NNL/INFM&CNR,Italy NETTAB 2005, 5-7 October
More informationScientific and Technical Applications as a Service in the Cloud
Scientific and Technical Applications as a Service in the Cloud University of Bern, 28.11.2011 adapted version Wibke Sudholt CloudBroker GmbH Technoparkstrasse 1, CH-8005 Zurich, Switzerland Phone: +41
More informationA Reliable and Fast Data Transfer for Grid Systems Using a Dynamic Firewall Configuration
A Reliable and Fast Data Transfer for Grid Systems Using a Dynamic Firewall Configuration Thomas Oistrez Research Centre Juelich Juelich Supercomputing Centre August 21, 2008 1 / 16 Overview 1 UNICORE
More informationDatabases and mapping BWA. Samtools
Databases and mapping BWA Samtools FASTQ, SFF, bax.h5 ACE, FASTG FASTA BAM/SAM GFF, BED GenBank/Embl/DDJB many more File formats FASTQ Output format from Illumina and IonTorrent sequencers. Quality scores:
More informationThe data explosion is transforming science
Talk Outline The data tsunami and the 4 th paradigm of science The challenges for the long tail of science Where is the cloud being used now? The app marketplace SMEs Analytics as a service. What are the
More informationData search and visualization tools at the Comparative Evolutionary Genomics of Cotton Web resource
Data search and visualization tools at the Comparative Evolutionary Genomics of Cotton Web resource Alan R. Gingle Andrew H. Paterson Joshua A. Udall Jonathan F. Wendel 1 CEGC project goals set the context
More informationHuman Genome and Human Genome Project. Louxin Zhang
Human Genome and Human Genome Project Louxin Zhang A Primer to Genomics Cells are the fundamental working units of every living systems. DNA is made of 4 nucleotide bases. The DNA sequence is the particular
More informationSequencing the Human Genome
Revised and Updated Edvo-Kit #339 Sequencing the Human Genome 339 Experiment Objective: In this experiment, students will read DNA sequences obtained from automated DNA sequencing techniques. The data
More informationTechnical. Overview. ~ a ~ irods version 4.x
Technical Overview ~ a ~ irods version 4.x The integrated Ru e-oriented DATA System irods is open-source, data management software that lets users: access, manage, and share data across any type or number
More informationBig Data Analytics and Healthcare
Big Data Analytics and Healthcare Anup Kumar, Professor and Director of MINDS Lab Computer Engineering and Computer Science Department University of Louisville Road Map Introduction Data Sources Structured
More informationEffective Management and Exploration of Scientific Data on the Web. Lena Strömbäck lena.stromback@liu.se Linköping University
Effective Management and Exploration of Scientific Data on the Web. Lena Strömbäck lena.stromback@liu.se Linköping University Internet 2 Example: New York Times 3 Example: Baby Name Vizard Laura Wattenberg
More informationData integration is a feature that clearly expands the role of the GTL
Technical Components of the GTL Knowledgebase Data Integration Data integration is a feature that clearly expands the role of the GTL Knowledgebase (GKB) beyond an archive to a dynamic systems biology
More informationWhite Paper. Version 1.2 May 2015 RAID Incorporated
White Paper Version 1.2 May 2015 RAID Incorporated Introduction The abundance of Big Data, structured, partially-structured and unstructured massive datasets, which are too large to be processed effectively
More informationClassic Grid Architecture
Peer-to to-peer Grids Classic Grid Architecture Resources Database Database Netsolve Collaboration Composition Content Access Computing Security Middle Tier Brokers Service Providers Middle Tier becomes
More informationPipeline Pilot Enterprise Server. Flexible Integration of Disparate Data and Applications. Capture and Deployment of Best Practices
overview Pipeline Pilot Enterprise Server Pipeline Pilot Enterprise Server (PPES) is a powerful client-server platform that streamlines the integration and analysis of the vast quantities of data flooding
More informationBig Data Mining Services and Knowledge Discovery Applications on Clouds
Big Data Mining Services and Knowledge Discovery Applications on Clouds Domenico Talia DIMES, Università della Calabria & DtoK Lab Italy talia@dimes.unical.it Data Availability or Data Deluge? Some decades
More informationSchool of Nursing. Presented by Yvette Conley, PhD
Presented by Yvette Conley, PhD What we will cover during this webcast: Briefly discuss the approaches introduced in the paper: Genome Sequencing Genome Wide Association Studies Epigenomics Gene Expression
More informationBig Data in Drug Discovery
Big Data in Drug Discovery David J. Wild Assistant Professor & Director, Cheminformatics Program Indiana University School of Informatics and Computing djwild@indiana.edu - http://djwild.info Epochs in
More informationPreparing the scenario for the use of patient s genome sequences in clinic. Joaquín Dopazo
Preparing the scenario for the use of patient s genome sequences in clinic Joaquín Dopazo Computational Medicine Institute, Centro de Investigación Príncipe Felipe (CIPF), Functional Genomics Node, (INB),
More informationGenome Explorer For Comparative Genome Analysis
Genome Explorer For Comparative Genome Analysis Jenn Conn 1, Jo L. Dicks 1 and Ian N. Roberts 2 Abstract Genome Explorer brings together the tools required to build and compare phylogenies from both sequence
More informationData mining with Mascot Integra ASMS 2005
Data mining with Mascot Integra 1 What is Mascot Integra? Fully functional out-the-box solution for proteomics workflow and data management Support for all the major mass-spectrometry data systems Powered
More informationCheck Your Data Freedom: A Taxonomy to Assess Life Science Database Openness
Check Your Data Freedom: A Taxonomy to Assess Life Science Database Openness Melanie Dulong de Rosnay Fellow, Science Commons and Berkman Center for Internet & Society at Harvard University This article
More informationCloud Computing for Research Roger Barga Cloud Computing Futures, Microsoft Research
Cloud Computing for Research Roger Barga Cloud Computing Futures, Microsoft Research Trends: Data on an Exponential Scale Scientific data doubles every year Combination of inexpensive sensors + exponentially
More information<Insert Picture Here> The Evolution Of Clinical Data Warehousing
The Evolution Of Clinical Data Warehousing Srinivas Karri Principal Consultant Agenda Value of Clinical Data Clinical Data warehousing & The Big Data Challenge
More informationInformatics and Knowledge Management at the Novartis Institutes for BioMedical Research (NIBR)
Informatics and Knowledge Management at the Novartis Institutes for BioMedical Research (NIBR) Enable Science in silico & Provide the Right Knowledge to the Right People at the Right Time to enable the
More informationHigh Performance Compu2ng Facility
High Performance Compu2ng Facility Center for Health Informa2cs and Bioinforma2cs Accelera2ng Scien2fic Discovery and Innova2on in Biomedical Research at NYULMC through Advanced Compu2ng Efstra'os Efstathiadis,
More information