GMQL Functional Comparison with BEDTools and BEDOPS
|
|
|
- Aron Hopkins
- 10 years ago
- Views:
Transcription
1 GMQL Functional Comparison with BEDTools and BEDOPS Genomic Computing Group Dipartimento di Elettronica, Informazione e Bioingegneria Politecnico di Milano This document presents a functional comparison of GMQL with the NGS tools BEDTools and BEDOPS, which are rather popular among biologists for genomic region manipulations. The comparative example below highlights a series of differences between GMQL and BEDTools and BEDOPS, which are hereby summarized. Experimental data selection in BEDTools and BEDOPS must be done manually, as these tools do not handle metadata and thus cannot support any selection over datasets. Conversely, GMQL supports selections using metadata. Annotation retrieval must also be done manually in BEDTools and BEDOPS, through the use of Genome Browsers (e.g. UCSC and Ensembl). In GMQL, annotations are selected in the same way as experimental data, by using selections. Differently from BEDTools and BEDOPS, GMQL does not require sorted inputs, since it automatically sorts them before processing. In order to perform the same operation on different samples, BEDTools and BEDOPS require the use of cycles and control structures, embedded in other languages. Conversely, GMQL allows multiple samples to be grouped in the same dataset and GMQL operations are applied in batch, i.e. to every sample in the dataset. BEDTools and BEDOPS do not have an internal standard for data format. Different scripts may lead to different output formats. This implies the need for external languages to manipulate outputs (e.g. AWK or SED), with longer scripts, and requires the knowledge of different scripting languages. Conversely, GMQL has a unique internal data format, for every region operation, and the output is always produced in GTF format. Next, we present a relevant biological example which is fully developed using the three approaches. Enhancer-gene distal binding relationship Description: In this example we want to find distal regulatory elements (i.e. enhancers) that may interact with a gene of interest (the proximal element), through the CTCF transcription factor (TF). The distance of a distal element from a gene is considered starting from the gene transcription start site (TSS). First, we set the distance TF-gene; then we consider the presence of a putative enhancer region using its typical histone modifications. This example is also included in our submitted paper (Example 2); here, we provide a simplified version applied to single samples of a single cell line in order to be directly performed also in BEDTools and BEDOPS. 1
2 Procedure: Find all enriched regions (peaks) in the CTCF transcription factor (TF) ChIP-seq sample from the K562 human cell line which are the nearest regions farther than 100 kb from a transcription start site (TSS). For the same cell line, find also all peaks for the H3K4me1 histone modifications (HM) which are also the nearest regions farther than 100 kb from a TSS. Then, out of the TF and HM peaks found in the same cell line, return all TF peaks that overlap with both HM peaks and known enhancer (EN) regions. 1. GMQL # Select and retrieve ChIP seq, TSS and enhancer data # One sample selected for each TF, HM, TSS and EN dataset TF = SELECT(dataType == 'ChipSeq' AND view == 'Peaks' AND cell == 'K562' AND antibody_target == 'CTCF' AND lab == 'Broad') HG19_PEAK; HM = SELECT(dataType == 'ChipSeq' AND view == 'Peaks' AND cell == 'K562' AND antibody_target == 'H3K4me1' AND lab == 'Broad') HG19_PEAK; TSS = SELECT(ann_type == 'TSS' AND provider == 'UCSC') HG19_ANN; EN = SELECT(ann_type == 'enhancer' AND provider == 'UCSC') HG19_ANN; # Although GMQL manages such regions correctly in each operation, including in the following JOINs, this # step is required for comparison with BEDTools and BEDOPS since both these tools merge such regions # internally before performing closestbed or closest features operations, respectively TF1 = COVER(1, ANY) TF; HM1 = COVER(1, ANY) HM; TSS1 = COVER(1, ANY) TSS; EN1 = COVER(1, ANY) EN; # Extract only TF and HM peaks farther than 100 kb from a TSS TF2 = JOIN(first(1) after distance , right_distinct) TSS1 TF1; HM2 = JOIN(first(1) after distance , right_distinct) TSS1 HM1; HM3 = JOIN(distance < 0, int) EN1 HM2; TF_res = JOIN(distance < 0, right) HM3 TF2; # STORING RESULTS MATERIALIZE TF_res; 2
3 2. BEDTools # SELECT is not available in BEDTools; data selection must be done manually and data must be present locally # Input data saved locally: # CTCF peaks: wgencodebroadhistonek562ctcfstdpk.broadpeak # H3K4me1 peaks: wgencodebroadhistonek562h3k4me1stdpk.broadpeak # TSS: TSS_h19.bed # Enhancers: VistaEnhancers_hg19.txt # BEDTools requires sorted input data sortbed i wgencodebroadhistonek562ctcfstdpk.broadpeak > CTCF_sorted.bed; sortbed i wgencodebroadhistonek562h3k4me1stdpk.broadpeak > H3K4me1_sorted.bed; sortbed i TSS_hg19.bed > TSS_sorted.bed; sortbed i VistaEnhancers_hg19.bed > VistaEnhancers_sorted.bed; # BEDTools requires this operation to flatten all the overlapping regions bedtools merge i CTCF_sorted.bed > CTCF_merged.bed; bedtools merge i H3K4me1_sorted.bed > H3K4me1_merged.bed; bedtools merge i TSS_sorted.bed > TSS_merged.bed; bedtools merge i VistaEnhancers_sorted.bed > VistaEnhancers_merged.bed; # Extend TSS regions # AWK is needed to set 100 kb distance from each TSS, by extending the coordinates of each original TSS # If the start position of a TSS is less than or equal to , the start coordinate of its extended region # is set to 1 awk v OFS="\t" '{ if ( $ <= 0 ) {print $1,1,$ ,$4,$5,$6} else {print $1,$ ,$ ,$4,$5,$6} }' TSS_merged.bed > TSS_extended.bed; # Extract only TF and HM peaks farther than 100 kb from a TSS, by searching for the closest CTCF or # H3K4me1 binding site to each extended TSS region # BEDTools closestbed command produces a NA label if the reference file has no related closest peaks. # To remove NA labels, a supplementary AWK operation is required closestbed t all a TSS_extended.bed b CTCF_merged.bed awk v OFS="\t" '{ if ($5 > 0) {print $4,$5,$6} }' > CTCF_closest.bed; closestbed t all a TSS_extended.bed b H3K4me1_merged.bed awk v OFS="\t" '{ if ($5 > 0) {print $4,$5,$6} }' > H3K4me1_closest.bed; # CTCF_closest.bed and H3K4me1_closest.bed files contain duplicate regions, because a peak that is the # closest to multiple TSSs is reported multiple times # BEDTools lacks a "distinct" operator; AWK must be used to removed duplicates awk '!x[$0]++' CTCF_closest.bed > CTCF_closest_distinct.bed; awk '!x[$0]++' H3K4me1_closest.bed > H3K4me1_closest_distinct.bed; bedtools intersect a VistaEnhancers_merged.bed b H3K4me1_closest_distinct.bed > H3K4me1_enhancers.bed; bedtools intersect a H3K4me1_enhancers.bed b CTCF_closest_distinct.bed > CTCF_H3K4me1_enhancers.bed; 3
4 3. BEDOPS # SELECT is not available in BEDOPS; data selection must be done manually and data must be present locally # Input data saved locally: # CTCF peaks: wgencodebroadhistonek562ctcfstdpk.broadpeak # H3K4me1 peaks: wgencodebroadhistonek562h3k4me1stdpk.broadpeak # TSS: TSS_h19.bed # Enhancers: VistaEnhancers_hg19.txt # BEDOPS requires sorted input data sort bed wgencodebroadhistonek562ctcfstdpk.broadpeak > CTCF_sorted.bed; sort bed wgencodebroadhistonek562h3k4me1stdpk.broadpeak > H3K4me1_sorted.bed; sort bed TSS_hg19.bed > TSS_sorted.bed; sort bed VistaEnhancers_hg19.bed > VistaEnhancers_sorted.bed; # BEDOPS requires this operation to flatten all the overlapping regions bedops merge CTCF_sorted.bed > CTCF_merged.bed; bedops merge H3K4me1_sorted.bed > H3K4me1_merged.bed; bedops merge TSS_sorted.bed > TSS_merged.bed; bedops merge VistaEnhancers_sorted.bed > VistaEnhancers_merged.bed; # Extend TSS regions # AWK is needed to set 100 kb distance from each TSS, by extending the coordinates of each original TSS # If the start position of a TSS is less than or equal to , the start coordinate of its extended region # is set to 1 awk v OFS="\t" '{ if ( $ <= 0 ) {print $1,1,$ ,$4,$5,$6} else {print $1,$ ,$ ,$4,$5,$6} }' TSS_merged.bed > TSS_extended.bed; # Extract only TF and HM peaks farther than 100 kb from a TSS, by searching for the closest CTCF or # H3K4me1 binding site to each extended TSS region # BEDOPS closest features command produces a NA label if the reference file has no related closest peaks. # To remove NA labels, a supplementary AWK operation is required closest features closest no overlaps no ref TSS_extended.bed CTCF_merged.bed awk v OFS="\t" '{ if ($0!= "NA") {print $0} }' > CTCF_closest.bed; closest features closest no overlaps no ref TSS_extended.bed H3K4me1_merged.bed awk v OFS="\t" '{ if ($0!= "NA") {print $0} }' > H3K4me1_closest.bed; # CTCF_closest.bed and H3K4me1_closest.bed files contain duplicate regions, because a peak that is the # closest to multiple TSSs is reported multiple times # BEDOPS lacks a "distinct" operator; AWK must be used to removed duplicates awk '!x[$0]++' CTCF_closest.bed > CTCF_closest_distinct.bed; awk '!x[$0]++' H3K4me1_closest.bed > H3K4me1_closest_distinct.bed; bedops intersect VistaEnhancers_merged.bed H3K4me1_closest_distinct.bed > H3K4me1_enhancers.bed; bedops intersect H3K4me1_enhancers.bed CTCF_closest_distinct.bed > CTCF_H3K4me1_enhancers.bed; 4
5 Discussion: We progressively compared the data processing steps and results produced by the three considered systems when the three programs above are applied to CTCF and H3K4me1 signal enrichment called regions from NGS human samples aligned to the human genome assembly 19, as provided by ENCODE ( and on transcription start sites (TSS) and enhancer annotations, as provided by the UCSC database ( from SwitchGear Genomics ( and Vista Enhancer ( respectively. Every input dataset contains one single sample (including 80,538 CTCF, 125,713 HeK4me1, 131,780 SwitchGear TSS and 1,339 Vista enhancer regions, respectively); each sample contains overlapping regions. For each step, we tested the concordance between the datasets produced by GMQL, BEDTools and BEDOPS. In GMQL, overlapping regions are processed independently, while in BEDTools and BEDOPS they are implicitly considered mainly as a unique, merged region, leading to a potential loss of data. To compare the behavior of the three algorithms avoiding differences caused by different strategies for the management of overlapping regions, in all programs we performed a first step to merge overlapping regions. Using the merged input, with the first two JOIN operations in GMQL, corresponding to the closestbed and closest-features operations in BEDTools and BEDOPS respectively, we obtained the same results for GMQL and BEDOPS: 40,687 CTCF and 41,007 H3K4me1 distinct peaks. BEDTools returned different results since the closestbed function cannot find the closest regions non-overlapping reference regions, but it considers overlapping regions (i.e. features) as the closest (i.e. at distance 0). Another minor source of differences among the three considered systems is represented by the management of ties, i.e. features that are equidistant from a reference region. When occurring, in GMQL ties are both retained, whereas in BEDTools the -t option allows the user to specify which feature has to be retained: left, right, all. Conversely, in BEDOPS, the preference is always assigned only to the left feature, reducing the number of returned regions as compared to the other two systems. Although ties are not present in our comparative dataset, they may represent a potential issue for other comparisons. When searching for regions at a given distance from (or intersecting with) a reference region, the same region can be correctly returned several times when it satisfies the searching constraints with respect to multiple reference regions. These duplicated regions can cause issues when the obtained results are used as input for a subsequent operation. In the GMQL JOIN, the _distinct option allows avoiding them; this is not possible in both BEDTools and BEDOPS, requiring the use of external languages (e.g. AWK) to remove them. The second two (and last) JOIN (intersect) operations returned different final results in the three systems. BEDTools s differences are due to the aforementioned issue in the closestbed function, leading to a final result of 67 CTCF peaks. On the other hand, BEDOPS seems to miss some features while doing the intersection, finally reporting 14 CTCF peaks versus 23 in GMQL. The small number of CTCF peaks is mainly due to the very limited amount of available Vista enhancers. 5
Comparing Methods for Identifying Transcription Factor Target Genes
Comparing Methods for Identifying Transcription Factor Target Genes Alena van Bömmel (R 3.3.73) Matthew Huska (R 3.3.18) Max Planck Institute for Molecular Genetics Folie 1 Transcriptional Regulation TF
Shouguo Gao Ph. D Department of Physics and Comprehensive Diabetes Center
Computational Challenges in Storage, Analysis and Interpretation of Next-Generation Sequencing Data Shouguo Gao Ph. D Department of Physics and Comprehensive Diabetes Center Next Generation Sequencing
GeneProf and the new GeneProf Web Services
GeneProf and the new GeneProf Web Services Florian Halbritter [email protected] Stem Cell Bioinformatics Group (Simon R. Tomlinson) [email protected] December 10, 2012 Florian Halbritter
Analysis of ChIP-seq data in Galaxy
Analysis of ChIP-seq data in Galaxy November, 2012 Local copy: https://galaxy.wi.mit.edu/ Joint project between BaRC and IT Main site: http://main.g2.bx.psu.edu/ 1 Font Conventions Bold and blue refers
Analysis and Integration of Big Data from Next-Generation Genomics, Epigenomics, and Transcriptomics
Analysis and Integration of Big Data from Next-Generation Genomics, Epigenomics, and Transcriptomics Christopher Benner, PhD Director, Integrative Genomics and Bioinformatics Core (IGC) idash Webinar,
Nebula A web-server for advanced ChIP-seq data analysis. Tutorial. by Valentina BOEVA
Nebula A web-server for advanced ChIP-seq data analysis Tutorial by Valentina BOEVA Content Upload data to the history pp. 5-6 Check read number and sequencing quality pp. 7-9 Visualize.BAM files in UCSC
The Segway annotation of ENCODE data
The Segway annotation of ENCODE data Michael M. Hoffman Department of Genome Sciences University of Washington Overview 1. ENCODE Project 2. Semi-automated genomic annotation 3. Chromatin 4. RNA-seq Functional
Visualisation tools for next-generation sequencing
Visualisation tools for next-generation sequencing Simon Anders EBI is an Outstation of the European Molecular Biology Laboratory. Outline Exploring and checking alignment with alignment viewers Using
Human-Mouse Synteny in Functional Genomics Experiment
Human-Mouse Synteny in Functional Genomics Experiment Ksenia Krasheninnikova University of the Russian Academy of Sciences, JetBrains [email protected] September 18, 2012 Ksenia Krasheninnikova
Data Analysis & Management of High-throughput Sequencing Data. Quoclinh Nguyen Research Informatics Genomics Core / Medical Research Institute
Data Analysis & Management of High-throughput Sequencing Data Quoclinh Nguyen Research Informatics Genomics Core / Medical Research Institute Current Issues Current Issues The QSEQ file Number files per
A Brief Introduction on DNase-Seq Data Aanalysis
A Brief Introduction on DNase-Seq Data Aanalysis Hashem Koohy, Thomas Down, Mikhail Spivakov and Tim Hubbard Spivakov s and Fraser s Lab September 13, 2014 1 Introduction DNaseI is an enzyme which cuts
Systematic discovery of regulatory motifs in human promoters and 30 UTRs by comparison of several mammals
Systematic discovery of regulatory motifs in human promoters and 30 UTRs by comparison of several mammals Xiaohui Xie 1, Jun Lu 1, E. J. Kulbokas 1, Todd R. Golub 1, Vamsi Mootha 1, Kerstin Lindblad-Toh
AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE
ACCELERATING PROGRESS IS IN OUR GENES AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE GENESPRING GENE EXPRESSION (GX) MASS PROFILER PROFESSIONAL (MPP) PATHWAY ARCHITECT (PA) See Deeper. Reach Further. BIOINFORMATICS
LifeScope Genomic Analysis Software 2.5
USER GUIDE LifeScope Genomic Analysis Software 2.5 Graphical User Interface DATA ANALYSIS METHODS AND INTERPRETATION Publication Part Number 4471877 Rev. A Revision Date November 2011 For Research Use
Partek Methylation User Guide
Partek Methylation User Guide Introduction This user guide will explain the different types of workflow that can be used to analyze methylation datasets. Under the Partek Methylation workflow there are
Lecture 11 Data storage and LIMS solutions. Stéphane LE CROM [email protected]
Lecture 11 Data storage and LIMS solutions Stéphane LE CROM [email protected] Various steps of a DNA microarray experiment Experimental steps Data analysis Experimental design set up Chips on catalog
Frequently Asked Questions Next Generation Sequencing
Frequently Asked Questions Next Generation Sequencing Import These Frequently Asked Questions for Next Generation Sequencing are some of the more common questions our customers ask. Questions are divided
Basic processing of next-generation sequencing (NGS) data
Basic processing of next-generation sequencing (NGS) data Getting from raw sequence data to expression analysis! 1 Reminder: we are measuring expression of protein coding genes by transcript abundance
NaviCell Data Visualization Python API
NaviCell Data Visualization Python API Tutorial - Version 1.0 The NaviCell Data Visualization Python API is a Python module that let computational biologists write programs to interact with the molecular
Linking the Epigenome to the Genome: Correlation of Different Features to DNA Methylation of CpG Islands
Linking the Epigenome to the Genome: Correlation of Different Features to DNA Methylation of CpG Islands Clemens Wrzodek*, Finja Büchel, Georg Hinselmann, Johannes Eichner, Florian Mittag, Andreas Zell
Fast. Integrated Genome Browser & DAS. Easy. Flexible. Free. bioviz.org/igb
bioviz.org/igb Integrated Genome Browser & DAS Free tools for visualizing, sharing, and publishing genomes and genome-scale data. Easy Flexible Fast Free Funding: National Science Foundation Arabidopsis
Chapter 2. imapper: A web server for the automated analysis and mapping of insertional mutagenesis sequence data against Ensembl genomes
Chapter 2. imapper: A web server for the automated analysis and mapping of insertional mutagenesis sequence data against Ensembl genomes 2.1 Introduction Large-scale insertional mutagenesis screening in
Using Ensembl tools for browsing ENCODE data
Using Ensembl tools for browsing ENCODE data Bert Overduin, Ph.D. Vertebrate Genomics Team EMBL - European Bioinformatics Institute Wellcome Trust Genome Campus Hinxton, Cambridge CB10 1SD United Kingdom
G E N OM I C S S E RV I C ES
GENOMICS SERVICES THE NEW YORK GENOME CENTER NYGC is an independent non-profit implementing advanced genomic research to improve diagnosis and treatment of serious diseases. capabilities. N E X T- G E
Cloud-Based Big Data Analytics in Bioinformatics
Cloud-Based Big Data Analytics in Bioinformatics Presented By Cephas Mawere Harare Institute of Technology, Zimbabwe 1 Introduction 2 Big Data Analytics Big Data are a collection of data sets so large
Data Processing of Nextera Mate Pair Reads on Illumina Sequencing Platforms
Data Processing of Nextera Mate Pair Reads on Illumina Sequencing Platforms Introduction Mate pair sequencing enables the generation of libraries with insert sizes in the range of several kilobases (Kb).
RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison
RETRIEVING SEQUENCE INFORMATION Nucleotide sequence databases Database search Sequence alignment and comparison Biological sequence databases Originally just a storage place for sequences. Currently the
BIO 3352: BIOINFORMATICS II HYBRID COURSE SYLLABUS
BIO 3352: BIOINFORMATICS II HYBRID COURSE SYLLABUS NEW YORK CITY COLLEGE OF TECHNOLOGY The City University Of New York School of Arts and Sciences Biological Sciences Department Course title: Bioinformatics
BBSRC TECHNOLOGY STRATEGY: TECHNOLOGIES NEEDED BY RESEARCH KNOWLEDGE PROVIDERS
BBSRC TECHNOLOGY STRATEGY: TECHNOLOGIES NEEDED BY RESEARCH KNOWLEDGE PROVIDERS 1. The Technology Strategy sets out six areas where technological developments are required to push the frontiers of knowledge
Yale Pseudogene Analysis as part of GENCODE Project
Sanger Center 2009.01.20, 11:20-11:40 Mark B Gerstein Yale Illustra(on from Gerstein & Zheng (2006). Sci Am. (c) Mark Gerstein, 2002, (c) Yale, 1 1Lectures.GersteinLab.org 2007bioinfo.mbb.yale.edu Yale
BIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology http://tinyurl.com/bioinf525-w16
Course Director: Dr. Barry Grant (DCM&B, [email protected]) Description: This is a three module course covering (1) Foundations of Bioinformatics, (2) Statistics in Bioinformatics, and (3) Systems
Graph Mining and Social Network Analysis
Graph Mining and Social Network Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann
Web-Based Genomic Information Integration with Gene Ontology
Web-Based Genomic Information Integration with Gene Ontology Kai Xu 1 IMAGEN group, National ICT Australia, Sydney, Australia, [email protected] Abstract. Despite the dramatic growth of online genomic
NGS data analysis. Bernardo J. Clavijo
NGS data analysis Bernardo J. Clavijo 1 A brief history of DNA sequencing 1953 double helix structure, Watson & Crick! 1977 rapid DNA sequencing, Sanger! 1977 first full (5k) genome bacteriophage Phi X!
Tutorial for Windows and Macintosh. Preparing Your Data for NGS Alignment
Tutorial for Windows and Macintosh Preparing Your Data for NGS Alignment 2015 Gene Codes Corporation Gene Codes Corporation 775 Technology Drive, Ann Arbor, MI 48108 USA 1.800.497.4939 (USA) 1.734.769.7249
Focusing on results not data comprehensive data analysis for targeted next generation sequencing
Focusing on results not data comprehensive data analysis for targeted next generation sequencing Daniel Swan, Jolyon Holdstock, Angela Matchan, Richard Stark, John Shovelton, Duarte Mohla and Simon Hughes
org.rn.eg.db December 16, 2015 org.rn.egaccnum is an R object that contains mappings between Entrez Gene identifiers and GenBank accession numbers.
org.rn.eg.db December 16, 2015 org.rn.egaccnum Map Entrez Gene identifiers to GenBank Accession Numbers org.rn.egaccnum is an R object that contains mappings between Entrez Gene identifiers and GenBank
Asexual Versus Sexual Reproduction in Genetic Algorithms 1
Asexual Versus Sexual Reproduction in Genetic Algorithms Wendy Ann Deslauriers ([email protected]) Institute of Cognitive Science,Room 22, Dunton Tower Carleton University, 25 Colonel By Drive
Next Generation Sequencing: Technology, Mapping, and Analysis
Next Generation Sequencing: Technology, Mapping, and Analysis Gary Benson Computer Science, Biology, Bioinformatics Boston University [email protected] http://tandem.bu.edu/ The Human Genome Project took
Using Galaxy for NGS Analysis. Daniel Blankenberg Postdoctoral Research Associate The Galaxy Team http://usegalaxy.org
Using Galaxy for NGS Analysis Daniel Blankenberg Postdoctoral Research Associate The Galaxy Team http://usegalaxy.org Overview NGS Data Galaxy tools for NGS Data Galaxy for Sequencing Facilities Overview
University of Glasgow - Programme Structure Summary C1G5-5100 MSc Bioinformatics, Polyomics and Systems Biology
University of Glasgow - Programme Structure Summary C1G5-5100 MSc Bioinformatics, Polyomics and Systems Biology Programme Structure - the MSc outcome will require 180 credits total (full-time only) - 60
Introduction to NGS data analysis
Introduction to NGS data analysis Jeroen F. J. Laros Leiden Genome Technology Center Department of Human Genetics Center for Human and Clinical Genetics Sequencing Illumina platforms Characteristics: High
Introduction. Overview of Bioconductor packages for short read analysis
Overview of Bioconductor packages for short read analysis Introduction General introduction SRAdb Pseudo code (Shortread) Short overview of some packages Quality assessment Example sequencing data in Bioconductor
Simplifying Data Interpretation with Nexus Copy Number
Simplifying Data Interpretation with Nexus Copy Number A WHITE PAPER FROM BIODISCOVERY, INC. Rapid technological advancements, such as high-density acgh and SNP arrays as well as next-generation sequencing
Asset Register Asset Care Plan Developer On Key Analytics Maintenance Manager Planning and Scheduling On Key Interface Tool
Are you in the market for a new enterprise asset management system? If so, make sure that you consider a system that will not only help you deliver on your asset management strategy, but that will assist
Oracle Database 11g SQL
AO3 - Version: 2 19 June 2016 Oracle Database 11g SQL Oracle Database 11g SQL AO3 - Version: 2 3 days Course Description: This course provides the essential SQL skills that allow developers to write queries
REGULATIONS FOR THE DEGREE OF BACHELOR OF SCIENCE IN BIOINFORMATICS (BSc[BioInf])
820 REGULATIONS FOR THE DEGREE OF BACHELOR OF SCIENCE IN BIOINFORMATICS (BSc[BioInf]) (See also General Regulations) BMS1 Admission to the Degree To be eligible for admission to the degree of Bachelor
Robust Blind Watermarking Mechanism For Point Sampled Geometry
Robust Blind Watermarking Mechanism For Point Sampled Geometry Parag Agarwal Balakrishnan Prabhakaran Department of Computer Science, University of Texas at Dallas MS EC 31, PO Box 830688, Richardson,
SAP HANA Enabling Genome Analysis
SAP HANA Enabling Genome Analysis Joanna L. Kelley, PhD Postdoctoral Scholar, Stanford University Enakshi Singh, MSc HANA Product Management, SAP Labs LLC Outline Use cases Genomics review Challenges in
GenBank, Entrez, & FASTA
GenBank, Entrez, & FASTA Nucleotide Sequence Databases First generation GenBank is a representative example started as sort of a museum to preserve knowledge of a sequence from first discovery great repositories,
New Technologies for Sensitive, Low-Input RNA-Seq. Clontech Laboratories, Inc.
New Technologies for Sensitive, Low-Input RNA-Seq Clontech Laboratories, Inc. Outline Introduction Single-Cell-Capable mrna-seq Using SMART Technology SMARTer Ultra Low RNA Kit for the Fluidigm C 1 System
PROGRAMMING FOR BIOLOGISTS. BIOL 6297 Monday, Wednesday 10 am -12 pm
PROGRAMMING FOR BIOLOGISTS BIOL 6297 Monday, Wednesday 10 am -12 pm Tomorrow is Ada Lovelace Day Ada Lovelace was the first person to write a computer program Today s Lecture Overview of the course Philosophy
Introduction To Epigenetic Regulation: How Can The Epigenomics Core Services Help Your Research? Maria (Ken) Figueroa, M.D. Core Scientific Director
Introduction To Epigenetic Regulation: How Can The Epigenomics Core Services Help Your Research? Maria (Ken) Figueroa, M.D. Core Scientific Director Gene expression depends upon multiple factors Gene Transcription
DMBI: Data Management for Bio-Imaging.
DMBI: Data Management for Bio-Imaging. Questionnaire Data Report 1.0 Overview As part of the DMBI project an international meeting was organized around the project to bring together the bio-imaging community
Genomic Testing: Actionability, Validation, and Standard of Lab Reports
Genomic Testing: Actionability, Validation, and Standard of Lab Reports emerge: Laura Rasmussen-Torvik Reaction: Heidi Rehm Summary: Dick Weinshilboum Panel: Murray Brilliant, David Carey, John Carpten,
Presentation by: Ahmad Alsahaf. Research collaborator at the Hydroinformatics lab - Politecnico di Milano MSc in Automation and Control Engineering
Johann Bernoulli Institute for Mathematics and Computer Science, University of Groningen 9-October 2015 Presentation by: Ahmad Alsahaf Research collaborator at the Hydroinformatics lab - Politecnico di
Delivering the power of the world s most successful genomics platform
Delivering the power of the world s most successful genomics platform NextCODE Health is bringing the full power of the world s largest and most successful genomics platform to everyday clinical care NextCODE
Module 3. Genome Browsing. Using Web Browsers to View Genome Annota4on. Kers4n Howe Wellcome Trust Sanger Ins4tute zfish- [email protected].
Module 3 Genome Browsing Using Web Browsers to View Genome Annota4on Kers4n Howe Wellcome Trust Sanger Ins4tute zfish- [email protected] Introduc.on Genome browsing The Ensembl gene set Guided examples
Final Project Report
CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes
In developmental genomic regulatory interactions among genes, encoding transcription factors
JOURNAL OF COMPUTATIONAL BIOLOGY Volume 20, Number 6, 2013 # Mary Ann Liebert, Inc. Pp. 419 423 DOI: 10.1089/cmb.2012.0297 Research Articles A New Software Package for Predictive Gene Regulatory Network
Computational Genomics. Next generation sequencing (NGS)
Computational Genomics Next generation sequencing (NGS) Sequencing technology defies Moore s law Nature Methods 2011 Log 10 (price) Sequencing the Human Genome 2001: Human Genome Project 2.7G$, 11 years
White Paper. Thirsting for Insight? Quench It With 5 Data Management for Analytics Best Practices.
White Paper Thirsting for Insight? Quench It With 5 Data Management for Analytics Best Practices. Contents Data Management: Why It s So Essential... 1 The Basics of Data Preparation... 1 1: Simplify Access
Practical Solutions for Big Data Analytics
Practical Solutions for Big Data Analytics Ravi Madduri Computation Institute ([email protected]) Paul Dave ([email protected]) Dinanath Sulakhe ([email protected]) Alex Rodriguez ([email protected])
Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data
CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear
Exercise with Gene Ontology - Cytoscape - BiNGO
Exercise with Gene Ontology - Cytoscape - BiNGO This practical has material extracted from http://www.cbs.dtu.dk/chipcourse/exercises/ex_go/goexercise11.php In this exercise we will analyze microarray
FlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem
FlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem Elsa Bernard Laurent Jacob Julien Mairal Jean-Philippe Vert September 24, 2013 Abstract FlipFlop implements a fast method for de novo transcript
Learning Objectives:
Proteomics Methodology for LC-MS/MS Data Analysis Methodology for LC-MS/MS Data Analysis Peptide mass spectrum data of individual protein obtained from LC-MS/MS has to be analyzed for identification of
COURSE SYLLABUS COURSE TITLE:
1 COURSE SYLLABUS COURSE TITLE: FORMAT: CERTIFICATION EXAMS: 55043AC Microsoft End to End Business Intelligence Boot Camp Instructor-led None This course syllabus should be used to determine whether the
Oracle SQL. Course Summary. Duration. Objectives
Oracle SQL Course Summary Identify the major structural components of the Oracle Database 11g Create reports of aggregated data Write SELECT statements that include queries Retrieve row and column data
SICKLE CELL ANEMIA & THE HEMOGLOBIN GENE TEACHER S GUIDE
AP Biology Date SICKLE CELL ANEMIA & THE HEMOGLOBIN GENE TEACHER S GUIDE LEARNING OBJECTIVES Students will gain an appreciation of the physical effects of sickle cell anemia, its prevalence in the population,
4.2.1. What is a contig? 4.2.2. What are the contig assembly programs?
Table of Contents 4.1. DNA Sequencing 4.1.1. Trace Viewer in GCG SeqLab Table. Box. Select the editor mode in the SeqLab main window. Import sequencer trace files from the File menu. Select the trace files
Go where the biology takes you. Genome Analyzer IIx Genome Analyzer IIe
Go where the biology takes you. Genome Analyzer IIx Genome Analyzer IIe Go where the biology takes you. To published results faster With proven scalability To the forefront of discovery To limitless applications
Transcription factor omics
Transcription factor omics Comparative transcriptomics and DNA-protein interactomics disclose novel insights into transcription factor omics. Factor I C A G A G T A G T A G Factor II C A G A G T A G C
R Language Fundamentals
R Language Fundamentals Data Types and Basic Maniuplation Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Where did R come from? Overview Atomic Vectors Subsetting
GC3 Use cases for the Cloud
GC3: Grid Computing Competence Center GC3 Use cases for the Cloud Some real world examples suited for cloud systems Antonio Messina Trieste, 24.10.2013 Who am I System Architect
Current Motif Discovery Tools and their Limitations
Current Motif Discovery Tools and their Limitations Philipp Bucher SIB / CIG Workshop 3 October 2006 Trendy Concepts and Hypotheses Transcription regulatory elements act in a context-dependent manner.
Tool Support for Model Checking of Web application designs *
Tool Support for Model Checking of Web application designs * Marco Brambilla 1, Jordi Cabot 2 and Nathalie Moreno 3 1 Dipartimento di Elettronica e Informazione, Politecnico di Milano Piazza L. Da Vinci,
Oracle Database 12c: Introduction to SQL Ed 1.1
Oracle University Contact Us: 1.800.529.0165 Oracle Database 12c: Introduction to SQL Ed 1.1 Duration: 5 Days What you will learn This Oracle Database: Introduction to SQL training helps you write subqueries,
Globus Genomics Tutorial GlobusWorld 2014
Globus Genomics Tutorial GlobusWorld 2014 Agenda Overview of Globus Genomics Example Collaborations Demonstration Globus Genomics interface Globus Online integration Scenario 1: Using Globus Genomics for
Module 1. Sequence Formats and Retrieval. Charles Steward
The Open Door Workshop Module 1 Sequence Formats and Retrieval Charles Steward 1 Aims Acquaint you with different file formats and associated annotations. Introduce different nucleotide and protein databases.
Network Webinar Series
Undergraduate Educator Network Series Sponsored by Undergraduate Education Subcommittee SOT Education Committee June 4, 2015 12:00 Noon ET (c) SOT2015 Welcome Kristine Willett, PhD Co-Chair, C Undergraduate
UGENE Quick Start Guide
Quick Start Guide This document contains a quick introduction to UGENE. For more detailed information, you can find the UGENE User Manual and other special manuals in project website: http://ugene.unipro.ru.
