Databases indexation
|
|
|
- Evan Cameron
- 10 years ago
- Views:
Transcription
1 Databases indexation Laurent Falquet, Basel October, 2006 Swiss Institute of Bioinformatics Swiss EMBnet node Overview Data access concept sequential direct Indexing EMBOSS Fetch Other BLAST Why indexing? formatdb Parsing output Excel import/export Tab delimited Coma delimited
2 Why indexing? Human tendency to classify and group Examples: Dictionnary Book Library DVD chapters ipod play lists Advantages: Fast access Easy data finding Disadvantages: Time to prepare indices Data access: sequential vs direct Sequential access Direct access Vary from very short to very long Very small variations track sector head
3 Similar concept for databases Flat files = sequential Indexing = simulated direct >seq1 cgatgtcatgtg >seq2 cgatcgtagctgtagctgtag >seq3 catgtgcatgcgacgt ID seq1 seq2 seq3 Position (byte) Length (byte) Tools EMBOSS dbxflat dbxfasta dbiblast seqret seqretsplit entret Other examples SRS (icarus language) indexer & fetch (warning local SIB tool) Relational (MySQL, Oracle ) Web (Google!!)
4 EMBOSS how to index? Where is your file? What is the format? Where should be the indices? Where is the emboss.default file? (.embossrc) Other EMBOSS tools textsearch Whichdb More details EMBOSS example Input file and directory ~/embossidx/ecoli.dat cd embossidx Index creation dbxflat -idformat swiss -dbname ecoli -filenames '*.dat' -dbresource swiss -directory. -release 1.0 -date 26/09/06 -fields id,acc Generates 5 files (default) ECOLI.ent ECOLI.pxac ECOLI.pxid ECOLI.xac ECOLI.xid Don t forget to modify ~/.embossrc
5 .embossrc setemboss_filter 1 # Ecoli DB ecoli[ type: P comment:"e.coli proteome" method: emboss format:swiss dir:"{path}/embossidx" file:"ecoli.dat" release:"1.0" indexdir:"{path}/embossidx" ] Example of queries seqret ecoli:thio_ecoli seqret ecoli:p00274 entret ecoli:thio_ecoli and even seqret ecoli:*_ecoli Where {path} is the path to your home directory Indexer & fetch Warning this is a local SIB tool!! Input file and directory ~/embossidx/ecoli.dat cd embossidx Index creation indexer -h '^ID' -t '^//' -i -p '^ID\s+(\S+)' ECOLI.dat ecoli.idx Generates 1 file ecoli.idx Don t forget to modify config file
6 Config file: fetch.conf fetch.conf #dbkey formatindexfiledatafile ecolisp ~/embossidx/ecoli.idx~/embossidx/ecoli.dat Example of queries fetch -c fetch.conf ecoli:thio_ecoli fetch -c fetch.conf -f ecoli:thio_ecoli[20..50] BLAST Maintained at NCBI Source distributed freely with several accessory tools ftp://ftp.ncbi.nlm.nih.gov/too lbox/ncbi_tools/ncbi.tar.gz May require compilation to install on your local computer blastall contains blastp blastn blastx tblastn tblastx Other tools blastpgp megablast formatdb
7 Available Blast programs Program Query Database blastp VS blastn nucleotide VS nucleotide blastx nucleotide VS tblastn nucleotide VS tblastx nucleotide nucleotide VS What makes BLAST so fast? Indexing all words of 3 aa or 11 bp in the sequence database Searching the query for all words of a score > T Search the indexed database for all perfect matches Try to align matches that are on the same diagonal
8 Indexing for Blast (1) A substitution matrix is used to compute the word scores Query REL LKP score > T AAA AAA AAC AAC AAD AAD... YYY YYY List of all possible words with 3 amino acid residues (8000) score < T LKP LKP ACT ACT TVF TVF List of words matching the query with a score > T Indexing for Blast (2) Database sequences ACT ACT ACT ACT TVF TVF Search for exact matches TVF TVF List of words matching the query with a score > T List List of of sequences sequences containing containing words words similar similar to to the the query query (hits) (hits)
9 Indexing for Blast (3) Database sequence Query A Ungapped extension if: 2 "Hits" are on the same diagonal but at a distance less than A Database sequence Query A Extension using dynamic programming limited to a restricted region limited through a score drop-off threshold BLAST indexing with formatdb Formatdb mydb.seq must contain sequences in FASTA format formatdb -i mydb.seq -p T -n mydb Generates 3 files mydb.psq mydb.pin mydb.phr Then start a Blast: blastall -p blastp -d mydb -i myseq (-optional parameters)
10 Blast local vs remote blastall Executed locally Slow No need to transfert db blastall.remote Executed remotely Fast Requires special priviledges and db transfert Using BioPerl (remoteblast.pm) Blast at NCBI No user db See Multiple Blasts? 1 seq vs db seq 1 FASTA seq as input db seq vs db seq Several single FASTA seq files as input or 1 Multiple FASTA seq file as input Possibility to export results as XML Use Perl to automatize the queries and parse the output
11 Parsing Blast output BLASTP [Oct ] Reference:Altschul,Stephen F., Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller,and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: anew generation of databasesearch programs", Nucleic Acids Res.25: Query= ACCA_BACSU O34847 Acetyl-coenzyme A carboxylasecarboxyl transferase subunitalpha (EC ). (325 letters) Database:ecoli_blast 4339 sequences; 1,373,039 totalletters Searching...done Score E Sequences producingsignificantalignments: (bits) Value ACCA_ECOLI P30867 Acetyl-coenzyme A carboxylasecarboxyltransfe e-72 Parsing Blast output (2) >ACCA_ECOLI P30867 Acetyl-coenzyme A carboxylasecarboxyl transferase subunitalpha (EC ). Length = 318 Score = 266 bits(681), Expect=1e-72 Identities= 143/312 (45%), Positives = 188/312 (60 %), Gaps = 3/312 (0%) Query:5 LEFEKPVIELQTKIAELKKFTQDS---DMDLSAEIERLEDRLAKLQDDIYKNLKPWDRVQ 61 L+FE+P+ EL+ KI L ++ D+++ E+ RL ++ +L I+ +L W Q Sbjct:5 LDFEQPIAELEAKIDSLTAVSRQDEKLDINIDEEVHRLREKSVELTRKIFADLGA WQIAQ 64 Query:62 IARLADRPTTLDYIEHLFTDFFECHGDRAYGDDEAIVGGIAKFHGLPVTVIGHQRGKDTK 121 +AR RP TLDY+ F+F E GDRAY DD+AIVGGIA+ G PV +IGHQ+G++TK Sbjct:65 LARHPQRPYTLDYVRLAFDEFDELAGDRAYADDKAIVGGIARLDGRPV MIIGHQKGRETK 124 Query:122 ENLVRNFGMPHPEGYRKALRLMKQADKFNRPIICFIDTKGAYPGRAAEERGQSEAIAKNL 181 E + RNFGMP PEGYRKALRLM+ A++F PII FIDT GAYPG AEERGQSEAIA+NL Sbjct:125 EKIRRNFGMPAPEGYRKALRLM Q MAERFKMPIITFIDTPGAYPGVGAEERGQSEAIARNL 184 Query:182 FEMAGLRVPXXXXXXXXXXXXXXXXXXXXXXXHMLENSTYSVISPEGAAALLWKDSSLAK 241 EM+ L VP +ML+ STYSVISPEG A++L WK + A Sbjct:185 REMSRLGVPVVCTVIGEGGSGGALAIGVGDKVNMLQYSTYSVISPEGCASILWKSADKAP 244 Query:242 KAAETMKITAPDLKELGIIDHMIKEVKGGAHHDVKLQASYMDXXXXXXXXXXXXXXXXXX 301 AAE M I AP LKEL +ID +I E GGAH + + A+ + Sbjct:245 LAAEAMGIIAPRLKELKLIDSIIPEPLGGAHRNPEAMAASLKAQLLADLADLDVLSTEDL 304 Query:302 VQQRYEKYKAIG 313 +RY++ + G Sbjct:305 KNRRYQRLMSYG 316
12 Parsing Blast output (3) With BioPerl: #!/usr/local/bin/perl use Bio::SearchIO; my $blast_report= new Bio::SearchIO ('-format'=>'blast', '-file' => $ARGV[0]); print "Query name:\tquery description:\thitname:\thitdescription:\te-value\tscore\n"; while( my $result=$blast_report->next_result){ print $result->query_name(),"\t",$result->query_description(),"\n"; while( my $hit= $result->next_hit()){ print "\t\t",$hit->name(),"\t",$hit->description(); while( my $hsp = $hit->next_hsp()){ print "\t",$hsp->evalue(),"\t", $hsp->score(); } print "\n"; } } exit0; MS-Excel import/export Excel can import Tab delimited Coma delimited Excel can export Tab delimited Space delimited AC/ID desc score e-value THIO_ECOLI thioredoxin Escherichia coli e-5 THIO_HUMAN thioredoxin Homo sapiens
13 MS-Excel import/export Tab delimited file: \t delimits the columns \n delimits the lines Optional first line contains columns title Example: AC/ID\tdesc\tscore\te-value\n THIO_ECOLI\tthioredoxin Escherichia coli\t234\t2.1e-5\n THIO_HU MAN\tthioredoxin Homo sapiens\t120\t0.001\n MS-Excel import/export Coma delimited file:, delimits the columns, each value is surrounded by \n delimits the lines Optional first line contains columns title Example: AC/ID, desc, score, e-value \n THIO_ECOLI, thioredoxin Escherichia coli, 234, 2.1e-5 \n THIO_HU M A N, thioredoxin Homo sapiens, 120, \n
Similarity Searches on Sequence Databases: BLAST, FASTA. Lorenza Bordoli Swiss Institute of Bioinformatics EMBnet Course, Basel, October 2003
Similarity Searches on Sequence Databases: BLAST, FASTA Lorenza Bordoli Swiss Institute of Bioinformatics EMBnet Course, Basel, October 2003 Outline Importance of Similarity Heuristic Sequence Alignment:
Algorithms in Bioinformatics I, WS06/07, C.Dieterich 47. This lecture is based on the following, which are all recommended reading:
Algorithms in Bioinformatics I, WS06/07, C.Dieterich 47 5 BLAST and FASTA This lecture is based on the following, which are all recommended reading: D.J. Lipman and W.R. Pearson, Rapid and Sensitive Protein
BLAST. Anders Gorm Pedersen & Rasmus Wernersson
BLAST Anders Gorm Pedersen & Rasmus Wernersson Database searching Using pairwise alignments to search databases for similar sequences Query sequence Database Database searching Most common use of pairwise
Pairwise Sequence Alignment
Pairwise Sequence Alignment [email protected] SS 2013 Outline Pairwise sequence alignment global - Needleman Wunsch Gotoh algorithm local - Smith Waterman algorithm BLAST - heuristics What
A Tutorial in Genetic Sequence Classification Tools and Techniques
A Tutorial in Genetic Sequence Classification Tools and Techniques Jake Drew Data Mining CSE 8331 Southern Methodist University [email protected] www.jakemdrew.com Sequence Characters IUPAC nucleotide
Apply PERL to BioInformatics (II)
Apply PERL to BioInformatics (II) Lecture Note for Computational Biology 1 (LSM 5191) Jiren Wang http://www.bii.a-star.edu.sg/~jiren BioInformatics Institute Singapore Outline Some examples for manipulating
Bioinformatics Resources at a Glance
Bioinformatics Resources at a Glance A Note about FASTA Format There are MANY free bioinformatics tools available online. Bioinformaticists have developed a standard format for nucleotide and protein sequences
Rapid alignment methods: FASTA and BLAST. p The biological problem p Search strategies p FASTA p BLAST
Rapid alignment methods: FASTA and BLAST p The biological problem p Search strategies p FASTA p BLAST 257 BLAST: Basic Local Alignment Search Tool p BLAST (Altschul et al., 1990) and its variants are some
Design Style of BLAST and FASTA and Their Importance in Human Genome.
Design Style of BLAST and FASTA and Their Importance in Human Genome. Saba Khalid 1 and Najam-ul-haq 2 SZABIST Karachi, Pakistan Abstract: This subjected study will discuss the concept of BLAST and FASTA.BLAST
Molecular Databases and Tools
NWeHealth, The University of Manchester Molecular Databases and Tools Afternoon Session: NCBI/EBI resources, pairwise alignment, BLAST, multiple sequence alignment and primer finding. Dr. Georgina Moulton
Biological Databases and Protein Sequence Analysis
Biological Databases and Protein Sequence Analysis Introduction M. Madan Babu, Center for Biotechnology, Anna University, Chennai 25, India Bioinformatics is the application of Information technology to
RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison
RETRIEVING SEQUENCE INFORMATION Nucleotide sequence databases Database search Sequence alignment and comparison Biological sequence databases Originally just a storage place for sequences. Currently the
CD-HIT User s Guide. Last updated: April 5, 2010. http://cd-hit.org http://bioinformatics.org/cd-hit/
CD-HIT User s Guide Last updated: April 5, 2010 http://cd-hit.org http://bioinformatics.org/cd-hit/ Program developed by Weizhong Li s lab at UCSD http://weizhong-lab.ucsd.edu [email protected] 1. Introduction
Efficient Parallel Execution of Sequence Similarity Analysis Via Dynamic Load Balancing
Efficient Parallel Execution of Sequence Similarity Analysis Via Dynamic Load Balancing James D. Jackson Philip J. Hatcher Department of Computer Science Kingsbury Hall University of New Hampshire Durham,
BIOINFORMATICS TUTORIAL
Bio 242 BIOINFORMATICS TUTORIAL Bio 242 α Amylase Lab Sequence Sequence Searches: BLAST Sequence Alignment: Clustal Omega 3d Structure & 3d Alignments DO NOT REMOVE FROM LAB. DO NOT WRITE IN THIS DOCUMENT.
Sequence Formats and Sequence Database Searches. Gloria Rendon SC11 Education June, 2011
Sequence Formats and Sequence Database Searches Gloria Rendon SC11 Education June, 2011 Sequence A is the primary structure of a biological molecule. It is a chain of residues that form a precise linear
Bioinformatics Grid - Enabled Tools For Biologists.
Bioinformatics Grid - Enabled Tools For Biologists. What is Grid-Enabled Tools (GET)? As number of data from the genomics and proteomics experiment increases. Problems arise for the current sequence analysis
Protein & DNA Sequence Analysis. Bobbie-Jo Webb-Robertson May 3, 2004
Protein & DNA Sequence Analysis Bobbie-Jo Webb-Robertson May 3, 2004 Sequence Analysis Anything connected to identifying higher biological meaning out of raw sequence data. 2 Genomic & Proteomic Data Sequence
Genome Explorer For Comparative Genome Analysis
Genome Explorer For Comparative Genome Analysis Jenn Conn 1, Jo L. Dicks 1 and Ian N. Roberts 2 Abstract Genome Explorer brings together the tools required to build and compare phylogenies from both sequence
SGI. High Throughput Computing (HTC) Wrapper Program for Bioinformatics on SGI ICE and SGI UV Systems. January, 2012. Abstract. Haruna Cofer*, PhD
White Paper SGI High Throughput Computing (HTC) Wrapper Program for Bioinformatics on SGI ICE and SGI UV Systems Haruna Cofer*, PhD January, 2012 Abstract The SGI High Throughput Computing (HTC) Wrapper
Welcome to the Plant Breeding and Genomics Webinar Series
Welcome to the Plant Breeding and Genomics Webinar Series Today s Presenter: Dr. Candice Hansey Presentation: http://www.extension.org/pages/ 60428 Host: Heather Merk Technical Production: John McQueen
Integration of data management and analysis for genome research
Integration of data management and analysis for genome research Volker Brendel Deparment of Zoology & Genetics and Department of Statistics Iowa State University 2112 Molecular Biology Building Ames, Iowa
GenBank: A Database of Genetic Sequence Data
GenBank: A Database of Genetic Sequence Data Computer Science 105 Boston University David G. Sullivan, Ph.D. An Explosion of Scientific Data Scientists are generating ever increasing amounts of data. Relevant
Getting started in Bio::Perl 1) Simple script to get a sequence by Id and write to specified format
BIOPERL TUTORIAL (ABREV.) Getting started in Bio::Perl 1) Simple script to get a sequence by Id and write to specified format use Bio::Perl; # this script will only work if you have an internet connection
Bioinformática BLAST. Blast information guide. Buscas de sequências semelhantes. Search for Homologies BLAST
BLAST Bioinformática Search for Homologies BLAST BLAST - Basic Local Alignment Search Tool http://blastncbinlmnihgov/blastcgi 1 2 Blast information guide Buscas de sequências semelhantes http://blastncbinlmnihgov/blastcgi?cmd=web&page_type=blastdocs
This document presents the new features available in ngklast release 4.4 and KServer 4.2.
This document presents the new features available in ngklast release 4.4 and KServer 4.2. 1) KLAST search engine optimization ngklast comes with an updated release of the KLAST sequence comparison tool.
Version 5.0 Release Notes
Version 5.0 Release Notes 2011 Gene Codes Corporation Gene Codes Corporation 775 Technology Drive, Ann Arbor, MI 48108 USA 1.800.497.4939 (USA) +1.734.769.7249 (elsewhere) +1.734.769.7074 (fax) www.genecodes.com
Geospiza s Finch-Server: A Complete Data Management System for DNA Sequencing
KOO10 5/31/04 12:17 PM Page 131 10 Geospiza s Finch-Server: A Complete Data Management System for DNA Sequencing Sandra Porter, Joe Slagel, and Todd Smith Geospiza, Inc., Seattle, WA Introduction The increased
At the end of this lesson, you will be able to create a Request Set to run all of your monthly statements and detail reports at one time.
Request Set Creation You can use a Request Set to run all of your monthly reports at one time, such as your Department Statements, Project Statements and RIT Account Analysis reports. A Request Set allows
Laboratorio di Bioinformatica
Laboratorio di Bioinformatica Lezione #2 Dr. Marco Fondi Contact: [email protected] www.unifi.it/dblemm/ tel. 0552288308 Dip.to di Biologia Evoluzionistica Laboratorio di Evoluzione Microbica e Molecolare,
How To Use The Librepo Software On A Linux Computer (For Free)
An introduction to Linux for bioinformatics Paul Stothard March 11, 2014 Contents 1 Introduction 2 2 Getting started 3 2.1 Obtaining a Linux user account....................... 3 2.2 How to access your
Biological Sequence Data Formats
Biological Sequence Data Formats Here we present three standard formats in which biological sequence data (DNA, RNA and protein) can be stored and presented. Raw Sequence: Data without description. FASTA
Installation Guide for AmiRNA and WMD3 Release 3.1
Installation Guide for AmiRNA and WMD3 Release 3.1 by Joffrey Fitz and Stephan Ossowski 1 Introduction This document describes the installation process for WMD3/AmiRNA. WMD3 (Web Micro RNA Designer version
Bio-Informatics Lectures. A Short Introduction
Bio-Informatics Lectures A Short Introduction The History of Bioinformatics Sanger Sequencing PCR in presence of fluorescent, chain-terminating dideoxynucleotides Massively Parallel Sequencing Massively
(http://genomes.urv.es/caical) TUTORIAL. (July 2006)
(http://genomes.urv.es/caical) TUTORIAL (July 2006) CAIcal manual 2 Table of contents Introduction... 3 Required inputs... 5 SECTION A Calculation of parameters... 8 SECTION B CAI calculation for FASTA
Databases and mapping BWA. Samtools
Databases and mapping BWA Samtools FASTQ, SFF, bax.h5 ACE, FASTG FASTA BAM/SAM GFF, BED GenBank/Embl/DDJB many more File formats FASTQ Output format from Illumina and IonTorrent sequencers. Quality scores:
Choices, choices, choices... Which sequence database? Which modifications? What mass tolerance?
Optimization 1 Choices, choices, choices... Which sequence database? Which modifications? What mass tolerance? Where to begin? 2 Sequence Databases Swiss-prot MSDB, NCBI nr dbest Species specific ORFS
org.rn.eg.db December 16, 2015 org.rn.egaccnum is an R object that contains mappings between Entrez Gene identifiers and GenBank accession numbers.
org.rn.eg.db December 16, 2015 org.rn.egaccnum Map Entrez Gene identifiers to GenBank Accession Numbers org.rn.egaccnum is an R object that contains mappings between Entrez Gene identifiers and GenBank
SQL Server Instance-Level Benchmarks with DVDStore
SQL Server Instance-Level Benchmarks with DVDStore Dell developed a synthetic benchmark tool back that can run benchmark tests against SQL Server, Oracle, MySQL, and PostgreSQL installations. It is open-sourced
Sequence Database Administration
Sequence Database Administration 1 When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want to search other databases
Introduction to Bioinformatics AS 250.265 Laboratory Assignment 6
Introduction to Bioinformatics AS 250.265 Laboratory Assignment 6 In the last lab, you learned how to perform basic multiple sequence alignments. While useful in themselves for determining conserved residues
Linear Sequence Analysis. 3-D Structure Analysis
Linear Sequence Analysis What can you learn from a (single) protein sequence? Calculate it s physical properties Molecular weight (MW), isoelectric point (pi), amino acid content, hydropathy (hydrophilic
An agent-based layered middleware as tool integration
An agent-based layered middleware as tool integration Flavio Corradini Leonardo Mariani Emanuela Merelli University of L Aquila University of Milano University of Camerino ITALY ITALY ITALY Helsinki FSE/ESEC
Supervised DNA barcodes species classification: analysis, comparisons and results. Tutorial. Citations
Supervised DNA barcodes species classification: analysis, comparisons and results Emanuel Weitschek, Giulia Fiscon, and Giovanni Felici Citations If you use this procedure please cite: Weitschek E, Fiscon
Database manager does something that sounds trivial. It makes it easy to setup a new database for searching with Mascot. It also makes it easy to
1 Database manager does something that sounds trivial. It makes it easy to setup a new database for searching with Mascot. It also makes it easy to automate regular updates of these databases. 2 However,
Analyzing A DNA Sequence Chromatogram
LESSON 9 HANDOUT Analyzing A DNA Sequence Chromatogram Student Researcher Background: DNA Analysis and FinchTV DNA sequence data can be used to answer many types of questions. Because DNA sequences differ
Skills Funding Agency
Provider Data Self Assessment Toolkit (PDSAT) v15 User Guide Contents Introduction... 3 1 Before You Start... 4 1.1 Compatibility... 4 1.2 Extract PDSAT... 4 1.3 Trust Center... 4 2. Using PDSAT... 6 2.1
William E Benjamin Jr, Owl Computer Consultancy, LLC
So, You ve Got Data Enterprise Wide (SAS, ACCESS, EXCEL, MySQL, Oracle, and Others); Well, Let SAS Enterprise Guide Software Point-n-Click Your Way to Using It. William E Benjamin Jr, Owl Computer Consultancy,
When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want
1 When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want to search other databases as well. There are very
Sequence Analysis 15: lecture 5. Substitution matrices Multiple sequence alignment
Sequence Analysis 15: lecture 5 Substitution matrices Multiple sequence alignment A teacher's dilemma To understand... Multiple sequence alignment Substitution matrices Phylogenetic trees You first need
High Performance Computing with Sun Grid Engine on the HPSCC cluster. Fernando J. Pineda
High Performance Computing with Sun Grid Engine on the HPSCC cluster Fernando J. Pineda HPSCC High Performance Scientific Computing Center (HPSCC) " The Johns Hopkins Service Center in the Dept. of Biostatistics
Oracle SOA Suite 11g Oracle SOA Suite 11g HL7 Inbound Example Functional ACK Addendum
Oracle SOA Suite 11g Oracle SOA Suite 11g HL7 Inbound Example Functional ACK Addendum [email protected] June 2010 Table of Contents Introduction... 1 Pre-requisites... 1 HL7 v2 Receiver Solution...
Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh
1 Hadoop: A Framework for Data- Intensive Distributed Computing CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 2 What is Hadoop? Hadoop is a software framework for distributed processing of large datasets
OTN Developer Day: Oracle Big Data
OTN Developer Day: Oracle Big Data Hands On Lab Manual Oracle Big Data Connectors: Introduction to Oracle R Connector for Hadoop ORACLE R CONNECTOR FOR HADOOP 2.0 HANDS-ON LAB Introduction to Oracle R
Consensus alignment server for reliable comparative modeling with distant templates
W50 W54 Nucleic Acids Research, 2004, Vol. 32, Web Server issue DOI: 10.1093/nar/gkh456 Consensus alignment server for reliable comparative modeling with distant templates Jahnavi C. Prasad 1, Sandor Vajda
A basic create statement for a simple student table would look like the following.
Creating Tables A basic create statement for a simple student table would look like the following. create table Student (SID varchar(10), FirstName varchar(30), LastName varchar(30), EmailAddress varchar(30));
Handling next generation sequence data
Handling next generation sequence data a pilot to run data analysis on the Dutch Life Sciences Grid Barbera van Schaik Bioinformatics Laboratory - KEBB Academic Medical Center Amsterdam Very short intro
Step by Step Guide to Importing Genetic Data into JMP Genomics
Step by Step Guide to Importing Genetic Data into JMP Genomics Page 1 Introduction Data for genetic analyses can exist in a variety of formats. Before this data can be analyzed it must imported into one
Module 1. Sequence Formats and Retrieval. Charles Steward
The Open Door Workshop Module 1 Sequence Formats and Retrieval Charles Steward 1 Aims Acquaint you with different file formats and associated annotations. Introduce different nucleotide and protein databases.
Discovering Bioinformatics
Discovering Bioinformatics Sami Khuri Natascha Khuri Alexander Picker Aidan Budd Sophie Chabanis-Davidson Julia Willingale-Theune English version ELLS European Learning Laboratory for the Life Sciences
000-420. IBM InfoSphere MDM Server v9.0. Version: Demo. Page <<1/11>>
000-420 IBM InfoSphere MDM Server v9.0 Version: Demo Page 1. As part of a maintenance team for an InfoSphere MDM Server implementation, you are investigating the "EndDate must be after StartDate"
BUDAPEST: Bioinformatics Utility for Data Analysis of Proteomics using ESTs
BUDAPEST: Bioinformatics Utility for Data Analysis of Proteomics using ESTs Richard J. Edwards 2008. Contents 1. Introduction... 2 1.1. Version...2 1.2. Using this Manual...2 1.3. Why use BUDAPEST?...2
ibolt V3.2 Release Notes
ibolt V3.2 Release Notes Welcome to ibolt V3.2, which has been designed to deliver an easy-touse, flexible, and cost-effective business integration solution. This document highlights the new and enhanced
Call Recorder Quick CD Access System
Call Recorder Quick CD Access System V4.0 VC2010 Contents 1 Call Recorder Quick CD Access System... 3 1.1 Install the software...4 1.2 Start...4 1.3 View recordings on CD...5 1.4 Create an archive on Hard
IceWarp to IceWarp Server Migration
IceWarp to IceWarp Server Migration Registered Trademarks iphone, ipad, Mac, OS X are trademarks of Apple Inc., registered in the U.S. and other countries. Microsoft, Windows, Outlook and Windows Phone
Oracle Fusion Middleware
Oracle Fusion Middleware Oracle WebCenter Forms Recognition/Capture Integration Guide 11g Release 1 (11.1.1) E49971-01 November 2013 Oracle WebCenter Forms Recognition is a learning-based solution that
GenBank, Entrez, & FASTA
GenBank, Entrez, & FASTA Nucleotide Sequence Databases First generation GenBank is a representative example started as sort of a museum to preserve knowledge of a sequence from first discovery great repositories,
Package hoarder. June 30, 2015
Type Package Title Information Retrieval for Genetic Datasets Version 0.1 Date 2015-06-29 Author [aut, cre], Anu Sironen [aut] Package hoarder June 30, 2015 Maintainer Depends
TRIM: Web Tool. Web Address The TRIM web tool can be accessed at:
TRIM: Web Tool Accessing TRIM Records through the Web The TRIM web tool is primarily aimed at providing access to records in the TRIM system. While it is possible to place records into TRIM or amend records
EMBL-EBI Web Services
EMBL-EBI Web Services Rodrigo Lopez Head of the External Services Team SME Workshop Piemonte 2011 EBI is an Outstation of the European Molecular Biology Laboratory. Summary Introduction The JDispatcher
TRIFORCE ANJP. THE POWER TO PROVE sm USER S GUIDE USER S GUIDE TRIFORCE ANJP VERSION 3.10
TRIFORCE ANJP THE POWER TO PROVE sm USER S GUIDE USER S GUIDE TRIFORCE ANJP VERSION 3.10 TRIFORCE ANJP USER S GUIDE 2 Contents LET'S BEGIN... 5 SAY HELLO TO ANJP... 5 RUNNING ANJP... 6 Software Activation...
Monitoring Replication
Monitoring Replication Article 1130112-02 Contents Summary... 3 Monitor Replicator Page... 3 Summary... 3 Status... 3 System Health... 4 Replicator Configuration... 5 Replicator Health... 6 Local Package
Lecture Outline. Introduction to Databases. Introduction. Data Formats Sample databases How to text search databases. Shifra Ben-Dor Irit Orr
Introduction to Databases Shifra Ben-Dor Irit Orr Lecture Outline Introduction Data and Database types Database components Data Formats Sample databases How to text search databases What units of information
MyOra 3.0. User Guide. SQL Tool for Oracle. Jayam Systems, LLC
MyOra 3.0 SQL Tool for Oracle User Guide Jayam Systems, LLC Contents Features... 4 Connecting to the Database... 5 Login... 5 Login History... 6 Connection Indicator... 6 Closing the Connection... 7 SQL
Snapshot Reports for 800xA User Guide
Snapshot Reports for 800xA User Guide System Version 5.1 Power and productivity for a better world TM Snapshot Reports for 800xA User Guide System Version 5.1 NOTICE This document contains information
3. About R2oDNA Designer
3. About R2oDNA Designer Please read these publications for more details: Casini A, Christodoulou G, Freemont PS, Baldwin GS, Ellis T, MacDonald JT. R2oDNA Designer: Computational design of biologically-neutral
Introduction to GCG and SeqLab
Oxford University Bioinformatics Centre Introduction to GCG and SeqLab 31 July 2001 Oxford University Bioinformatics Centre, 2001 Sir William Dunn School of Pathology South Parks Road Oxford, OX1 3RE Contents
A Multiple DNA Sequence Translation Tool Incorporating Web Robot and Intelligent Recommendation Techniques
Proceedings of the 2007 WSEAS International Conference on Computer Engineering and Applications, Gold Coast, Australia, January 17-19, 2007 402 A Multiple DNA Sequence Translation Tool Incorporating Web
DB Administration COMOS. Platform DB Administration. Trademarks 1. Prerequisites. MS SQL Server 2005/2008 3. Oracle. Operating Manual 09/2011
Trademarks 1 Prerequisites 2 COMOS Platform MS SQL Server 2005/2008 3 Oracle 4 Operating Manual 09/2011 A5E03638301-01 Legal information Legal information Warning notice system This manual contains notices
Vaxign Reverse Vaccinology Software Demo Introduction Zhuoshuang Allen Xiang, Yongqun Oliver He
Vaxign Reverse Vaccinology Software Demo Introduction Zhuoshuang Allen Xiang, Yongqun Oliver He Unit for Laboratory Animal Medicine Department of Microbiology and Immunology Center for Computational Medicine
