Cahier de réalisation

Similar documents
RNA- seq de novo ABiMS

Liste d'adresses URL

Audit de sécurité avec Backtrack 5

MINING DATA BANK OF THE ACP STATES

REQUEST FORM FORMULAIRE DE REQUÊTE

REQUEST FORM FORMULAIRE DE REQUÊTE

Travaux publics et Services gouvernementaux Canada. Solicitation No. - N de l'invitation publics et Services gouvernementaux Canada

STUDENT APPLICATION FORM (Dossier d Inscription) ACADEMIC YEAR (Année Scolaire )

Office of the Auditor General / Bureau du vérificateur général FOLLOW-UP TO THE 2010 AUDIT OF COMPRESSED WORK WEEK AGREEMENTS 2012 SUIVI DE LA

Capacity building and Strengthening of the implementation of IOTC Conservation and management Measures. Madagascar

Note concernant votre accord de souscription au service «Trusted Certificate Service» (TCS)

RAPPORT FINANCIER ANNUEL PORTANT SUR LES COMPTES 2014

BUSINESS PROCESS OPTIMIZATION. OPTIMIZATION DES PROCESSUS D ENTERPRISE Comment d aborder la qualité en améliorant le processus

site et appel d'offres

REQUEST FORM FORMULAIRE DE REQUETE

Report to Rapport au: Council Conseil 9 December 2015 / 9 décembre Submitted on October 26, 2015 Soumis le 26 octobre 2015


Centre International de Hautes Etudes Agronomiques MÄditerranÄennes. International Centre for Advanced Mediterranean Agronomic Studies POINT 9

System Requirements Orion

Introduction au BIM. ESEB Seyssinet-Pariset Economie de la construction contact@eseb.fr

Asset management in urban drainage

Implementation of SAP-GRC with the Pictet Group

Data Analysis & Management of High-throughput Sequencing Data. Quoclinh Nguyen Research Informatics Genomics Core / Medical Research Institute

«Object-Oriented Multi-Methods in Cecil» Craig Chambers (Cours IFT6310, H08)

Travaux publics et Services gouvernementaux Canada. Solicitation No. - N de l'invitation. pd

Travaux publics et Services gouvernementaux Canada. Title - Sujet INFRASTRUCTURE MANAGEMENT SERVICES. Solicitation No. - N de l'invitation

Power Distribution System. Additional Information on page 2 See Page 2 Page 6. Eaton. See Page 2. Additional Information on page 2

Langages Orientés Objet Java

Introduction to NGS data analysis

ISO/TC 46/SC 11 N 1401

Advanced Software Engineering Agile Software Engineering. Version 1.0

Setting up a monitoring and remote control tool

ISO/TC 46/SC 11 N 1445

Enterprise Informa/on Modeling: An Integrated Way to Track and Measure Asset Performance

DHI a.s. Na Vrsich 51490/5, , Prague 10, Czech Republic ( t.metelka@dhi.cz, z.svitak@dhi.cz )

SIXTH FRAMEWORK PROGRAMME PRIORITY [6

CB Test Certificates

Introduction. GEAL Bibliothèque Java pour écrire des algorithmes évolutionnaires. Objectifs. Simplicité Evolution et coévolution Parallélisme

French Property Registering System: Evolution to a Numeric Format?

Licence Informatique Année Exceptions

"Internationalization vs. Localization: The Translation of Videogame Advertising"

ENERGY SERVICES& ESCOS & THE ROLE OFBELESCO LIEVEN VANSTRAELEN IN BELGIUM

Product / Produit Description Duration /Days Total / Total

Régression logistique : introduction

Travaux publics et Services gouvernementaux Canada. Title - Sujet LEARNING SERVICES. Solicitation No. - N de l'invitation E60ZH

Travaux publics et Services gouvernementaux Canada. Title - Sujet court reporting services. Solicitation No. - N de l'invitation /A

Modifier le texte d'un élément d'un feuillet, en le spécifiant par son numéro d'index:

Personnalisez votre intérieur avec les revêtements imprimés ALYOS design

Tutorial for Windows and Macintosh. Preparing Your Data for NGS Alignment

Survey on Conference Services provided by the United Nations Office at Geneva

Travaux publics et Services gouvernementaux Canada. Solicitation No. - N de l'invitation E60PD-14PAPR. pd032.e60pd-14papr

Benin Business visa Application

Fondation Rennes 1. Atelier de l innovation. Fondation Rennes 1. Fondation Rennes 1 MANAGEMENT AGILE. Fondation Rennes 1 ET INNOVATION

Memory Eye SSTIC Yoann Guillot. Sogeti / ESEC R&D yoann.guillot(at)sogeti.com

2 RENSEIGNEMENTS CONCERNANT L ASSURÉ SI CELUI-CI N EST PAS LE REQUÉRANT INFORMATION CONCERNING THE INSURED PERSON IF OTHER THAN THE APPLICANT

TREATIES AND OTHER INTERNATIONAL ACTS SERIES Agreement Between the UNITED STATES OF AMERICA and CONGO

Women and business in a knowledge-based society: integrated services network to sustain women's employment

DESIGN & PROTOTYPAGE. ! James Eagan james.eagan@telecom-paristech.fr

BILL C-665 PROJET DE LOI C-665 C-665 C-665 HOUSE OF COMMONS OF CANADA CHAMBRE DES COMMUNES DU CANADA

Bac + 04 Licence en science commerciale, option marketing et communication. Degree in computer science, engineering or equivalent

SOLICITATION AMENDMENT MODIFICATION DE L INVITATION

Hours: The hours for the class are divided between practicum and in-class activities. The dates and hours are as follows:

Bridder Soumissionnaire (RFP) For English Only Transcription and Interviews

Travaux publics et Services gouvernementaux Canada. Title - Sujet Furniture for Work Spaces. Solicitation No. - N de l'invitation E60PQ

FACULTY OF MANAGEMENT MBA PROGRAM

SEMIDE EMWIS. Discussion and validation of meta-data. Eric Mino EMWIS Technical Unit.

AgroMarketDay. Research Application Summary pp: Abstract

mvam: mobile technology for remote food security surveys mvam: technologie mobile pour effectuer des enquêtes à distance sur la sécurité alimentaire

Action of organization-0s

Millier Dickinson Blais

Account Manager H/F - CDI - France

Le projet européen ECOLABEL

SCHOLARSHIP ANSTO FRENCH EMBASSY (SAFE) PROGRAM 2016 APPLICATION FORM

Introduction to transcriptome analysis using High Throughput Sequencing technologies (HTS)

Introduction ToIP/Asterisk Quelques applications Trixbox/FOP Autres distributions Conclusion. Asterisk et la ToIP. Projet tuteuré

sept-2002 Computer architecture and software cells for broadband networks Va avec

Travaux publics et Services gouvernementaux Canada. Title - Sujet Project Manager -Release Management

Switching Power Supply XP POWER INC. SUITE 150, 1241 E DYER RD SANTA ANA CA 92705, USA XP POWER INC SUITE 150, 1241 E DYER RD SANTA ANA CA 92705, USA

COLLABORATIVE LCA. Rachel Arnould and Thomas Albisser. Hop-Cube, France

Gabon Tourist visa Application for citizens of Canada living in Alberta

Travaux publics et Services gouvernementaux Canada. Title - Sujet SYSTEMS INTEGRATION - SBIPS. Solicitation No. - N de l'invitation /A

Calcul parallèle avec R

Travaux publics et Services gouvernementaux Canada. Title - Sujet HRSDC FUNCTIONAL SUPPORT. Solicitation No. - N de l'invitation G /A

How To Download Openoffice 40 Manual And User Guide

Archived Content. Contenu archivé

FINAL DRAFT INTERNATIONAL STANDARD

openoffice 32 manual : The User's Guide


Archived Content. Contenu archivé

CNC Kompetenzzenter mit der multimedialen Software

Travaux publics et Services gouvernementaux Canada. Title - Sujet Province House, Techinical/Design

RFP AMENDMENT#2 DDP MODIFICATION#2

Travaux publics et Services gouvernementaux Canada. Solicitation No. - N de l'invitation publics et Services gouvernementaux Canada

Expérience appui ANR Zimbabwe (Medicines Control Authority of Zimbabwe -MCAZ) Corinne Pouget -AEDES

[ dessins, collages, illustrations, etcetera ]

"Templating as a Strategy for Translating Official Documents from Spanish to English"

CAMÉRAS EMCCD POUR LA QUANTIFICATION DE FLUORESCENCE IN VIVO

Travaux publics et Services gouvernementaux Canada. Title - Sujet FLEET MANAGEMENT SUPPORT SRVCS. Solicitation No. - N de l'invitation E60HP-11FMSS

Shared Services Canada Services partagés Canada RETURN BIDS TO : RETOURNER LES SOUMISSIONS À:

SOLICITATION AMENDMENT MODIFICATION DE L INVITATION

Transcription:

Référence : cahier_realisation_mini_projet-sepia-theba-1.0 Page : 1/8 Cahier de réalisation SEPIA THEBA REDACTION Nom, prénom Gildas Le Corguillé Erwan Corre Unité ABiMS ABiMS Version Date Nature des modifications 1.0 10-07-13 création HISTORIQUE DU DOCUMENT

SEPIA THEBA Page : 2/8 A. Nature de la demande < Description succincte > Réalisation d un transcriptome de référence de l espece Theba pisana. comparaison de ce transcriptome avec le transcriptome de Sepia officinalis < Enumération au plus haut niveau des attendus du projet par ordre décroissant de priorité. > Besoins (par ordre décroissant de priorité) Formulé à l'itération N... : 1 Elaboration d un transcriptome de référence de T. pisana 1 2 Comparaison des transcriptomes S. officinalis et T. pisana 2 B. Limites du projet < Description des demandes connexes à celles du projet, mais non prises en charge dans les livrables. > C. Adresses Web de gestion de projet Page principale Sources Tableau de bord D. Livrables < Tableau énumérant chaque livrable du projet, avec une indication sur sa nature (base de donnée, application Web, document, service ), le (les) besoin(s) auquel (auxquels) il répond, ainsi qu une brève description. > Nom Nature Besoin(s) concerné(s) Description 1 Theba transcriptome Fastq files 1 Theba de novo transcriptome assembly 2

SEPIA THEBA Page : 3/8 E. Itérations

SEPIA THEBA Page : 4/8 1. Itération 1 A. Descriptif < Un descriptif par itération. > Date de début Date de fin prévue 08-07- 13 12-07- 13 Date de fin effective Livrable(s) concerné(s) 1 B. Step 1 Quality check Command Line fastqc - t 8 - o /projet/sbr/tara/work/tmp/raw/theba/raw/ /projet/sbr/tara/work/tmp/raw/theba/raw/mspp1_l1_1.fq fastqc - t 8 - o /projet/sbr/tara/work/tmp/raw/theba/raw/ /projet/sbr/tara/work/tmp/raw/theba/raw/mspp1_l1_2.fq no adapter sequences in the raw file and few sequences corresponding to mitochondrial 16rRNA 2 Cleaning 2.1 first printseq- lite step Parameters - Trim poly- N tail with a minimum length of 1 at the 5'- end. (- trim_ns_right) - Trim poly- N tail with a minimum length of 1 at the 3'- end. (- trim_ns_left) - Filter sequence with more than 0 Ns. (- ns_max_n) - Trim sequence by quality score from the 3'- end with this threshold score : 20 (- trim_qual_right) - Filter sequence with quality score mean below 30 (- min_qual_mean) - Filter sequence shorter than 50 (- min_len) - Filter sequence with characters other than A, C, G, T or N. (- noniupac) prinseq- lite.pl - trim_ns_right 1 - trim_ns_left 1 - ns_max_n 0 - trim_qual_right 20 - min_qual_mean 30 - min_len 50 noniupac - fastq /projet/sbr/tara/work/tmp/raw/theba/raw/mspp1_l1_1.fq - out_good /projet/sbr/tara/work/tmp/raw/theba/raw/mspp1_l1_1_good.fq - out_bad /projet/sbr/tara/work/tmp/raw/theba/raw/mspp1_l1_1_bad.fq prinseq- lite.pl - trim_ns_right 1 - trim_ns_left 1 - ns_max_n 0 - trim_qual_right 20 - min_qual_mean 30 - min_len 50 noniupac - fastq /projet/sbr/tara/work/tmp/raw/theba/raw/mspp1_l1_2.fq - out_good /projet/sbr/tara/work/tmp/raw/theba/raw/mspp1_l1_2_good.fq - out_bad /projet/sbr/tara/work/tmp/raw/theba/raw/mspp1_l1_2_bad.fq

SEPIA THEBA Page : 5/8 Input and filter stats for MsPp1_l1_1.fq : Input sequences: 27,184,390 Input bases: 2,446,595,100 Input mean length: 90.00 Good sequences: 27,148,012 (99.87%) Good bases: 2,443,270,870 Good mean length: 90.00 Bad sequences: 36,378 (0.13%) Bad bases: 3,274,020 Bad mean length: 90.00 Sequences filtered by specified parameters: ns_max_n: 36378 Input and filter stats for MsPp1_l1_2.fq : Input sequences: 27,184,390 Input bases: 2,446,595,100 Input mean length: 90.00 Good sequences: 27,176,020 (99.97%) Good bases: 2,445,841,610 Good mean length: 90.00 Bad sequences: 8,370 (0.03%) Bad bases: 753,300 Bad mean length: 90.00 Sequences filtered by specified parameters: ns_max_n: 8370 2.2 second printseq- lite step Parameters - Trim poly- A/T tail with a minimum length of 5 at the 5'- end. (- trim_tail_left) - Trim poly- A/T tail with a minimum length of 5 at the 3'- end. (- trim_tail_right) - Method to filter low complexity sequences. The current options is entropy (lc_method) - The threshold value (between 0 and 100) used to filter sequences by sequence complexity. The dust method uses this as maximum allowed score and the entropy method as minimum allowed value : 70 (- lc_threshold) - Filter sequence shorter than 50 (- min_len) prinseq- lite.pl - trim_tail_left 5 - trim_tail_right 5 - lc_method entropy - lc_threshold 70 - min_len 50 - fastq /projet/sbr/tara/work/tmp/raw/theba/raw/mspp1_l1_1_good.fq - out_good /projet/sbr/tara/work/tmp/raw/theba/raw/mspp1_l1_1_good2.fq - out_bad /projet/sbr/tara/work/tmp/raw/theba/raw/mspp1_l1_1_bad2.fq prinseq- lite.pl - trim_tail_left 5 - trim_tail_right 5 - lc_method entropy - lc_threshold 70 - min_len 50 - fastq /projet/sbr/tara/work/tmp/raw/theba/raw/mspp1_l1_2_good.fq - out_good /projet/sbr/tara/work/tmp/raw/theba/raw/mspp1_l1_2_good2.fq - out_bad /projet/sbr/tara/work/tmp/raw/theba/raw/mspp1_l1_2_bad2.fq Input and filter stats or MsPp1_l1_1.fq Input sequences: 27,148,012 Input bases: 2,443,270,870 Input mean length: 90.00 Good sequences: 25,387,568 (93.52%) Good bases: 2,281,657,234 Good mean length: 89.87 Bad sequences: 1,760,444 (6.48%) Bad bases: 158,436,820 Bad mean length: 90.00 Sequences filtered by specified parameters: trim_tail_left: 2715 trim_tail_right: 116 min_len: 13344 lc_method: 1744269 Input and filter stats for MsPp1_l1_2.fq Input sequences: 27,176,020 Input bases: 2,445,841,610 Input mean length: 90.00 Good sequences: 25,422,144 (93.55%) Good bases: 2,284,841,088 Good mean length: 89.88 Bad sequences: 1,753,876 (6.45%) Bad bases: 157,848,827 Bad mean length: 90.00 Sequences filtered by specified parameters: trim_tail_left: 2055 trim_tail_right: 94 min_len: 12213 lc_method: 1739514 2.3 Ribosomal sequence cleaning

SEPIA THEBA Page : 6/8 ribopicker.pl - f /projet/sbr/tara/work/tmp/raw/theba/raw/mspp1_l1_1_good2.fq - dbs rrnadb - out_dir /projet/sbr/tara/work/tmp/raw/theba/raw/ ribopicker.pl - f /projet/sbr/tara/work/tmp/raw/theba/raw/mspp1_l1_2_good2.fq - dbs rrnadb - out_dir /projet/sbr/tara/work/tmp/raw/theba/raw/ MsPp1_l1_1_nonrrna.fq : 23473103 reads (83,34% of the raw reads) MsPp1_l1_1_rrna.fq : 1914465 reads (7,54% of the cleaned reads) MsPp1_l1_2_nonrrna.fq : 23510076 reads (86,48% of the raw reads) MsPp1_l1_2_rrna.fq : 1912068 reads (7,52% of the cleaned reads) 2.4 Pairing resulting reads for normalized by kmer trinity assembly and expression value estimation count get_pairs.py /projet/sbr/tara/work/tmp/raw/theba/raw/mspp1_l1_1_nonrrna.fq /projet/sbr/tara/work/tmp/raw/theba/raw/mspp1_l1_2_nonrrna.fq MsPp1_l1_1_nonrrna.paired.fq : 22752577 reads MsPp1_l1_2_nonrrna.paired.fq : 22752577 reads MsPp1_l1_1_nonrrna.unpaired.fq : 720526 reads MsPp1_l1_2_nonrrna.unpaired.fq : 757499 reads 3 Assembly 3.1 paired de novo assembly /usr/local/genome2/trinityrnaseq/trinity.pl - - seqtype fq - - output /projet/sbr/tara/work/tmp/raw/theba/trinity_assembly/ - - seqtype fq - - left /projet/sbr/tara/work/tmp/raw/theba/raw/mspp1_l1_1_nonrrna.fq - - right /projet/sbr/tara/work/tmp/raw/theba/raw/mspp1_l1_2_nonrrna.fq - - CPU 10 - - JM 100G Trinity.fasta 66206 For comparison a denovo assembly with the raw reads (without cleanning) produce Trinity_raw_reads.fasta 23580832 5007 356.17 24357398 4559 356.81 68264 3.2 normalized by kmer assembly /usr/local/genome2/trinityrnaseq_r3-02- 25/util/normalize_by_kmer_coverage.pl - - seqtype fq - - JM 100G - - max_cov 30 - - left /projet/sbr/tara/work/tmp/raw/theba/raw/mspp1_l1_1_nonrrna.paired.fq - - right

SEPIA THEBA Page : 7/8 /projet/sbr/tara/work/tmp/raw/theba/raw/mspp1_l1_2_nonrrna.paired.fq - - output /projet/sbr/tara/work/tmp/raw/theba/trinity_assembly_norm/ - - pairs_together - - PARALLEL_STATS - - JELLY_CPU 10 Paired reads normalized : MsPp1_l1_1_nonrrna.paired.fq.normalized_K25_C30_pctSD100.fq MsPp1_l1_2_nonrrna.paired.fq.normalized_K25_C30_pctSD100.fq /usr/local/genome2/trinityrnaseq/trinity.pl - - seqtype fq - - output /projet/sbr/tara/work/tmp/raw/theba/trinity_assembly_norm/ - - seqtype fq - - left /projet/sbr/tara/work/tmp/raw/theba/raw/mspp1_l 1_1_nonrrna.paired.fq.normalized_K25_C30_pctSD100.fq - - right /projet/sbr/tara/work/tmp/raw/theba/raw/mspp1_l1_2_nonrrna.paired.fq.normalized_k25_c30_pctsd100.fq - - CPU 10 - - JM 100G Trinity_norm.fasta 64144 22459171 4584 350.14 4 Annotation Trinotate annotation process including : blastp versus uniprot, hmmsearch against pfam, tmhmm search, signalp search /usr/local/genome2/scripts/trinotatewrapper/trinotatewrapper 2 excel files Trinity_annotation_report.xls Trinity_norm_annotation_report.xls 5 Expression 5.1 Concatenation of singles reads cat MsPp1_l1_1_nonrrna.paired.fq MsPp1_l1_1_nonrrna.unpaired.fq MsPp1_l1_2_nonrrna.unpaired.fq > MsPp1_l1_12_nonrrna.single.fq 5.2 remapping and counting Executed for both trinity assemblies (normalized and not normalized) /usr/local/genome2/trinityrnaseq/util/rsem_util/run_rsem_align_n_estimate.pl - - transcripts Trinity.fasta - - seqtype fq - - single /projet/sbr/tara/work/tmp/raw/theba/raw/mspp1_l1 _12_nonrrna.single.fq 2 counting files for isoforms and genes for each Trinity assembly column description 'TPM' stands for Transcripts Per Million. It is a relative measure of transcript abundance. The sum of all transcripts' TPM is 1 million.

SEPIA THEBA Page : 8/8 'FPKM' stands for Fragments Per Kilobase of transcript per Million mapped reads. It is another relative measure of transcript abundance. 'IsoPct' stands for isoform percentage. It is the percentage of this transcript's abandunce over its parent gene's abandunce. If its parent gene has only one isoform or the gene information is not provided, this field will be set to 100. RSEM.isoforms.results RSEM.genes.results 5.3 filtering of low expressed transcripts and rare isoformes To filter out the likely transcript artifacts and lowly expressed transcripts, we consider retaining only those that represent at least 1% of the per- component (IsoPct) expression level and those with FPKM values > 1 for both assembles /usr/local/genome2/trinityrnaseq/util/filter_fasta_by_rsem_values.pl - r RSEM.isoforms.results - f Trinity.fasta - o Trinity_filtered.fasta - - fpkm_cutoff=1 - - isopct_cutoff=1.00 4 files Trinity_filtered.fasta.rsem : assembly expression values Trinity_filtered.fasta : assembly filtered Trinity_norm_filtered.fasta.rsem : normalized assembly expression values Trinity_norm_filtered.fasta : normalized assembly filtered Trinity_filtered.fasta Trinity_norm filtered.fasta transcripts) 63082 (vs. 66206 initial transcripts) 60599 (vs. 64144 initial 22274567 21034300 5007 353.10 4584 347.11 Overall results of the iteration 1 Data are accessibles on this web page : http://application.sb- roscoff.fr/download/fr2424/abims/corre/theba/