Galaxy4Bioinformatics Développement et intégration d application sous Galaxy Gildas Le Corguillé Gwendoline Andres Loraine Guéguen IFB-GT Galaxy Devteam March 4, 2015 9am 18am TOOL INTEGRATION Part I
CONTEXT 2
Layers bowtie2-wrapper.xml OS <command> <inputs> bowtie2 <outputs> 3
Layers bowtie2-wrapper.xml OS <command> <inputs> bowtie2 <outputs> 4
Layers bowtie2-wrapper.xml OS <command> <inputs> bowtie2 <outputs> 5
Layers bowtie2-wrapper.xml OS <command> <inputs> bowtie2 <outputs> 6
Layers bowtie2-wrapper.xml OS <command> <inputs> bowtie2 <outputs> 7
Prerequisites Tools installed on the server bowtie2 A wrapper? To manage input arguments To format outputs names To package outputs in zip or HTML page For other tricky situation bowtie2-wrapper.py bowtie2 8
Wrapper XML <tool id="hello" name="hello" version="0.01"> <description>world</description> <command> /bin/echo "Hello World ${mystring}" > ${output1} </command> <inputs> <param name="mystring" type="text" label="say something interesting"/> </inputs> <outputs> <data format="tabular" name="output1" label="hello_world"/> </outputs> <help> **What it does** Says hello </help> </tool> https://wiki.galaxyproject.org/admin/tools/toolconfigsyntax 9
Directory tree. config data_manager_conf.xml datatypes_conf.xml galaxy.ini job_conf.xml shed_data_manager_conf shed_tool_conf.xml shed_tool_data_table_conf.xml tool_conf.xml tool_data_table_conf.xml tool_shed.ini tool_sheds_conf.xml database files 000 dataset_10.dat dataset_13.dat 001 dataset_1033.dat dataset_1034.dat job_working_directory 000 1 2 001 1000 1001 pbs galaxy_1000.sh galaxy_1001.sh tmp tmp00dgyq tmp01bliz upload_file_data_xoizxk upload_file_data_xtjjoo integrated_tool_panel.xml paster.log paster.pid run_reports.sh run.sh run_tests.sh run_tool_shed.sh static welcome.html tool-data add_scores.loc alignseq.loc all_fasta.loc bfast_indexes.loc blastdb_p.loc shared ensembl gbrowse igv rviewer ucsc tools data_source genbank.py genbank.xml import.py import.xml ucsc_proxy.py ucsc_proxy.xml upload.py upload.xml filters grep.py grep.xml sorter.py sorter.xml trimmer.py trimmer.xml next_gen_conversion bwa_solid2fastq_modified.pl fastq_conversions.py fastq_conversions.xml fastq_gen_conv.py fastq_gen_conv.xml 10
Directory tree database/files History datasets database/job_working_directory Temporary files Only files described in <outputs> will be kept 11
BASIC 12
BASIC: TOOL SYNTAX 13
<tool> tag set <description> tag set <version_command> tag set <command> tag set <inputs> tag set <repeat> tag set <conditional> tag set <when> tag set <param> tag set <validator> tag set <option> tag set <options> tag set <column> tag set <filter> tag set <request_param_translation> tag set <request_param> tag set <append_param> tag set <value> tag set <value_translation> tag set <value> tag set <sanitizer> tag set <valid> tag set <add> and <remove> tag set <mapping> tag set <add> and <remove> tag set <configfiles> tag set <configfile> tag set <outputs> tag set <data> tag set <change_format> tag set <when> tag set ( change_format ) <actions> tag set <tests> tag set <test> tag set <param> tag set (functional tests) <output> tag set (functional tests) <assert_contents> tag set (functional tests) <page> tag set <code> tag set <requirements> tag set <requirement> tag set <stdio>, <regex>, and <exit_code> tag sets <exit_code> tag set <regex> tag set <help> tag set <citations> tag set https://wiki.galaxyproject.org/admin/tools/toolconfigsyntax 14
XML Tags <tool id="hello" name="hello" version="0.01"> <description>world</description> <command> /bin/echo "Hello World ${mystring}" > ${output1} </command> <inputs> <param name="mystring" type="text" label="say something interesting"/> </inputs> <outputs> <data format="tabular" name="output1" label="hello_world"/> </outputs> <help> **What it does** Says hello </help> </tool> https://wiki.galaxyproject.org/admin/tools/toolconfigsyntax 15
XML Tags <tool id="hello" name="hello" version="0.01"> <description>world</description> <command> /bin/echo "Hello World ${mystring}" > ${output1} </command> <inputs> <param name="mystring" type="text" label="say something interesting"/> </inputs> <outputs> <data format="tabular" name="output1" label="hello_world"/> </outputs> <help> **What it does** Says hello </help> </tool> https://wiki.galaxyproject.org/admin/tools/toolconfigsyntax 16
XML Tags <tool id="hello" name="hello" version="0.01"> <description>world</description> <command> /bin/echo "Hello World ${mystring}" > ${output1} </command> <inputs> <param name="mystring" type="text" label="say something interesting"/> </inputs> <outputs> <data format="tabular" name="output1" label="hello_world"/> </outputs> <help> **What it does** Job Command-Line: Says hello </help> </tool> /bin/echo "Hello World You are amazing" > /usr/local/galaxy-dist/database/files/000/dataset_9.dat https://wiki.galaxyproject.org/admin/tools/toolconfigsyntax 17
XML Tags <tool id="hello" name="hello" version="0.01"> <description>world</description> <command> /bin/echo "Hello World ${mystring}" > ${output1} </command> <inputs> <param name="mystring" type="text" label="say something interesting"/> </inputs> <outputs> <data format="tabular" name="output1" label="hello_world"/> </outputs> <help> **What it does** Says hello </help> Hello World You are amazing </tool> https://wiki.galaxyproject.org/admin/tools/toolconfigsyntax 18
XML syntax errors XML Tags Check your xml with Firefox / Chrome 19
XML Tags <command> If your script/binary is in the $PATH <command interpreter=bash> If your script/binary is in the XML directory Ex: $ which trinity /usr/local/bin/trinity Ex : $ ls tools/abims/prinseq/ prinseq.xml prinseq_wrapper.py <command> </command> trinity $input $out <command interpreter=bash> </command> wrapper.py $input $out 20
XML Tags <inputs> <param format="sam,bam" name="bamorsamfile" type="data" label="alignments in BAM or SAM format" help= --input: The set of aligned reads." /> <param name="size" size="4" type="integer" value="1" label="size (-S)" /> <param name="xlab" size="30" type="text" value="v1" label="label for x axis (-xlab)"> </param> <validator type="empty_field"/> <param name="foo" type="text" area="true" size="5x25" /> Recommendations: - Keep the name of the original parameters (ex: --mode R) 21
XML Tags <inputs> <param format="sam,bam" name="bamorsamfile" type="data" label="alignments in BAM or SAM format" help= --input: The set of aligned reads." /> <param name="size" size="4" type="integer" value="1" label="size (-S)" /> <param name="xlab" size="30" type="text" value="v1" label="label for x axis (-xlab)"/> <param name="foo" type="text" area="true" size="5x25" /> Recommendations: - Keep the name of the original parameters (ex: --mode R) 22
XML Tags <inputs> <param format="sam,bam" name="bamorsamfile" type="data" label="alignments in BAM or SAM format" help= --input: The set of aligned reads." /> <param name="size" size="4" type="integer" value="1" label="size (-S)" /> <param name="xlab" size="30" type="text" value="v1" label="label for x axis (-xlab)"/> <param name="foo" type="text" area="true" size="5x25" /> GREP Recommendations: - Keep the name of the original parameters (ex: --mode R) 23
XML Tags <outputs> Ex: <data [ ] label=""> label="${tool.name} on ${on_string} (mapped reads in BAM format) label="${input.name}_good.${input.ext}" label="${tool.name} on ${on_string} (#if str( $input_control_file )!= 'None' then ''.join( map( str, [ 'test-w', $window_size, '-G',$gap_size, '-FDR', $error_cut_off, '- islandfiltered-normalized.wig' ] ) ) else ''.join( map( str, [ 'test-w', $window_size, '-G', $gap_size, '-E', $error_cut_off, '-islandfiltered-normalized.wig' ] ) ) #)" 24
<help> in restructuredtext Markup <help> XML Tags ======================= Help!!! PLEEASSEEEEEEE ======================= Help!!! PLEEASSEEEEEE +------------+------------+-------------+-----+ colonne1 colonne 2 col3 etc +============+============+=============+=====+ element1.1 element1.2 el1.3... +------------+------------+-------------+-----+ element2.1 elem 2.2 el1.3... +------------+------------+-------------+-----+ elem3.1 3.2 el1.3... +------------+------------+-------------+-----+.. image:: help_me.png </help> http://wiki.sb-roscoff.fr/ifb/index.php/xml_help_tag http://docutils.sourceforge.net/docs/ref/rst/restructuredtext.html 25
BASIC: CONFIG FILES.XML 26
tool_conf.xml <toolbox> <!-- COMMON TOOLS --> <label text="common TOOLS" id="common" /> [ ] <section id="filter" name="filter and Sort"> <tool file="stats/filtering.xml" /> <tool file="filters/sorter.xml" /> <tool file="filters/grep.xml" /> <label id="gff" text="gff" /> <tool file="filters/gff/extract_gff_features.xml" /> <tool file="filters/gff/gff_filter_by_attribute.xml" /> <tool file="filters/gff/gff_filter_by_feature_count.xml" /> <tool file="filters/gff/gtf_filter_by_attribute_values_list.xml" /> </section> </toolbox> integrated_tool_panel.xml 27
BASIC: INTEGRATION 28
Steps to integrate a tools 1/2 1. Install the software (bin or scripts) on the cluster 2. Write the wrapper_xml in the directory tools/ 3. Declare your wrapper_xml to galaxy in the tool_conf.xml 4. Restart Galaxy 29
ADVANCED 30
Layers bowtie2-wrapper.xml OS bowtie2-wrapper.py bowtie2 <command> <inputs> <outputs> 31
Prerequisites Tools installed on the server bowtie2 A wrapper? To manage input arguments To format outputs names To package outputs in zip or HTML page For other tricky situation bowtie2-wrapper.py bowtie2 32
ADVANCED: TOOL SYNTAX 33
<tool> tag set <description> tag set <version_command> tag set <command> tag set <inputs> tag set <repeat> tag set <conditional> tag set <when> tag set <param> tag set <validator> tag set <option> tag set <options> tag set <column> tag set <filter> tag set <request_param_translation> tag set <request_param> tag set <append_param> tag set <value> tag set <value_translation> tag set <value> tag set <sanitizer> tag set <valid> tag set <add> and <remove> tag set <mapping> tag set <add> and <remove> tag set <configfiles> tag set <configfile> tag set <outputs> tag set <data> tag set <change_format> tag set <when> tag set ( change_format ) <actions> tag set <tests> tag set <test> tag set <param> tag set (functional tests) <output> tag set (functional tests) <assert_contents> tag set (functional tests) <page> tag set <code> tag set <requirements> tag set <requirement> tag set <stdio>, <regex>, and <exit_code> tag sets <exit_code> tag set <regex> tag set <help> tag set <citations> tag set https://wiki.galaxyproject.org/admin/tools/toolconfigsyntax 34
<command> cheetah code </command> XML Tags - advanced Cheetah is an open source template engine and code generation tool, written in Python. https://wiki.galaxyproject.org/admin/tools/toolconfigsyntax#reserved_variables http://www.cheetahtemplate.org/ 35
XML Tags - advanced <command> ## check for single/pair-end #if str( $type ) == "single" -U $input_1 #else -1 $input_1-2 $input_2 #if $unaligned_file --un-conc $output_unaligned_reads_l #end if #end if </command> <inputs> <param name="type" type="select" label="is this library mate-paired?"> <option value="single">single-end</option> <option value="paired">paired-end</option> </param> <param name="unaligned_file" type="boolean" truevalue="true" falsevalue="false" checked="false" label="write unaligned reads to separate file(s)" /> </inputs> http://www.cheetahtemplate.org/ 36
XML Tags - advanced <command> #for $i, $s in enumerate( $series ) rank_of_series=$i input_path=${s.input} x_colom=${s.xcol} y_colom=${s.ycol} #end for </command> <inputs> <repeat name="series" title="series"> <param name="input" type="data" format="tabular" label="dataset"/> <param name="xcol" type="data_column" data_ref="input" label="column for x axis"/> <param name="ycol" type="data_column" data_ref="input" label="column for y axis"/> </repeat> </inputs> http://www.cheetahtemplate.org/ 37
XML Tags - advanced <command interpreter="bash"> #if $input_format.format == "fasta": phyml_fasta_id_encode_wrapper.sh $input_format.input $output_tree #end if #if $input_format.format == "phylip": phyml.sh #end if </command> <inputs> <conditional name="input_format"> <param type="select" name="format" label="input format"> <option value="phylip">phylip</option> <option value="fasta">fasta</option> </param> <when value="phylip"> <param format="phylip" name="input" type="data" label="alignment in phylip format /> </when> <when value="fasta"> <param format="fasta" name="input" type="data" label="alignment in fasta format" /> </when> </conditional> </inputs> http://www.cheetahtemplate.org/ 38
XML Tags - advanced <command> #set $rod_binding_names = dict() #for $rod_binding in $rod_bind: #set $rod_binding_names[$rod_bind_name] = $rod_binding_names.get( $rod_bind_name, -1 ) + 1 #end for </command> <command interpreter= bash > pgm $input && mv outfile $output_tranches_pdf </command> ${variant.input_variants.ext} http://www.cheetahtemplate.org/ 39
XML Tags - advanced <stdio> <exit_code range=":-1" level="warning" /> <exit_code range="1:" level="fatal" /> <regex match="error:" level="fatal" /> <regex match="exception:" level="fatal" /> <regex match="exception:" level="fatal" /> <regex match="traceback" level="fatal" /> </stdio> $ ls foo foo $ echo $? 0 $ ls bar ls: impossible d'accéder à foo: Aucun fichier ou dossier de ce type $ echo $? 2 https://wiki.galaxyproject.org/admin/tools/toolconfigsyntax 40
XML Tags - advanced <configfiles> <command> like: allow cheetah code Can be used to: Create a config file Create a whole script or a module https://wiki.galaxyproject.org/admin/tools/toolconfigsyntax 41
XML Tags - advanced <configfiles> Ex: MrBayes <command>mb $script_nexus </command> <configfiles> set autoclose = yes; execute $input_data; <configfile name="script_nexus"> #if str( $data_type.type ) == "nuc #end if #if str( $data_type.lset_params.lset_use ) == "yes" lset nucmodel=$data_type.lset_params.lset_nucmodel nst=$data_type.lset_params.lset_nst; #end if mcmcp ngen=$mcmcp_ngen samplefreq=$mcmcp_samplefreq [ ]; mcmc; quit </configfiles> </configfile> set autoclose = yes; execute dataset_42.dat; lset nucmodel=4by4 nst=2 ; mcmcp ngen=100000; samplefreq=500; mcmc; quit https://wiki.galaxyproject.org/admin/tools/toolconfigsyntax 42
XML Tags - advanced <parallelism> (in beta since 02/2012) <parallelism method="basic"></parallelism> <parallelism method="multi" split_inputs="query" split_mode="to_size" split_size="1000" merge_outputs="output1"></parallelism> https://toolshed.g2.bx.psu.edu/view/devteam/ncbi_blast_plus 43
<requirements> XML Tags - advanced <requirement type="set_environment">r_script_path</requirement> <requirement type="package">gatk2</requirement> <requirement type="package" version="0.1.19">samtools</requirement> <requirement type="package" version="1.56.0">picard</requirement> <requirement type="set_environment">gatk2_path</requirement> <requirement type="set_environment">gatk2_site_options</requirement> <requirement type="package" version="1.7.1">numpy</requirement> </requirements> <version_command>bowtie2 --version</version_command> https://wiki.galaxyproject.org/admin/tools/toolconfigsyntax 44
<tests> Go Loraine 45
XML Tags - advanced TODO: Data collection https://docs.google.com/presentation/d/1d4pbulfe3ibsrvf3xswvp9gyact4snfhhufipysxya/edit#slide=id.g2d1564370_212 46
XML Tags - advanced <macro> blastn.xml blast.xml <token> <template> <xml> @JAR_PATH@ @THREADS@ #include source=$standard_blast_options# <expand macro="input_out_format" /> <macros> <command> <inputs> Blastp.xml <macros> <outputs> <command> <inputs> Blastx.xml <outputs> <macros> <command> <inputs> <outputs> https://toolshed.g2.bx.psu.edu/view/iuc/gatk2/2553f84b8174 47
TIPS & TRICKS 48
Deal with a lot of outputs Outil Info Info Info Fichier de Info Info Info résultats Info Info Info Info Go Gwen 49
Deal with a lot of outputs Outil Fichier de résultats Fichier HTML Info Info Info Info Info 50
Deal with a lot of outputs Outil Fichier de résultats Fichier HTML Info Info Info Info Info Wrapper sh Interface Galaxy / Wrapper xml 51
Deal with a lot of outputs Outil Fichier de résultats Fichier HTML Info Info Info Info Info Wrapper sh Interface Galaxy / Wrapper xml Transfert Path du dossier out grâce à l attribut : $out_html.files_path 52
Deal with a lot of outputs Outil Fichier de résultats Fichier HTML Info Info Info Info Info Déplace les fichiers info dans le dossier out et crée le fichier html avec des liens vers ces fichiers Wrapper sh Interface Galaxy / Wrapper xml Transfert Path du dossier out grâce à l attribut : $out_html.files_path 53
ADVANCED: CONFIG FILES.XML 54
Directory tree. config data_manager_conf.xml datatypes_conf.xml galaxy.ini job_conf.xml shed_data_manager_conf shed_tool_conf.xml shed_tool_data_table_conf.xml tool_conf.xml tool_data_table_conf.xml tool_shed.ini tool_sheds_conf.xml database files 000 dataset_10.dat dataset_13.dat 001 dataset_1033.dat dataset_1034.dat job_working_directory 000 1 2 001 1000 1001 pbs galaxy_1000.sh galaxy_1001.sh tmp tmp00dgyq tmp01bliz upload_file_data_xoizxk upload_file_data_xtjjoo integrated_tool_panel.xml paster.log paster.pid run_reports.sh run.sh run_tests.sh run_tool_shed.sh static welcome.html tool-data add_scores.loc alignseq.loc all_fasta.loc bfast_indexes.loc blastdb_p.loc shared ensembl gbrowse igv rviewer ucsc tools data_source genbank.py genbank.xml import.py import.xml ucsc_proxy.py ucsc_proxy.xml upload.py upload.xml filters grep.py grep.xml sorter.py sorter.xml trimmer.py trimmer.xml next_gen_conversion bwa_solid2fastq_modified.pl fastq_conversions.py fastq_conversions.xml fastq_gen_conv.py fastq_gen_conv.xml 55
tool_data_table_conf.xml tool_data_table_conf.xml <tables> <!-- Locations of indexes in the Bowtie2 mapper format --> <table name="bowtie2_indexes" comment_char="#"> <columns>value, dbkey, name, path</columns> <file path="tool-data/bowtie2_indices.loc" /> </table> tool-data/bowtie2_indices.loc # containing hg19 genome and hg19.*.bt2 files # <unique_build_id> <dbkey> <display_name> <file_base_path> hg19 hg19 Human (hg19) /db/hg19/bowtie2/hg19 bowtie2_wrapper.xml <param name="index" type="select" label="select a reference genome" > <options from_data_table="bowtie2_indexes"> <filter type="sort_by" column="2"/> <validator type="no_options" message="no indexes are available for the selected input dataset"/> </options> </param> 56
job_conf.xml job_conf.xml <job_conf> [ ] <destinations default="sge_default"> <destination id="thread4-men_free10" runner="sge"> <param id="nativespecification">-v -w n -q galaxy.q -R y -pe thread 4 -l mem_free=10g </param> </destination> </destinations> <tools> <tool id="ncbi_blastn_wrapper" destination="thread4-men_free10"/> </tools> ncbi_blastn_wrapper.xml <tool id="ncbi_blastn_wrapper" <command> blastn -query "$query" [ ] -num_threads "\${GALAXY_SLOTS:-8} [ ] </command> 57
job_conf.xml job_conf.xml <job_conf> [ ] <destinations default="sge_default"> <destination id="thread4-men_free10" runner="sge"> <param id="nativespecification">-v -w n -q galaxy.q -R y -pe thread 4 -l mem_free=10g </param> </destination> </destinations> <tools> <tool id="ncbi_blastn_wrapper" destination="thread4-men_free10"/> </tools> ncbi_blastn_wrapper.xml <tool id="ncbi_blastn_wrapper" <command> blastn -query "$query" [ ] -num_threads "\${GALAXY_SLOTS:-8} [ ] </command> 58
job_conf.xml job_conf.xml <job_conf> [ ] <destinations default="sge_default"> <destination id="thread4-men_free10" runner="sge"> <param id="nativespecification">-v -w n -q galaxy.q -R y -pe thread 4 -l mem_free=10g </param> </destination> </destinations> <tools> <tool id="ncbi_blastn_wrapper" destination="thread4-men_free10"/> </tools> ncbi_blastn_wrapper.xml <tool id="ncbi_blastn_wrapper" <command> blastn -query "$query" [ ] -num_threads "\${GALAXY_SLOTS:-8} [ ] </command> 59
job_conf.xml job_conf.xml <job_conf> [ ] <destinations default="sge_default"> <destination id="thread4-men_free10" runner="sge"> <param id="nativespecification">-v -w n -q galaxy.q -R y -pe thread 4 -l mem_free=10g </param> </destination> </destinations> <tools> <tool id="ncbi_blastn_wrapper" destination="thread4-men_free10"/> </tools> ncbi_blastn_wrapper.xml <tool id="ncbi_blastn_wrapper" <command> blastn -query "$query" [ ] -num_threads "\${GALAXY_SLOTS:-8} [ ] </command> 60
datatype_conf.xml Go Misharl 61
CONCLUSION 64
Conclusion <tool> tag set <description> tag set <version_command> tag set <command> tag set <inputs> tag set <repeat> tag set <conditional> tag set <when> tag set <param> tag set <validator> tag set <option> tag set <options> tag set <column> tag set <filter> tag set <request_param_translation> tag set <request_param> tag set <append_param> tag set <value> tag set <value_translation> tag set <value> tag set <sanitizer> tag set <valid> tag set <add> and <remove> tag set <mapping> tag set <add> and <remove> tag set <configfiles> tag set <configfile> tag set <outputs> tag set <data> tag set <change_format> tag set <when> tag set ( change_format ) <actions> tag set <tests> tag set <test> tag set <param> tag set (functional tests) <output> tag set (functional tests) <assert_contents> tag set (functional tests) <page> tag set <code> tag set <requirements> tag set <requirement> tag set <stdio>, <regex>, and <exit_code> tag sets <exit_code> tag set <regex> tag set <help> tag set <citations> tag set 65
Conclusion If you find one 66