Building Galaxy Tools Using Planemo

This tutorial is a gentle introduction to writing Galaxy tools using Planemo. Please read the installation instructions for Planemo if you have not already installed it.

The Basics

This guide is going to demonstrate building up tools for commands from Heng Li’s Seqtk package - a package for processing sequence data in FASTA and FASTQ files.

To get started let’s install Seqtk. Here we are going to use conda to install Seqtk - but however you obtain it should be fine.

$ conda install --force --yes -c conda-forge -c bioconda seqtk=1.2
    ... seqtk installation ...
$ seqtk seq
        Usage:   seqtk seq [options] <in.fq>|<in.fa>
        Options: -q INT    mask bases with quality lower than INT [0]
                 -X INT    mask bases with quality higher than INT [255]
                 -n CHAR   masked bases converted to CHAR; 0 for lowercase [0]
                 -l INT    number of residues per line; 0 for 2^32-1 [0]
                 -Q INT    quality shift: ASCII-INT gives base quality [33]
                 -s INT    random seed (effective with -f) [11]
                 -f FLOAT  sample FLOAT fraction of sequences [1]
                 -M FILE   mask regions in BED or name list FILE [null]
                 -L INT    drop sequences with length shorter than INT [0]
                 -c        mask complement region (effective with -M)
                 -r        reverse complement
                 -A        force FASTA output (discard quality)
                 -C        drop comments at the header lines
                 -N        drop sequences containing ambiguous bases
                 -1        output the 2n-1 reads only
                 -2        output the 2n reads only
                 -V        shift quality by '(-Q) - 33'

Next we will download an example FASTQ file and test out the a simple Seqtk command - seq which converts FASTQ files into FASTA.

$ wget https://raw.githubusercontent.com/galaxyproject/galaxy-test-data/master/2.fastq
$ seqtk seq -A 2.fastq > 2.fasta
$ cat 2.fasta
>EAS54_6_R1_2_1_413_324
CCCTTCTTGTCTTCAGCGTTTCTCC
>EAS54_6_R1_2_1_540_792
TTGGCAGGCCAAGGCCGATGGATCA
>EAS54_6_R1_2_1_443_348
GTTGCTTCTGGCGTGGGTGGGGGGG

For fully featured Seqtk wrappers check out Helena Rasche’s wrappers on GitHub.

Galaxy tool files are just XML files, so at this point one could open a text editor and start writing the tool. Planemo has a command tool_init to quickly generate some of the boilerplate XML, so let’s start by doing that.

$ planemo tool_init --id 'seqtk_seq' --name 'Convert to FASTA (seqtk)'

The tool_init command can take various complex arguments - but the two most basic ones are shown above --id and --name. Every Galaxy tool needs an id (this is a short identifier used by Galaxy itself to identify the tool) and a name (this is displayed to the Galaxy user and should be a short description of the tool). A tool’s name can have whitespace but its id must not.

The above command will generate the file seqtk_seq.xml - which looks like this.

<tool id="seqtk_seq" name="Convert to FASTA (seqtk)" version="0.1.0">
    <requirements>
    </requirements>
    <command detect_errors="exit_code"><![CDATA[
        TODO: Fill in command template.
    ]]></command>
    <inputs>
    </inputs>
    <outputs>
    </outputs>
    <help><![CDATA[
        TODO: Fill in help.
    ]]></help>
</tool>

This tool file has the common sections required for a Galaxy tool but you will still need to open up the editor and fill out the command template, describe input parameters, tool outputs, write a help section, etc.

The tool_init command can do a little bit better than this as well. We can use the test command we tried above seqtk seq -A 2.fastq > 2.fasta as an example to generate a command block by specifing the inputs and the outputs as follows.

$ planemo tool_init --force \
                    --id 'seqtk_seq' \
                    --name 'Convert to FASTA (seqtk)' \
                    --requirement seqtk@1.2 \
                    --example_command 'seqtk seq -A 2.fastq > 2.fasta' \
                    --example_input 2.fastq \
                    --example_output 2.fasta

This will generate the following XML file - which now has correct definitions for the input and output as well as an actual command template.

<tool id="seqtk_seq" name="Convert to FASTA (seqtk)" version="0.1.0">
    <requirements>
        <requirement type="package" version="1.2">seqtk</requirement>
    </requirements>
    <command detect_errors="exit_code"><![CDATA[
        seqtk seq -A '$input1' > '$output1'
    ]]></command>
    <inputs>
        <param type="data" name="input1" format="fastq" />
    </inputs>
    <outputs>
        <data name="output1" format="fasta" />
    </outputs>
    <help><![CDATA[
        TODO: Fill in help.
    ]]></help>
</tool>

As shown at the beginning of this section, the command seqtk seq generates a help message for the seq command. tool_init can take that help message and stick it right in the generated tool file using the help_from_command option.

Generally command help messages aren’t exactly appropriate for tools since they mention argument names and simillar details that are abstracted away by the tool - but they can be an excellent place to start.

The following Planemo’s tool_init call has been enhanced to use --help_from_command.

$ planemo tool_init --force \
                    --id 'seqtk_seq' \
                    --name 'Convert to FASTA (seqtk)' \
                    --requirement seqtk@1.2 \
                    --example_command 'seqtk seq -A 2.fastq > 2.fasta' \
                    --example_input 2.fastq \
                    --example_output 2.fasta \
                    --test_case \
                    --cite_url 'https://github.com/lh3/seqtk' \
                    --help_from_command 'seqtk seq'

In addition to demonstrating --help_from_command, this demonstrates generating a test case from our example with --test_case and adding a citation for the underlying tool. The resulting tool XML file is:

<tool id="seqtk_seq" name="Convert to FASTA (seqtk)" version="0.1.0">
    <requirements>
        <requirement type="package" version="1.2">seqtk</requirement>
    </requirements>
    <command detect_errors="exit_code"><![CDATA[
        seqtk seq -A '$input1' > '$output1'
    ]]></command>
    <inputs>
        <param type="data" name="input1" format="fastq" />
    </inputs>
    <outputs>
        <data name="output1" format="fasta" />
    </outputs>
    <tests>
        <test>
            <param name="input1" value="2.fastq"/>
            <output name="output1" file="2.fasta"/>
        </test>
    </tests>
    <help><![CDATA[
        
Usage:   seqtk seq [options] <in.fq>|<in.fa>

Options: -q INT    mask bases with quality lower than INT [0]
         -X INT    mask bases with quality higher than INT [255]
         -n CHAR   masked bases converted to CHAR; 0 for lowercase [0]
         -l INT    number of residues per line; 0 for 2^32-1 [0]
         -Q INT    quality shift: ASCII-INT gives base quality [33]
         -s INT    random seed (effective with -f) [11]
         -f FLOAT  sample FLOAT fraction of sequences [1]
         -M FILE   mask regions in BED or name list FILE [null]
         -L INT    drop sequences with length shorter than INT [0]
         -c        mask complement region (effective with -M)
         -r        reverse complement
         -A        force FASTA output (discard quality)
         -C        drop comments at the header lines
         -N        drop sequences containing ambiguous bases
         -1        output the 2n-1 reads only
         -2        output the 2n reads only
         -V        shift quality by '(-Q) - 33'
         -U        convert all bases to uppercases
         -S        strip of white spaces in sequences


    ]]></help>
    <citations>
        <citation type="bibtex">
@misc{githubseqtk,
  author = {LastTODO, FirstTODO},
  year = {TODO},
  title = {seqtk},
  publisher = {GitHub},
  journal = {GitHub repository},
  url = {https://github.com/lh3/seqtk},
}</citation>
    </citations>
</tool>

At this point we have a fairly a functional Galaxy tool with test and help. This was a pretty simple example - usually you will need to put more work into the tool to get to this point - tool_init is really just designed to get you started.

Now lets lint and test the tool we have developed. The Planemo’s lint (or just l) command will review tool for XML validity, obvious mistakes, and compliance with IUC best practices.

$ planemo l
Linting tool /opt/galaxy/tools/seqtk_seq.xml
Applying linter tests... CHECK
.. CHECK: 1 test(s) found.
Applying linter output... CHECK
.. INFO: 1 outputs found.
Applying linter inputs... CHECK
.. INFO: Found 1 input parameters.
Applying linter help... CHECK
.. CHECK: Tool contains help section.
.. CHECK: Help contains valid reStructuredText.
Applying linter general... CHECK
.. CHECK: Tool defines a version [0.1.0].
.. CHECK: Tool defines a name [Convert to FASTA (seqtk)].
.. CHECK: Tool defines an id [seqtk_seq].
Applying linter command... CHECK
.. INFO: Tool contains a command.
Applying linter citations... CHECK
.. CHECK: Found 1 likely valid citations.

By default lint will find all the tools in your current working directory, but we could have specified a particular tool with planemo lint seqtk_seq.xml.

Next we can run our tool’s functional test with the test (or just t) command. This will print a lot of output (as it starts a Galaxy instance) but should ultimately reveal our one test passed.

$ planemo t
...
All 1 test(s) executed passed.
seqtk_seq[0]: passed

In addition to the in console display of test results as red (failing) or green (passing), Planemo also creates an HTML report for the test results by default. Many more test report options are available such --test_output_xunit which is useful in certain continuous integration environments. See planemo test --help for more options, as well as the test_reports command.

Now we can open Galaxy with the serve (or just s).

$ planemo s
...
serving on http://127.0.0.1:9090

Open up http://127.0.0.1:9090 in a web browser to view your new tool.

Serve and test can be passed various command line arguments such as --galaxy_root to specify a Galaxy instance to use (by default planemo will download and manage a instance just for planemo).

Simple Parameters

We have built a tool wrapper for the seqtk seq command - but this tool actually has additional options that we may wish to expose the Galaxy user.

Lets take a few of the parameters from the help command and build Galaxy param blocks to stick in the tool’s inputs block.

-V        shift quality by '(-Q) - 33'

In the previous section we saw param block of type data for input files, but there are many different kinds of parameters one can use. Flag parameters such as the above -V parameter are frequently represented by boolean parameters in Galaxy tool XML.

<param name="shift_quality" type="boolean" label="Shift quality"
       truevalue="-V" falsevalue=""
       help="shift quality by '(-Q) - 33' (-V)" />

We can then stick $shift_quality in our command block and if the user has selected this option it will be expanded as -V (since we have defined this as the truevalue). If the user hasn’t selected this option $shift_quality will just expand as an empty string and not affect the generated command line.

Now consider the following seqtk seq parameters:

-q INT    mask bases with quality lower than INT [0]
-X INT    mask bases with quality higher than INT [255]

These can be translated into Galaxy parameters as:

<param name="quality_min" type="integer" label="Mask bases with quality lower than"
       value="0" min="0" max="255" help="(-q)" />
<param name="quality_max" type="integer" label="Mask bases with quality higher than"
       value="255" min="0" max="255" help="(-X)" />

These can be add to the command tag as -q $quality_min -X $quality_max.

At this point the tool would look like:

<tool id="seqtk_seq" name="Convert to FASTA (seqtk)" version="0.1.0">
    <requirements>
        <requirement type="package" version="1.2">seqtk</requirement>
    </requirements>
    <command detect_errors="exit_code"><![CDATA[
        seqtk seq
              $shift_quality
              -q $quality_min
              -X $quality_max
              -a '$input1' > '$output1'
    ]]></command>
    <inputs>
        <param type="data" name="input1" format="fastq" />
        <param name="shift_quality" type="boolean" label="Shift quality" 
               truevalue="-V" falsevalue=""
               help="shift quality by '(-Q) - 33' (-V)" />
        <param name="quality_min" type="integer" label="Mask bases with quality lower than" 
               value="0" min="0" max="255" help="(-q)" />
        <param name="quality_max" type="integer" label="Mask bases with quality higher than" 
               value="255" min="0" max="255" help="(-X)" />
    </inputs>
    <outputs>
        <data name="output1" format="fasta" />
    </outputs>
    <tests>
        <test>
            <param name="input1" value="2.fastq"/>
            <output name="output1" file="2.fasta"/>
        </test>
    </tests>
    <help><![CDATA[
        
Usage:   seqtk seq [options] <in.fq>|<in.fa>

Options: -q INT    mask bases with quality lower than INT [0]
         -X INT    mask bases with quality higher than INT [255]
         -n CHAR   masked bases converted to CHAR; 0 for lowercase [0]
         -l INT    number of residues per line; 0 for 2^32-1 [0]
         -Q INT    quality shift: ASCII-INT gives base quality [33]
         -s INT    random seed (effective with -f) [11]
         -f FLOAT  sample FLOAT fraction of sequences [1]
         -M FILE   mask regions in BED or name list FILE [null]
         -L INT    drop sequences with length shorter than INT [0]
         -c        mask complement region (effective with -M)
         -r        reverse complement
         -A        force FASTA output (discard quality)
         -C        drop comments at the header lines
         -N        drop sequences containing ambiguous bases
         -1        output the 2n-1 reads only
         -2        output the 2n reads only
         -V        shift quality by '(-Q) - 33'
         -U        convert all bases to uppercases


    ]]></help>
    <citations>
        <citation type="bibtex">
@misc{githubseqtk,
  author = {LastTODO, FirstTODO},
  year = {TODO},
  title = {seqtk},
  publisher = {GitHub},
  journal = {GitHub repository},
  url = {https://github.com/lh3/seqtk},
}</citation>
    </citations>
</tool>

Conditional Parameters

The previous parameters were simple because they always appeared, now consider.

-M FILE   mask regions in BED or name list FILE [null]

We can mark this data type param as optional by adding the attribute optional="true".

<param name="mask_regions" type="data" label="Mask regions in BED"
       format="bed" help="(-M)" optional="true" />

Then instead of just using $mask_regions directly in the command block, one can wrap it in an if statement (because tool XML files support Cheetah).

#if $mask_regions
-M '$mask_regions'
#end if

Next consider the parameters:

-s INT    random seed (effective with -f) [11]
-f FLOAT  sample FLOAT fraction of sequences [1]

In this case, the -s random seed parameter should only be seen or used if the sample parameter is set. We can express this using a conditional block.

<conditional name="sample">
    <param name="sample_selector" type="boolean" label="Sample fraction of sequences" />
    <when value="true">
        <param name="fraction" label="Fraction" type="float" value="1.0"
               help="(-f)" />
        <param name="seed" label="Random seed" type="integer" value="11"
               help="(-s)" />
    </when>
    <when value="false">
    </when>
</conditional>

In our command block, we can again use an if statement to include these parameters.

#if $sample.sample_selector
-f $sample.fraction -s $sample.seed
#end if

Notice we must reference the parameters using the sample. prefix since they are defined within the sample conditional block.

The newest version of this tool is now

<tool id="seqtk_seq" name="Convert to FASTA (seqtk)" version="0.1.0">
    <requirements>
        <requirement type="package" version="1.2">seqtk</requirement>
    </requirements>
    <command detect_errors="exit_code"><![CDATA[
        seqtk seq
              $shift_quality
              -q $quality_min
              -X $quality_max
              #if $mask_regions
                  -M '$mask_regions'
              #end if
              #if $sample.sample
                  -f $sample.fraction
                  -s $sample.seed
              #end if
              -a '$input1' > '$output1'
    ]]></command>
    <inputs>
        <param type="data" name="input1" format="fastq" />
        <param name="shift_quality" type="boolean" label="Shift quality" 
               truevalue="-V" falsevalue=""
               help="shift quality by '(-Q) - 33' (-V)" />
        <param name="quality_min" type="integer" label="Mask bases with quality lower than" 
               value="0" min="0" max="255" help="(-q)" />
        <param name="quality_max" type="integer" label="Mask bases with quality higher than" 
               value="255" min="0" max="255" help="(-X)" />
        <param name="mask_regions" type="data" label="Mask regions in BED" 
               format="bed" help="(-M)" optional="true" />
        <conditional name="sample">
            <param name="sample" type="boolean" label="Sample fraction of sequences" />
            <when value="true">
                <param name="fraction" label="Fraction" type="float" value="1.0"
                       help="(-f)" />
                <param name="seed" label="Random seed" type="integer" value="11"
                       help="(-s)" />
            </when>
            <when value="false">
            </when>
        </conditional>
    </inputs>
    <outputs>
        <data name="output1" format="fasta" />
    </outputs>
    <tests>
        <test>
            <param name="input1" value="2.fastq"/>
            <output name="output1" file="2.fasta"/>
        </test>
    </tests>
    <help><![CDATA[
        
Usage:   seqtk seq [options] <in.fq>|<in.fa>

Options: -q INT    mask bases with quality lower than INT [0]
         -X INT    mask bases with quality higher than INT [255]
         -n CHAR   masked bases converted to CHAR; 0 for lowercase [0]
         -l INT    number of residues per line; 0 for 2^32-1 [0]
         -Q INT    quality shift: ASCII-INT gives base quality [33]
         -s INT    random seed (effective with -f) [11]
         -f FLOAT  sample FLOAT fraction of sequences [1]
         -M FILE   mask regions in BED or name list FILE [null]
         -L INT    drop sequences with length shorter than INT [0]
         -c        mask complement region (effective with -M)
         -r        reverse complement
         -A        force FASTA output (discard quality)
         -C        drop comments at the header lines
         -N        drop sequences containing ambiguous bases
         -1        output the 2n-1 reads only
         -2        output the 2n reads only
         -V        shift quality by '(-Q) - 33'
         -U        convert all bases to uppercases


    ]]></help>
    <citations>
        <citation type="bibtex">
@misc{githubseqtk,
  author = {LastTODO, FirstTODO},
  year = {TODO},
  title = {seqtk},
  publisher = {GitHub},
  journal = {GitHub repository},
  url = {https://github.com/lh3/seqtk},
}</citation>
    </citations>
</tool>

For tools like this where there are many options but in the most uses the defaults are preferred - a common idiom is to break the parameters into simple and advanced sections using a conditional.

Updating this tool to use that idiom might look as follows.

<tool id="seqtk_seq" name="Convert to FASTA (seqtk)" version="0.1.0">
    <requirements>
        <requirement type="package" version="1.2">seqtk</requirement>
    </requirements>
    <command detect_errors="exit_code"><![CDATA[
        seqtk seq
              #if $settings.advanced == "advanced"
                  $settings.shift_quality
                  -q $settings.quality_min
                  -X $settings.quality_max
                  #if $settings.mask_regions
                      -M '$settings.mask_regions'
                  #end if
                  #if $settings.sample.sample
                      -f $settings.sample.fraction
                      -s $settings.sample.seed
                  #end if
              #end if
              -a '$input1' > '$output1'
    ]]></command>
    <inputs>
        <param type="data" name="input1" format="fastq" />
        <conditional name="settings">
            <param name="advanced" type="select" label="Specify advanced parameters">
                <option value="simple" selected="true">No, use program defaults.</option>
                <option value="advanced">Yes, see full parameter list.</option>
            </param>
            <when value="simple">
            </when>
            <when value="advanced">
                <param name="shift_quality" type="boolean" label="Shift quality" 
                       truevalue="-V" falsevalue=""
                       help="shift quality by '(-Q) - 33' (-V)" />
                <param name="quality_min" type="integer" label="Mask bases with quality lower than" 
                       value="0" min="0" max="255" help="(-q)" />
                <param name="quality_max" type="integer" label="Mask bases with quality higher than" 
                       value="255" min="0" max="255" help="(-X)" />
                <param name="mask_regions" type="data" label="Mask regions in BED" 
                       format="bed" help="(-M)" optional="true" />
                <conditional name="sample">
                    <param name="sample" type="boolean" label="Sample fraction of sequences" />
                    <when value="true">
                        <param name="fraction" label="Fraction" type="float" value="1.0"
                               help="(-f)" />
                        <param name="seed" label="Random seed" type="integer" value="11"
                               help="(-s)" />
                    </when>
                    <when value="false">
                    </when>
                </conditional>
            </when>
        </conditional>
    </inputs>
    <outputs>
        <data name="output1" format="fasta" />
    </outputs>
    <tests>
        <test>
            <param name="input1" value="2.fastq"/>
            <output name="output1" file="2.fasta"/>
        </test>
    </tests>
    <help><![CDATA[
        
Usage:   seqtk seq [options] <in.fq>|<in.fa>

Options: -q INT    mask bases with quality lower than INT [0]
         -X INT    mask bases with quality higher than INT [255]
         -n CHAR   masked bases converted to CHAR; 0 for lowercase [0]
         -l INT    number of residues per line; 0 for 2^32-1 [0]
         -Q INT    quality shift: ASCII-INT gives base quality [33]
         -s INT    random seed (effective with -f) [11]
         -f FLOAT  sample FLOAT fraction of sequences [1]
         -M FILE   mask regions in BED or name list FILE [null]
         -L INT    drop sequences with length shorter than INT [0]
         -c        mask complement region (effective with -M)
         -r        reverse complement
         -A        force FASTA output (discard quality)
         -C        drop comments at the header lines
         -N        drop sequences containing ambiguous bases
         -1        output the 2n-1 reads only
         -2        output the 2n reads only
         -V        shift quality by '(-Q) - 33'
         -U        convert all bases to uppercases


    ]]></help>
    <citations>
        <citation type="bibtex">
@misc{githubseqtk,
  author = {LastTODO, FirstTODO},
  year = {TODO},
  title = {seqtk},
  publisher = {GitHub},
  journal = {GitHub repository},
  url = {https://github.com/lh3/seqtk},
}</citation>
    </citations>
</tool>

Wrapping a Script

Many common bioinformatics applications are available on the Tool Shed already and so a common development task is to integrate scripts of various complexity into Galaxy.

Consider the following small Perl script.

#!/usr/bin/perl -w

# usage : perl toolExample.pl <FASTA file> <output file>

open (IN, "<$ARGV[0]");
open (OUT, ">$ARGV[1]");
while (<IN>) {
    chop;
    if (m/^>/) {
        s/^>//;
        if ($. > 1) {
            print OUT sprintf("%.3f", $gc/$length) . "\n";
        }
        $gc = 0;
        $length = 0;
    } else {
        ++$gc while m/[gc]/ig;
        $length += length $_;
    }
}
print OUT sprintf("%.3f", $gc/$length) . "\n";
close( IN );
close( OUT );

One can build a tool for this script as follows and place the script in the same directory as the tool XML file itself. The special value $__tool_directory__ here refers to the directory your tool lives in.

<tool id="gc_content" name="Compute GC content">
  <description>for each sequence in a file</description>
  <command>perl '$__tool_directory__/gc_content.pl' '$input' output.tsv</command>
  <inputs>
    <param name="input" type="data" format="fasta" label="Source file"/>
  </inputs>
  <outputs>
    <data name="output" format="tabular" from_work_dir="output.tsv" />
  </outputs>
  <help>
This tool computes GC content from a FASTA file.
  </help>
</tool>

Macros

If your desire is to write a tool for a single relatively simple application or script - this section should be skipped. If you hope to maintain a collection of related tools - experience suggests you will realize there is a lot of duplicated XML to do this well. Galaxy tool XML macros can help reduce this duplication.

Planemo’s tool_init command can be used to generate a macro file appropriate for suites of tools by using the --macros flag. Consider the following variant of the previous tool_init command (the only difference is now we are adding the --macros flag).

$ planemo tool_init --force \
                    --macros \
                    --id 'seqtk_seq' \
                    --name 'Convert to FASTA (seqtk)' \
                    --requirement seqtk@1.2 \
                    --example_command 'seqtk seq -A 2.fastq > 2.fasta' \
                    --example_input 2.fastq \
                    --example_output 2.fasta \
                    --test_case \
                    --help_from_command 'seqtk seq'

This will produce the two files in your current directory instead of just one - seqtk_seq.xml and macros.xml.

<tool id="seqtk_seq" name="Convert to FASTA (seqtk)" version="0.1.0">
    <macros>
        <import>macros.xml</import>
    </macros>
    <expand macro="requirements" />
    <command detect_errors="exit_code"><![CDATA[
        seqtk seq -A '$input1' > '$output1'
    ]]></command>
    <inputs>
        <param type="data" name="input1" format="fastq" />
    </inputs>
    <outputs>
        <data name="output1" format="fasta" />
    </outputs>
    <tests>
        <test>
            <param name="input1" value="2.fastq"/>
            <output name="output1" file="2.fasta"/>
        </test>
    </tests>
    <help><![CDATA[
        
Usage:   seqtk seq [options] <in.fq>|<in.fa>

Options: -q INT    mask bases with quality lower than INT [0]
         -X INT    mask bases with quality higher than INT [255]
         -n CHAR   masked bases converted to CHAR; 0 for lowercase [0]
         -l INT    number of residues per line; 0 for 2^32-1 [0]
         -Q INT    quality shift: ASCII-INT gives base quality [33]
         -s INT    random seed (effective with -f) [11]
         -f FLOAT  sample FLOAT fraction of sequences [1]
         -M FILE   mask regions in BED or name list FILE [null]
         -L INT    drop sequences with length shorter than INT [0]
         -c        mask complement region (effective with -M)
         -r        reverse complement
         -A        force FASTA output (discard quality)
         -C        drop comments at the header lines
         -N        drop sequences containing ambiguous bases
         -1        output the 2n-1 reads only
         -2        output the 2n reads only
         -V        shift quality by '(-Q) - 33'


    ]]></help>
    <expand macro="citations" />
</tool>
<macros>
    <xml name="requirements">
        <requirements>
        <requirement type="package" version="1.2">seqtk</requirement>
            <yield/>
        </requirements>
    </xml>
    <xml name="citations">
        <citations>
            <yield />
        </citations>
    </xml>
</macros>

As you can see in the above code macros are reusable chunks of XML that make it easier to avoid duplication and keep your XML concise.

Further reading:

Creating Galaxy tools from python scripts using argparse

If a python script should be wrapped that creates its command line interface using the argparse library, then planemo can generate a good starting point.

All you need to do is to pass the path to the python source file containing the argparse definition. Lets use tests/data/autopygen/autopygen_end_to_end_sub.py from the planemo tests as an example:

$ planemo tool_init -i e2e -n e2e
                    --autopygen tests/data/autopygen/autopygen_end_to_end_sub.py
                    -- tool tool.xml

Here we only provided the id and name in addition to the python sources and planemo creates a pretty tool xml files that can serve as a good starting point to create a functional Galaxy tool in tool.xml. Additional information, e.g. requirements, help, etc, can and should be given with additional parameters to planemo.

There are a few points that need to be edited in any case. Most importantly since in most cases argparse does not distinguish between input and output files all file parameters will be rendered as input parameters of the tool.

Please open an issue if you have ideas on how to improve the generated tools: https://github.com/galaxyproject/planemo/issues

More Information