How do I…

This section contains a number of smaller topics with links and examples meant to provide relatively concrete answers for specific tool development scenarios.

... deal with index/reference data?

Galaxy’s concept of data tables are meant to provide tools with access reference datasets or index data not tied to particular histories or users. A common example would be FASTA files for various genomes or mapper-specific indices of those files (e.g. a BWA index for the hg19 genome).

Galaxy data managers are specialized tools designed to populate tool data tables.

... cite tools without an obvious DOI?

In the absence of an obvious DOI, tools may contain embedded BibTeX directly.

Futher reading:

  • bibtex.xml (test tool with a bunch of random examples)

  • bwa-mem.xml (BWA-MEM tool by Anton Nekrutenko demonstrating citation of an arXiv article)

  • macros.xml (Macros for vcflib tool demonstrating citing a github repository)

... declare a Docker container for my tool?

Galaxy tools can be decorated to with container tags indicated Docker container ids that the tools can run inside of.

The longer term plan for the Tool Shed ecosystem is to be able to automatically build Docker containers for tool dependency descriptions and thereby obtain this Docker functionality for free and in a way that is completely backward compatible with non-Docker deployments.

Further reading:

... do extra validation of parameters?

Tool parameters support a validator element (syntax) to perform validation of a single parameter. More complex validation across parameters can be performed using arbitrary Python functions using the code file syntax but this feature should be used sparingly.

Further reading:

  • validator XML tag syntax on the Galaxy wiki.

  • fastq_filter.xml (a FASTQ filtering tool demonstrating validator constructs)

  • gffread.xml (a tool by Jim Johnson demonstrating using regular expressions with validator tags)

  • code_file.xml, code_file.py (test files demonstrating defining a simple constraint in Python across two parameters)

... check input type in command blocks?

Input data parameters may specify multiple formats. For example

<param name="input" type="data" format="fastq,fasta" label="Input" />

If the command-line under construction doesn’t require changes based on the input type - this may just be referenced as $input. However, if the command-line under construction uses different argument names depending on type for instance - it becomes important to dispatch on the underlying type.

In this example $input.ext - would return the short code for the actual datatype of the input supplied - for instance the string fasta or fastqsanger would be valid responses for inputs to this parameter for the above definition.

While .ext may sometimes be useful - there are many cases where it is inappropriate because of subtypes - checking if .ext is equal to fastq in the above example would not catch fastqsanger inputs for instance. To check if an input matches a type or any subtype thereof - the is_of_type method can be used. For instance

$input.is_of_type('fastq')

would check if the input is of type fastq or any derivative types such as fastqsanger.

... handle arbitrary output data formats?

If the output format of a tool’s output cannot be known ahead of time, Galaxy can be instructed to “sniff” the output and determine the data type using the same method used for uploads. Adding the auto_format="true" attribute to a tool’s output enables this.

<output name="out1" auto_format="true" label="Auto Output" />

... determine the user submitting a job?

The variable $__user_email__ (as well as $__user_name__ and $__user_id__) is available when building up your command in the tool’s <command> block. The following tool demonstrates the use of this and a few other special parameters available to all tools.

... test with multiple value inputs?

To write tests that supply multiple values to a multiple="true" select or data parameter - simply specify the multiple values as a comma seperated list.

Here are examples of each:

... test dataset collections?

Here are some examples of testing tools that consume collections with type="data_collection" parameters.

Here are some examples of testing tools that produce collections with output_collection elements.

... test discovered datasets?

Tools which dynamically discover datasets after the job is complete, either using the <discovered_datasets> element, the older default pattern approach (e.g. finding files with names like primary_DATASET_ID_sample1_true_bam_hg18), or the undocumented galaxy.json approach can be tested by placing discovered_dataset elements beneath the corresponding output element with the designation corresponding to the file to test.

<test>
  <param name="input" value="7" />
  <output name="report" file="example_output.html">
    <discovered_dataset designation="world1" file="world1.txt" />
    <discovered_dataset designation="world2">
      <assert_contents>
        <has_line line="World Contents" />
      </assert_contents>
    </discovered_dataset>
  </output>
</test>

The test examples distributed with Galaxy demonstrating dynamic discovery and the testing thereof include:

... test composite dataset contents?

Tools which consume Galaxy composite datatypes can generate test inputs using the composite_data element demonstrated by the following tool.

Tools which produce Galaxy composite datatypes can specify tests for the individual output files using the extra_files element demonstrated by the following tool.

... test index (.loc) data?

There is an idiom to supply test data for index during tests using Planemo.

To create this kind of test, one needs to provide a tool_data_table_conf.xml.test beside your tool’s tool_data_table_conf.xml.sample file that specifies paths to test .loc files which in turn define paths to the test index data. Both the .loc files and the tool_data_table_conf.xml.test can use the value ${__HERE__} which will be replaced with the path to the directory the file lives in. This allows using relative-like paths in these files which is needed for portable tests.

An example commit demonstrating the application of this approach to a Picard tool can be found here.

These tests can then be run with the Planemo test command.

... test exit codes?

A test element can check the exit code of the underlying job using the check_exit_code="n" attribute.

... test failure states?

Normally, all tool test cases described by a test element are expected to pass - but on can assert a job should fail by adding expect_failure="true" to the test element.

... test output filters work?

If your tool contains filter elements, you can’t verify properties of outputs that are filtered out and do not exist. The test element may contain an expect_num_outputs attribute to specify the expected number of outputs, this can be used to verify that outputs not listed are expected to be filtered out during tool execution.

... test metadata?

Output metadata can be checked using metadata elements in the XML description of the output.

... test tools installed in an existing Galaxy instance?

Do not use planemo, Galaxy should be used to test its tools directly. The following two commands can be used to test Galaxy tools in an existing instance.

$ sh run_tests.sh --report_file tool_tests_shed.html --installed

This above command specifies the --installed flag when calling run_tests.sh, this tells the test framework to test Tool Shed installed tools and only those tools.

$ GALAXY_TEST_TOOL_CONF=config/tool_conf.xml sh run_tests.sh --report_file tool_tests_tool_conf.html functional.test_toolbox

The second command sets GALAXY_TEST_TOOL_CONF environment variable, which will restrict the testing framework to considering a single tool conf file (such as the default tools that ship with Galaxy config/tool_conf.xml.sample and which must have their dependencies setup manually). The last argument to run_tests.sh, functional.test_toolbox tells the test framework to run all the tool tests in the configured tool conf file.

Note

Tip: To speed up tests you can use a pre-migrated database file the way Planemo does by setting the following environment variable before running run_tests.sh.

$ export GALAXY_TEST_DB_TEMPLATE="https://github.com/jmchilton/galaxy-downloads/raw/master/db_gx_rev_0127.sqlite"

... test tools against a package or container in a bioconda pull request?

First, obtain the artifacts of the PR by adding this comment: @BiocondaBot please fetch artifacts. In the reply one finds the links a zip file containing the built package and docker image. Download this zip and extract it. For the following let PACKAGES_DIR be the absolute path to the directory packages in the resulting unzipped directory and IMAGE_ZIP be the absolute path to the tar.gz file in the images directory in the unzipped directory.

In order to test the tool with the package add the following to the planemo call:

$ planemo test ... --conda_channels file://PACKAGES_DIR,conda-forge,bioconda,defaults ...

For containerized testing we need to differentiate two cases:

  1. the tool has a single requirement (that is fulfilled by the container)

  2. the tool has multiple requirements (in this case a docker image will be built on the fly using package)

For the former case the docker image that has been created by the bioconda CI needs to be loaded:

$ gzip -dc IMAGE_ZIP | docker load

and a planemo test can then simply use this image:

$ planemo test ... --biocontainers --no_dependency_resolution --no_conda_auto_init ...

For the later case it suffices to call planemo as follows:

$ planemo test ... --biocontainers --no_dependency_resolution --no_conda_auto_init --conda_channels file://PACKAGES_DIR,conda-forge,bioconda,defaults ...

... interactively debug tool tests?

It can be desirable to interactively debug a tool test. In order to do so, start planemo test with the option --no_cleanup. Inspect the output: After Galaxy starts up, the tests commence. At the start of each test one finds a message: ( <TOOL_ID> ) > Test-N. After some upload jobs, the actual tool job is started (it is the last before the next test is executed). There you will find a message like Built script [/tmp/tmp1zixgse3/job_working_directory/000/3/tool_script.sh]

In this case /tmp/tmp1zixgse3/job_working_directory/000/3/ is the job dir. It contains some files and directories of interest:

  • tool_script.sh: the bash script generated from the tool’s command and version_command tags plus some boiler plate code

  • galaxy_3.sh (note that the number may be different): a shell script setting up the environment (e.g. paths and environment variables), starting the tool_script.sh, and postprocessing (e.g. error handling and setting metadata)

  • working: the job working directory

  • outputs: a directory containing the job stderr and stdout

For a tool test that uses a conda environment to resolve the requirements one can simply change into working and execute ../tool_script.sh (works as long as no special environment variables are used; in this case ../galaxy_3.sh needs to be executed after cleaning the job dir). By editing the tool script one may understand/fix problems in the command block faster than by rerunning planemo test over and over again.

Alternatively one can change into the working dir and load the conda environment (the code to do so can be found in tool_script.sh: . PATH_TO_CONDA_ENV activate). Afterwards one can execute individual commands, e.g. those found in tool_script.sh or variants.

For a tool test that uses Docker to to resolve the requirements one needs to execute ../galaxy_3.sh, because it executes docker run ... tool_script.sh in order to rerun the job (with a possible edited version of the tool script). In order to run the docker container interactively execute the docker run .... /bin/bash that you find in ../galaxy_3.sh (i.e. ommitting the call of the tool_script.sh) with added parameter -it. Note that the docker run command contains some shell variables (-v "$_GALAXY_JOB_TMP_DIR:$_GALAXY_JOB_TMP_DIR:rw" -v "$_GALAXY_JOB_HOME_DIR:$_GALAXY_JOB_HOME_DIR:rw") which ensure that the job’s temporary and home directory are available within docker. Ideally these shell variables are set to the same values as in ../galaxy_3.sh, but often its sufficient to remove this part from the docker run call.