I was using the provenance plugin to create a Workflow Run RO-Crate to store the provenance information of a nextflow workflow. If tried the for me relevant sparql queries on the results, but there are some issues and I'm not sure whether this is due to me using outdated queries, or if there are some missing features in the plugin (or I'm using the plugin with a wrong configuration), or there are still some compatibility issues.
cq4: What is the environment/container file used in a specific workflow execution step?
The query essentially searches for MediaObjects or buildInstructions, both of which are not present in the ROCrate. (I've added a conda directive to each process, so I expected this to be documented somewhere, see also cq.10), I also tried to search in the provenance plugin where this information is stored but was not successful.
cq5: How long does this workflow component take to run?
The query searches for an action that is executed by a tool (main.nf), though there the tool is a http://schema.org/SoftwareApplication, where in my ROCrate this is a https://bioschemas.org/ComputationalWorkflow. Which version is correct? (is one maybe outdated)?
cq8: "What are the inputs and outputs of the overall workflow?"
For the outputs, the query tries to find the CreateActions where the instument is a ComputationalWorkflow (main.nf). In our case, this createAction has no results, the results are only stored on the level of the individual processes, but those do not have an instrument, as such the output of the query is empty.
cq10: What is the script used to wrap up a software component?
The example query is searching for softwareRequirements, which is not present in the ROCrate. In our case we are interested in the actual script (with the complete list of arguments) that was executed in a workflow run and the conda environment/docker image this script was executed with.
The current setup is work in progress, on a more complex project, a nextflow workflow (under development) is here, the provenance file is attached. If it helps, we could setup a github action where this provenance file is created such that all information is available.
ro-crate-metadata.json
I was using the provenance plugin to create a Workflow Run RO-Crate to store the provenance information of a nextflow workflow. If tried the for me relevant sparql queries on the results, but there are some issues and I'm not sure whether this is due to me using outdated queries, or if there are some missing features in the plugin (or I'm using the plugin with a wrong configuration), or there are still some compatibility issues.
cq4: What is the environment/container file used in a specific workflow execution step?
The query essentially searches for MediaObjects or buildInstructions, both of which are not present in the ROCrate. (I've added a conda directive to each process, so I expected this to be documented somewhere, see also cq.10), I also tried to search in the provenance plugin where this information is stored but was not successful.
cq5: How long does this workflow component take to run?
The query searches for an action that is executed by a tool (main.nf), though there the tool is a http://schema.org/SoftwareApplication, where in my ROCrate this is a https://bioschemas.org/ComputationalWorkflow. Which version is correct? (is one maybe outdated)?
cq8: "What are the inputs and outputs of the overall workflow?"
For the outputs, the query tries to find the CreateActions where the instument is a ComputationalWorkflow (main.nf). In our case, this createAction has no results, the results are only stored on the level of the individual processes, but those do not have an instrument, as such the output of the query is empty.
cq10: What is the script used to wrap up a software component?
The example query is searching for softwareRequirements, which is not present in the ROCrate. In our case we are interested in the actual script (with the complete list of arguments) that was executed in a workflow run and the conda environment/docker image this script was executed with.
The current setup is work in progress, on a more complex project, a nextflow workflow (under development) is here, the provenance file is attached. If it helps, we could setup a github action where this provenance file is created such that all information is available.
ro-crate-metadata.json