See the building workflows page if you have not already. If you are looking for the documentation for a specific field, find the name of the field from the sidebar on the right. This page documents all of the schema fields of the workflow yaml, with the exception of input fields (under on.execute.inputs
) which can be found here.
jobs
The most essential part of the YAML configuration is jobs
, which defines the jobs this workflow will run. Each job must define a list of steps to be executed using the steps
field. By default, all jobs run in parallel. If a job needs to depend on the completion of another job, it can be specified using the needs
field. You must have at least one job containing at least a single step for a workflow to be valid. Job names must be unique within a workflow.
jobs
take the following shape in the YAML file:
jobs:
job-name:
steps:
- name: Step 1
run: echo step 1
- name: Step 2
run: echo step 2
jobs.<job>.needs
needs
defines a list of jobs that must complete before this job can start. If a job fails, all jobs that depend on it will not run.
In the following example, the job first
will run, and only after it successfully completes will the job second
run.
jobs:
first:
steps:
- run: echo runs first
second:
needs:
- first
steps:
- run: echo runs second
jobs.<job>.steps
A list of steps to be executed. These are executed in order, and if any step fails, the job will fail. Each step requires either a command to execute using the run
field, or another workflow to execute by using the uses
field. All other attributes are optional.
The cleanup
field at the job level is equivalent to the steps
field, except that cleanups will always run irrespective of whether a previous cleanup or step failed. Everything in cleanup
will be run after steps
.
You can define steps
and/or cleanup
with the attributes below.
jobs.<job>.steps[*].name
The name of the step.
Step names are useful when viewing a workflow's progress in the Runs tab. If a job fails at a certain step, you can quickly identify its name on the Runs graph.
If you don’t provide a name, the step's command will be shown instead.
jobs.<job>.steps[*].run
Defines the command to be executed, e.g. echo hello world
. This cannot be used in conjunction with uses
. It is possible to combine bash commands into a single step:
jobs:
main:
steps:
- run: |
echo part 1
echo part 2
jobs.<job>.steps[*].uses
Runs another workflow as a step, e.g. marketplace/<marketplace slug>
. This cannot be used in conjunction with run
. You may also use workflow/<workflow name>
to use one of your personal workflows. Additionally, when using marketplace/<marketplace slug>
, a version may be specified after another slash, like so: marketplace/<marketplace slug>/<version>
, where version probably looks somethng like v1.0.0
. If a version is not specificed, the latest version published on the marketplace is used.
uses
calls an existing workflow either from another workflow or from the Marketplace. To reference another workflow, prefix the workflow name with workflow/
. To reference a Marketplace workflow, prefix the marketplace workflow slug with marketplace/
.
If the workflow requires inputs to run, use with
to provide those inputs. The fields defined inside with
are the same names as defined in the workflow's inputs
section.
There are also two rules that must be followed when running workflows within workflows:
- A workflow may never call itself, even indirectly through another workflow.
- There is a limit to how deep the workflow call stack can get. Only up to 5 layers of workflows calling workflows are supported.
Attempts to run the workflow will fail immediately if these rules are violated.
jobs.<job>.steps[*].with
Defines the inputs passed to the workflow defined by uses
.
If you include uses
in the YAML file to call a workflow, it may have inputs that must be defined in order to run. In that case, with
is used to define those inputs. In the example below, we used the inputs from the default YAML configuration.
jobs:
job-name:
steps:
- name: Sample Step
uses: marketplace/default-local-workflow
with:
resource: sample-cluster
jobs.<job>.steps[*].ssh
jobs.<job>.steps[*].cleanup
Defines a cleanup command to be run after step execution finishes, e.g. rm -rf /tmp/myapp
.
This can be used to delete temporary files, terminate connections, remove credentials, or perform any any other clean up necessary when a job is shutting down. cleanup
always runs at the end of a job in reverse order of definition. Clean up steps always run regardless of if a job was successful, failed, or was canceled. If a step did not run, it's cleanup step will not run.
For example, if you have a job with three steps that all have cleanup
attributes, each cleanup
will add a clean up step to the end of the job and the job will ultimately execute steps in this order:
- step 1
run
- step 2
run
- step 3
run
- step 3
cleanup
- step 2
cleanup
- step 1
cleanup
jobs.<job>.steps[*].if
jobs.<job>.steps[*].retry
If a step has a certain chance of failure but is necessary for your workflow, using the retry
field is recommended. retry
is an object with the following properties:
max-retries
: The number of retries before giving up. Defaults to 0 without a retry object, and 10 with a retry object but no max-retries field.interval
: The amount of time to wait between retries. Defaults to 5s.timeout
: The amount of time to wait before giving up on an attempt. Defaults to 30s.
The supported units for interval
and timeout
are:
n
(nanoseconds)s
(seconds)m
(minutes)h
(hours)d
(days)
jobs.<job>.steps[*].env
jobs.<job>.steps[*].ignore-errors
If true, non-zero exit codes will not be counted as failures. By default, this attribute is set to false
.
jobs.<job>.steps[*].working-directory
jobs.<job>.steps[*].early-cancel
Conditions for early cancellation of the step. Right now, only early-cancel: any-job-failed
is supported, which cancels the step if any job fails before the step finishes running.
jobs.<job>.steps[*].timeout
jobs.<job>.ssh
If you want the workflow to execute commands on a remote host, using the ssh
field is recommended. ssh
is an object with the following properties:
remoteHost
: The ip address of the remote host.remoteUser
: The username to utilize when attempting to ssh to the remoteHost.jumpNodeHost
: The ip address of the (optional) jump node.jumpNodeUser
: The username to utilize when attempting to ssh to the jump node.
However, you generally will not need to use the remoteUser
and jumpNodeUser
fields, as they will be populated automatically. The step-level ssh
overwrites the job-level ssh
, and you may also pass ssh: null
(or just ssh:
with nothing else) at the step level to execute on the user workspace (the default behavior without ssh set).
Example:
jobs:
echo_on_remote:
ssh:
remoteHost: ${{ inputs.resource1.ip }}
jumpNodeHost: ${{ inputs.resource2.ip }}
steps:
- run: echo This is executed on the remote host!
- run: echo This is executed on the jump node!
ssh:
remoteHost: ${{ inputs.resource2.ip }}
- run: echo This is executed in the user workspace!
ssh: null
on:
execute:
inputs:
resource1:
label: Resource 1
type: compute-clusters
optional: false
resource2:
label: Resource 2
type: compute-clusters
optional: false
jobs.<job>.cleanup
jobs.<job>.if
if
prevents a job/step from running unless a conditional evaluates to true.
Example:
jobs:
main:
steps:
- run: echo hello world
- run: echo This ran because of an input!
if: ${{ inputs.should-run }}
extra:
if: ${{ inputs.should-run }}
steps:
- run: echo This ran because of an input!
on:
execute:
inputs:
should-run:
type: boolean
label: Run extra step + job?
In the above example, the second step and second job will only run if the user selects Yes on "Run extra step + job?".
The if
field also accepts ${{ always }}
, which for jobs ensures that it runs even when one of its dependencies failed, and for steps ensures that it runs even when a previous step in the job failed. Unlike cleanup
, steps with if: ${{ always }}
can be cancelled and are run in order rather than in reverse order. So in this example:
jobs:
main:
steps:
- run: fail
cleanup: echo fifth
- run: echo first
if: ${{ always }}
cleanup: echo fourth
- run: echo second
if: ${{ always }}
cleanup: echo third
cleanup:
- run: echo sixth
The commands will be executed in this order:
fail
echo first
echo second
echo third
echo fourth
echo fifth
echo sixth
jobs.<job>.env
jobs.<job>.working-directory
Defining this field changes the directory that run
commands are run in from the default, which is the job directory (~/pw/jobs/workflow-name/job-number/). If the path does not exist before the command is run, it will be created. Step-level working-directory
overwrites job-level.
jobs.<job>.timeout
app
This field is filled in with app cluster information on execution if it is initially set to null
; if it is included in the workflow yml (as app: null
) then the workflow is an app, otherwise if app
is not included in the yml the workflow is not an app. Here is barebones example:
app:
jobs:
job1:
ssh:
remoteHost: ${{ app.target.ip }}
steps:
- name: echo on remote host
run: echo hello
As shown in the example, app.target
can be used to fetch the cluster information, which we can use to connect easily over ssh. This workflow does have one weakness, however, which is that it cannot be run as a normal workflow because app.target
only works when running it as an app. We can fix this:
app:
jobs:
job1:
ssh:
remoteHost: ${{ inputs.remote_host.ip }}
steps:
- name: fake output
run: echo hello
'on':
execute:
inputs:
remote_host:
type: compute-clusters
default: ${{ app.target }}
By passing app.target
indirectly as an input, we can also handle the case where the workflow is not run as an app (through the run-workflow tab in the workflow page instead of from the apps page). In general, it is recommended to use an indirect pass, as shown in the second example.
sessions
This field is used to define the sessions that will be used when running the workflow. For an example usage, see Building Sessions.
sessions.<session>.type
What type of session to create (tunnel by default). The only possible options are tunnel and link.
sessions.<session>.openAI
If true, will mark the session as providing an OpenAI API, and connect it to the built-in chat interface.
sessions.<session>.prompt-for-name
If prompt-for-name
is null
, the user will be prompted to name the session before workflow execution. If prompt-for-name.default
is defined, the passed default will be used.
sessions.<session>.redirect
If true, the user will be redirected to this session once the workflow is executed. Only one session is allowed to have this set to true.
sessions.<session>.useTLS
If true, will use HTTPS to connect to the session. This should only be enabled if the app requires it.
sessions.<session>.useCustomDomain
If true, will use a custom domain to connect to the session. This should only be enabled if the app requires it.
github
If you want to copy a few files from github to your user workspace before running anything, github
is a useful field. It is an array of objects, and should have one object for each repository you need files from. Here is an example usage:
github:
- repo: https://github.com/parallelworks/interactive_session/
branch: main
path: checked_out
sparse_checkout:
- main.sh
- workflow/yamls/turbovnc
What this is doing is copying the main.sh file and workflow/yamls/turbovnc directory from the main branch of the interactive session repo to the path /job/directory/path/here/checked_out. Note that the job directory will look something like this:
job directory:
... other files ...
checked_out
main.sh
workflow
yamls
turbovnc
... turbovnc files ...
So the sparse checkout preserves the directory path of the copied files. You can also just copy over all of the files in the repo by not providing sparse_checkout
.
github:
- repo: https://github.com/parallelworks/interactive_session/
branch: main
Note that at this time, the github
field only copies files over to the usercontainer, and cannot copy any github files that require authentication.
github[*].repo
The repository to copy files from. Must be public.
github[*].branch
The branch of the repository to use when copying files.
github[*].path
A path in the job directory to place the checked out repository files. Defaults to "."
github[*].sparse_checkout
Files or subdirectories in the repository to be checked out instead of the entire repo. If this is not defined then the whole repository is copied.
permissions
Workflow runs can be granted additional access via permissions
. Adding the *
permission will allow the workflow to do anything a user would be able to. Without the *
permission the workflow will only be able to update any sessions it creates.
permissions: ["*"]
jobs:
main:
steps:
- name: Print buckets
run: pw buckets ls
Without the permission, pw buckets ls
would not allow the workflow to see all of the buckets.
configurations
Configurations are saved inputs that can be used when running a workflow, as opposed to manually filling out the form, to save time and ensure consistency. Configurations can be saved by users of a workflow, but can also be defined in the workflow yaml file. Here is a basic example of a definition in yaml:
configurations:
config_1:
variables:
input_1: hello!
jobs:
echo:
steps:
- run: echo ${{ inputs.input_1 }}
'on':
execute:
inputs:
input_1:
type: string
You can save a config for personal use by pressing the Save Config button after filling out the workflow form with the inputs you wish to save. In the example below, we save with input_1 = "goodbye!"
:
Both built-in and personal configurations can be viewed in the configuration tab:
If you wish to execute using a configuration, you can select a configuration from the dropdown in the top right on the Run Workflow tab, which will fill the inputs with the saved configuration values, then press Execute as usual.
env
Defines environment variables to be set. Note that variables set at the step level overwrite job level env variables, which overwrite global env variables.
Example:
env:
foo: a
jobs:
job-name:
env:
foo: b
steps:
- name: Print an environment variable
run: echo $foo
env:
foo: c
The above example will print c
.
timeout
Defines a maximum amount of time for a workflow/job/step to run.
Supported units are:
n
(nanoseconds)s
(seconds)m
(minutes)h
(hours)d
(days)
Example:
timeout: 1d
jobs:
main:
timeout: 1m
steps:
- name: Sample Step
run: sleep 30
timeout: 10s
In the example above, the sleep command finishes after 1 minute. We set the step timeout
attribute to 10 seconds. Since the step has not completed in 10 seconds, the job will fail. Without a timeout
value, the step will run until it finishes. Note that timeouts at all levels are applied rather than overwritten, so if the job timeout was 5s
, the command would be cancelled after 5 seconds instead of 10.
on
Use the on
field to define the event that triggers the workflow. In a future release, the on
field will support additional events. At this time, workflows only support the execute
event, which is triggered when the workflow is manually executed via the UI or the REST API. When the workflow is manually triggered, the inputs
context is populated with values from the input form.