Parallel Works

See the building workflows page if you have not already. If you are looking for the documentation for a specific field, find the name of the field from the sidebar on the right. This page documents all of the schema fields of the workflow yaml, with the exception of input fields (under on.execute.inputs) which can be found here.

`jobs`

The most essential part of the YAML configuration is jobs , which defines the jobs this workflow will run. Each job must define a list of steps to be executed using the steps field. By default, all jobs run in parallel. If a job needs to depend on the completion of another job, it can be specified using the needs field. You must have at least one job containing at least a single step for a workflow to be valid. Job names must be unique within a workflow.

jobs take the following shape in the YAML file:

jobs:
  job-name:
    steps:
      - name: Step 1
        run: echo step 1
      - name: Step 2
        run: echo step 2

`jobs.<job>.needs`

needs defines a list of jobs that must complete before this job can start. If a job fails, all jobs that depend on it will not run.

In the following example, the job first will run, and only after it successfully completes will the job second run.

jobs:
  first:
    steps:
      - run: echo runs first
  second:
    needs:
      - first
    steps:
      - run: echo runs second

`jobs.<job>.steps`

A list of steps to be executed. These are executed in order, and if any step fails, the job will fail. Each step requires either a command to execute using the run field, or another workflow to execute by using the uses field. All other attributes are optional.

The cleanup field at the job level is equivalent to the steps field, except that cleanups will always run irrespective of whether a previous cleanup or step failed. Everything in cleanup will be run after steps.

You can define steps and/or cleanup with the attributes below.

`jobs.<job>.steps[*].name`

The name of the step.

Step names are useful when viewing a workflow's progress in the Runs tab. If a job fails at a certain step, you can quickly identify its name on the Runs graph.

If you don’t provide a name, the step's command will be shown instead.

`jobs.<job>.steps[*].run`

Defines the command to be executed, e.g. echo hello world. This cannot be used in conjunction with uses. It is possible to combine bash commands into a single step:

jobs:
  main:
    steps:
      - run: |
          echo part 1
          echo part 2

`jobs.<job>.steps[*].uses`

Runs another workflow as a step, e.g. marketplace/<marketplace slug>. This cannot be used in conjunction with run. You may also use workflow/<workflow name> to use one of your personal workflows. Additionally, when using marketplace/<marketplace slug>, a version may be specified after another slash, like so: marketplace/<marketplace slug>/<version>, where version probably looks somethng like v1.0.0. If a version is not specificed, the latest version published on the marketplace is used.

uses calls an existing workflow either from another workflow or from the Marketplace. To reference another workflow, prefix the workflow name with workflow/. To reference a Marketplace workflow, prefix the marketplace workflow slug with marketplace/.

If the workflow requires inputs to run, use with to provide those inputs. The fields defined inside with are the same names as defined in the workflow's inputs section.

There are also two rules that must be followed when running workflows within workflows:

A workflow may never call itself, even indirectly through another workflow.
There is a limit to how deep the workflow call stack can get. Only up to 5 layers of workflows calling workflows are supported.

Attempts to run the workflow will fail immediately if these rules are violated.

`jobs.<job>.steps[*].with`

Defines the inputs passed to the workflow defined by uses.

If you include uses in the YAML file to call a workflow, it may have inputs that must be defined in order to run. In that case, with is used to define those inputs. In the example below, we used the inputs from the default YAML configuration.

jobs:
  job-name:
    steps:
      - name: Sample Step
        uses: marketplace/default-local-workflow
        with:
          resource: sample-cluster

`jobs.<job>.steps[*].ssh`

`jobs.<job>.steps[*].cleanup`

Defines a cleanup command to be run after step execution finishes, e.g. rm -rf /tmp/myapp.

This can be used to delete temporary files, terminate connections, remove credentials, or perform any any other clean up necessary when a job is shutting down. cleanup always runs at the end of a job in reverse order of definition. Clean up steps always run regardless of if a job was successful, failed, or was canceled. If a step did not run, it's cleanup step will not run.

For example, if you have a job with three steps that all have cleanup attributes, each cleanup will add a clean up step to the end of the job and the job will ultimately execute steps in this order:

step 1 run
step 2 run
step 3 run
step 3 cleanup
step 2 cleanup
step 1 cleanup

`jobs.<job>.steps[*].if`

`jobs.<job>.steps[*].retry`

If a step has a certain chance of failure but is necessary for your workflow, using the retry field is recommended. retry is an object with the following properties:

max-retries: The number of retries before giving up. Defaults to 0 without a retry object, and 10 with a retry object but no max-retries field.
interval: The amount of time to wait between retries. Defaults to 5s.
timeout: The amount of time to wait before giving up on an attempt. Defaults to 30s.

The supported units for interval and timeout are:

n (nanoseconds)
s (seconds)
m (minutes)
h (hours)
d (days)

`jobs.<job>.steps[*].env`

`jobs.<job>.steps[*].ignore-errors`

If true, non-zero exit codes will not be counted as failures. By default, this attribute is set to false.

`jobs.<job>.steps[*].working-directory`

`jobs.<job>.steps[*].early-cancel`

Conditions for early cancellation of the step. Right now, only early-cancel: any-job-failed is supported, which cancels the step if any job fails before the step finishes running.

`jobs.<job>.steps[*].timeout`

`jobs.<job>.ssh`

If you want the workflow to execute commands on a remote host, using the ssh field is recommended. ssh is an object with the following properties:

remoteHost: The ip address of the remote host.
remoteUser: The username to utilize when attempting to ssh to the remoteHost.
jumpNodeHost: The ip address of the (optional) jump node.
jumpNodeUser: The username to utilize when attempting to ssh to the jump node.

However, you generally will not need to use the remoteUser and jumpNodeUser fields, as they will be populated automatically. The step-level ssh overwrites the job-level ssh, and you may also pass ssh: null (or just ssh: with nothing else) at the step level to execute on the user workspace (the default behavior without ssh set).

Example:

jobs:
  echo_on_remote:
    ssh:
      remoteHost: ${{ inputs.resource1.ip }}
      jumpNodeHost: ${{ inputs.resource2.ip }}
    steps:
      - run: echo This is executed on the remote host!
      - run: echo This is executed on the jump node!
        ssh:
          remoteHost: ${{ inputs.resource2.ip }}
      - run: echo This is executed in the user workspace!
        ssh: null
on:
  execute:
    inputs:
      resource1:
        label: Resource 1
        type: compute-clusters
        optional: false
      resource2:
        label: Resource 2
        type: compute-clusters
        optional: false

`jobs.<job>.cleanup`

`jobs.<job>.if`

if prevents a job/step from running unless a conditional evaluates to true.

Example:

jobs:
  main:
    steps:
      - run: echo hello world
      - run: echo This ran because of an input!
        if: ${{ inputs.should-run }}
  extra:
    if: ${{ inputs.should-run }}
    steps:
      - run: echo This ran because of an input!
  
on:
  execute:
    inputs:
      should-run:
        type: boolean
        label: Run extra step + job?

In the above example, the second step and second job will only run if the user selects Yes on "Run extra step + job?".

The if field also accepts ${{ always }}, which for jobs ensures that it runs even when one of its dependencies failed, and for steps ensures that it runs even when a previous step in the job failed. Unlike cleanup, steps with if: ${{ always }} can be cancelled and are run in order rather than in reverse order. So in this example:

jobs:
  main:
    steps:
      - run: fail
        cleanup: echo fifth
      - run: echo first
        if: ${{ always }}
        cleanup: echo fourth
      - run: echo second
        if: ${{ always }}
        cleanup: echo third
    cleanup:
      - run: echo sixth

The commands will be executed in this order:

fail
echo first
echo second
echo third
echo fourth
echo fifth
echo sixth

`jobs.<job>.env`

`jobs.<job>.working-directory`

Defining this field changes the directory that run commands are run in from the default, which is the job directory (~/pw/jobs/workflow-name/job-number/). If the path does not exist before the command is run, it will be created. Step-level working-directory overwrites job-level.

`jobs.<job>.timeout`

`app`

This field is filled in with app cluster information on execution if it is initially set to null; if it is included in the workflow yml (as app: null) then the workflow is an app, otherwise if app is not included in the yml the workflow is not an app. Here is barebones example:

app: 
jobs:
  job1:
    ssh:
      remoteHost: ${{ app.target.ip }}
    steps:
      - name: echo on remote host
        run: echo hello

As shown in the example, app.target can be used to fetch the cluster information, which we can use to connect easily over ssh. This workflow does have one weakness, however, which is that it cannot be run as a normal workflow because app.target only works when running it as an app. We can fix this:

app: 
jobs:
  job1:
    ssh:
      remoteHost: ${{ inputs.remote_host.ip }}
    steps:
      - name: fake output
        run: echo hello
'on':
  execute:
    inputs:
      remote_host:
        type: compute-clusters
        default: ${{ app.target }}

By passing app.target indirectly as an input, we can also handle the case where the workflow is not run as an app (through the run-workflow tab in the workflow page instead of from the apps page). In general, it is recommended to use an indirect pass, as shown in the second example.

`sessions`

This field is used to define the sessions that will be used when running the workflow. For an example usage, see Building Sessions.

`sessions.<session>.type`

What type of session to create (tunnel by default). The only possible options are tunnel and link.

`sessions.<session>.openAI`

If true, will mark the session as providing an OpenAI API, and connect it to the built-in chat interface.

`sessions.<session>.prompt-for-name`

If prompt-for-name is null, the user will be prompted to name the session before workflow execution. If prompt-for-name.default is defined, the passed default will be used.

`sessions.<session>.redirect`

If true, the user will be redirected to this session once the workflow is executed. Only one session is allowed to have this set to true.

`sessions.<session>.useTLS`

If true, will use HTTPS to connect to the session. This should only be enabled if the app requires it.

`sessions.<session>.useCustomDomain`

If true, will use a custom domain to connect to the session. This should only be enabled if the app requires it.

`github`

If you want to copy a few files from github to your user workspace before running anything, github is a useful field. It is an array of objects, and should have one object for each repository you need files from. Here is an example usage:

github:
  - repo: https://github.com/parallelworks/interactive_session/
    branch: main
    path: checked_out
    sparse_checkout:
      - main.sh
      - workflow/yamls/turbovnc

What this is doing is copying the main.sh file and workflow/yamls/turbovnc directory from the main branch of the interactive session repo to the path /job/directory/path/here/checked_out. Note that the job directory will look something like this:

job directory:
  ... other files ...
  checked_out
    main.sh
    workflow
      yamls
        turbovnc
          ... turbovnc files ...

So the sparse checkout preserves the directory path of the copied files. You can also just copy over all of the files in the repo by not providing sparse_checkout.

github:
  - repo: https://github.com/parallelworks/interactive_session/
    branch: main

Note that at this time, the github field only copies files over to the usercontainer, and cannot copy any github files that require authentication.

`github[*].repo`

The repository to copy files from. Must be public.

`github[*].branch`

The branch of the repository to use when copying files.

`github[*].path`

A path in the job directory to place the checked out repository files. Defaults to "."

`github[*].sparse_checkout`

Files or subdirectories in the repository to be checked out instead of the entire repo. If this is not defined then the whole repository is copied.

`permissions`

Workflow runs can be granted additional access via permissions. Adding the * permission will allow the workflow to do anything a user would be able to. Without the * permission the workflow will only be able to update any sessions it creates.

permissions: ["*"]
jobs:
  main:
    steps:
      - name: Print buckets
        run: pw buckets ls

Without the permission, pw buckets ls would not allow the workflow to see all of the buckets.

`configurations`

Configurations are saved inputs that can be used when running a workflow, as opposed to manually filling out the form, to save time and ensure consistency. Configurations can be saved by users of a workflow, but can also be defined in the workflow yaml file. Here is a basic example of a definition in yaml:

configurations:
  config_1:
    variables:
      input_1: hello!
jobs:
  echo:
    steps:
      - run: echo ${{ inputs.input_1 }}
'on':
  execute:
    inputs:
      input_1:
        type: string

You can save a config for personal use by pressing the Save Config button after filling out the workflow form with the inputs you wish to save. In the example below, we save with input_1 = "goodbye!":

Screenshot of Saving Config

Both built-in and personal configurations can be viewed in the configuration tab:

Screenshot of Config Tab

If you wish to execute using a configuration, you can select a configuration from the dropdown in the top right on the Run Workflow tab, which will fill the inputs with the saved configuration values, then press Execute as usual.

Screenshot of Loading Config

`env`

Defines environment variables to be set. Note that variables set at the step level overwrite job level env variables, which overwrite global env variables.

Example:

env: 
  foo: a
jobs:
  job-name:
    env: 
      foo: b
    steps:
      - name: Print an environment variable
        run: echo $foo
        env:
          foo: c

The above example will print c.

`timeout`

Defines a maximum amount of time for a workflow/job/step to run.

Supported units are:

n (nanoseconds)
s (seconds)
m (minutes)
h (hours)
d (days)

Example:

timeout: 1d
jobs:
  main:
    timeout: 1m
    steps:
      - name: Sample Step
        run: sleep 30
        timeout: 10s

In the example above, the sleep command finishes after 1 minute. We set the step timeout attribute to 10 seconds. Since the step has not completed in 10 seconds, the job will fail. Without a timeout value, the step will run until it finishes. Note that timeouts at all levels are applied rather than overwritten, so if the job timeout was 5s, the command would be cancelled after 5 seconds instead of 10.

`on`

Use the on field to define the event that triggers the workflow. In a future release, the on field will support additional events. At this time, workflows only support the execute event, which is triggered when the workflow is manually executed via the UI or the REST API. When the workflow is manually triggered, the inputs context is populated with values from the input form.

jobs​

jobs.<job>.needs​

jobs.<job>.steps​

jobs.<job>.steps[*].name​

jobs.<job>.steps[*].run​

jobs.<job>.steps[*].uses​

jobs.<job>.steps[*].with​

jobs.<job>.steps[*].cleanup​

jobs.<job>.steps[*].retry​

jobs.<job>.steps[*].ignore-errors​

jobs.<job>.steps[*].early-cancel​

jobs.<job>.ssh​

jobs.<job>.if​

jobs.<job>.working-directory​

app​

sessions​

sessions.<session>.type​

sessions.<session>.openAI​

sessions.<session>.prompt-for-name​

sessions.<session>.redirect​

sessions.<session>.useTLS​

sessions.<session>.useCustomDomain​

github​

github[*].repo​

github[*].branch​

github[*].path​

github[*].sparse_checkout​

permissions​

configurations​

env​

timeout​

on​

`jobs`

`jobs.<job>.needs`

`jobs.<job>.steps`

`jobs.<job>.steps[*].name`

`jobs.<job>.steps[*].run`

`jobs.<job>.steps[*].uses`

`jobs.<job>.steps[*].with`

`jobs.<job>.steps[*].cleanup`

`jobs.<job>.steps[*].retry`

`jobs.<job>.steps[*].ignore-errors`

`jobs.<job>.steps[*].early-cancel`

`jobs.<job>.ssh`

`jobs.<job>.if`

`jobs.<job>.working-directory`

`app`

`sessions`

`sessions.<session>.type`

`sessions.<session>.openAI`

`sessions.<session>.prompt-for-name`

`sessions.<session>.redirect`

`sessions.<session>.useTLS`

`sessions.<session>.useCustomDomain`

`github`

`github[*].repo`

`github[*].branch`

`github[*].path`

`github[*].sparse_checkout`

`permissions`

`configurations`

`env`

`timeout`

`on`