Skip to main content

Monitoring Your Work

The PW platform features several data monitoring modules as well as a monitor dashboard to track your work.

Home

The Home page displays important data at a glance.

Workflow Monitor

This module shows a snapshot of your most recently run workflows.

Screenshot of the Workflow Monitor module.

Data columns for your workflows include:

  • ID
  • Workflow
  • Status
  • Submitted
  • Runtime (Minutes)

If you click a workflow’s ID number or its name in the Workflow column, you’ll be taken to a more detailed view on the Workflows page.

A workflow’s Status can be Started, Running, Completed, Canceled, or Error.

The Workflow Monitor also includes three important action buttons:

  • Cancel Run
  • Run Workflow Again
  • View Active Workflow

If you click the icon to re-run a workflow, you’ll be taken to the workflow’s configuration form on the Workflows page.

Resource Monitor

This module shows a summary of your node usage across your resources.

Screenshot of the Resource Monitor module.

Different resources are shown as white, green, and blue lines. You can mouse over data points to see corresponding resource names.

Storage Resources

This module shows your favorited storage resources and their status (active, starting, or stopped).

Screenshot of the Storage Resources module.

If you click the information icon, you’ll be taken to the storage’s configuration form on the Storage page.

My Compute Resources

This module shows your favorited resources and their status (active nodes and requested nodes if active or stopped if inactive).

Screenshot of the My Compute Resources module.

The navy bar reflects the number of maximum nodes a resource can have. In the screenshot above, the resource has the controller node and a partition that’s configured for 10 maximum nodes, for a total of 11 possible nodes.

The green bar reflects the number of active nodes on a compute resource. In the screenshot above, the resource is active but not running any jobs, so there is 1 active node (the controller).

If you click the gear icon, you’ll be taken to the resource’s Definition tab on the Clusters page.

If you click the information icon, you’ll be taken to the resource’s Sessions tab on the Clusters page.

Workflows

On the Workflows page, click a workflow. You’ll be taken to the workflow’s Jobs tab.

Screenshot of the Jobs tab on the Workflows page after clicking a workflow.

The Workflow Monitor here mirrors the Workflow Monitor from the Home page.

The Job logs module shows details about specific workflow sessions. When you navigate to this page, this module shows No log found until you click a job number in the ID column.

If you navigate to this page after clicking the eye icon on a running workflow, the Job logs module will show details for that active session.

You can save workflow logs by clicking the Download button.

Clusters

When you click on a cluster, you’ll be taken to the resource’s Sessions tab.

Screenshot of the Sessions tab on the Compute page after clicking a resource.

The monitor at the top of the page mirrors the My Compute Resources module from the Home page.

Active Nodes

This module shows details about nodes that are currently running on your resource, including:

  • Node ID
  • Public IP Address
  • Private IP Address
  • Node Runtime

Sessions

This module shows details about all your sessions with this resource, including:

  • Session
  • Status
  • Health Check
  • Creation Time
  • Deletion Time
  • Dashboards

If you click a number in the Session column, the Logs module will change to reflect information from that session.

If you click the icon in the Dashboard column, you’ll be taken to the cost dashboard. If you click the icon, you’ll be taken to the monitor dashboard.

Cost

This module displays cost information about your running cluster, including Cost By Type, Daily Cost, and Filtered Cost.

For detailed information about cost types, please see this section of our page Monitoring Costs.

Logs

This module features five tabs of detailed information, including:

  • Provision
  • Storages
  • Deletion
  • Scheduler
  • Health Check

The Provision log shows your resource’s provisioning process. If a resource has been provisioned successfully, you’ll see the message Tunnel established successfully, Controller IP is 12.345.678.90.

If a resource fails to provision, you’ll see an Error message. In that case, we suggest adjusting your resource’s configuration, then starting the resource again.

The Storages log shows the provisioning process for attached ephemeral storage resources, if any.

The Deletion log shows your resource’s deletion process after you turn it off. If a resource has been deleted successfully, you’ll see a message like 2023-07-10T17:15:34.721Z-INFO: delete() finished.

The Scheduler log shows your resource’s scheduled and completed Slurm jobs, if any.

The Health Check log shows details for an automated script that checks for provisioning and connection errors.

You can save any of these logs by clicking the Download button.

Storage

When you click on a persistent storage resource, you’ll be taken to the storage’s Sessions tab.

Screenshot of the Sessions tab on the Storage page after clicking a storage resource.

The Sessions module shows details about all your sessions with this storage resource, including:

  • Session
  • Status
  • Creation Time
  • Deletion Time

The Logs module shows details about specific storage sessions. When you navigate to this page, the Provision and Deletion tabs here show Log not found until you click a number in the Session column.

You can save storage logs by clicking the Download button.

Ephemeral Storage

Please note that ephemeral storage resources don’t have this page because they’re created and destroyed with a resource.

You can see more details about ephemeral storage resources by navigating to their attached resource and clicking on the Storages tab of the Logs module.

Monitor

In the Monitor category, there are three pages: Dashboard, Instances, and Cost.

Dashboard

The Dashboard page provides an overview of your work on the platform.

Screenshot of overview for the Dashboard tab on the Monitor page.

By default, the monitor dashboard displays the following data modules. You can change the view at any time; for more information, please see Filters below.

Note

Please note that you must use the Pool filter to select a resource before the monitor dashboard will display data.

Graphs

Average Lustre Filesystem GB

This graph shows how many gigabytes (GB) a resource has used for Lustre storage.

Please note that Lustre is optional, so this graph will not display any data if you haven’t used a Lustre storage resource (as seen in the screenshot above).

Load and Utilization

This graph shows a summary of a resource’s processing and memory usage.

You can mouse over the lines to see detailed information for:

  • Average CPU User
  • Average CPU System
  • Average Memory Used
  • Average Disk Used

This graph is most useful when looking at specific sessions for a resource.

Memory GB

This graph shows how many gigabytes (GB) a resource has used for memory. This type of memory is similar to RAM on a personal computer.

IO KB/s

This graph shows a resource’s data input and output in kilobytes per second (KB/s).

Nodes

This graph shows how many nodes a resource has used.

The Nodes graph is especially useful when paired with the other data modules. For example, you could adjust the number of nodes on your resource and check the IO KB/s graph to maximize your nodes’ efficiency for your workload.

Root Filesystem GB

This graph shows a resource’s root filesystem in gigabytes (GB). This type of memory is similar to a hard drive on a personal computer.

Tables

Worker Table

This table shows a summary of a resource’s history. You can click any of the fields at the top of the table to sort the data. Fields include:

  • Hostname
  • Project
  • Username
  • Private IP Address
  • Session Number
  • Public IP Address
  • Pool Name
  • Last Active
  • CPU User
  • Disk Used (Percentage)
  • Memory Used (Percentage)
  • Created At

Slurm Jobs

This table shows a summary of a resource’s jobs that have been submitted via Slurm. You can click any of the fields at the top of the table to sort the data. Field include:

  • State
  • Job ID
  • Project
  • Job Name
  • Username
  • Start Time
  • Cluster
  • End Time
  • Elapsed Seconds
  • Nodes
  • CPUs

Filters

The monitor dashboard includes the following options for filtering data:

  • Time
  • User
  • Pool
  • Session

Two filters must have options selected: Pool and Time. These filters are pinned to the top of the Monitor page. You can click either of these filters to change them.

To add additional filters, click Filter Options and select any filter from the list. Next, use the dropdown menu to select the filter parameter. All filter dropdown menus include a search bar for quickly finding parameters.

Please note that some filters are conditional. For example, you must select a Pool before you can select a Session.

Printing Data

You can save the data from the cost dashboard by downloading a copy.

Click Options, then Print.

A Print window will appear. Select the option for Save as PDF. Click Save.

The cost dashboard page will be downloaded as a PDF.

Instances

The Instances page shows your active and deleted clusters within the last hour.

Each instance listed here includes the following data:

  • Pool
  • Session
  • Region
  • Number of Running Instances
  • Started
  • Deleted
  • State

If a cluster is active, its State will show Running in yellow.

If a cluster has recently been shut down, its State will show Deleted in blue.

Screenshot of a running and a deleted cluster on the Instances page.

If you haven’t started or stopped a cluster within the last hour, the Instances tab will show the message No instances found.

Screenshot of a blank Instances tab on the Monitor page.

Cost

The Cost page shows your cost data across the platform. For more information about this page, please see Monitoring Costs.