Monitoring Your Work
The PW platform features several data monitoring modules as well as a monitor dashboard to track your work.
The Home Page
The Home page displays important data at a glance.
This module shows a snapshot of your most recently run workflows.
Data columns for your workflows include:
- Runtime (Minutes)
If you click a workflow’s ID number or its name in the Workflow column, you’ll be taken to a more detailed view on the Workflows page.
A workflow’s Status can be Started, Running, Completed, Canceled, or Error.
The Workflow Monitor also includes three important action buttons:
- Cancel Run
- Run Workflow Again
- View Active Workflow
If you click the icon to re-run a workflow, you’ll be taken to the workflow’s configuration form on the Workflows page.
This module shows a summary of your node usage across your resources.
Different resources are shown as white, green, and blue lines. You can mouse over data points to see corresponding resource names.
This module shows your favorited storage resources and their status (active, starting, or stopped).
If you click the information icon, you’ll be taken to the storage’s configuration form on the Storage page.
My Computing Resources
This module shows your favorited resources and their status (active nodes and requested nodes if active or stopped if inactive).
The navy bar reflects the number of maximum nodes a resource can have. In the screenshot above, the resource has the controller node and a partition that’s configured for 10 maximum nodes, for a total of 11 possible nodes.
The green bar reflects the number of active nodes on a compute resource. In the screenshot above, the resource is active but not running any jobs, so there is 1 active node (the controller).
If you click the gear icon, you’ll be taken to the resource’s configuration form on the Resources page.
If you click the information icon, you’ll be taken to the resource’s Sessions tab on the Resources page.
The Compute Page
When you click on a compute resource here, you’ll be taken to the resource’s Sessions tab.
The monitor at the top of the page mirrors the My Computing Resources module from the Compute page.
This module shows details about nodes that are currently running on your resource, including:
- Node ID
- Public IP Address
- Private IP Address
- Node Runtime
This module shows details about all your sessions with this resource, including:
- Health Check
- Creation Time
- Deletion Time
If you click a number in the Session column, the Logs module will change to reflect information from that session.
This module features five tabs of detailed information, including:
- Health Check
The Provision log shows your resource’s provisioning process. If a resource has been provisioned successfully, you’ll see the message
Tunnel established successfully, Controller IP is 12.345.678.90.
If a resource fails to provision, you’ll see an
Error message. In that case, we suggest adjusting your resource’s configuration, then starting the resource again.
The Storages log shows the provisioning process for attached ephemeral storage resources, if any.
The Deletion log shows your resource’s deletion process after you turn it off. If a resource has been deleted successfully, you’ll see a message like
2023-07-10T17:15:34.721Z-INFO: delete() finished.
The Scheduler log shows your resource’s scheduled and completed Slurm jobs, if any.
The Health Check log shows details for an automated script that checks for provisioning and connection errors.
You can save any of these logs by clicking the Download button.
The Workflows Page
When you click on a workflow here, you’ll be taken to the workflow’s Jobs tab.
The Workflow Monitor here mirrors the Workflow Monitor from the Compute page.
The Job logs module shows details about specific workflow sessions. When you navigate to this page, this module shows
No log found until you click a job number in the ID column.
If you navigate to this page after clicking the eye icon on a running workflow, the Job logs module will show details for that active session.
You can save workflow logs by clicking the Download button.
The Storage Page
When you click on a persistent storage resource here, you’ll be taken to the storage’s Sessions tab.
The Sessions module shows details about all your sessions with this storage resource, including:
- Creation Time
- Deletion Time
The Logs module shows details about specific storage sessions. When you navigate to this page, the Provision and Deletion tabs here show
Log not found until you click a number in the Session column.
You can save storage logs by clicking the Download button.
Please note that ephemeral storage resources don’t have this page because they’re created and destroyed with a resource.
You can see more details about ephemeral storage resources by navigating to their attached resource and clicking on the Storages tab of the Logs module.
The Monitor Page
When you navigate to the Monitor page, there are two tabs: Instances and Dashboard.
The Instances tab is the landing page for monitoring your active and deleted clusters within the last hour.
Each instance listed here includes the following data:
- Number of Running Instances
If a cluster is active, its State will show Running in yellow.
If a cluster has recently been shut down, its State will show Deleted in blue.
If you haven’t started or stopped a cluster within the last hour, the Instances tab will show the message No instances found.
The monitor dashboard provides an overview of your work on the platform.
Please note that you must use the Pool filter to select a resource before the monitor dashboard will display data.
Average Lustre Filesystem GB
This graph shows how many gigabytes (GB) a resource has used for Lustre storage.
Please note that Lustre is optional, so this graph will not display any data if you haven’t used a Lustre storage resource (as seen in the screenshot above).
Load and Utilization
This graph shows a summary of a resource’s processing and memory usage.
You can mouse over the lines to see detailed information for:
- Average CPU User
- Average CPU System
- Average Memory Used
- Average Disk Used
This graph is most useful when looking at specific sessions for a resource.
This graph shows how many gigabytes (GB) a resource has used for memory. This type of memory is similar to RAM on a personal computer.
This graph shows a resource’s data input and output in kilobytes per second (KB/s).
This graph shows how many nodes a resource has used.
The Nodes graph is especially useful when paired with the other data modules. For example, you could adjust the number of nodes on your resource and check the IO KB/s graph to maximize your nodes’ efficiency for your workload.
Root Filesystem GB
This graph shows a resource’s root filesystem in gigabytes (GB). This type of memory is similar to a hard drive on a personal computer.
This table shows a summary of a resource’s history. You can click any of the fields at the top of the table to sort the data. Fields include:
- Private IP Address
- Session Number
- Public IP Address
- Pool Name
- Last Active
- CPU User
- Disk Used (Percentage)
- Memory Used (Percentage)
- Created At
This table shows a summary of a resource’s jobs that have been submitted via Slurm. You can click any of the fields at the top of the table to sort the data. Field include:
- Job ID
- Job Name
- Start Time
- End Time
- Elapsed Seconds
Customizing the Monitor Dashboard
You can customize the monitor dashboard by using different filters or changing the layout of the page.
The monitor dashboard includes the following options for filtering data:
Two filters must have options selected: Pool and Time. These filters are pinned to the top of the Monitor page. You can click either of these filters to change them.
To add additional filters, click Filter Options and select any filter from the list. Next, use the dropdown menu to select the filter parameter. All filter dropdown menus include a search bar for quickly finding parameters.
Please note that some filters are conditional. For example, you must select a Pool before you can select a Session.
You can change the layout of the monitor dashboard at any time. Your changes will not affect other users in your organization.
Click Options, then Unlock Layout.
When the monitor dashboard is in editing mode, a bracket will appear in the bottom-right corner for each data module.
Drag and drop modules to change their positions on the page.
To resize a module, click the bracket in the bottom-right corner and drag vertically or horizontally.
Click the delete icon to remove a module from the page.
When you’re done making changes, click Options > Save Layout, then Lock Layout. Your cost dashboard’s layout will remain in this state until you make further changes.
Click Options > Reset Layout, then Save Layout to revert the page to its default state.
You can save the data from the cost dashboard by downloading a copy.
Click Options, then Print.
A Print window will appear. Select the option for Save as PDF. Click Save.
The cost dashboard page will be downloaded as a PDF.