Monitoring Your Work
ACTIVATE features several data monitoring modules as well as a monitor dashboard to track your work.
Home
The Home page displays important data at a glance.
Workflow Monitor
This module shows a snapshot of your most recently run workflows.
Data columns for your workflows include:
- ID
- Workflow
- Status
- Submitted
- Runtime (Minutes)
If you click a workflow’s ID number or its name in the Workflow column, you’ll be taken to a more detailed view on the Workflows page.
A workflow’s Status can be Started, Running, Completed, Canceled, or Error.
The Workflow Monitor also includes three important action buttons:
- Cancel Run
- Run Workflow Again
- View Active Workflow
If you click the icon to re-run a workflow, you’ll be taken to the workflow’s configuration form on the Workflows page.
Resource Monitor
This module shows a summary of your node usage across your resources.
Different resources are shown as white, green, and blue lines. You can mouse over data points to see corresponding resource names.
Storage Resources
This module shows your favorited storage resources and their status (active, starting, or stopped).
If you click the information icon, you’ll be taken to the storage’s configuration form on the Storage page.
My Compute Resources
This module shows your favorited resources and their status (active nodes and requested nodes if active or stopped if inactive).
The navy bar reflects the number of maximum nodes a resource can have. In the screenshot above, the resource has the controller node and a partition that’s configured for 10 maximum nodes, for a total of 11 possible nodes.
The green bar reflects the number of active nodes on a compute resource. In the screenshot above, the resource is active but not running any jobs, so there is 1 active node (the controller).
If you click the gear icon, you’ll be taken to the resource’s Definition tab on the Clusters page.
If you click the information icon, you’ll be taken to the resource’s Sessions tab on the Clusters page.
Workflows
On the Workflows page, click a workflow. You’ll be taken to the workflow’s Jobs tab.
The Workflow Monitor here mirrors the Workflow Monitor from the Home page.
The Job logs module shows details about specific workflow sessions. When you navigate to this page, this module shows No log found
until you click a job number in the ID column.
If you navigate to this page after clicking the eye icon on a running workflow, the Job logs module will show details for that active session.
You can save workflow logs by clicking the Download button.
Clusters
When you click on a cluster, you’ll be taken to the resource’s Sessions tab.
The monitor at the top of the page mirrors the My Compute Resources module from the Home page.
Active Nodes
This module shows details about nodes that are currently running on your resource, including:
- Node ID
- Public IP Address
- Private IP Address
- Node Runtime
Sessions
This module shows details about all your sessions with this resource, including:
- Session
- Status
- Health Check
- Creation Time
- Deletion Time
- Dashboards
If you click a number in the Session column, the Logs module will change to reflect information from that session.
If you click the icon in the Dashboard column, you’ll be taken to the cost dashboard. If you click the icon, you’ll be taken to the monitor dashboard.
Cost
This module displays cost information about your running cluster, including Cost By Type, Daily Cost, and Filtered Cost.
For detailed information about cost types, please see this section of our page Monitoring Costs.
Logs
This module features five tabs of detailed information, including:
- Provision
- Storages
- Deletion
- Scheduler
- Health Check
The Provision log shows your resource’s provisioning process. If a resource has been provisioned successfully, you’ll see the message Tunnel established successfully, Controller IP is 12.345.678.90
.
If a resource fails to provision, you’ll see an Error
message. In that case, we suggest adjusting your resource’s configuration, then starting the resource again.
The Storages log shows the provisioning process for attached ephemeral storage resources, if any.
The Deletion log shows your resource’s deletion process after you turn it off. If a resource has been deleted successfully, you’ll see a message like 2023-07-10T17:15:34.721Z-INFO: delete() finished
.
The Scheduler log shows your resource’s scheduled and completed Slurm jobs, if any.
The Health Check log shows details for an automated script that checks for provisioning and connection errors.
You can save any of these logs by clicking the Download button.
Storage
When you click on a persistent storage resource, you’ll be taken to the storage’s Sessions tab.
The Sessions module shows details about all your sessions with this storage resource, including:
- Session
- Status
- Creation Time
- Deletion Time
The Logs module shows details about specific storage sessions. When you navigate to this page, the Provision and Deletion tabs here show Log not found
until you click a number in the Session column.
You can save storage logs by clicking the Download button.
Please note that ephemeral storage resources don’t have this page because they’re created and destroyed with a resource.
You can see more details about ephemeral storage resources by navigating to their attached resource and clicking on the Storages tab of the Logs module.
Monitor
In the Monitor category, there are three pages: Dashboard, Instances, and Cost.
Dashboard
The Dashboard page provides an overview of your work on ACTIVATE.
By default, the monitor dashboard displays the following data modules. You can change the view at any time; for more information, please see Filters below.
Please note that you must use the Pool filter to select a resource before the monitor dashboard will display data.
Graphs
Average Lustre Filesystem GB
This graph shows how many gigabytes (GB) a resource has used for Lustre storage.
Please note that Lustre is optional, so this graph will not display any data if you haven’t used a Lustre storage resource (as seen in the screenshot above).
Load and Utilization
This graph shows a summary of a resource’s processing and memory usage.
You can mouse over the lines to see detailed information for:
- Average CPU User
- Average CPU System
- Average Memory Used
- Average Disk Used
This graph is most useful when looking at specific sessions for a resource.
Memory GB
This graph shows how many gigabytes (GB) a resource has used for memory. This type of memory is similar to RAM on a personal computer.
IO KB/s
This graph shows a resource’s data input and output in kilobytes per second (KB/s).
Nodes
This graph shows how many nodes a resource has used.
The Nodes graph is especially useful when paired with the other data modules. For example, you could adjust the number of nodes on your resource and check the IO KB/s graph to maximize your nodes’ efficiency for your workload.
Root Filesystem GB
This graph shows a resource’s root filesystem in gigabytes (GB). This type of memory is similar to a hard drive on a personal computer.
Tables
Worker Table
This table shows a summary of a resource’s history. You can click any of the fields at the top of the table to sort the data. Fields include:
- Hostname
- Project
- Username
- Private IP Address
- Session Number
- Public IP Address
- Pool Name
- Last Active
- CPU User
- Disk Used (Percentage)
- Memory Used (Percentage)
- Created At
Slurm Jobs
This table shows a summary of a resource’s jobs that have been submitted via Slurm. You can click any of the fields at the top of the table to sort the data. Field include:
- State
- Job ID
- Project
- Job Name
- Username
- Start Time
- Cluster
- End Time
- Elapsed Seconds
- Nodes
- CPUs
Filters
The monitor dashboard includes the following options for filtering data:
- Time
- User
- Pool
- Session
Two filters must have options selected: Pool and Time. These filters are pinned to the top of the Monitor page. You can click either of these filters to change them.
To add additional filters, click Filter Options and select any filter from the list. Next, use the dropdown menu to select the filter parameter. All filter dropdown menus include a search bar for quickly finding parameters.
Please note that some filters are conditional. For example, you must select a Pool before you can select a Session.
Printing Data
You can save the data from the cost dashboard by downloading a copy.
Click Options, then Print.
A Print window will appear. Select the option for Save as PDF. Click Save.
The cost dashboard page will be downloaded as a PDF.
Instances
The Instances page shows your active and deleted clusters within the last hour.
Each instance listed here includes the following data:
- Pool
- Session
- Region
- Number of Running Instances
- Started
- Deleted
- State
If a cluster is active, its State will show Running in yellow.
If a cluster has recently been shut down, its State will show Deleted in blue.
If you haven’t started or stopped a cluster within the last hour, the Instances tab will show the message No instances found.
Cost
The Cost page shows your cost data across ACTIVATE. For more information about this page, please see Monitoring Costs.