# Nodes & GPUs

> Source: https://parallelworks.com/docs/kubernetes/nodes-and-gpus

# Nodes & GPUs

ACTIVATE provides visibility into the nodes running across your Kubernetes clusters and tools for managing NVIDIA GPU configurations, including Multi-Instance GPU (MIG) partitioning.

:::info Admin Only
The Nodes view is available to organization admins and platform admins only.
:::

## Viewing Cluster Nodes

Navigate to **Kubernetes > Nodes** in the sidebar to view all nodes across your connected clusters.

### Node Table Columns

| Column | Description |
|--------|-------------|
| **Name** | The node hostname. Click to open the node detail page. |
| **Cluster** | The cluster the node belongs to (hidden when filtering by a single cluster). |
| **Kubernetes Version** | The kubelet version running on the node (e.g., `v1.28.4`). |
| **Container Runtime** | The container runtime and version (e.g., `containerd://1.7.2`). |
| **Internal IP** | The node's internal network IP address. |
| **Architecture** | The CPU architecture (e.g., `amd64`, `arm64`). |

### Filtering Nodes

Use the filter bar to narrow results:

- **Clusters** — Show nodes from specific clusters only
- **Search** — Free-text search across node name and cluster name

## Node Detail Page

Click a node name to open its detail page. This page displays comprehensive information about the selected node.

### System Information

The detail page shows the following system-level properties:

| Property | Description |
|----------|-------------|
| **OS Image** | The operating system image (e.g., `Ubuntu 22.04.3 LTS`). |
| **Kernel Version** | The Linux kernel version. |
| **Operating System** | The OS type (e.g., `linux`). |
| **Architecture** | The CPU architecture. |
| **Container Runtime Version** | The container runtime and version. |
| **Kubernetes Version** | The kubelet version. |
| **Internal IP** | The node's internal IP address. |

### Capacity and Allocatable Resources

Each node reports two sets of resource quantities:

- **Capacity** — The total physical resources available on the node
- **Allocatable** — The resources available for pod scheduling (capacity minus system-reserved resources)

Both sets include:

| Resource | Format | Example |
|----------|--------|---------|
| **CPU** | Number of cores | `8` |
| **Memory** | Gigabytes | `32Gi` |
| **Ephemeral Storage** | Gigabytes | `100Gi` |
| **Pods** | Maximum pod count | `110` |
| **NVIDIA GPUs** | GPU count (if present) | `4` |

:::tip Resource Overhead
Compare the capacity and allocatable values to understand how much overhead is reserved for system components like the kubelet and OS processes.
:::

### Node Labels

The detail page displays all labels assigned to the node. Labels commonly include:

- `kubernetes.io/hostname` — The node hostname
- `kubernetes.io/arch` — CPU architecture
- `kubernetes.io/os` — Operating system
- `node.kubernetes.io/instance-type` — Instance type (on cloud providers)
- `nvidia.com/gpu.product` — GPU model name (on GPU nodes)
- `nvidia.com/mig.config` — Current MIG configuration label

## GPU Management

For nodes equipped with NVIDIA GPUs, ACTIVATE provides tools to install and manage the NVIDIA GPU Operator and configure MIG partitioning directly from the node detail page.

### NVIDIA GPU Operator

The GPU Operator automates the management of GPU drivers, container toolkits, and device plugins on Kubernetes. From the node detail page, you can install, upgrade, or roll back the GPU Operator Helm chart.

#### Installing the GPU Operator

1. Navigate to the node detail page for a GPU-equipped node
2. Click the **GPU Operator** button in the action bar
3. Fill in the installation form:

| Field | Description | Default |
|-------|-------------|---------|
| **Helm Chart Version** | The GPU Operator chart version to install | `v25.3.0` |
| **Namespace** | The namespace for the GPU Operator deployment | `gpu-operator` |
| **Create Namespace** | Whether to create the namespace if it does not exist | `true` |
| **Containerd Config** | Path to the containerd configuration file (optional) | — |
| **Containerd Socket** | Path to the containerd socket (optional) | — |

4. Click **Install NVIDIA GPU Operator**

The operator is installed from the `https://helm.ngc.nvidia.com/nvidia` Helm repository using the `nvidia/gpu-operator` chart.

#### Upgrading the GPU Operator

If the GPU Operator is already installed, the same form appears with an **Upgrade NVIDIA GPU Operator** button instead. The upgrade uses the same Helm chart configuration.

#### Rolling Back

When the GPU Operator is installed, the drawer also shows the **Release History** table with all previous revisions. Click the rollback button next to any revision to revert to that version.

### MIG Configuration

NVIDIA Multi-Instance GPU (MIG) allows a single physical GPU to be partitioned into multiple isolated GPU instances, each with dedicated compute, memory, and bandwidth resources.

#### MIG Strategies

ACTIVATE supports two MIG strategies:

| Strategy | Description |
|----------|-------------|
| **Single** | All GPU instances on the node use the same MIG profile. Use this when all workloads on the node have identical GPU requirements. |
| **Mixed** | Different MIG profiles can coexist on the same GPU. Use this for heterogeneous workloads with varying GPU requirements. |

#### Configuring MIG

1. Navigate to the node detail page for a GPU node that has the GPU Operator installed
2. Click the **NVIDIA MIG** button in the action bar
3. Select a **MIG Strategy** (`single` or `mixed`)
4. Enter a **MIG Strategy Config** value that specifies the MIG profile to apply

   Common configuration values include:
   - `all-1g.6gb` — All instances configured as 1g.6gb (smallest slice)
   - `all-2g.12gb` — All instances configured as 2g.12gb
   - `all-3g.24gb` — All instances configured as 3g.24gb
   - `all-balanced` — A balanced mix of MIG instance sizes (for mixed strategy)

5. Click **Configure MIG**

:::info GPU-Dependent Profiles
The available MIG profiles depend on the GPU model. The configuration drawer displays the default MIG partitioning options for the detected GPU type, loaded from the `default-mig-parted-config` ConfigMap managed by the GPU Operator.
:::

#### What Happens During MIG Configuration

When you apply a MIG configuration, ACTIVATE performs two operations:

1. **Patches the cluster policy** — Updates the `clusterpolicies.nvidia.com/cluster-policy` CRD to set the MIG strategy (e.g., `mixed` or `single`) at `/spec/mig/strategy`
2. **Labels the node** — Applies the `nvidia.com/mig.config` label to the target node with the specified configuration value (e.g., `all-balanced`)

The GPU Operator detects these changes and automatically reconfigures the GPU partitioning on the node.

:::warning Workload Disruption
Changing MIG configuration may temporarily disrupt GPU workloads running on the node. Plan MIG reconfiguration during maintenance windows when possible.
:::

## Cross-Cluster Queries

The nodes list view aggregates data from all connected clusters by default. The response metadata includes:

- **Total clusters queried** — How many clusters were contacted
- **Successful clusters** — How many responded successfully
- **Total nodes** — The combined count of nodes returned

If a cluster is unreachable, the remaining clusters still return their results.

## See Also

- [Resource Quotas](/docs/kubernetes/resource-quotas) — Set GPU limits per namespace
- [Helm Charts](/docs/kubernetes/helm-charts) — Manage Helm releases including the GPU Operator
- [Managing Workloads](/docs/kubernetes/managing-workloads) — View GPU workloads running across your clusters
