Skip to main content

Using GPU infrastructure monitoring

GPUs are a core resource that is far more expensive than other infrastructure (CPU, memory) and directly impacts AI/ML, LLM, and HPC workload performance. To protect your investment, you need to look beyond "is the GPU alive" and track utilization, occupying workloads, anomalies, and placement suitability. WhaTap GPU monitoring provides this across both server and Kubernetes environments in a unified way.

4 questions GPU monitoring must answer

Table | Core questions for GPU monitoring
QuestionWhat happens if unanswered
How heavily is it used?Expensive GPUs sit idle undetected → over-investment
Who (which Pod/workload) occupies it?A single Pod monopolizes the GPU → other teams' workloads are delayed
Are there any anomalies?Temperature or power issues start degrading performance → later escalate into hardware lifetime problems
Is the placement appropriate?A mix of overloaded and idle GPUs → reallocation decisions are delayed

Supported environments

WhaTap provides two views of GPUs. Choose based on your workload deployment.

GPU monitoring in server environments

Track GPUs installed directly on bare metal or VM servers. Server GPU monitoring

GPU monitoring in Kubernetes environments

Track Node ↔ GPU (MIG) ↔ Pod mapping across the Kubernetes cluster. K8s GPU monitoring

MIG (Multi-Instance GPU) support — Monitor NVIDIA GPUs at both the physical (P) and MIG instance (M) level. Essential for environments that split GPUs within a cluster.

Prerequisites

  • Kubernetes GPU dashboard: Kubernetes agent 1.8.7 or later + OpenAgent installed
  • Server GPU: GPU module enabled in the server agent (agent-gpu)

Usage scenarios

① Get GPU asset status at a glance

When adopting or migrating infrastructure, you first need to answer "where are our GPUs and how many?" Use the Server GPU inventory to check models, quantities, and allocation status in one place.

② Track workload bottlenecks

When LLM or ML inference slows down:

  1. Check the Top 5 by utilization, temperature, and memory on the GPU dashboard.
  2. When an overloaded GPU is found, trace its node-Pod mapping.
  3. In a MIG environment, drill down to identify which instance is saturated.

In an LLM context, cross-reference with LLM Observability metrics to distinguish between "GPU utilization saturation" and "model choice or prompt length" as the root cause.

③ Anomaly detection

  • Temperature or power anomalies: Early warning on hardware issues → prevent outages
  • GPU in Pending state: Detect missing allocations
  • Unused GPUs: Catch budget waste early
  • Utilization skew: One node saturated while others are idle → reallocation signal

Link to alert rules: Add GPU metrics to event rules so thresholds exceeded trigger automatic notifications. See Attach your first alert for setup details.

④ Optimize resource placement

  • Check long-term usage patterns with GPU trends.
  • If only specific time windows are saturated, adjust scheduling or placement.
  • Per-team usage → grounds for internal chargeback or quota policy

⑤ Capacity planning

Monthly and quarterly GPU usage trends provide grounds for scale-up or scale-down decisions. Include them in the quarterly retrospective of the Performance reporting scenario.

Dashboard structure highlights

Kubernetes GPU dashboard

  • GPU resource status summary (top four widgets): Counts of allocated nodes, Pods, and GPUs by status
  • GPU Map: Device map chart (P = physical, M = MIG)
    • Grouped by node or physical device
    • Color-coded by status and utilization
  • Top 5 trends: Time series of GPUs ranked by utilization, temperature, and memory

Details: GPU dashboard

Server GPU performance summary

  • Real-time utilization, temperature, power, and memory per installed GPU
  • Per-node summary with drill-down to individual GPUs

Details: GPU performance summary

Next steps