Skip to main content

Kubernetes metrics

Some metrics collected by Kubernetes are the same as those of server and application monitoring.

Container(container) metric

The container category collects all custom labels set on the container's pods as tags.

  • Target: Cluster project, Namespace project
  • Collection interval: 5 seconds
  • Statistical data: 5 minutes

Tags

Tag nameDescriptionRemarks
agentOidNode agent's unique IDUnique value
agentPcodepcodeUnique value
commandExecution command-
containerIdContainer IDUnique value
containerKeyContainer key-
createdTime stamp generated by the container-
imageContainer image name-
imageHashImage hash value-
imageIdImage ID-
k8s-appValue for the pod's label k8s-app-
microOidUnique ID of the WhaTap APM agent installed in the container-
nameContainer name-
namespaceNamespace to which the container belongs-
namespaceHashHash value of the namespace to which the container belongs-
okindUnique ID of OKIND specified in the WhaTap APM agent installed in the container-
okindNameName of OKIND specified in the WhaTap APM agent installed in the container-
onameName of the WhaTap APM agent installed in the container-
onodeUnique ID of the node agent on which the container is running-
onodeNameNode name on which the container is running-
podHashHash value of the container's Pod-
podNameContainer's Pod name-
replicaSetHashHash value of the container's replica set-
replicaSetNameName of the container's replica set-
whatap_projectName of the WhaTap project to which the container belongs-

Fields

Field nameUnitShortname, Name, Description
blkio_rbpsByteIoReadBytes
Container Block I/O Read Byte
Sum of bytes read per second across all block devices in the container
blkio_riopsCountIoReadIops
Container Block I/O Read IOPS
Sum of counts read per second across all block devices in the container
blkio_wbpsByteIoWriteBytes
Container Block I/O Write Byte
Sum of bytes written per second across all block devices in the container
blkio_wiopsCountIoWriteIops
Container Block I/O Write IOPS
Sum of counts written per second across all block devices in the container
cpu_per_quotaPercentCpuByLimit
Container CPU Usage by Limit (%)
Container CPU utilization by limit
cpu_quotaMillicoreCpuLimit
Container CPU Limit (core)
Container CPU Limit Quota
If the limit is not set, the total CPU cores of the node where the container is running appears in millicores.
cpu_quota_percentPercentCpuLimitByNode
Container CPU Limit by Node (%)
Container CPU Limit Quota against Node CPU
If the limit is not set, the total CPU cores of the node where the container is running appears in percentage.
cpu_sysPercentCpuSysByNode
Container CPU Sys Usage by Node (%)
Container CPU System Utilization against Node CPU
cpu_throttledperiodsCountCpuThrottledCnt
Container CPU Throttling Count
Container CPU Throttled Count
cpu_throttledtimeNanosecondCpuThrottledTime
Container CPU Throttling Time
Container CPU Throttled Time
cpu_totalPercentCpuByNode
Container CPU Usage by Node (%)
Container CPU Utilization against Node CPU
cpu_total_milliMillicoreCpuTotUsage
Container CPU Usage (millicore)
Container CPU Usage
cpu_userPercentCpuUserByNode
Container CPU User Usage by Node (%)
Container CPU User Utilization against Node CPU
cpu_requestMillicoreCpuRequest
Container CPU Request (core)
Container CPU Request
cpu_per_requestPercentCpuByRequest
Container CPU Usage by Request (%)
Utilization against Container CPU Request
= cpu_total_milli/cpu_request * 100
mem_failcntCountMemFailCnt
Container Memory Failure Count
Container Memory Limit reached Count
mem_limitByteMemLimit
Container Memory Limit (byte)
Container Memory Limit Size
mem_maxusageByteMemMaxUsage
Container Memory Max Usage (byte)
Recorded Value for Container Memory Maximum Usage
mem_percentPercentMemWsByLimit
Container Memory Working Set by Limit (%)
Working Set Usage based on Container Memory Limit
= mem_usage/mem_limit * 100
mem_totalcacheByteMemTotCache
Container Memory Total Cache (byte)
Container's Total Cache Size
mem_totalpgfaultCountMemTotPageFaultCnt
Container Memory Total Page Fault Count
Container's Page Fault Count
mem_totalrssByteMemTotRss
Container Memory Total RSS (byte)
Container's Total RSS Memory Size
mem_totalrss_percentPercentMemTotRssByLimit
Container Memory Total RSS By Limit (%)
Container's Total RSS Memory Utilization
mem_totalunevictableByteMemTotUnevictable
Container Memory Total Unevictable (byte)
Container's Total Unevictable Memory Size
mem_usageByteMemUsage
Container Memory Usage (byte)
Container Memory Usage
mem_working_setByteMemWs
Container Memory Working Set (byte)
Container memory working set
= mem_usage - inactive file
mem_working_set_percentPercentMemWsByLimit
Container Memory Working Set by Limit (%)
Working Set Usage based on Container Memory Limit
= mem_usage/mem_limit * 100
mem_requestByteMemRequest
Container Memory Request (byte)
Container Memory Request Size
mem_per_requestPercentMemWsByRequest
Container Memory Working Set by Request (%)
Working Set Usage based on Container Memory Request
= mem_working_set / mem_request * 100
network_rbpsByteNetRxBytes
Container Network Receive Byte
Sum of bytes read per second across all block devices in the container
network_rdroppedByteNetRxDropped
Container Network Receive Dropped
Container Network Receive Dropped Count
network_rerrorByteNetRxError
Container Network Receive Error
Container Network Receive Error Count
network_riopsByteNetRxIops
Container Network Receive IOPS
Container Network Receive Error Count
network_wbpsByteNetTxByes
Container Network Transmit Byte
Container Network Transmit Data Size
network_wdroppedCountNetTxDropped
Container Network Transmit Dropped
Container Network Transmit Dropped Count
network_werrorCountNetTxError
Container Network Transmit Error
Container Network Transmit Error Count
network_wiopsCountNetTxIops
Container Network Transmit IOPS
Container Network Transmit Error Count
node_cpuPercentConNodeCpu
Container Work Node CPU Usage (%)
CPU Usage of the Node where the container is running
node_memPercentConNodeMem
Container Work Node Memory Usage (%)
Memory Usage of the Node where the container is running
phaseStringPod lifecycle
① PENDING
② RUNNING
③ SUCCEEDED
④ FAILED
⑤ UNKNOWN
restart_countPositive numberConRestartCnt
Container Restart Count
Container Restart Count
statePositive numberConState
Container Current State
Container State Code
① RUNNING = 114
② PAUSE = 112
③ RESTARTING = 101
④ OOMKILLED = 111M
⑤ DEAD = 100
⑥ WAITING = 119
statusStringConStatus
Container Current Status
Container State Information
① running: Displays the uptime information
② waiting/terminated: Displays the reason of the state

Kubernetes node (kube_node) metric

The kube_node category collects all custom labels set on the node as tags.

  • Target: Cluster project, Namespace project
  • Collection interval: 5 seconds
  • Statistical data: 5 minutes, 1 hour

Tags

Tag nameDescriptionRemarks
nodeNameNode name-

Fields

Field nameUnitDescriptionRemarks
allocatable_cpuMillicoreCPU size that can be assigned to node-
allocatable_memoryByteMemory size that can be assigned to node-
allocatable_podsPositive numberNumber of Pods that can be assigned to node-
limit_cpuMillicoreSum of node CPU limits-
limit_memoryByteSum of node memory limits-
podsPositive numberTotal number of node Pods-
request_cpuMillicoreSum of node CPU requests-
request_memoryByteSum of node memory requests-

Kubernetes event (kube_event) metric

The kube_event category collects cluster-wide data for cluster projects, and collects data only for events that occurred in the namespace for namespace projects.

  • Target: Cluster project, Namespace project
  • Collection interval: 5 seconds
  • Statistical data: 5 minutes, 1 hour

Tags

Tag nameDescriptionRemarks
field_pathField Path-
kindTypeObject type on which the event occurred
nameObject nameKubernetes object name on which the event occurred
namespaceNamespace nameNamespace on which the event occurred
reasonEvent occurrence cause-
typeEvent typeWarning or normal
uidUIDObject on which the event occurred

Fields

Field nameUnitDescriptionRemarks
actionStringAction name-
countCountEvent occurrence count-
event_timePositive numberTime stamp for the first event-
first_timestampPositive numberFirst event occurrence time-
last_timestampPositive numberLast event occurrence time-
messageStringEvent Message-
reasonFiledStringEvent reason-
reporting_componentStringComponent that reports the current event-
reporting_instanceStringInstance that reports the current event-
series_last_observed_timePositive numberSeries last observed time-

Kubernetes Cluster (kube_stat) metric

The kube_stat category collects all the clusters for the cluster project, and the namespace projects collects objects associated with the namespace.

  • Target: Cluster project, Namespace project
  • Collection interval: 5 seconds
  • Statistical data: 5 minutes, 1 hour

Tags

Tag nameDescriptionRemarks
namekube_statFixed value

Fields

Field nameUnitDescriptionRemarks
alloctable_cpuMillicoreNumber of cluster coresCluster project only
alloctable_ephemeral-storageByteEphemeral storage that can be allocated to all clustersCluster project only
alloctable_hugepages-1giByteHugepages-1Gi that can be allocated to all clustersCluster project only
alloctable_hugepages-2miByteHugepages-2Gi that can be allocated to all clustersCluster project only
alloctable_memoryByteMemory that can be allocated to all clustersCluster project only
alloctable_podsPositive numberNumber of pods that can be allocated-
available_podPositive numberNumber of pods whose phase is in Running state-
desired_podPositive numberSum of the number of pods deployed without metadata.ownerReferences and the number of desired pods defined in Kubernetes objects (ReplicaSet, Daemonset, StatefulSet)-
Same as the number of pods retrieved by kubectl get pods -A-
nodesPositive numberNumber of nodes-
pod_phase_PendingPositive numberNumber of pending pods-
pod_phase_RunningPositive numberNumber of running pods-
running_containersPositive numberNumber of running containers-
stopped_containersPositive numberNumber of stopped containers-
total_available_cpuPositive numberTotal allocatable CPU-
total_available_memoryPositive numberTotal sum of allocatable memory-
total_limit_cpuMillicoreTotal sum of limit CPU-
total_limit_memoryByteTotal sum of limit memory-
total_request_cpuMillicoreTotal sum of request CPU-
total_request_memoryByteTotal sum of request memory-
unavailable_podPositive numberNumber of pods whose phase is not in Running state (Pending, Failed, Succedded)-
waiting_containersPositive numberWaiting container count-

Pod (kube_pod) metric

The kube_pod category collects all custom labels set on the Pod as tags.

  • Target: Master (cluster) project, Namespace project
  • Collection interval: 5 seconds
  • Statistical data: 5 minutes

Tags

Tag nameDescriptionRemarks
agentOidNode agent's unique IDUnique value
agentPcodepcodeUnique value
commandExecution command-
containerIdsContainer ID that belongs to the pod-
containerIdsCountNumber of containerIds-
containerKeysHash value for the container ID that belongs to the pod-
containerKeysCountNumber of containerKeys-
DaemonSetDaemonSet name of the pod-
DeploymentDeployment-
k8s-appValue for the pod's label k8s-app-
microOidID of the agent running on the applications inside the pod's container.-
microOidsMultiple IDs of the agents running on applications inside multiple containers in the pod-
microOidsCountNumber of microOids-
namePod Name-
onamesName of the agent running on the applications inside the pod's container.-
onamesCountNumber of onames-
podNamePod Name-
namespaceNamespace to which the Pod belongs-
namespaceHashHash value of the namespace to which the Pod belongs-
replicaSetHashHash value of ReplicaSet of the Pod-
replicaSetNameReplicaSet name of the Pod-
whatap_projectName of the WhaTap project to which the Pod belongs-

Fields

Field nameUnit(Shortname, Name, Description)
blkio_rbpsByteIoReadBytes
Pod Block I/O Read Byte
Sum of bytes read per second across all block devices in the Pod
blkio_riopsCountIoReadIops
Pod Block I/O Read IOPS
Sum of cases read per second across all block devices in the Pod
blkio_wbpsByteIoWriteBytes
Pod Block I/O Write Byte
Sum of bytes written per second across all block devices in the Pod
blkio_wiopsCountIoWriteIops
Pod Block I/O Write IOPS
Sum of cases written per second across all block devices in the Pod
cpu_per_limitPercentCpuByLimit
Pod CPU Usage by Limit (%)
Container CPU utilization by limit
cpu_per_requestPercentCpuByRequest
Pod CPU Usage by Request (%)
Total CPU utilization based on the CPU requests
cpu_quota_percentPercentCpuLimitByNode
Pod CPU Limit by Node (%)
Pod CPU limit quota against node CPU
If the limit is not set, the total CPU cores of the node where the Pod is running appears in percentage.
cpu_sysPercentCpuSysByNode
Pod CPU Sys Usage by Node (%)
Pod CPU System Utilization against Node CPU
cpu_throttledperiodsCountCpuThrottledCnt
Pod CPU Throttling Count
Pod CPU Throttled Count
cpu_throttledtimeNanosecondCpuThrottledTime
Pod CPU Throttling Time
Pod CPU Throttled Time
cpu_totalPercentCpuByNode
Pod CPU Usage by Node (%)
Pod CPU Utilization against Node CPU
cpu_total_milliMillicoreCpuTotUsage
Pod CPU Usage (millicore)
Pod CPU usage
cpu_userPercentCpuUserByNode
Pod CPU User Usage by Node (%)
Pod CPU User Utilization against Node CPU
cpu_requestMillicoreCpuRequest
Pod CPU Request (core)
Pod CPU Request
cpu_per_requestPercentCpuByRequest
Pod CPU Usage by Request (%)
Utilization against Pod CPU Request
= cpu_total_milli/cpu_request * 100
mem_totalcacheByteMemTotCache
Pod Memory Total Cache (byte)
Total Pod Cache Size
mem_totalpgfaultCountMemTotPageFaultCnt
Pod Memory Total Page Fault Count
Pod's Page Fault Count
mem_totalrssByteMemTotRss
Pod Memory Total RSS (byte)
Pod's Total RSS Memory Size
mem_totalrss_percentPercentMemTotRssByLimit
Pod Memory Total RSS by Limit (%)
Pod's Total RSS Memory Utilization
mem_totalunevictableByteMemTotUnevictable
Pod Memory Total Unevictable (byte)
Pod's Total Unevictable Memory Size
mem_usageByteMemUsage
Pod Memory Usage (byte)
Pod Memory Usage
mem_working_setByteMemWs
Pod Memory Working Set (byte)
Pod Memory working set
= mem_usage - inactive file
memory_requestByteMemRequest
Pod Memory Request (byte)
Pod memory requests
memory_limitByteMemLimit
Pod Memory Limit (byte)
Pod memory limit quota
memory_per_requestPercentMemByRequest
Pod Memory Working Set by Request (%)
Working Set usage based on the Pod memory request
memory_per_limitPercentMemByLimit
Pod Memory Working Set by Limit (%)
Working Set usage based on the Pod memory limit
network_rbpsByteNetRxBytes
Pod Network Receive Byte
Sum of bytes read per second across all block devices in the Pod
network_rdroppedByteNetRxDropped
Pod Network Receive Dropped
Pod Network Receive Dropped Count
network_rerrorByteNetRxError
Pod Network Receive Error
Pod Network Receive Error Count
network_riopsByteNetRxIops
Pod Network Receive IOPS
Pod Network Receive Count
network_wbpsByteNetTxByes
Pod Network Transmit Byte
Pod Network Transmit Data Size
network_wdroppedCountNetTxDropped
Pod Network Transmit Dropped
Pod Network Transmit Dropped Count
network_werrorCountNetTxError
Pod Network Transmit Error
Pod Network Transmit Error Count
network_wiopsCountNetTxIops
Pod Network Transmit IOPS
Pod Network Transmit Count
phaseStringPhase
Pod Current Phase
Pod lifecycle
① PENDING
② RUNNING
③ SUCCEEDED
④ FAILED
⑤ UNKNOWN

The following fields are reserved for internal use.

Field nameDescriptionRemarks
kube_sless_normalNumber of Kubernetes informative events-
kube_sless_warningNumber of Kubernetes warning events-
micro_sful_criticalNumber of APM events that are critical-
micro_sful_infoAPM informative event count-
micro_sful_warningAPM warning event count-
micro_sless_criticalNumber of APM events that are not critical-
micro_sless_infoNumber of APM events that are not informative-
micro_sless_warningNumber of APM events that are not for warning-
sful_criticalNumber of events that are critical in the metric-
sful_infoNumber of events that are informative in the metric-
sful_warningNumber of events that are for warning in the metric-
sless_criticalNumber of events that are not critical in the metric-
sless_infoNumber of events that are not informative in the metric-
sless_warningNumber of events that are not for warning in the metric-

Kubernetes Pod Statistics (kube_pod_stat) metric

The kube_pod_stat category cluster project collects data for all clusters, and the namespace project collects data only for pods that belong to the namespace.

  • Target: Cluster project, Namespace project
  • Collection interval: 5 seconds
  • Statistical data: 5 minutes, 1 hour

Tags

Tag nameDescriptionRemarks
kindTypeThe cluster project has cluster as the fixed value, and the namespace projects collect only for Deployment or ReplicaSet.
nameKubernetes resource nameThe cluster project has no name value and the namespace projects have the name for Deployment or ReplicaSet.

Fields

Field nameUnitDescriptionRemarks
available_podPositive numberNumber of pods whose phase is in Running state-
desired_podPositive numberSum of the number of pods deployed without metadata.ownerReferences and the number of desired pods defined in Kubernetes objects (ReplicaSet, Daemonset, StatefulSet)-
Same as the number of pods retrieved by kubectl get pods -A-
limit_cpuMillicoreCPU Limit Usage-
limit_memoryByteMemory Limit Usage-
request_cpuMillicoreCPU Request Usage-
request_memoryByteMemory Request Usage-
running_containerPositive numberRunning Container Count-
stopped_containerPositive numberStopped Container Count-
waiting_containerPositive numberWaiting container count-

Kubernetes Horizontal Pod Autoscaler (HPA) (kube_hpa_stat) metric

Metric collection starts only when HPA is added to the ClusterRole used by WhaTap.

  • Target: Cluster project
  • Collection interval: 5 seconds
  • Statistical data: 5 minutes, 1 hour

Tags

Tag nameDescriptionRemarks
nameHPA name-

Fields

Field nameUnitDescriptionRemarks
currentReplicasPositive numberCurrent Replica Count-
desiredReplicasPositive numberDesired Replica Count-
lastScaleTimePositive numberLast scaled TimeStamp-
maxReplicasPositive numberMaximum Replica Count-
minReplicasPositive numberMinimum Replica Count-

Process (kube_process) metrics

Note

Kubernetes agent 1.7.12 or later is required. For more information about agent updates, see the following.

Kubernetes-related processes that exist in the node are collected during monitoring.

  • Target: Cluster project, Namespace project

  • Collection interval: 5 seconds

  • Statistical data: 5 minutes

Tags

Tag nameDescriptionTypeRemarks
ppidParent process IDString/proc/[pid]/status::PPid
pidProcess IDString/proc/[pid]/status::Pid
cmd1Command nameString/proc/[pid]/status::Name
cmd2Command line (full command and arguments)String/proc/[pid]/cmdline
userUser ID or usernameString/proc/[pid]/status::Uid
onodeNameNode name of the processStringEnvironment variable of the container system (NODE_IP)
createTimeProcess start timeTimestampField calculated through /proc/uptime

Fields

Field nameDescriptionUnitTypeRemarks
cpuCPU UtilizationPercent (%)floatField calculated through /proc/[pid]/stat
memoryMemory utilizationPercent (%)floatField calculated through /proc/[pid]/statm
rssActual memory usage (Resident Set Size)Byte (B)long/proc/[pid]/status::VmRSS
uidUser ID or username-String/proc/[pid]/status::Uid
stateProcess status-String/proc/[pid]/status::State
SharedMemoryShared memory sizeByte (B)longField calculated through /proc/[pid]/statm
openFileDescriptorsNumber of file descriptors open by the process-intField calculated through /proc/[pid]/fd
vmSizeVirtual memory sizeByte (B)long/proc/[pid]/status의 VmSize
threadsNumber of threads created by the process-int/proc/[pid]/status의 Threads

Linux process status in the Kubernetes environment

On Linux, the State field in the /proc/[pid]/status file displays the current state of the process. The meanings of each status are as follows:

CodeDescriptionDescription detail
R (Running)RunningThe process is running or ready to run.
S (Sleeping)WaitingInterruptible sleep state, waiting for an event.
D (Disk Sleep)Disk sleepingNon-interruptible sleep state, waiting for an I/O operation.
R (Zombie)Zombie stateThe process has been terminated, but the parent process has not yet collected its termination status.
T (Stopped)StoppedThe process is stopped by a job control signal (such as SIGSTOP) or debugger.
t (Tracing stop)Tracing stoppedThe state in which tracing is performed by the debugger (indicated by a lowercase t).
X (Dead)Dead stateThe process is dead (usually invisible).
x (Dead)Dead stateThe kernel thread is dead (usually invisible).
K (WakeKill)Forcibly terminatedWake-up signal is ignored and immediately dead.
W (Waking)WakingThe process is being woken up after receiving a wake-up signal.
I (Idle)Idle stateKernel thread is idle (usually invisible to user space processes).
Note

Because Kubernetes manages the resources of containers and nodes efficiently, many processes running in containers are actually in waiting state. As a result, most processes may be in Sleeping state.

Agent status (agent_status_summary) metrics

This category collects metrics related to agent status every 10 seconds.

Fields

Field nameUnitDescriptionRemarks
inActTimeMillisecond(ms)Amount of time the agent remains inactive-
isActive-Whether the agent is active or nottrue / false
isRestart-Whether the agent is restarted or nottrue / false
lastActTimeMillisecond(ms)Time when the agent was last activated0: If disabled
oid-Unique IDs for each agent in the project-
oType-Agent type1: Application agent / 2: See subType
startTimeMillisecond(ms)Timestamp indicating the time when the agent was started-
subType-Agent type9: Node agent / 10: Master agent

Ingress (kube_ingress) metric

Note

Kubernetes agent 1.7.13 or later is required. For more information about agent updates, see the following.

It is collected when monitoring metadata and the related information for Ingress resources.

  • Target: Cluster project, Namespace project

  • Collection interval: 30 seconds

  • Statistical data: 5 minutes

Tags

Tag nameDescriptionUnitType
ingressUidUnique ID of the Ingress resource-String
ingressNameName of the Ingress resource-String
ingressNamespaceNamespace of the Ingress resource-String
creationTimeMillisCreated time of the Ingress resourceMillisecond(ms)Long
ingressClassNameName of the Ingress class-String
ingressLoadBalancerIpsIP of the Ingress load balancer-List

Fields

Field nameDescriptionUnitType
hostHost name that the Ingress resource listens to
(if *, it applies to all hosts)
-List
pathRequest path under a specific host-List
backendServiceNameName of the service passed to the backend-List
backendServicePortPort number passed to the backend-List
backendServiceUidURL of the service passed to the backend-List
pathTypePath matching method (e.g. Prefix, Exact)-List