Skip to main content

PostgreSQL metrics

Database metrics are a comprehensive collection of database workloads and resource usages. The WhaTap monitoring service continuously collects metrics to identify bottlenecks in the database and provides data to help identify the root causes of performance issues.

For more information about other metrics, see the link.

agent_status_summary

This category collects metrics related to agent status every 10 seconds.

Fields

FieldTypeUnitDescription
inActTime-Millisecond(ms)Amount of time the agent remains inactive
isActiveBoolean-Whether the agent is active or not
isRestartBoolean-Whether the agent was restarted recently (true / false)
lastActTime-Millisecond(ms)Last time when the agent was activated (0: Disabled)
oid--Unique IDs for each agent in the project
startTime-Millisecond(ms)Timestamp when the agent was started.

db_postgresql_counter

Tags

The following lists the tags that are collected in the db_postgresql_counter category.

Tag nameDescriptionRemarks
agentIpIP address per agent.-
dbIsMultiWhether to support the multitenant or not.-
dbNameDatabase name.-
dbTypeDatabase type.-
dbVersionVersion of the database in use.-
onameAgent nameUnique value

Field

The following lists the fields that are collected in the db_postgresql_counter category.

MetricsDescriptionCategory
active_sessionsNumber of active sessions.Connection
CPUUtilizationPercentage of CPU resources being used by the database instances.Performance
FreeStorageSpaceAvailable storage space for the instance.I/O
FreeableMemoryAmount of memory that can be used immediately by the system.I/O
NetworkReceiveThroughputSpeed of network data received by the instanceTroughput
NetworkTransmitThroughputSpeed of network data transmitted by the instanceTroughput
blk_read_timeTime taken to read a data block in the backend (in milliseconds).I/O
blk_write_timeTime taken to write a data block in the backend (milliseconds).I/O
blks_hitTotal count where a disk block was found in the buffer cache and did not require a read.I/O
blks_hit_ratioPercentage of queries retrieved from the cache out of the total queries.
Calculation method: sum(blks_hit)/sum(blks_hit + blks_read) * 100
I/O
blks_readNumber of data blocks read directly from disk.I/O
buffers_checkpointNumber of buffers flushed during checkpoint.Info
check_pointNumber of cases a database checkpoint occurs.Troughput
checkpoint_write_timeTime (milliseconds) taken to write data during checkpoint.Troughput
checkpoints_timedNumber of checkpoints occurred according to the checkpoint_timeout setting.Troughput
commit_countNumber of transaction commits.Troughput
confl_bufferpinNumber of session conflicts due to buffer pin contention.Troughput
confl_deadlockNumber of deadlock conflicts.Troughput
confl_lockNumber of session conflicts due to lock contention.Troughput
confl_snapshotNumber of session conflicts due to snapshot contention.Troughput
confl_tablespaceNumber of session conflicts due to tablespace contention.Troughput
conflictsNumber of queries that were aborted because they conflicted with database recovery operations.Troughput
connect_errorNumber of database connection failures.Connection
deadlock_countTotal number of deadlocks occurred.Performance
dml_countTotal number of DML (Data Manipulation Language) operations.Troughput
fatal_countNumber of fatal errors.Info
idle_trxNumber of idle transactions.Connection
idle_trx_timeCumulative time of idle transactions.Connection
idx_scanWith this method, table records are accessed using the keys stored in index leaf blocks. This metric has the values queried every minute.Performance
index_hit_ratioPercentage of index queries retrieved from the cache.Performance
lock_wait_sessionsNumber of pending sessions.Performance
long_running_sessionsNumber of long running sessions.Performance
max_connectionsMaximum number of connections allowed. Value set in pg_settings. If this value is insufficient, the 'FATAL ERROR: too many clients already.' Errors may occur.Info
max_runtimeExecution time of the longest running query.Performance
node_stateThis database consists of replication relationships. 1: Master, 2: Slave.Info
numbackendsNumber of backend processes connected to the database.Info
oldest_cur_xidThe oldest active transaction ID.Info
per_to_emergency_autovacNumber of emergency automatic vacuum occurrences.Performance
per_to_wraparoundNumber of vacuum operations to prevent transaction ID wraparound.Performance
replication_countNumber of active replication databases (Slave).Performance
replication_delayReplication delay time. Delay time reflected in the replication database for the replication structure, this is a metric measured in the slave DB.Performance
replication_lagExtent to which replicated data lags behind the original data.Performance
rollback_countNumber of transaction rollbacks.Performance
seq_scanMethod to read records by performing a full scan on the table. Select this when there is no index or when the range to be read is wide even if an index exists. This metric has the values queried every minute.Performance
table_hit_ratioPercentage of queries retrieved from the cache.Performance
temp_bytesTotal amount of bytes written to temporary files by queries in the database.Performance
temp_filesNumber of temporary files created by queries in the database.Performance
total_runtimeTotal query execution time.Performance
total_sessionsTotal number of connected sessions.Connection
tpsNumber of processed transactions per second.Performance
tup_deletedNumber of tuples deleted by queries in the database.Troughput
tup_fetchedNumber of tuples retrieved from the database by queries.Troughput
tup_insertedNumber of tuples inserted by queries in the database.Troughput
tup_returnedNumber of tuples returned by queries in the database.Troughput
tup_updatedNumber of tuples updated by queries in the database.Troughput
uptimeDatabase uptime.Info
vacuum_running_countNumber of vacuum operations running.Performance
wait_sessionsNumber of waiting sessions.Performance
wal_countNumber of Write-Ahead Logging (WAL) records written. Spikes in the WAL file counts are usually caused by the large amount of changes in a short period or vacuum-like maintenance operations that generate temporary tables and objects. They should slowly return to normal levels. This usually results in lots of disk I/O and CPU activity, slowing the application query's execution until things get back to normal. This metric has the values queried every minute.Performance
warning_countNumber of warnings occurred.Info

XOS-related metrics

The following describes the metrics collected from the XOS agent.

MetricUnitDescription
cpu(xos)pctCPU utilization
cpu_user + cpu_sys + cpu_iowait
cpu_idle(xos)pctCPU idle time
cpu_iowait(xos)pctCPU's I/O waiting time
cpu_sys(xos)pctCPU's kernel code execution time
cpu_user(xos)pctCPU's user code execution time
disk_read_bytes(xos)byteNumber of data blocks read from disk
disk_reads(xos)blockNumber of buffers read from disk
disk_write_bytes(xos)byteNumber of pages written to disk
disk_writes(xos)blockNumber of buffers written to disk
mem(xos)pctMemory utilization
mem_available(xos)MBShared memory size
mem_buffcache(xos)MBAmount of memory used by the buff/cache
mem_free(xos)MBMemory not in use
mem_swapfree(xos)MBUnused swap space size
Because swap uses disk space, it may cause performance degradation.
mem_swaptotal(xos)MBTotal swap size
It is used when physical memory is insufficient.
mem_total(xos)MBTotal memory size
mem_used(xos)MBMemory size in use
net_recv_bytes(xos)byteAmount of data transmitted over the network
net_recv_packets(xos)countNumber of data packets transmitted over the network
net_send_bytes(xos)byteAmount of data transmitted over the network
net_send_packets(xos)countNumber of data packets transmitted over the network