KAFKA Monitoring

Note

Feature Project

WhaTap newly introduced Feature Project (Features) so that users can easily check whether the desired software supports monitoring and easily start monitoring. The Feature Project provides an optimized monitoring solution by repackaging existing monitoring products to fit the software to monitor. Manage your services for stability and performance more effectively through the WhaTap's Feature Project.

KAFKA is a distributed streaming platform optimized for processing, storing, and delivering real-time data feeds. KAFKA allows you to trace anomalies in data streams. Because KAFKA is a distributed system, it is required to check the status of each broker, topic, partition, producer, and consumer in more detail through monitoring.

WhaTap provides the KAFKA monitoring with a new Feature Project method. It traces the overall performance and health of the KAFKA cluster to help you find and resolve issues early.

Metrics Monitoring: It monitors various metrics of Kafka brokers, topics, partitions, producers, and consumers in real time.
Custom Dashboard: You can view the metrics efficiently through this dashboard.
Alert Settings: Alerts are sent in real time when the set conditions are detected, allowing users to quickly respond to the problems.

Effectively manage KAFKA’s complex operating environments through WhaTap's KAFKA monitoring.

Installation

The following guides you to the basic installation method for using the WhaTap KFKA monitoring service.

To use the WhaTap monitoring service, after Sign up, create a project and then install the agent to the target server. For more information about membership registration, see the following.

Note

Supported environment

Before installing the WhaTap KAFKA monitoring agent, check the support environment.

Kafka: Apache Kafka 3.x or later
Ubuntu: Ubuntu 12.04 or later
OS: Red Hat 6 or equivalent (CentOS, Rocky Linux, Amazon Linux)
OS Architecture: Amd64/X86_64, Arm64/Aarch64

Creating a Feature Project

Create a feature project (Feature) before installing the agent.

Log in WhaTap monitoring service.
To create a project, on the left of the screen, select All Projects > + Project.
On the Select product screen, select the feature product to install in the project.
Configure the settings for Project name, Data server region, and Time zone.
In Notification language setting, select the language for alert messages.
After all settings are finished, select Creating a project.

Installing the KAFKA agent

After creating a Feature Project (Features), the KAFKA Agent Installation screen appears. Proceed with installation according to the following instructions.

Check the project access key.

Project access key is the unique ID for activating the WhaTap services. Select Getting the access key.

Create the installation script.

Execute the following command to automatically check the user environment and create an installation script on the server where KAFKA has been installed.

curl http://repo.whatap.io/telegraf/feature/kafka/install_kafka_monitoring.sh -o install_kafka_monitoring.sh

Execute the following command to install the KAFKA agent.

chmod +x install_kafka_monitoring.sh
sudo ./install_kafka_monitoring.sh "x604pf485d1kk-z6q14nuc509pk3-x39moealrfodum" "13.124.11.223/13.209.172.35"

After configuring and restarting the agent, start monitoring.

To configure the Jolokia agent in KAFKA, execute the following command and then restart it.

#cd {kafka home directory}/bin
sed -i '/^#!/a export KAFKA_OPTS='\''-javaagent:/usr/whatap/infra/feature/jolokia-agent-jvm-2.0.1-javaagent.jar=port=8778,host=127.0.0.1'\''' kafka-server-start.sh
./kafka-server-stop.sh
./kafka-server-start.sh

Learn about the main features

Custom dashboard

Home > Select Feature Project (Feature) > KAFKA > _ KAFKA Dashboard_

WhaTap's KAFKA Monitoring monitors various metrics for KAFKA brokers, topics, partitions, producers, and consumers in real time so that you can see the overall status of the KAFFKA cluster at a glance. You can check each metric regularly to keep your cluster in optimal health.

KAFKA Custom Dashboard provided by WhaTap consists of the following three presets:

Kafka Default

This basic dashboard allows you to check whether the cluster features are working properly. It monitors the overall status of the KAFKA cluster in real time and helps detect performance degradation or any failures in advance.
Guide to dashboard metrics
Kafka Overview
- Active Controller Count: Number of controllers that are active. The controllers manage the partition leaders and replication in the KAFKA cluster. Generally there is one active controller.
- Brokers Online: It is the number of brokers that are online. A broker plays a role for storing messages and processing client requests.
- Online Partitions: It is the number of partitions that are online. Each topic is divided into multiple partitions, and each partition is stored across multiple brokers.
- Preferred Replica Imbalance: It indicates the imbalance in preferred replicas. A high value indicates that replicas are concentrated in specific brokers.
- Under Replicated Partitions: It is the number of non-replicated partitions. If this value is not zero, there is a risk of data loss.
- Leader Count per Broker: It indicates the number of leader partitions for each broker. The leader partition directly handles client requests.
- Unclean Leader Elections: It is the abnormal number of leader elections. A high value indicates that there are cluster stability issues.
- Under min ISR Partitions: It is the number of partitions below the minimum synchronized replicas (ISR). ISR ensures the data durability.
- Offline Partitions Count: It is the number of partitions that are offline. When a partition is offline, you cannot access the data on that partition.
- Broker Network Throughput BytesOutPerSec/BytesInPerSec: It indicates the broker's network throughput. It graphs the numbers of bytes sent/received per second.
Java Virtual Machine
- CPU Usage: It displays the CPU utilization of the JVM. A high CPU utilization may affect the performance.
- JVM Memory Used: It indicates how much memory the JVM is using. It displays a pattern of periodic memory increase and decrease after Garbage Collection (GC).
- Time Spent in GC: It indicates the time spent in Garbage Collection (GC). Increased GC time can lead to poor performance.
Linux Disk I/O
- Linux Disk Read Bytes: It indicates the number of read bytes on the disk. A high disk read/write volume can cause I/O bottleneck.
- Linux Disk Write Bytes: It indicates the number of disk write bytes. As in reads, a high write volume can affect the performance.
Kafka Request

With this dashboard, you can check the throughput related to consumers and producers. It monitors various performance metrics of the KAFKA cluster in real time to check the system health and help detect potential problems early as possible.
Guide to dashboard metrics
Processing Performance
- Messages In: It indicates the number of messages received per second across the entire cluster.
- Messages In per Broker: It indicates the number of messages received per second for each broker.
- Bytes In*: It indicates the number of bytes received per second across the entire cluster.
- Bytes In per Broker: It indicates the number of bytes received per second for each broker.
- Request Queue Size: It indicates the size of the queue where unprocessed requests are queued. A larger queue size can lead to a longer response time.
- Response Queue Size: It indicates the size of the queue where requests waiting for a response are queued. A large queue size can cause a delay.
- Network Processor Avg Usage Percent: It indicates the average utilization of the network processor. It displays the load of network tasks.
- Request Handler Avg Percent: It indicates the average utilization of the request handler. It displays the load for request processing.
Requests
- Produce Request Per Sec: It indicates the number of producer requests generated per second. A high value indicates a lot of data is being sent to the broker.
- Consume Fetch Request Per Sec: It indicates the number of consumer fetch requests generated per second. A high value indicates a lot of data is being sent to the consumer.
- Broker Fetch Request Per Sec: It indicates the number of fetch requests between brokers. It displays internal traffic for data replication.
- All Request Per Sec Across All Brokers: It indicates the number of requests per second processed by all brokers. It displays the request load across the entire cluster.
Errors
- Errors Per Sec: It indicates the number of errors occurring per second. If there are many errors, it indicates that there may be system stability issues.
Offset Commit
- Offset Commit Request Per Sec: It indicates the number of offset requests committed per second. The offset commit displays the position where the consumer group processed the message.
Metadata
- Metadata Request Per Sec: It indicates the number of metadata requests per second. A metadata request indicates a request for the broker and partition information.
Topic
- IsrShrinks per Sec: It indicates the number of synchronized replica (ISR) shrink events per second. As the ISR scales down, there may be data durability issues.
- IsrExpands per Sec: It indicates the number of ISR expansion events per second. Data durability can be improved as the ISR expands.
- Log size per Topic: It indicates the log size for each topic. It displays the size of topic data.
- Log size per Broker: It indicates the log size for each broker. It displays the size of data stored in the broker.
Consume Lag
- Lag: It indicates a delay in message consumption. This occurs when the consumer reads messages slower than the producer writes.
Kafka Broker

Using this dashboard, you can check the broker performance. It monitors the performance of the KAFKA cluster in real time and helps detect potential problems early as possible.
Guide to dashboard metrics
Broker Performance - Produce
- Producer - RequestQueueTimeMs: It displays the time while the producer request is waiting in the queue, in milliseconds (ms). A long queue waiting can cause performance degradation.
- Producer - LocalTimeMs: It displays the time, in milliseconds, that the producer request is processed by the local broker.
- Producer - RemoteTimeMs: It displays the time in milliseconds that a producer request is sent to and processed by the remote broker.
- Producer - ResponseQueueTimeMs: It displays the time, in milliseconds, that a producer request's response waits in the queue.
- Producer - ResponseSendTimeMs: It displays the time, in milliseconds, that the response from the producer request is sent to the client.
Broker Performance - Consume
- Consumer - RequestQueueTimeMs: It displays the time, in milliseconds, that consumer requests wait in the queue.
- Consumer - LocalTimeMs: It displays the time, in milliseconds, that a consumer request is processed by the local broker.
- Consumer - RemoteTimeMs: It displays the time in milliseconds that a consumer request is sent to and processed by the remote broker.
- Consumer - ResponseQueueTimeMs: It displays the time, in milliseconds, that the response of a consumer request waits in the queue.
- Consumer - ResponseSendTimeMs: It displays the time, in milliseconds, that the response to a consumer request is sent to the client.
Broker Performance - Fetch Follower
- FetchFollower - RequestQueueTimeMs: It displays the time, in milliseconds, that the follower broker's fetch request waits in the queue.
- FetchFollower - LocalTimeMs: It displays the time, in milliseconds, that a follower broker's fetch request is processed by the local broker.
- FetchFollower - RemoteTimeMs: It displays the time, in milliseconds, that the follower broker's fetch request is sent to and processed by the remote broker.
- FetchFollower - ResponseQueueTimeMs: It displays the time, in milliseconds, that the response of the follower broker's fetch request waits in the queue.
- FetchFollower - ResponseSendTimeMs*: It displays the time, in milliseconds, that the response to the follower broker's fetch request is sent to the client.
ZooKeeper
- ZooKeeper Request Latency: It displays the delay times of ZooKeeper requests. A high latency may indicate performance degradation in ZooKeeper.
- ZooKeeper Connections per sec: It displays the number of connections to ZooKeeper per second.
- ZooKeeper Expired Connections per sec: It displays the number of expired ZooKeeper connections per second. Connection expiration can occur due to session timeout.
- ZooKeeper auth failures per sec: It displays the number of ZooKeeper authentication failures per second. If there are many authentication failures, you may suspect security issues or configuration errors.
- ZooKeeper disconnects per sec: It displays the number of ZooKeeper disconnections per second. If connection drops frequently, you may suspect network issues or performance issues of the ZooKeeper server.

Alert

Home > Select Feature Project (Feature) > Alert > Event Configuration

The WhaTap KAFKA monitoring provides the Composite Metrics notifications by default.

Note

For more information about composite metrics events, see the following.

Installation​

Creating a Feature Project​

Installing the KAFKA agent​

Learn about the main features​

Custom dashboard​

Kafka Overview​

Java Virtual Machine​

Linux Disk I/O​

Processing Performance​

Requests​

Errors​

Offset Commit​

Metadata​

Topic​

Consume Lag​

Broker Performance - Produce​

Broker Performance - Consume​

Broker Performance - Fetch Follower​

ZooKeeper​

Alert​

Installation

Creating a Feature Project

Installing the KAFKA agent

Learn about the main features

Custom dashboard

Kafka Overview

Java Virtual Machine

Linux Disk I/O

Processing Performance

Requests

Errors

Offset Commit

Metadata

Topic

Consume Lag

Broker Performance - Produce

Broker Performance - Consume

Broker Performance - Fetch Follower

ZooKeeper

Alert