Metrics alert

Home > Select Project > Alert > Event configuration > Select Metrics tab

What is the metrics event?

Metrics events are used to set more specific and complex events than basic events (application events, server events, and such). You can set events based on the metrics data being collected in real time from your projects. Depending on the usage, you can set any event by selecting one of two methods.

Metrics Event
Composite metrics event

Note

For more information about the metrics, see the following.
You can add metrics events from the Event Configuration New menu, which has a new UI for improved usability. For more information, see the following.
This feature is only available to the members with the Alert settings role. For more information about member roles, see the following.

Metrics Event

Select Metrics on the screen in Alert > Event configuration. Select Add Alert Policy on the upper right of the screen. The Metrics event window appears.

Metrics Event

Note

For more information about the event templates for metrics, see the following.

Entry of basic information

Event name: Enter the name of the event to add.
Activate events: Select whether or not to activate events.
Templates: Events can be easily set after selecting a template. If the template is not to be used, select Disabled.
Category: The unit to identify the metrics data. It is a mandatory value for setting the metrics events.
- The Category selection options display Name and data collection interval, and Key. When setting up an event, use the key value of the category.
- Category retrieves metrics data being collected from projects within the last 3 hours and displays them in a list. If the collection interval is not displayed in the Category selection options, you can select Enter it yourself option to enter a category key.
Level
- It displays the alert level when an event occurs. The levels are Critical, Warning, and Info. When setting the Critical and Warning levels, the Additional notifications when the event state is resolved. selection option is enabled.
- Additional notifications when the event state is resolved.: You can choose whether to send Additional notifications when the event state is resolved. that occurred among events. You can turn the feature on or off by selecting the toggle button.
Message
- Enter a notification message to be displayed when events occur. By entering ${Tag} or ${Field}, you can apply the variable to the message. The key to enter in the variable must be included in Category of the selected metrics data. You can see the tags or field keys that can be entered in Metrics Search.
- If the button is clicked, you can see the history of previously entered messages.
Test alert
By generating alerts based on the required items: Event name, Category, Level, and Message, this function checks the messages.
Note
- During testing, substitutions for actual metric values or variables do not work.
- The Event reception option sends test notifications only to users with the recipient tag set. If Receive all is selected, the test notifications are sent to all users.
- To use a reception test, enter or select required items in Event name, Category, Level, and Message.
Event rule

By entering the field, operator selection, and threshold value, set the event rule.
Filtering event targets

It filters the targets by entering the tag, operator selection, and filtering value. If no input, alerts are sent to all agents.

Note

For available basic syntaxes and operators in Event rule and Filtering event targets, see the following.
For the Event rule and Filtering event targets options, you can select Selector or Typing.
After the event setting is made, the option value is managed as the Typing option. Afterwards, if switched to the Selector option, the option value can be initialized.
Upon entry of event occurrence conditions and targets, an error may occur, if you enter a field name that contains special characters (~!@#$%^&*()_+=-[]`) or begins with a number. In this case, select the Typing option and then enter values enclosing with curly brackets (${}) as shown in the following example:
```
${4xxErrorType} == '401'
```

Notification setting

Notification Setting

Number of event: For the selected period, if the events set in Event rule occur as many as the input, an alert is sent.
Note
- If the selected time is set to Disabled, an alert is sent only when the events occur consecutively as many as the input.
- If the option, Additional notifications when the event state is resolved. is activated, it is recommended to select Disabled as the selected time.
- In the Category option, the collection cycle for the selected item is 5 seconds.
Event pause: This option can prevent excessive alert notifications from happening. No alerts are sent for the selected period after the first alert notification is generated. In addition, they are not recorded in Event history.
Related category: You can set the related categories up to 5 and see them when checking notifications.
Event reception tag: If this tag is selected, notifications can be sent to project members and 3rd-party plug-ins with the corresponding tags. If the event receiving tag is not selected, alerts are sent to all project members.

Note
In Alert > Notification setting, you can set the tags in project members and 3rd-party plug-ins.

Testing event rules

Alert Test

You can check how many alerts have occurred by enabling the event conditions you set for the selected time period. If your select RUN, the number of notifications occurred appears in the upper right corner. The selected field and thresholds are displayed on the chart in Event rule.

Note

For more information about Event rule, see the following.
The Testing event rules feature can test for the data for 24 hours.

Composite metrics event

To use the Composite metrics, you have to understand the following concepts:

What is the metrics?

MXQL

The Composite metrics event can generate events by using more complex rules along with the metrics data and send alerts. Composite metrics can be used effectively in the following situations:

You have to make comprehensive decisions on data received from multiple agents.
You have to compare the past data with the current ones to make judgment.

Metrics events make judgment whenever metrics are received from the agents. On the other hand, the composite metrics event stores the metrics collected from each agent into the database. Then they are reviewed to judge the event. Because of this characteristic, the data from multiple agents can be used collectively or the past data can be used. However, there is a barrier to entry that requires to use MXQL, the WhaTap's unique data query language. Therefore, event templates are provided so that users can effectively set events only if they understand the basic MXQL. Basic MXQL users can apply events by just modifying the query for event target filtering and conditions.

Select Metrics on the screen in Alert > Event configuration.
In the Composite metrics section, select Add Alert Policy on the right.
If the Composite metrics window appears, select Creating as a chart.

The Event Setting window appears.

Composite Metrics Event Setting

Note

To set a composite metrics event, you have to have the event setting role.

Query event data

Composite metrics The event creates event conditions by using MXQL, a metrics data query language. The Creating as a chart function provides a combo box function for automatic completion of MXQL. This template is used to query the event data, construct a chart, and then directly enter the event generation conditions. Select the Widget or Text option, and then configure the event.

Widget
Text

Through the option to configure the time series charts, you can autocomplete MXQL for using when setting events.

Event data inquiry

Filter: Select an event condition target. Enter values for formula, tag, and filtering values to create filtering conditions.
Group by: Select the grouped metrics data. You can select multiple items.
Time unit: Set the time criterion for dividing the grouped data. You can set it by selecting sec, Minutes, and Hour.
Field: Select fields to use as event generation conditions. You can select multiple items.

Notification

Enter basic data for alert settings.

Activate events: You can select to enable or disable the events by clicking the toggle button.
Level: Select a level among Fatal, Warning, and Info.

Additional notifications when the event state is resolved.: You can select whether to transmit Additional notifications when the event state is resolved. among events. This function can be enabled or disabled by selecting the toggle button.
Title: Enter the title of the alert.
Message: Enter a notification message to be displayed when events occur. By entering ${Tag} or ${Field} key, you can apply the variable to the message. The key to enter in the variable must be included in Category of the selected metrics data. You can see the tags or field keys that can be entered in Metrics Search.

Alert Policy

Enter the conditions to send alerts.

Time Range: Set the time range to view the MXQL real-time data for event conditions. You can use only the fields included for viewing the event data.

Composite metrics events retrieve metrics in DB for later use. Therefore, first specify the time range to query data. If you select 5 minutes for the data lookup time, the event generation conditions are checked by searching for the data collected for the last 5 minutes. You can set it short when you set any event for recent data, or long when you want to approach statistically for a wide period.

Note
For actual usage examples, see the following.
Condition: Enter the fields, calculation rules, and thresholds reflected in MXQL.

Additional information

Set additional options that are related to receiving alerts.

Interval: Check the notification conditions at the selected time interval.
Silent: This option can prevent excessive alerts from happening. No alerts are sent for the selected period after the first alert notification is generated. In addition, they are not recorded in Event history.
Event reception tag: If you select an event receiving tag, alert notifications can be sent to project members and 3rd-party plug-ins with the tag. If the event receiving tag is not selected, alerts are sent to all project members.

Note
In Alert > Notification setting, you can set the tags in project members and 3rd-party plug-ins.

Test Event Rules

Testing event rules

You can check how many alerts have occurred by enabling the event conditions you set for the selected time period. If you select Run, you can see the number of notifications, and the selected fields and thresholds are displayed on the chart when the event conditions are met.

Most of what is included in Event Setting can be specified using MXQL. It provides the function to simulate whether MXQL has been properly written. The simulation function queries the past 24-hour data to make judgment, and then informs you how many metrics were queried and how many of them are successful.

Modifying and deleting metrics events

Go to Alert > Event configuration and then select the Metrics tab.
In the event list, select at the utmost right of the item to modify or delete.
If the metrics or composite metrics event setting window appears, modify each option and then select Save.

To delete the selected event, select Delete on the upper right of the event setting window.

Guide to select generation conditions and targets

For the event generation conditions and selection of event targets on metrics alerts, use the same syntaxes. For event generation conditions, use the tag key as a variable. For selection of event targets, use the field key as a variable.

Basic syntax rules

If you just enter a string, it is recognized as a variable. If you enclose it in single quotation marks ('') or double quotation marks (""), it is recognized as text.

oid == "oid"
oid: variable
==: function
"oid": text

// In case oname is ott-1235

// Normal cases
onname = 'ott-1235' or onname = "ott-1235"

// In abnormal cases, notification does not work.
onname = ott-1235

If you just enter a number, it is recognized as number, and if you enclose it in single quotation marks ('') or double quotation marks (""), it is recognized as text.

oid == 123
oid: variable
==: function
123: number

// In case oid is 123

// Normal cases
oid = 123

// In abnormal cases, notification does not work.
id == '123' or oid == "123"

List of available operators

Operator	Usage	Description
`==`	operand1 `==` operand2	It checks whether operand1 is equal to operand2.
`!=`	operand1 `!=` operand2	It checks whether operand1 and operand2 have different values.
`>`	operand1 `>` operand2	Check whether the operand1 value is greater than the operand2 value.
`>=`	operand1 `>=` operand2	Checks whether the operand1 value is greater than or equal to the operand2 value.
`<`	operand1 `<` operand2	Check whether the operand1 value is less than the operand2 value.
`<=`	operand1 `<=` operand2	Check whether the operand1 value is less than or equal to the operand2 value.
`like`	operand1 `like` operand2	Search with patterns whether operand1 includes operand2.
`&&`	expression1 `&&` expression2	Check whether expression1 and expression2 are all `true`.
`and`	expression1 `and` expression2	Check whether expression1 and expression2 are all `true`. The operator plays the same role as &&.
`\|\|`	expression1 `\|\|` expression2	Check whether expression1 and expression2 are all `true`.
`or`	expression1 `or` expression2	Check whether expression1 and expression2 are all `true`. The operator plays the same role as \|\|.

Usage of like

You can conveniently search for embedded strings via the wildcard (*).

Searching for strings that start with a specific keyword
```
Key like "Value*"
```
Searching for strings that end with a specific keyword
```
Key like "*Value"
```
Searching for strings that include a specific keyword
```
Key like "*Value*"
```
The wildcard (*) cannot be used in the middle of keywords.
```
// Unsupported syntax
Key like "Va*lue"
```

If you omit the wildcard (*) in the like operator, it operates as equals (==).

// The following two statements have the same result.
Key like "Value"
Key == "Value"

Available functions

Method	Usage	Description
startsWith	startsWith(param1, param2)	If the value whose param1 is the key starts with param2, the result is `true`. Otherwise, the result is `false`.
endsWith	endsWith(param1, param2)	If the value whose param1 is the key ends with param2, the result is `true`. Otherwise, the result is `false`.
isNull	isNull(param1)	If param1 is `null`, the value becomes `true`. Otherwise, the value becomes `false`.
isNotNull	isNotNull(param1)	If param1 is not `null`, the value becomes `true`. Otherwise, the value becomes `false`.
isEmpty	isEmpty(param1)	If param1 is `null` or `EmptyString("")`, the value becomes `true`. Otherwise, the value becomes `false`.
isNotEmpty	isNotEmpty(param1)	If param1 is not `null` nor `EmptyString("")`, the value becomes `true`. Otherwise, the value becomes `false`.

startsWith

startsWith(Key, "Value")

endsWith

endsWith(Key, "Value")

isNull

isNull(Key)

isNotNull

isNotNull(Key)

isEmpty

isEmpty(Key)

isNotEmpty

isNotEmpty(Key)

Template

Metrics Event Templates

Note

For more information about the reason field in the Kubernetes event, see Kubernetes official documentation.

BackOff

This notification is sent when BackOff is displayed 0 or more times in the reason field for the Kubernetes event. The example of Message is as follows:

Kubernetes Event (Kube Event ${message})

Evicted

This notification is sent when Evicted is displayed 0 or more times in the reason field for the Kubernetes event. The example of Message is as follows:

Kubernetes Event (Kube Event ${message})

FailedCreatePodSandBox

This notification is sent when FailedCreatePodSandBox is displayed 0 or more times in the reason field of the Kubernetes event. The example of Message is as follows:

Kubernetes Event (Kube Event ${message})

FailedMount

This notification is sent when FailedMount is displayed 0 or more times in the reason field of the Kubernetes event. The example of Message is as follows:

Kubernetes Event (Kube Event ${message})

FailedScheduling

This notification is sent when FailedScheduling is displayed 0 or more times in the reason field of the Kubernetes event. The example of Message is as follows:

Kubernetes Event (Kube Event ${message})

FailedSync

This notification is sent when FailedSync is displayed 0 or more times in the reason field of the Kubernetes event. The example of Message is as follows:

Kubernetes Event (Kube Event ${message})

NodeNotReady

This notification is sent when NodeNotReady is displayed 0 or more times in the reason field for the Kubernetes event. The example of Message is as follows:

Kubernetes Event (Kube Event ${message})

Unhealthy

This notification is sent when Unhealthy is displayed 0 or more times in the reason field for the Kubernetes event. The example of Message is as follows:

Kubernetes Event (Kube Event ${message})

Notification of utilization based on container CPU quota

The alert occurs when the total CPU usage (${cpu_per_quota}) based on the container's CPU limit is 70% or more. The example of Message is as follows:

The CPU utilization of container ${oname} in ${okindName} is high, ${cpu_per_quota}% >= 70%.

Container memory fail count

This notification is sent when the container memory limit is reached once or more times. The example of Message is as follows:

Because the ${oname} container of ${okindName} exceeded the limit, the ${mem_failcnt} increased.

Container memory utilization

The alert occurs when the usage (${container.mem_percent}) based on the container's memory limit is 90% or more. The example of Message is as follows:

The memory usage of ${oname} container in ${okindName} is ${container.mem_percent}% >= 90%.

Container DEAD status notification

This notification is sent when the container status code is 100. The status code, 100 means DEAD. The example of Message is as follows:

Container ${oname} is in DEAD state.

Cluster CPU request notification

This notification is sent when the value of the CPU allocatable to nodes divided by the total Limit CPU amount and multiplied by 100 is 80% or more. The example of Message is as follows:

The CPU request (minimum required resource) is more than 80% of the cluster CPU allocation.

Cluster memory request notification

The corresponding alert occurs when the available amount of memory for node allocation divided by the total Limit Memory and multiplied by 100 is 80% or more. The example of Message is as follows:

Memory Request (minimum required resource) is more than 80% of the cluster memory allocation.

Cluster CPU request notification

This notification is sent when the value of the CPU allocatable to nodes divided by the total Limit CPU amount and multiplied by 100 is 60% or more. The example of Message is as follows:

The CPU request (minimum required resource) is more than 60% of the cluster CPU allocation.

Cluster memory request notification

The corresponding alert occurs when the available amount of memory for node allocation divided by the total Limit Memory and multiplied by 100 is 60% or more. The example of Message is as follows:

Memory Request (minimum required resource) is more than 60% of the cluster memory allocation.

Notification of the number of master pods

This alert occurs when the Pods allocable to nodes do not exist. The example of Message is as follows:

The number of Pods allocable to the master is 0.

Node CPU utilization notification

This notification is sent when the node's CPU utilization (${cpu}) is 70% or more. The example of Message is as follows:

The CPU usage of ${oname} is ${cpu}% >= 70%.

Node memory utilization notification

This notification is sent when the node's memory utilization (${memory_pused}) is 90% or more. The example of Message is as follows:

The memory usage of ${oname} is ${memory_pused}% >= 90%.

Unassignable node notification

The corresponding alert occurs when the available number of Pods that can be assigned to a node is zero.

APDEX

This notification is triggered when transactions exist and the APDEX score is lower than 0.7. The example of Message is as follows:

This alert occurs when the available number of Pods that can be assigned to a node is zero. The example of Message is as follows:

APDEX is lower than 0.7 (${oname})

Composite Metrics Event Templates

Inactive agents has been found

An alert occurs when the number of active agents is less than the specified value. The example of Message is as follows:

${ip} ${okindName} The number of active agents has decreased to ${num_of_current_agents}.

TPS has changed by more than 30% compared to the previous week

An alert occurs when the application's TPS changes by more than 30% compared to that of the previous week. The example of message is as follows:

${okindName} a week ago : ${prev_week_tps_display}, current : ${current_tps_display}, difference : ${one_week_diff_display}

Very slow active transactions detected

An alert occurs when the number of transactions that exceed 8 seconds in the application exceeds 10 on average. The example of message is as follows:

${okindName} ${very_slow_tx_cnt_m5_avg_display} active transactions performed for more than 8 seconds were detected.

APDEX score dropped

A notification is triggered when the APDEX score falls below 70. The example of Message is as follows:

The average apdex of ${pname} in the last 5 seconds is ${apdex_display}

CPU % is too high

A notification is sent when the CPU utilization of the node exceeds 80%. The example of message is as follows:

CPU utilization rate of the ${oname} in the last minute > ${_rule_} %

CPU User % is too high

An alert occurs when the user's CPU utilization exceeds 50%. The example of Message is as follows:

CPU User utilization rate of the ${oname} in the last minute > ${_rule_} %

The number of agents with high CPU SYS % is too large

An alert occurs when the system's CPU utilization exceeds 50%. The example of message is as follows:

The number of agents with a CPU SYS of 70% or more in the last minute > ${_rule_} %

The Disk I/O is too high

A notification is sent when the disk I/O utilization exceeds 10%. The example of Message is as follows:

In the last minute, ${oname}'s Disk I/O > ${_rule_} %

The Disk Used % is too high

A notification is sent when the file system utilization exceeds 90%. The example of Message is as follows:

In the last minute, ${oname}'s Disk Used > ${_rule_} %

Network Traffic I/O is too high

An alert occurs when the network inbound/outbound traffic exceeds 10%. The example of Message is as follows:

In the last minute, ${oname}'s Network Traffic I/O > ${_rule_} %

Network Packet I/O is too high

An alert occurs when the network inbound/outbound packets exceed 10%. The example of Message is as follows:

In the last minute, ${oname}'s Network Packet I/O > ${_rule_} %

Network Error I/O is too high

An alert occurs when the network inbound/outbound errors exceed 10%. The example of Message is as follows:

In the last minute, The maximum value of the ${oname}'s Network Error I/O > ${_rule_} %

The kube-apiserver latency over 10 second

A notification is sent when the latency of kube-apiserver among the control plane components exceeds 10 seconds. However, the WATCH operation is excluded. The example of Message is as follows:

Latency of the ${verb} verb in ${instance} of kube-apiserver exceeded ${metricValue} seconds.

The kube-apiserver response increase/decrease rate for error codes

Among control plane components, an alert occurs when the number of error responses from kube-apiserver exceeds 50 and the increase/decrease rate changes by more than 50%. The example of Message is as follows:

Rate of increase in the number of requests for code ${code} on instance ${instance} of kube-apiserver exceeded ${metricValue}.

What is the metrics event?​

Metrics Event​

Entry of basic information​

Notification setting​

Testing event rules​

Composite metrics event​

Query event data​

Notification​

Alert Policy​

Additional information​

Test Event Rules​

Modifying and deleting metrics events​

Guide to select generation conditions and targets​

Basic syntax rules​

List of available operators​

Usage of like​

Available functions​

startsWith​

endsWith​

isNull​

isNotNull​

isEmpty​

isNotEmpty​

Template​

Metrics Event Templates​

BackOff​

Evicted​

FailedCreatePodSandBox​

FailedMount​

FailedScheduling​

FailedSync​

NodeNotReady​

Unhealthy​

Notification of utilization based on container CPU quota​

Container memory fail count​

Container memory utilization​

Container DEAD status notification​

Cluster CPU request notification​

Cluster memory request notification​

Cluster CPU request notification​

Cluster memory request notification​

Notification of the number of master pods​

Node CPU utilization notification​

Node memory utilization notification​

Unassignable node notification​

APDEX​

Composite Metrics Event Templates​

Inactive agents has been found​

TPS has changed by more than 30% compared to the previous week​

Very slow active transactions detected​

APDEX score dropped​

CPU % is too high​

CPU User % is too high​

The number of agents with high CPU SYS % is too large​

The Disk I/O is too high​

The Disk Used % is too high​

Network Traffic I/O is too high​

Network Packet I/O is too high​

Network Error I/O is too high​

The kube-apiserver latency over 10 second​

The kube-apiserver response increase/decrease rate for error codes​

What is the metrics event?

Metrics Event

Entry of basic information

Notification setting

Testing event rules

Composite metrics event

Query event data

Notification

Alert Policy

Additional information

Test Event Rules

Modifying and deleting metrics events

Guide to select generation conditions and targets

Basic syntax rules

List of available operators

Usage of like

Available functions

startsWith

endsWith

isNull

isNotNull

isEmpty

isNotEmpty

Template

Metrics Event Templates

BackOff

Evicted

FailedCreatePodSandBox

FailedMount

FailedScheduling

FailedSync

NodeNotReady

Unhealthy

Notification of utilization based on container CPU quota

Container memory fail count

Container memory utilization

Container DEAD status notification

Cluster CPU request notification

Cluster memory request notification

Cluster CPU request notification

Cluster memory request notification

Notification of the number of master pods

Node CPU utilization notification

Node memory utilization notification

Unassignable node notification

APDEX

Composite Metrics Event Templates

Inactive agents has been found

TPS has changed by more than 30% compared to the previous week

Very slow active transactions detected

APDEX score dropped

CPU % is too high

CPU User % is too high

The number of agents with high CPU SYS % is too large

The Disk I/O is too high

The Disk Used % is too high

Network Traffic I/O is too high

Network Packet I/O is too high

Network Error I/O is too high

The kube-apiserver latency over 10 second

The kube-apiserver response increase/decrease rate for error codes