Metrics alert

Home > Select Project > Sitemap > Alert > Event Configuration New

The following explains how to create and manage event notifications based on metrics. Metric events are used to configure the events more specific and complex than the Default template. It provides the feature that allows you to define various events occurring in the monitoring targets according to conditions and set them to receive notifications efficiently.

Note

For more information about the metrics, see the following.
It can be used by only the members with the Alert settings role. For more information about the member roles, see the following.

Basic screen guide

In Event Configuration New, select the Metrics tab. You can view the list of metrics events added by users, and add, edit, or delete the metrics events.

Event configuration

The major elements of the list of metrics events are as follows:

Activation: It is a toggle button that controls whether to enable or disable each event. You can turn on or off the notifications of the desired events.
Edit: You can modify the event rules. You can update the rules to set notifications more precisely.
Event title: It is the event name. Event names make it easy to distinguish which events occurred. When an event occurs and a notification is received, it is displayed with the title of the message.
Rule: It includes event conditions for triggering a notification. Each color represents the Critical or Warning level.
Target: You can receive notifications based on events occurring across all or specific monitoring targets.
Number of event: A notification is received when the set number of events occurs within the specified time.
Resolved notification: It indicates whether or not to receive notifications when an event in Critical and Warning level returns to normal status (RECOVERED).
Event reception: It indicates the team or user group that receives notifications.

Note

For more information about the Notification Message Settings feature, see the following.
You can edit or share metrics events as JSON files. For more information, see the following document.
For more information about the JSON Batch Edit and JSON Batch Download features that allow you to share the entire event settings via JSON files, see the following.

Adding event

To add a metrics event, select the Add Alert Policy button on the upper right of the screen. When the Add Event screen appears, proceed with configuration by following the steps below:

Define event condition: Set the event conditions to receive notifications.
Select event target: Select the monitoring targets to generate events.
Basic information and notification setting: Set the event name, message, and recipients.

Define event condition

You can set event conditions to receive notifications.

Templates

The event templates tailored to your monitoring platform are provided for easy and quick event configuration. Select a desired template. Then the remaining fields are automatically filled in with predefined event conditions. If a template is not to be used, select Disabled.

Note

After selecting a template, the predefined event conditions can be modified.
Depending on the monitoring platform, the provided templates may differ. For more information about the provided template list, see the following.

Select Category

It is the unit that distinguishes the metrics data. It is required for defining event conditions.

Select Category

In the Select Category list, Name, data collection interval (e.g. 1 h, 5 min), and Key are displayed. It lists and displays metrics data being collected for projects within the last 3 hours. If the category name does not appear in the Category selection list, you can enter the category key by selecting the Enter it yourself option.

Note

You must select the Category entry to proceed with Indicatorsettings.
The category information can be checked in Analysis > Metrics Search.

Indicatorsettings

You can select a notification level and set the threshold for events that trigger a notification.

Indicator settings

Notification levels are divided into Critical (Critical), Warning (Warning), and Normal (Info).
Selector: Set the thresholds after selecting the field name and operators and then entering values. After clicking a field, you can see the list of fields included in the metric category.

Add: You can add the metric settings. You can select &&(and) or ||(or) as the condition.
Typing: Set the event occurrence conditions by directly entering the field name, operators, and values. If you select this option, the fields (Autocomplete by selecting a field) that can be entered are provided.

Tip

The following example is an event condition that triggers a notification when the write_time exceeds 3,000 milliseconds, or the read_time is less than 3,000 milliseconds, and the io_time is less than 10,000 milliseconds.

example

(write_time > 3000 || read_time > 3000) && io_time < 10000

Note

If the Selector option is selected, the field list appears only when Select Category is set.
For the operators that can be selected in the Selector option, see the following.
An error may occur if you enter a field name that includes special characters (~!@#$%^&*()_+=-[]`) or starts with a number. In this case, select the Typing option and then enter the value by enclosing it in curly brackets (${}\) as in the following example.
```
${4xxErrorType} == '401'
```
When the input method is changed between Selector and Typing options, the edited result disappears.

Number of event

A notification is sent when the specified number of events occur during the selected period.

Consecutive: A notification can be received when the specified number of events occur consecutively.
In the last: A notification can be received when the specified number of events occur consecutively during the selected period.

Note

The value, Interval indicates the data collection interval for the selected category.

Pause

This option can prevent excessive alerts from happening. Alert notifications are not sent for the selected period after the first alert occurs.

Pause alert

Note

If you have enabled the Resolved notification feature, no notification is sent for the selected period after receiving a normal (RECOVERED) notification.

Resolved notification

If the Resolved notification option is activated, it is displayed as the event of Event History in the In progress menu. When an event of Critical or Warning level is resolved, normal (RECOVERED) notifications are received. This function can be enabled or disabled by selecting the toggle button.

Resolved notification

Note

This option appears when Critical or Warning is selected for the notification level.
For more information about Event History, see the following.

Simulation

You can test the event conditions set in Indicatorsettings. Select Simulation.

Simulation

The number of notifications that occurred during the queried time appears at the upper right corner.
The red dotted line is the result set in Indicatorsettings.

Note

For more information about Indicatorsettings, see the following.
The Simulation feature can test for the data for 24 hours.
For more information on how to use the time selector, see the following.

Select event target

Select monitoring targets where events occur. If not entered, events are monitored and notifications are sent across the entire project. This may result in lots of notifications. Set specific targets to avoid excessive notifications.

Select target

Event targets can be specified based on the following tags (Tag). The tag (Tag) is the data containing unique information to distinguish the targets to collect. You can use values such as IP, Oname, and host data that have little change history.

Selector: You can specify targets by selecting tag names and operators and entering values.

Add: You can add targets for the event. You can select &&(and) or ||(or) as the condition.
Typing: You can specify targets by directly entering the tag names, operators, and values. If you select this option, the tags (Autocomplete by selecting a tag) that can be entered are provided.

Tip

Set event targets by noting the following example.

ex. endsWith(okindName, 'example_name') && container == 'prod.billing'
ex. ${4xxErrorType} == '401'

Note

Depending on the value selected in Select Category of the Define event condition section, available tags may differ.
If the event target is changed, the number of notifications may also change. Run the Simulation button again to see the result.
For more information about the selection of event targets and use of operators, see the following.

Basic information and notification setting

Set the event name, message, and recipients.

Basic information

Activate events: You can enable the current events.
Event title: This is used as the title of the notification message. Enter a name that is easy to identify.
Message: Enter the notification message to deliver to users. You can use variables to contain dynamic data.
- By entering ${Tag} or ${Field}, you can apply the variable to the message. The variable must be a value contained in the selected metrics data Category. You can check the ${Tag} and ${Field} variables that can be entered in the Metrics Search menu.
- If the button is clicked, you can see the history of previously entered messages.
Tip
You can write a message by entering the ${Tag} and ${Field} variables in the message field.
In Analysis > Metrics Search, select Category and then check the ${Tag} and ${Field} variables that can be entered. For the Category name of the current event template, see the Category in the following

Note
For the ${Tag} and ${Field} variables that can be entered in the message, see the following.
Test alert: When receiving notifications as events, you can pre-check the specified event name and message. Testing is possible only when the required items (Indicatorsettings, Event title, and Message) are all entered.
Note
- During testing, substitutions for actual metric values or variables do not work.
- Test notifications are sent only to users with the recipient tag set in the Event reception option. If you select Receive all, a test notification is sent to all users.
- To use this feature, enter or select a value for the required field (*).
Event reception: You can select members to receive notifications from current events.
- Receive all: You can send notifications to all the members in the project.
- Receive selected tags: It sends notifications to project members and 3rd-party plugins with the selected tags. When Reception tag appears, click Add tag or to select a desired tag from the tag list.
Note
In Alert > Notifications, you can set the tags in project members and 3rd-party plug-ins. For more information, see the following.

Modifying and deleting the event

Go to Alert > Event Configuration and then select the Metrics tab.
In the event list, select on the utmost left of the item to edit or delete.
If the metrics event setting window appears, modify each option and then select Save.

To delete the selected event, select Delete on the upper right of the Event configuration window.

Modifying to JSON format

You can modify metrics event settings in JSON format.

On the upper right of the screen, select JSON .
When the editing window appears, modify the content in JSON format.
After all changes are made, select Save on the upper right of the screen.

Note

If the modified content does not match the JSON format, an error message appears at the bottom of the screen and the changes cannot be saved. The displayed error message may differ depending on the format.

JSON error

The JSON data structure is as follows:

[
  {
    "eventMessage": "APDEX: ${apdex100} > 10 ",
    "select": "",
    "receiver": [],
    "alertLabel": [
      "project"
    ],
    "rule": "apdex100 > 10",
    "silentSec": 0,
    "alertKey": [
      "project"
    ],
    "enabled": false,
    "eventTitle": "APDEX",
    "repeatDuration": 0,
    "eventLevelText": "Critical",
    "id": "z3f41ge464magg",
    "category": "app_counter_project{m5}",
    "repeatCount": 0,
    "stateful": false
  }
]

The fields in JSON data are associated with the following options in event settings.

JSON field	Option	Classification
`eventMessage`	Message	Basic information and notification setting
`select`	Select target > Typing	Select event target
`receiver`	List of key values in the Event reception > Reception tag option	Basic information and notification setting
`alertLabel`	Primary key value used internally when performing notification operations	-
`rule`	Event conditions of the Indicator</br>settings option	Define event condition
`silentSec`	Pause	Define event condition
`alertKey`	Primary key value used internally when performing notification operations	-
`enabled`	Activate events	Basic information and notification setting
`eventTitle`	Event title	Basic information and notification setting
`repeatDuration`	Time selected in the Number of event option	Define event condition
`eventLevelText`	Event level of the Indicator</br>settings option	Define event condition
`id`	Unique identifier value of the event	-
`category`	Select Category	Define event condition
`repeatCount`	Number selected in the Number of event option	Define event condition
`stateful`	Resolved notification	Define event condition

You can save the Metrics event settings as a JSON file to share them with other users or import them from other users.

Export

On the upper right of the screen, select JSON .
If the JSON edition window appears, select Export.
Once the JSON file has been downloaded, forward it to others for sharing.

Note

The format of the JSON file name is event-rules-YYYY-MM-DD.json.
After searching for events, you can use the export feature to download only the searched list as a JSON file.

Import

On the upper right of the screen, select .
Using the Export function, select a JSON file to download.
If the JSON edition window appears, select Add to list or Overwrite.

Caution

It is recommended to use this function for the products of the same type. You can import event settings from the projects for other products, but it does not work.

Searching event

You can search the event by event name or metric in the event list. Enter a string in the search field and then select .

Note

After searching for events, you can use the export feature to download only the searched list as a JSON file. For more information about the export feature, see the following.

Guide to select generation conditions and targets

For the event generation conditions and selection of event targets on metrics alerts, use the same syntaxes. For event generation conditions, use the tag key as a variable. For selection of event targets, use the field key as a variable.

Basic syntax rules

If you just enter a string, it is recognized as a variable. If you enclose it in single quotation marks ('') or double quotation marks (""), it is recognized as text.

oid == "oid"
oid: variable
==: function
"oid": text

// In case oname is ott-1235

// Normal cases
onname = 'ott-1235' or onname = "ott-1235"

// In abnormal cases, notification does not work.
onname = ott-1235

If you just enter a number, it is recognized as number, and if you enclose it in single quotation marks ('') or double quotation marks (""), it is recognized as text.

oid == 123
oid: variable
==: function
123: number

// In case oid is 123

// Normal cases
oid = 123

// In abnormal cases, notification does not work.
id == '123' or oid == "123"

List of available operators

Operator	Usage	Description
`==`	operand1 `==` operand2	It checks whether operand1 is equal to operand2.
`!=`	operand1 `!=` operand2	It checks whether operand1 and operand2 have different values.
`>`	operand1 `>` operand2	Check whether the operand1 value is greater than the operand2 value.
`>=`	operand1 `>=` operand2	Checks whether the operand1 value is greater than or equal to the operand2 value.
`<`	operand1 `<` operand2	Check whether the operand1 value is less than the operand2 value.
`<=`	operand1 `<=` operand2	Check whether the operand1 value is less than or equal to the operand2 value.
`like`	operand1 `like` operand2	Search with patterns whether operand1 includes operand2.
`&&`	expression1 `&&` expression2	Check whether expression1 and expression2 are all `true`.
`and`	expression1 `and` expression2	Check whether expression1 and expression2 are all `true`. The operator plays the same role as &&.
`\|\|`	expression1 `\|\|` expression2	Check whether expression1 and expression2 are all `true`.
`or`	expression1 `or` expression2	Check whether expression1 and expression2 are all `true`. The operator plays the same role as \|\|.

Usage of like

You can conveniently search for embedded strings via the wildcard (*).

Searching for strings that start with a specific keyword
```
Key like "Value*"
```
Searching for strings that end with a specific keyword
```
Key like "*Value"
```
Searching for strings that include a specific keyword
```
Key like "*Value*"
```
The wildcard (*) cannot be used in the middle of keywords.
```
// Unsupported syntax
Key like "Va*lue"
```

If you omit the wildcard (*) in the like operator, it operates as equals (==).

// The following two statements have the same result.
Key like "Value"
Key == "Value"

Available functions

Method	Usage	Description
startsWith	startsWith(param1, param2)	If the value whose param1 is the key starts with param2, the result is `true`. Otherwise, the result is `false`.
endsWith	endsWith(param1, param2)	If the value whose param1 is the key ends with param2, the result is `true`. Otherwise, the result is `false`.
isNull	isNull(param1)	If param1 is `null`, the value becomes `true`. Otherwise, the value becomes `false`.
isNotNull	isNotNull(param1)	If param1 is not `null`, the value becomes `true`. Otherwise, the value becomes `false`.
isEmpty	isEmpty(param1)	If param1 is `null` or `EmptyString("")`, the value becomes `true`. Otherwise, the value becomes `false`.
isNotEmpty	isNotEmpty(param1)	If param1 is not `null` nor `EmptyString("")`, the value becomes `true`. Otherwise, the value becomes `false`.

startsWith

startsWith(Key, "Value")

endsWith

endsWith(Key, "Value")

isNull

isNull(Key)

isNotNull

isNotNull(Key)

isEmpty

isEmpty(Key)

isNotEmpty

isNotEmpty(Key)

Template

Metrics Event Templates

Note

For more information about the reason field in the Kubernetes event, see Kubernetes official documentation.

BackOff

This notification is sent when BackOff is displayed 0 or more times in the reason field for the Kubernetes event. The example of Message is as follows:

Kubernetes Event (Kube Event ${message})

Evicted

This notification is sent when Evicted is displayed 0 or more times in the reason field for the Kubernetes event. The example of Message is as follows:

Kubernetes Event (Kube Event ${message})

FailedCreatePodSandBox

This notification is sent when FailedCreatePodSandBox is displayed 0 or more times in the reason field of the Kubernetes event. The example of Message is as follows:

Kubernetes Event (Kube Event ${message})

FailedMount

This notification is sent when FailedMount is displayed 0 or more times in the reason field of the Kubernetes event. The example of Message is as follows:

Kubernetes Event (Kube Event ${message})

FailedScheduling

This notification is sent when FailedScheduling is displayed 0 or more times in the reason field of the Kubernetes event. The example of Message is as follows:

Kubernetes Event (Kube Event ${message})

FailedSync

This notification is sent when FailedSync is displayed 0 or more times in the reason field of the Kubernetes event. The example of Message is as follows:

Kubernetes Event (Kube Event ${message})

NodeNotReady

This notification is sent when NodeNotReady is displayed 0 or more times in the reason field for the Kubernetes event. The example of Message is as follows:

Kubernetes Event (Kube Event ${message})

Unhealthy

This notification is sent when Unhealthy is displayed 0 or more times in the reason field for the Kubernetes event. The example of Message is as follows:

Kubernetes Event (Kube Event ${message})

Notification of utilization based on container CPU quota

The alert occurs when the total CPU usage (${cpu_per_quota}) based on the container's CPU limit is 70% or more. The example of Message is as follows:

The CPU utilization of container ${oname} in ${okindName} is high, ${cpu_per_quota}% >= 70%.

Container memory fail count

This notification is sent when the container memory limit is reached once or more times. The example of Message is as follows:

Because the ${oname} container of ${okindName} exceeded the limit, the ${mem_failcnt} increased.

Container memory utilization

The alert occurs when the usage (${container.mem_percent}) based on the container's memory limit is 90% or more. The example of Message is as follows:

The memory usage of ${oname} container in ${okindName} is ${container.mem_percent}% >= 90%.

Container DEAD status notification

This notification is sent when the container status code is 100. The status code, 100 means DEAD. The example of Message is as follows:

Container ${oname} is in DEAD state.

Cluster CPU request notification

This notification is sent when the value of the CPU allocatable to nodes divided by the total Limit CPU amount and multiplied by 100 is 80% or more. The example of Message is as follows:

The CPU request (minimum required resource) is more than 80% of the cluster CPU allocation.

Cluster memory request notification

The corresponding alert occurs when the available amount of memory for node allocation divided by the total Limit Memory and multiplied by 100 is 80% or more. The example of Message is as follows:

Memory Request (minimum required resource) is more than 80% of the cluster memory allocation.

Cluster CPU request notification

This notification is sent when the value of the CPU allocatable to nodes divided by the total Limit CPU amount and multiplied by 100 is 60% or more. The example of Message is as follows:

The CPU request (minimum required resource) is more than 60% of the cluster CPU allocation.

Cluster memory request notification

The corresponding alert occurs when the available amount of memory for node allocation divided by the total Limit Memory and multiplied by 100 is 60% or more. The example of Message is as follows:

Memory Request (minimum required resource) is more than 60% of the cluster memory allocation.

Notification of the number of master pods

This alert occurs when the Pods allocable to nodes do not exist. The example of Message is as follows:

The number of Pods allocable to the master is 0.

Node CPU utilization notification

This notification is sent when the node's CPU utilization (${cpu}) is 70% or more. The example of Message is as follows:

The CPU usage of ${oname} is ${cpu}% >= 70%.

Node memory utilization notification

This notification is sent when the node's memory utilization (${memory_pused}) is 90% or more. The example of Message is as follows:

The memory usage of ${oname} is ${memory_pused}% >= 90%.

Unassignable node notification

The corresponding alert occurs when the available number of Pods that can be assigned to a node is zero.

APDEX

This notification is triggered when transactions exist and the APDEX score is lower than 0.7. The example of Message is as follows:

This alert occurs when the available number of Pods that can be assigned to a node is zero. The example of Message is as follows:

APDEX is lower than 0.7 (${oname})

Composite Metrics Event Templates

Inactive agents has been found

An alert occurs when the number of active agents is less than the specified value. The example of Message is as follows:

${ip} ${okindName} The number of active agents has decreased to ${num_of_current_agents}.

TPS has changed by more than 30% compared to the previous week

An alert occurs when the application's TPS changes by more than 30% compared to that of the previous week. The example of message is as follows:

${okindName} a week ago : ${prev_week_tps_display}, current : ${current_tps_display}, difference : ${one_week_diff_display}

Very slow active transactions detected

An alert occurs when the number of transactions that exceed 8 seconds in the application exceeds 10 on average. The example of message is as follows:

${okindName} ${very_slow_tx_cnt_m5_avg_display} active transactions performed for more than 8 seconds were detected.

APDEX score dropped

A notification is triggered when the APDEX score falls below 70. The example of Message is as follows:

The average apdex of ${pname} in the last 5 seconds is ${apdex_display}

CPU % is too high

A notification is sent when the CPU utilization of the node exceeds 80%. The example of message is as follows:

CPU utilization rate of the ${oname} in the last minute > ${_rule_} %

CPU User % is too high

An alert occurs when the user's CPU utilization exceeds 50%. The example of Message is as follows:

CPU User utilization rate of the ${oname} in the last minute > ${_rule_} %

The number of agents with high CPU SYS % is too large

An alert occurs when the system's CPU utilization exceeds 50%. The example of message is as follows:

The number of agents with a CPU SYS of 70% or more in the last minute > ${_rule_} %

The Disk I/O is too high

A notification is sent when the disk I/O utilization exceeds 10%. The example of Message is as follows:

In the last minute, ${oname}'s Disk I/O > ${_rule_} %

The Disk Used % is too high

A notification is sent when the file system utilization exceeds 90%. The example of Message is as follows:

In the last minute, ${oname}'s Disk Used > ${_rule_} %

Network Traffic I/O is too high

An alert occurs when the network inbound/outbound traffic exceeds 10%. The example of Message is as follows:

In the last minute, ${oname}'s Network Traffic I/O > ${_rule_} %

Network Packet I/O is too high

An alert occurs when the network inbound/outbound packets exceed 10%. The example of Message is as follows:

In the last minute, ${oname}'s Network Packet I/O > ${_rule_} %

Network Error I/O is too high

An alert occurs when the network inbound/outbound errors exceed 10%. The example of Message is as follows:

In the last minute, The maximum value of the ${oname}'s Network Error I/O > ${_rule_} %

The kube-apiserver latency over 10 second

A notification is sent when the latency of kube-apiserver among the control plane components exceeds 10 seconds. However, the WATCH operation is excluded. The example of Message is as follows:

Latency of the ${verb} verb in ${instance} of kube-apiserver exceeded ${metricValue} seconds.

The kube-apiserver response increase/decrease rate for error codes

Among control plane components, an alert occurs when the number of error responses from kube-apiserver exceeds 50 and the increase/decrease rate changes by more than 50%. The example of Message is as follows:

Rate of increase in the number of requests for code ${code} on instance ${instance} of kube-apiserver exceeded ${metricValue}.

Basic screen guide​

Adding event​

Define event condition​

Templates​

Select Category​

Indicator</br>settings​

Number of event​

Pause​

Resolved notification​

Simulation​

Select event target​

Basic information and notification setting​

Modifying and deleting the event​

Modifying to JSON format​

Metrics Sharing events​

Export​

Import​

Searching event​

Guide to select generation conditions and targets​

Basic syntax rules​

List of available operators​

Usage of like​

Available functions​

startsWith​

endsWith​

isNull​

isNotNull​

isEmpty​

isNotEmpty​

Template​

Metrics Event Templates​

BackOff​

Evicted​

FailedCreatePodSandBox​

FailedMount​

FailedScheduling​

FailedSync​

NodeNotReady​

Unhealthy​

Notification of utilization based on container CPU quota​

Container memory fail count​

Container memory utilization​

Container DEAD status notification​

Cluster CPU request notification​

Cluster memory request notification​

Cluster CPU request notification​

Cluster memory request notification​

Notification of the number of master pods​

Node CPU utilization notification​

Node memory utilization notification​

Unassignable node notification​

APDEX​

Composite Metrics Event Templates​

Inactive agents has been found​

TPS has changed by more than 30% compared to the previous week​

Very slow active transactions detected​

APDEX score dropped​

CPU % is too high​

CPU User % is too high​

The number of agents with high CPU SYS % is too large​

The Disk I/O is too high​

The Disk Used % is too high​

Network Traffic I/O is too high​

Network Packet I/O is too high​

Network Error I/O is too high​

The kube-apiserver latency over 10 second​

The kube-apiserver response increase/decrease rate for error codes​

Basic screen guide

Adding event

Define event condition

Templates

Select Category

Indicator</br>settings

Number of event

Pause

Resolved notification

Simulation

Select event target

Basic information and notification setting

Modifying and deleting the event

Modifying to JSON format

Metrics Sharing events

Export

Import

Searching event

Guide to select generation conditions and targets

Basic syntax rules

List of available operators

Usage of like

Available functions

startsWith

endsWith

isNull

isNotNull

isEmpty

isNotEmpty

Template

Metrics Event Templates

BackOff

Evicted

FailedCreatePodSandBox

FailedMount

FailedScheduling

FailedSync

NodeNotReady

Unhealthy

Notification of utilization based on container CPU quota

Container memory fail count

Container memory utilization

Container DEAD status notification

Cluster CPU request notification

Cluster memory request notification

Cluster CPU request notification

Cluster memory request notification

Notification of the number of master pods

Node CPU utilization notification

Node memory utilization notification

Unassignable node notification

APDEX

Composite Metrics Event Templates

Inactive agents has been found

TPS has changed by more than 30% compared to the previous week

Very slow active transactions detected

APDEX score dropped

CPU % is too high

CPU User % is too high

The number of agents with high CPU SYS % is too large

The Disk I/O is too high

The Disk Used % is too high

Network Traffic I/O is too high

Network Packet I/O is too high

Network Error I/O is too high

The kube-apiserver latency over 10 second

The kube-apiserver response increase/decrease rate for error codes