Skip to main content

Metrics alert

Home > Select Project > Icon Sitemap > Alert > Event Configuration New

The following explains how to create and manage event notifications based on metrics. Metric events are used to configure the events more specific and complex than the Default template. It provides the feature that allows you to define various events occurring in the monitoring targets according to conditions and set them to receive notifications efficiently.

Note
  • For more information about the metrics, see the following.

  • It can be used by only the members with the Alert settings role. For more information about the member roles, see the following.

Basic screen guide

In Event Configuration New, select the Metrics tab. You can view the list of metrics events added by users, and add, edit, or delete the metrics events.

Event configuration

The major elements of the list of metrics events are as follows:

  • Activation: It is a toggle button that controls whether to enable or disable each event. You can turn on or off the notifications of the desired events.

  • Edit: You can modify the event rules. You can update the rules to set notifications more precisely.

  • Event title: It is the event name. Event names make it easy to distinguish which events occurred. When an event occurs and a notification is received, it is displayed with the title of the message.

  • Rule: It includes event conditions for triggering a notification. Each color represents the Critical or Warning level.

  • Target: You can receive notifications based on events occurring across all or specific monitoring targets.

  • Number of event: A notification is received when the set number of events occurs within the specified time.

  • Resolved notification: It indicates whether or not to receive notifications when an event in Critical and Warning level returns to normal status (RECOVERED).

  • Event reception: It indicates the team or user group that receives notifications.

Note
  • For more information about the Notification Message Settings feature, see the following.

  • You can edit or share metrics events as JSON files. For more information, see the following document.

  • For more information about the JSON Batch Edit and JSON Batch Download features that allow you to share the entire event settings via JSON files, see the following.

Adding event

To add a metrics event, select the Add Alert Policy button on the upper right of the screen. When the Add Event screen appears, proceed with configuration by following the steps below:

  1. Define event condition: Set the event conditions to receive notifications.

  2. Select event target: Select the monitoring targets to generate events.

  3. Basic information and notification setting: Set the event name, message, and recipients.

Define event condition

You can set event conditions to receive notifications.

Templates

The event templates tailored to your monitoring platform are provided for easy and quick event configuration. Select a desired template. Then the remaining fields are automatically filled in with predefined event conditions. If a template is not to be used, select Disabled.

Note
  • After selecting a template, the predefined event conditions can be modified.

  • Depending on the monitoring platform, the provided templates may differ. For more information about the provided template list, see the following.

Select Category

It is the unit that distinguishes the metrics data. It is required for defining event conditions.

Select Category

In the Select Category list, Name, data collection interval (e.g. 1 h, 5 min), and Key are displayed. It lists and displays metrics data being collected for projects within the last 3 hours. If the category name does not appear in the Category selection list, you can enter the category key by selecting the Enter it yourself option.

Note
  • You must select the Category entry to proceed with Indicator</br>settings.

  • The category information can be checked in Analysis > Metrics Search.

Indicator</br>settings

You can select a notification level and set the threshold for events that trigger a notification.

Indicator settings

  • Notification levels are divided into Critical (Critical), Warning (Warning), and Normal (Info).

  • Selector: Set the thresholds after selecting the field name and operators and then entering values. After clicking a field, you can see the list of fields included in the metric category.

    Add: You can add the metric settings. You can select &&(and) or ||(or) as the condition.

  • Typing: Set the event occurrence conditions by directly entering the field name, operators, and values. If you select this option, the fields (Autocomplete by selecting a field) that can be entered are provided.

Tip

The following example is an event condition that triggers a notification when the write_time exceeds 3,000 milliseconds, or the read_time is less than 3,000 milliseconds, and the io_time is less than 10,000 milliseconds.

example
(write_time > 3000 || read_time > 3000) && io_time < 10000
Note
  • If the Selector option is selected, the field list appears only when Select Category is set.

  • For the operators that can be selected in the Selector option, see the following.

  • An error may occur if you enter a field name that includes special characters (~!@#$%^&*()_+=-[]`) or starts with a number. In this case, select the Typing option and then enter the value by enclosing it in curly brackets (${}\) as in the following example.

    ${4xxErrorType} == '401'
  • When the input method is changed between Selector and Typing options, the edited result disappears.

Number of event

A notification is sent when the specified number of events occur during the selected period.

Count of event

  • Consecutive: A notification can be received when the specified number of events occur consecutively.

  • In the last: A notification can be received when the specified number of events occur consecutively during the selected period.

Note

The value, Interval indicates the data collection interval for the selected category.

Pause

This option can prevent excessive alerts from happening. Alert notifications are not sent for the selected period after the first alert occurs.

Pause alert

Note

If you have enabled the Resolved notification feature, no notification is sent for the selected period after receiving a normal (RECOVERED) notification.

Resolved notification

When an event whose level is Critical or Warning, is cleared, a normal (RECOVERED) notification is sent. This function can be enabled or disabled by selecting the toggle button.

Resolved notification

Note

This option appears when Critical or Warning is selected for the notification level.

Simulation

You can test the event conditions set in Indicator</br>settings. Select Simulation.

Simulation

  • The number of notifications Number 1 that occurred during the queried time appears at the upper right corner.

  • The Number 2 red dotted line is the result set in Indicator</br>settings.

Note
  • For more information about Indicator</br>settings, see the following.

  • The Simulation feature can test for the data for 24 hours.

Select event target

Select monitoring targets where events occur. If not entered, events are monitored and notifications are sent across the entire project. This may result in lots of notifications. Set specific targets to avoid excessive notifications.

Select target

Event targets can be specified based on the following tags (Tag). The tag (Tag) is the data containing unique information to distinguish the targets to collect. You can use values ​​such as IP, Oname, and host data that have little change history.

  • Selector: You can specify targets by selecting tag names and operators and entering values.

    Add: You can add targets for the event. You can select &&(and) or ||(or) as the condition.

  • Typing: You can specify targets by directly entering the tag names, operators, and values. If you select this option, the tags (Autocomplete by selecting a tag) that can be entered are provided.

Tip

Set event targets by noting the following example.

ex. endsWith(okindName, 'example_name') && container == 'prod.billing'
ex. ${4xxErrorType} == '401'
Note
  • Depending on the value selected in Select Category of the Define event condition section, available tags may differ.

  • If the event target is changed, the number of notifications may also change. Run the Simulation button again to see the result.

  • For more information about the selection of event targets and use of operators, see the following.

Basic information and notification setting

Set the event name, message, and recipients.

Basic information

  • Activate events: You can enable the current events.

  • Event title: This is used as the title of the notification message. Enter a name that is easy to identify.

  • Message: Enter the notification message to deliver to users. You can use variables to contain dynamic data.

    Message example

    • By entering ${Tag} or ${Field}, you can apply the variable to the message. The variable must be a value contained in the selected metrics data Category. You can check the ${Tag} and ${Field} variables that can be entered in the Metrics Search menu.

    • If the Start icon button is clicked, you can see the history of previously entered messages.

    Tip

    You can write a message by entering the ${Tag} and ${Field} variables in the message field.

    In Analysis > Metrics Search, select Category and then check the ${Tag} and ${Field} variables that can be entered. For the Category name of the current event template, see the Category in the following

    Note

    For the ${Tag} and ${Field} variables that can be entered in the message, see the following.

  • Test alert: When receiving notifications as events, you can pre-check the specified event name and message. Testing is possible only when the required items (Indicatorsettings, Event title, and Message) are all entered.

    Note
    • During testing, substitution for actual metric values or variables does not work, and the notifications cannot be sent only to users with recipient tags.

    • To use this feature, enter or select a value for the required field (*).

  • Event reception: You can select members to receive notifications from current events.

    • Receive all: You can send notifications to all the members in the project.

    • Receive selected tags: It sends notifications to project members and 3rd-party plugins with the selected tags. When Reception tag appears, click Add tag or to select a desired tag from the tag list.

    Note

    In Alert > Notifications, you can set the tags in project members and 3rd-party plug-ins. For more information, see the following.

Modifying and deleting the event

  1. Go to Alert > Event Configuration and then select the Metrics tab.

  2. In the event list, select Edit icon on the utmost left of the item to edit or delete.

  3. If the metrics event setting window appears, modify each option and then select Save.

    To delete the selected event, select Delete on the upper right of the Event configuration window.

Modifying to JSON format

You can modify metrics event settings in JSON format.

  1. On the upper right of the screen, select JSON .

  2. When the editing window appears, modify the content in JSON format.

  3. After all changes are made, select Save on the upper right of the screen.

Note

If the modified content does not match the JSON format, an error message appears at the bottom of the screen and the changes cannot be saved. The displayed error message may differ depending on the format.

JSON error

The JSON data structure is as follows:

[
{
"eventMessage": "APDEX: ${apdex100} > 10 ",
"select": "",
"receiver": [],
"alertLabel": [
"project"
],
"rule": "apdex100 > 10",
"silentSec": 0,
"alertKey": [
"project"
],
"enabled": false,
"eventTitle": "APDEX",
"repeatDuration": 0,
"eventLevelText": "Critical",
"id": "z3f41ge464magg",
"category": "app_counter_project{m5}",
"repeatCount": 0,
"stateful": false
}
]

The fields in JSON data are associated with the following options in event settings.

JSON fieldOptionClassification
eventMessageMessageBasic information and notification setting
selectSelect target > TypingSelect event target
receiverList of key values in the Event reception > Reception tag optionBasic information and notification setting
alertLabelPrimary key value used internally when performing notification operations-
ruleEvent conditions of the Indicator</br>settings optionDefine event condition
silentSecPauseDefine event condition
alertKeyPrimary key value used internally when performing notification operations-
enabledActivate eventsBasic information and notification setting
eventTitleEvent titleBasic information and notification setting
repeatDurationTime selected in the Number of event optionDefine event condition
eventLevelTextEvent level of the Indicator</br>settings optionDefine event condition
idUnique identifier value of the event-
categorySelect CategoryDefine event condition
repeatCountNumber selected in the Number of event optionDefine event condition
statefulResolved notificationDefine event condition

Metrics Sharing events

You can save the Metrics event settings as a JSON file to share them with other users or import them from other users.

Export

  1. On the upper right of the screen, select JSON .

  2. If the JSON edition window appears, select Export.

  3. Once the JSON file has been downloaded, forward it to others for sharing.

Note

The format of the JSON file name is event-rules-YYYY-MM-DD.json.

Import

  1. On the upper right of the screen, select .

  2. Using the Export function, select a JSON file to download.

  3. If the JSON edition window appears, select Add to list or Overwrite.

Caution

It is recommended to use this function for the products of the same type. You can import event settings from the projects for other products, but it does not work.

Searching event

You can search the event by event name or metric in the event list. Enter a string in the search field and then select Search icon.

Guide to select generation conditions and targets

For the event generation conditions and selection of event targets on metrics alerts, use the same syntaxes. For event generation conditions, use the tag key as a variable. For selection of event targets, use the field key as a variable.

Basic syntax rules

  • If you just enter a string, it is recognized as a variable. If you enclose it in single quotation marks ('') or double quotation marks (""), it is recognized as text.

    oid == "oid"
    1. oid: variable
    2. ==: function
    3. "oid": text
    // In case oname is ott-1235

    // Normal cases
    onname = 'ott-1235' or onname = "ott-1235"

    // In abnormal cases, notification does not work.
    onname = ott-1235
  • If you just enter a number, it is recognized as number, and if you enclose it in single quotation marks ('') or double quotation marks (""), it is recognized as text.

    oid == 123
    1. oid: variable
    2. ==: function
    3. 123: number
    // In case oid is 123

    // Normal cases
    oid = 123

    // In abnormal cases, notification does not work.
    id == '123' or oid == "123"

List of available operators

OperatorUsageDescription
==operand1 == operand2It checks whether operand1 is equal to operand2.
!=operand1 != operand2It checks whether operand1 and operand2 have different values.
>operand1 > operand2Check whether the operand1 value is greater than the operand2 value.
>=operand1 >= operand2Checks whether the operand1 value is greater than or equal to the operand2 value.
<operand1 < operand2Check whether the operand1 value is less than the operand2 value.
<=operand1 <= operand2Check whether the operand1 value is less than or equal to the operand2 value.
likeoperand1 like operand2Search with patterns whether operand1 includes operand2.
&&expression1 && expression2Check whether expression1 and expression2 are all true.
andexpression1 and expression2Check whether expression1 and expression2 are all true.
The operator plays the same role as &&.
||expression1 || expression2Check whether expression1 and expression2 are all true.
orexpression1 or expression2Check whether expression1 and expression2 are all true.
The operator plays the same role as ||.

Usage of like

You can conveniently search for embedded strings via the wildcard (*).

  • Searching for strings that start with a specific keyword


    Key like "Value*"

  • Searching for strings that end with a specific keyword


    Key like "*Value"

  • Searching for strings that include a specific keyword


    Key like "*Value*"

  • The wildcard (*) cannot be used in the middle of keywords.


    // Unsupported syntax
    Key like "Va*lue"

  • If you omit the wildcard (*) in the like operator, it operates as equals (==).


    // The following two statements have the same result.
    Key like "Value"
    Key == "Value"

Available functions

MethodUsageDescription
startsWithstartsWith(param1, param2)If the value whose param1 is the key starts with param2, the result is true. Otherwise, the result is false.
endsWithendsWith(param1, param2)If the value whose param1 is the key ends with param2, the result is true. Otherwise, the result is false.
isNullisNull(param1)If param1 is null, the value becomes true. Otherwise, the value becomes false.
isNotNullisNotNull(param1)If param1 is not null, the value becomes true. Otherwise, the value becomes false.
isEmptyisEmpty(param1)If param1 is null or EmptyString(""), the value becomes true. Otherwise, the value becomes false.
isNotEmptyisNotEmpty(param1)If param1 is not null nor EmptyString(""), the value becomes true. Otherwise, the value becomes false.

startsWith

startsWith(Key, "Value")

endsWith

endsWith(Key, "Value")

isNull

isNull(Key)

isNotNull

isNotNull(Key)

isEmpty

isEmpty(Key)

isNotEmpty

isNotEmpty(Key)

Template

Metrics Event Templates

Note

For more information about the reason field in the Kubernetes event, see Kubernetes official documentation.

BackOff

This notification is sent when BackOff is displayed 0 or more times in the reason field for the Kubernetes event. The example of Message is as follows:

Kubernetes Event (Kube Event ${message})

Evicted

This notification is sent when Evicted is displayed 0 or more times in the reason field for the Kubernetes event. The example of Message is as follows:

Kubernetes Event (Kube Event ${message})

FailedCreatePodSandBox

This notification is sent when FailedCreatePodSandBox is displayed 0 or more times in the reason field of the Kubernetes event. The example of Message is as follows:

Kubernetes Event (Kube Event ${message})

FailedMount

This notification is sent when FailedMount is displayed 0 or more times in the reason field of the Kubernetes event. The example of Message is as follows:

Kubernetes Event (Kube Event ${message})

FailedScheduling

This notification is sent when FailedScheduling is displayed 0 or more times in the reason field of the Kubernetes event. The example of Message is as follows:

Kubernetes Event (Kube Event ${message})

FailedSync

This notification is sent when FailedSync is displayed 0 or more times in the reason field of the Kubernetes event. The example of Message is as follows:

Kubernetes Event (Kube Event ${message})

NodeNotReady

This notification is sent when NodeNotReady is displayed 0 or more times in the reason field for the Kubernetes event. The example of Message is as follows:

Kubernetes Event (Kube Event ${message})

Unhealthy

This notification is sent when Unhealthy is displayed 0 or more times in the reason field for the Kubernetes event. The example of Message is as follows:

Kubernetes Event (Kube Event ${message})

Notification of utilization based on container CPU quota

The alert occurs when the total CPU usage (${cpu_per_quota}) based on the container's CPU limit is 70% or more. The example of Message is as follows:

The CPU utilization of container ${oname} in ${okindName} is high, ${cpu_per_quota}% >= 70%.

Container memory fail count

This notification is sent when the container memory limit is reached once or more times. The example of Message is as follows:

Because the ${oname} container of ${okindName} exceeded the limit, the ${mem_failcnt} increased.

Container memory utilization

The alert occurs when the usage (${container.mem_percent}) based on the container's memory limit is 90% or more. The example of Message is as follows:

The memory usage of ${oname} container in ${okindName} is ${container.mem_percent}% >= 90%.

Container DEAD status notification

This notification is sent when the container status code is 100. The status code, 100 means DEAD. The example of Message is as follows:

Container ${oname} is in DEAD state.

Cluster CPU request notification

This notification is sent when the value of the CPU allocatable to nodes divided by the total Limit CPU amount and multiplied by 100 is 80% or more. The example of Message is as follows:

The CPU request (minimum required resource) is more than 80% of the cluster CPU allocation.

Cluster memory request notification

The corresponding alert occurs when the available amount of memory for node allocation divided by the total Limit Memory and multiplied by 100 is 80% or more. The example of Message is as follows:

Memory Request (minimum required resource) is more than 80% of the cluster memory allocation.

Cluster CPU request notification

This notification is sent when the value of the CPU allocatable to nodes divided by the total Limit CPU amount and multiplied by 100 is 60% or more. The example of Message is as follows:

The CPU request (minimum required resource) is more than 60% of the cluster CPU allocation.

Cluster memory request notification

The corresponding alert occurs when the available amount of memory for node allocation divided by the total Limit Memory and multiplied by 100 is 60% or more. The example of Message is as follows:

Memory Request (minimum required resource) is more than 60% of the cluster memory allocation.

Notification of the number of master pods

This alert occurs when the Pods allocable to nodes do not exist. The example of Message is as follows:

The number of Pods allocable to the master is 0.

Node CPU utilization notification

This notification is sent when the node's CPU utilization (${cpu}) is 70% or more. The example of Message is as follows:

The CPU usage of ${oname} is ${cpu}% >= 70%.

Node memory utilization notification

This notification is sent when the node's memory utilization (${memory_pused}) is 90% or more. The example of Message is as follows:

The memory usage of ${oname} is ${memory_pused}% >= 90%.

Unassignable node notification

The corresponding alert occurs when the available number of Pods that can be assigned to a node is zero.

APDEX

This notification is triggered when transactions exist and the APDEX score is lower than 0.7. The example of Message is as follows:

This alert occurs when the available number of Pods that can be assigned to a node is zero. The example of Message is as follows:

APDEX is lower than 0.7 (${oname})

Composite Metrics Event Templates

Inactive agents has been found

An alert occurs when the number of active agents is less than the specified value. The example of Message is as follows:

${ip} ${okindName} The number of active agents has decreased to ${num_of_current_agents}.

TPS has changed by more than 30% compared to the previous week

An alert occurs when the application's TPS changes by more than 30% compared to that of the previous week. The example of message is as follows:

${okindName} a week ago : ${prev_week_tps_display}, current : ${current_tps_display}, difference : ${one_week_diff_display}

Very slow active transactions detected

An alert occurs when the number of transactions that exceed 8 seconds in the application exceeds 10 on average. The example of message is as follows:

${okindName} ${very_slow_tx_cnt_m5_avg_display} active transactions performed for more than 8 seconds were detected.

APDEX score dropped

A notification is triggered when the APDEX score falls below 70. The example of Message is as follows:

The average apdex of ${pname} in the last 5 seconds is ${apdex_display}

CPU % is too high

A notification is sent when the CPU utilization of the node exceeds 80%. The example of message is as follows:

CPU utilization rate of the ${oname} in the last minute > ${_rule_} %

CPU User % is too high

An alert occurs when the user's CPU utilization exceeds 50%. The example of Message is as follows:

CPU User utilization rate of the ${oname} in the last minute > ${_rule_} %

The number of agents with high CPU SYS % is too large

An alert occurs when the system's CPU utilization exceeds 50%. The example of message is as follows:

The number of agents with a CPU SYS of 70% or more in the last minute > ${_rule_} %

The Disk I/O is too high

A notification is sent when the disk I/O utilization exceeds 10%. The example of Message is as follows:

In the last minute, ${oname}'s Disk I/O > ${_rule_} %

The Disk Used % is too high

A notification is sent when the file system utilization exceeds 90%. The example of Message is as follows:

In the last minute, ${oname}'s Disk Used > ${_rule_} %

Network Traffic I/O is too high

An alert occurs when the network inbound/outbound traffic exceeds 10%. The example of Message is as follows:

In the last minute, ${oname}'s Network Traffic I/O > ${_rule_} %

Network Packet I/O is too high

An alert occurs when the network inbound/outbound packets exceed 10%. The example of Message is as follows:

In the last minute, ${oname}'s Network Packet I/O > ${_rule_} %

Network Error I/O is too high

An alert occurs when the network inbound/outbound errors exceed 10%. The example of Message is as follows:

In the last minute, The maximum value of the ${oname}'s Network Error I/O > ${_rule_} %

The kube-apiserver latency over 10 second

A notification is sent when the latency of kube-apiserver among the control plane components exceeds 10 seconds. However, the WATCH operation is excluded. The example of Message is as follows:

Latency of the ${verb} verb in ${instance} of kube-apiserver exceeded ${metricValue} seconds.

The kube-apiserver response increase/decrease rate for error codes

Among control plane components, an alert occurs when the number of error responses from kube-apiserver exceeds 50 and the increase/decrease rate changes by more than 50%. The example of Message is as follows:

Rate of increase in the number of requests for code ${code} on instance ${instance} of kube-apiserver exceeded ${metricValue}.