Metrics alert
Home > Select Project > Sitemap > Alert > Event Configuration New
The following explains how to create and manage event notifications based on metrics. Metric events are used to configure the events more specific and complex than the Default template. It provides the feature that allows you to define various events occurring in the monitoring targets according to conditions and set them to receive notifications efficiently.
-
For more information about the metrics, see the following.
-
It can be used by only the members with the Alert settings role. For more information about the member roles, see the following.
Basic screen guide
In Event Configuration New
, select the Metrics tab. You can view the list of metrics events added by users, and add, edit, or delete the metrics events.
The major elements of the list of metrics events are as follows:
-
Activation: It is a toggle button that controls whether to enable or disable each event. You can turn on or off the notifications of the desired events.
-
Edit: You can modify the event rules. You can update the rules to set notifications more precisely.
-
Event title: It is the event name. Event names make it easy to distinguish which events occurred. When an event occurs and a notification is received, it is displayed with the title of the message.
-
Rule: It includes event conditions for triggering a notification. Each color represents the Critical or Warning level.
-
Target: You can receive notifications based on events occurring across all or specific monitoring targets.
-
Number of event: A notification is received when the set number of events occurs within the specified time.
-
Resolved notification: It indicates whether or not to receive notifications when an event in Critical and Warning level returns to normal status (RECOVERED).
-
Event reception: It indicates the team or user group that receives notifications.
-
For more information about the Notification Message Settings feature, see the following.
-
You can edit or share metrics events as JSON files. For more information, see the following document.
-
For more information about the JSON Batch Edit and JSON Batch Download features that allow you to share the entire event settings via JSON files, see the following.
Adding event
To add a metrics event, select the Add Alert Policy button on the upper right of the screen. When the Add Event screen appears, proceed with configuration by following the steps below:
-
Define event condition: Set the event conditions to receive notifications.
-
Select event target: Select the monitoring targets to generate events.
-
Basic information and notification setting: Set the event name, message, and recipients.
Define event condition
You can set event conditions to receive notifications.
Templates
The event templates tailored to your monitoring platform are provided for easy and quick event configuration. Select a desired template. Then the remaining fields are automatically filled in with predefined event conditions. If a template is not to be used, select Disabled.
-
After selecting a template, the predefined event conditions can be modified.
-
Depending on the monitoring platform, the provided templates may differ. For more information about the provided template list, see the following.
Select Category
It is the unit that distinguishes the metrics data. It is required for defining event conditions.
In the Select Category list, Name, data collection interval (e.g. 1 h, 5 min), and Key are displayed. It lists and displays metrics data being collected for projects within the last 3 hours. If the category name does not appear in the Category selection list, you can enter the category key by selecting the Enter it yourself option.
-
You must select the Category entry to proceed with Indicator</br>settings.
-
The category information can be checked in Analysis > Metrics Search.
Indicator</br>settings
You can select a notification level and set the threshold for events that trigger a notification.
-
Notification levels are divided into Critical (Critical), Warning (Warning), and Normal (Info).
-
Selector: Set the thresholds after selecting the field name and operators and then entering values. After clicking a field, you can see the list of fields included in the metric category.
Add: You can add the metric settings. You can select
&&
(and) or||
(or) as the condition. -
Typing: Set the event occurrence conditions by directly entering the field name, operators, and values. If you select this option, the fields (Autocomplete by selecting a field) that can be entered are provided.
The following example is an event condition that triggers a notification when the write_time
exceeds 3,000 milliseconds, or the read_time
is less than 3,000 milliseconds, and the io_time
is less than 10,000 milliseconds.
(write_time > 3000 || read_time > 3000) && io_time < 10000
-
If the Selector option is selected, the field list appears only when Select Category is set.
-
For the operators that can be selected in the Selector option, see the following.
-
An error may occur if you enter a field name that includes special characters (
~!@#$%^&*()_+=-[]`
) or starts with a number. In this case, select the Typing option and then enter the value by enclosing it in curly brackets (${}\
) as in the following example.${4xxErrorType} == '401'
-
When the input method is changed between Selector and Typing options, the edited result disappears.
Number of event
A notification is sent when the specified number of events occur during the selected period.
-
Consecutive: A notification can be received when the specified number of events occur consecutively.
-
In the last: A notification can be received when the specified number of events occur consecutively during the selected period.
The value, Interval indicates the data collection interval for the selected category.
Pause
This option can prevent excessive alerts from happening. Alert notifications are not sent for the selected period after the first alert occurs.
If you have enabled the Resolved notification feature, no notification is sent for the selected period after receiving a normal (RECOVERED) notification.
Resolved notification
If the Resolved notification option is activated, it is displayed as the event of Event History in the In progress menu. When an event of Critical or Warning level is resolved, normal (RECOVERED) notifications are received. This function can be enabled or disabled by selecting the toggle button.
-
This option appears when Critical or Warning is selected for the notification level.
-
For more information about Event History, see the following.
Simulation
You can test the event conditions set in Indicator</br>settings. Select Simulation.
-
The number of notifications that occurred during the queried time appears at the upper right corner.
-
The red dotted line is the result set in Indicator</br>settings.
-
For more information about Indicator</br>settings, see the following.
-
The Simulation feature can test for the data for 24 hours.
-
For more information on how to use the time selector, see the following.
Select event target
Select monitoring targets where events occur. If not entered, events are monitored and notifications are sent across the entire project. This may result in lots of notifications. Set specific targets to avoid excessive notifications.
Event targets can be specified based on the following tags (Tag
). The tag (Tag
) is the data containing unique information to distinguish the targets to collect. You can use values such as IP, Oname, and host data that have little change history.
-
Selector: You can specify targets by selecting tag names and operators and entering values.
Add: You can add targets for the event. You can select
&&
(and) or||
(or) as the condition. -
Typing: You can specify targets by directly entering the tag names, operators, and values. If you select this option, the tags (Autocomplete by selecting a tag) that can be entered are provided.
Set event targets by noting the following example.
ex. endsWith(okindName, 'example_name') && container == 'prod.billing'
ex. ${4xxErrorType} == '401'
-
Depending on the value selected in Select Category of the Define event condition section, available tags may differ.
-
If the event target is changed, the number of notifications may also change. Run the Simulation button again to see the result.
-
For more information about the selection of event targets and use of operators, see the following.
Basic information and notification setting
Set the event name, message, and recipients.
-
Activate events: You can enable the current events.
-
Event title: This is used as the title of the notification message. Enter a name that is easy to identify.
-
Message: Enter the notification message to deliver to users. You can use variables to contain dynamic data.
-
By entering
${Tag}
or${Field}
, you can apply the variable to the message. The variable must be a value contained in the selected metrics data Category. You can check the${Tag}
and${Field}
variables that can be entered in the Metrics Search menu. -
If the button is clicked, you can see the history of previously entered messages.
TipYou can write a message by entering the
${Tag}
and${Field}
variables in the message field.In Analysis > Metrics Search, select Category and then check the
${Tag}
and${Field}
variables that can be entered. For the Category name of the current event template, see the Category in the followingNoteFor the
${Tag}
and${Field}
variables that can be entered in the message, see the following. -
-
Test alert: When receiving notifications as events, you can pre-check the specified event name and message. Testing is possible only when the required items (Indicatorsettings, Event title, and Message) are all entered.
Note-
During testing, substitutions for actual metric values or variables do not work.
-
Test notifications are sent only to users with the recipient tag set in the Event reception option. If you select Receive all, a test notification is sent to all users.
-
To use this feature, enter or select a value for the required field (*).
-
-
Event reception: You can select members to receive notifications from current events.
-
Receive all: You can send notifications to all the members in the project.
-
Receive selected tags: It sends notifications to project members and 3rd-party plugins with the selected tags. When Reception tag appears, click Add tag or to select a desired tag from the tag list.
NoteIn Alert > Notifications, you can set the tags in project members and 3rd-party plug-ins. For more information, see the following.
-
Modifying and deleting the event
-
Go to Alert > Event Configuration and then select the Metrics tab.
-
In the event list, select on the utmost left of the item to edit or delete.
-
If the metrics event setting window appears, modify each option and then select Save.
To delete the selected event, select Delete on the upper right of the Event configuration window.
Modifying to JSON format
You can modify metrics event settings in JSON format.
-
On the upper right of the screen, select JSON .
-
When the editing window appears, modify the content in JSON format.
-
After all changes are made, select Save on the upper right of the screen.
If the modified content does not match the JSON format, an error message appears at the bottom of the screen and the changes cannot be saved. The displayed error message may differ depending on the format.
The JSON data structure is as follows:
[
{
"eventMessage": "APDEX: ${apdex100} > 10 ",
"select": "",
"receiver": [],
"alertLabel": [
"project"
],
"rule": "apdex100 > 10",
"silentSec": 0,
"alertKey": [
"project"
],
"enabled": false,
"eventTitle": "APDEX",
"repeatDuration": 0,
"eventLevelText": "Critical",
"id": "z3f41ge464magg",
"category": "app_counter_project{m5}",
"repeatCount": 0,
"stateful": false
}
]
The fields in JSON data are associated with the following options in event settings.
JSON field | Option | Classification |
---|---|---|
eventMessage | Message | Basic information and notification setting |
select | Select target > Typing | Select event target |
receiver | List of key values in the Event reception > Reception tag option | Basic information and notification setting |
alertLabel | Primary key value used internally when performing notification operations | - |
rule | Event conditions of the Indicator</br>settings option | Define event condition |
silentSec | Pause | Define event condition |
alertKey | Primary key value used internally when performing notification operations | - |
enabled | Activate events | Basic information and notification setting |
eventTitle | Event title | Basic information and notification setting |
repeatDuration | Time selected in the Number of event option | Define event condition |
eventLevelText | Event level of the Indicator</br>settings option | Define event condition |
id | Unique identifier value of the event | - |
category | Select Category | Define event condition |
repeatCount | Number selected in the Number of event option | Define event condition |
stateful | Resolved notification | Define event condition |
Metrics Sharing events
You can save the Metrics event settings as a JSON file to share them with other users or import them from other users.
Export
-
On the upper right of the screen, select JSON .
-
If the JSON edition window appears, select Export.
-
Once the JSON file has been downloaded, forward it to others for sharing.
-
The format of the JSON file name is event-rules-
YYYY
-MM
-DD
.json. -
After searching for events, you can use the export feature to download only the searched list as a JSON file.
Import
-
On the upper right of the screen, select .
-
Using the Export function, select a JSON file to download.
-
If the JSON edition window appears, select Add to list or Overwrite.
It is recommended to use this function for the products of the same type. You can import event settings from the projects for other products, but it does not work.
Searching event
You can search the event by event name or metric in the event list. Enter a string in the search field and then select .
After searching for events, you can use the export feature to download only the searched list as a JSON file. For more information about the export feature, see the following.
Guide to select generation conditions and targets
For the event generation conditions and selection of event targets on metrics alerts, use the same syntaxes. For event generation conditions, use the tag key as a variable. For selection of event targets, use the field key as a variable.
Basic syntax rules
-
If you just enter a string, it is recognized as a variable. If you enclose it in single quotation marks ('') or double quotation marks (""), it is recognized as text.
oid == "oid"1. oid: variable
2. ==: function
3. "oid": text// In case oname is ott-1235
// Normal cases
onname = 'ott-1235' or onname = "ott-1235"
// In abnormal cases, notification does not work.
onname = ott-1235 -
If you just enter a number, it is recognized as number, and if you enclose it in single quotation marks ('') or double quotation marks (""), it is recognized as text.
oid == 1231. oid: variable
2. ==: function
3. 123: number// In case oid is 123
// Normal cases
oid = 123
// In abnormal cases, notification does not work.
id == '123' or oid == "123"
List of available operators
Operator | Usage | Description |
---|---|---|
== | operand1 == operand2 | It checks whether operand1 is equal to operand2. |
!= | operand1 != operand2 | It checks whether operand1 and operand2 have different values. |
> | operand1 > operand2 | Check whether the operand1 value is greater than the operand2 value. |
>= | operand1 >= operand2 | Checks whether the operand1 value is greater than or equal to the operand2 value. |
< | operand1 < operand2 | Check whether the operand1 value is less than the operand2 value. |
<= | operand1 <= operand2 | Check whether the operand1 value is less than or equal to the operand2 value. |
like | operand1 like operand2 | Search with patterns whether operand1 includes operand2. |
&& | expression1 && expression2 | Check whether expression1 and expression2 are all true . |
and | expression1 and expression2 | Check whether expression1 and expression2 are all true .The operator plays the same role as &&. |
|| | expression1 || expression2 | Check whether expression1 and expression2 are all true . |
or | expression1 or expression2 | Check whether expression1 and expression2 are all true .The operator plays the same role as ||. |
Usage of like
You can conveniently search for embedded strings via the wildcard (*
).
-
Searching for strings that start with a specific keyword
Key like "Value*" -
Searching for strings that end with a specific keyword
Key like "*Value" -
Searching for strings that include a specific keyword
Key like "*Value*" -
The wildcard (
*
) cannot be used in the middle of keywords.
// Unsupported syntax
Key like "Va*lue" -
If you omit the wildcard (
*
) in thelike
operator, it operates as equals (==
).
// The following two statements have the same result.
Key like "Value"
Key == "Value"
Available functions
Method | Usage | Description |
---|---|---|
startsWith | startsWith(param1, param2) | If the value whose param1 is the key starts with param2, the result is true . Otherwise, the result is false . |
endsWith | endsWith(param1, param2) | If the value whose param1 is the key ends with param2, the result is true . Otherwise, the result is false . |
isNull | isNull(param1) | If param1 is null , the value becomes true . Otherwise, the value becomes false . |
isNotNull | isNotNull(param1) | If param1 is not null , the value becomes true . Otherwise, the value becomes false . |
isEmpty | isEmpty(param1) | If param1 is null or EmptyString("") , the value becomes true . Otherwise, the value becomes false . |
isNotEmpty | isNotEmpty(param1) | If param1 is not null nor EmptyString("") , the value becomes true . Otherwise, the value becomes false . |
startsWith
startsWith(Key, "Value")
endsWith
endsWith(Key, "Value")
isNull
isNull(Key)
isNotNull
isNotNull(Key)
isEmpty
isEmpty(Key)
isNotEmpty
isNotEmpty(Key)
Template
Metrics Event Templates
For more information about the reason
field in the Kubernetes event, see Kubernetes official documentation.
BackOff
This notification is sent when BackOff
is displayed 0 or more times in the reason
field for the Kubernetes event. The example of Message is as follows:
Kubernetes Event (Kube Event
${message}
)
Evicted
This notification is sent when Evicted
is displayed 0 or more times in the reason
field for the Kubernetes event. The example of Message is as follows:
Kubernetes Event (Kube Event
${message}
)
FailedCreatePodSandBox
This notification is sent when FailedCreatePodSandBox
is displayed 0 or more times in the reason
field of the Kubernetes event. The example of Message is as follows:
Kubernetes Event (Kube Event
${message}
)
FailedMount
This notification is sent when FailedMount
is displayed 0 or more times in the reason
field of the Kubernetes event. The example of Message is as follows:
Kubernetes Event (Kube Event
${message}
)
FailedScheduling
This notification is sent when FailedScheduling
is displayed 0 or more times in the reason
field of the Kubernetes event. The example of Message is as follows:
Kubernetes Event (Kube Event
${message}
)
FailedSync
This notification is sent when FailedSync
is displayed 0 or more times in the reason
field of the Kubernetes event. The example of Message is as follows:
Kubernetes Event (Kube Event
${message}
)
NodeNotReady
This notification is sent when NodeNotReady
is displayed 0 or more times in the reason
field for the Kubernetes event. The example of Message is as follows:
Kubernetes Event (Kube Event
${message}
)
Unhealthy
This notification is sent when Unhealthy
is displayed 0 or more times in the reason
field for the Kubernetes event. The example of Message is as follows:
Kubernetes Event (Kube Event
${message}
)
Notification of utilization based on container CPU quota
The alert occurs when the total CPU usage (${cpu_per_quota}
) based on the container's CPU limit is 70% or more. The example of Message is as follows:
The CPU utilization of container
${oname}
in${okindName}
is high,${cpu_per_quota}
% >= 70%.
Container memory fail count
This notification is sent when the container memory limit is reached once or more times. The example of Message is as follows:
Because the
${oname}
container of${okindName}
exceeded the limit, the${mem_failcnt}
increased.
Container memory utilization
The alert occurs when the usage (${container.mem_percent}
) based on the container's memory limit is 90% or more. The example of Message is as follows:
The memory usage of
${oname}
container in${okindName}
is${container.mem_percent}
% >= 90%.
Container DEAD status notification
This notification is sent when the container status code is 100
. The status code, 100
means DEAD
. The example of Message is as follows:
Container
${oname}
is inDEAD
state.
Cluster CPU request notification
This notification is sent when the value of the CPU allocatable to nodes divided by the total Limit CPU amount and multiplied by 100 is 80% or more. The example of Message is as follows:
The CPU request (minimum required resource) is more than 80% of the cluster CPU allocation.
Cluster memory request notification
The corresponding alert occurs when the available amount of memory for node allocation divided by the total Limit Memory and multiplied by 100 is 80% or more. The example of Message is as follows:
Memory Request (minimum required resource) is more than 80% of the cluster memory allocation.
Cluster CPU request notification
This notification is sent when the value of the CPU allocatable to nodes divided by the total Limit CPU amount and multiplied by 100 is 60% or more. The example of Message is as follows:
The CPU request (minimum required resource) is more than 60% of the cluster CPU allocation.
Cluster memory request notification
The corresponding alert occurs when the available amount of memory for node allocation divided by the total Limit Memory and multiplied by 100 is 60% or more. The example of Message is as follows:
Memory Request (minimum required resource) is more than 60% of the cluster memory allocation.
Notification of the number of master pods
This alert occurs when the Pods allocable to nodes do not exist. The example of Message is as follows:
The number of Pods allocable to the master is 0.
Node CPU utilization notification
This notification is sent when the node's CPU utilization (${cpu}
) is 70% or more. The example of Message is as follows:
The CPU usage of
${oname}
is${cpu}
% >= 70%.
Node memory utilization notification
This notification is sent when the node's memory utilization (${memory_pused}
) is 90% or more. The example of Message is as follows:
The memory usage of
${oname}
is${memory_pused}
% >= 90%.
Unassignable node notification
The corresponding alert occurs when the available number of Pods that can be assigned to a node is zero.
APDEX
This notification is triggered when transactions exist and the APDEX score is lower than 0.7. The example of Message is as follows:
This alert occurs when the available number of Pods that can be assigned to a node is zero. The example of Message is as follows:
APDEX is lower than 0.7 (
${oname}
)
Composite Metrics Event Templates
Inactive agents has been found
An alert occurs when the number of active agents is less than the specified value. The example of Message is as follows:
${ip}
${okindName}
The number of active agents has decreased to${num_of_current_agents}
.
TPS has changed by more than 30% compared to the previous week
An alert occurs when the application's TPS changes by more than 30% compared to that of the previous week. The example of message is as follows:
${okindName}
a week ago :${prev_week_tps_display}
, current :${current_tps_display}
, difference :${one_week_diff_display}
Very slow active transactions detected
An alert occurs when the number of transactions that exceed 8 seconds in the application exceeds 10 on average. The example of message is as follows:
${okindName}
${very_slow_tx_cnt_m5_avg_display}
active transactions performed for more than 8 seconds were detected.
APDEX score dropped
A notification is triggered when the APDEX score falls below 70. The example of Message is as follows:
The average apdex of
${pname}
in the last 5 seconds is${apdex_display}
CPU % is too high
A notification is sent when the CPU utilization of the node exceeds 80%. The example of message is as follows:
CPU utilization rate of the
${oname}
in the last minute >${_rule_}
%
CPU User % is too high
An alert occurs when the user's CPU utilization exceeds 50%. The example of Message is as follows:
CPU User utilization rate of the
${oname}
in the last minute >${_rule_}
%
The number of agents with high CPU SYS % is too large
An alert occurs when the system's CPU utilization exceeds 50%. The example of message is as follows:
The number of agents with a CPU SYS of 70% or more in the last minute >
${_rule_}
%
The Disk I/O is too high
A notification is sent when the disk I/O utilization exceeds 10%. The example of Message is as follows:
In the last minute,
${oname}
's Disk I/O >${_rule_}
%
The Disk Used % is too high
A notification is sent when the file system utilization exceeds 90%. The example of Message is as follows:
In the last minute,
${oname}
's Disk Used >${_rule_}
%
Network Traffic I/O is too high
An alert occurs when the network inbound/outbound traffic exceeds 10%. The example of Message is as follows:
In the last minute,
${oname}
's Network Traffic I/O >${_rule_}
%
Network Packet I/O is too high
An alert occurs when the network inbound/outbound packets exceed 10%. The example of Message is as follows:
In the last minute,
${oname}
's Network Packet I/O >${_rule_}
%
Network Error I/O is too high
An alert occurs when the network inbound/outbound errors exceed 10%. The example of Message is as follows:
In the last minute, The maximum value of the
${oname}
's Network Error I/O >${_rule_}
%
The kube-apiserver latency over 10 second
A notification is sent when the latency of kube-apiserver among the control plane components exceeds 10 seconds. However, the WATCH operation is excluded. The example of Message is as follows:
Latency of the
${verb}
verb in${instance}
of kube-apiserver exceeded${metricValue}
seconds.
The kube-apiserver response increase/decrease rate for error codes
Among control plane components, an alert occurs when the number of error responses from kube-apiserver exceeds 50 and the increase/decrease rate changes by more than 50%. The example of Message is as follows:
Rate of increase in the number of requests for code
${code}
on instance${instance}
of kube-apiserver exceeded${metricValue}
.