Skip to main content

Metrics alert

Home > Select Project > Alert > Event configuration > Select Metrics tab

What is the metrics event?

Metrics events are used to set more specific and complex events than basic events (application events, server events, and such). You can set events based on the metrics data being collected in real time from your projects. Depending on the usage, you can set any event by selecting one of two methods.

  • Metrics Event
  • Composite metrics event
Note

For more information about the metrics, see the following.

Metrics Event

Select Metrics on the screen in Alert > Event configuration. Select Add Alert Policy on the upper right of the screen. The Metrics event window appears.

Metrics Event

Note

For more information about the event templates for metrics, see the following.

Entry of basic information

  • Event name: Enter the name of the event to add.

  • Activate events: Select whether or not to activate events.

  • Templates: Events can be easily set after selecting a template. If the template is not to be used, select Disabled.

  • Category: The unit to identify the metrics data. It is a mandatory value for setting the metrics events.

    Metrics Event - Category

    • The Category selection options display Name and data collection interval, and Key. When setting up an event, use the key value of the category.

    • Category retrieves metrics data being collected from projects within the last 3 hours and displays them in a list. If the collection interval is not displayed in the Category selection options, you can select Enter it yourself option to enter a category key.

  • Level
    • It displays the alert level when an event occurs. The levels are Critical, Warning, and Info. When setting the Critical and Warning levels, the Notify when event status is resolved. selection option is enabled.

    • Notify when event status is resolved.: You can set whether or not to send notifications when the occurred event's status is resolved. This function can be enabled or disabled by selecting the toggle button.

  • Message
    • Enter a notification message to be displayed when events occur. By entering ${Tag} or ${Field}, you can apply the variable to the message. The key to enter in the variable must be included in Category of the selected metrics data. You can see the tags or field keys that can be entered in Metrics Search.

      Message example

    • If the Start icon button is clicked, you can see the history of previously entered messages.

  • Test alert

    By generating alerts based on the required items: Event name, Category, Level, and Message, this function checks the messages.

    Note

    To use a reception test, enter or select required items in Event name, Category, Level, and Message.

  • Event rule

    Event Rule

    By entering the Leader line 4 field, Leader line 5 operator selection, and Leader line 6 threshold value, set the event rule.

  • Filtering event targets

    Filtering event targets

    It filters the targets by entering the Leader line 7 tag, Leader line 8 operator selection, and Leader line 9 filtering value. If no input, alerts are sent to all agents.

Note
  • For available basic syntaxes and operators in Event rule and Filtering event targets, see the following.

  • For the Event rule and Filtering event targets options, you can select Selector or Typing.

  • After the event setting is made, the option value is managed as the Typing option. Afterwards, if switched to the Selector option, the option value can be initialized.

  • Upon entry of event occurrence conditions and targets, an error may occur, if you enter a field name that contains special characters (~!@#$%^&*()_+=-[]`) or begins with a number. In this case, select the Typing option and then enter values enclosing with curly brackets ($) as shown in the following example:

    ${4xxErrorType} == '401'

Notification setting

Notification Setting

  • Number of event: For the selected period, if the events set in Event rule occur as many as the input, an alert is sent.

    Note
    • If the selected time is set to Disabled, an alert is sent only when the events occur consecutively as many as the input.
    • If the option, Notify when event status is resolved. is activated, it is recommended to select Disabled as the selected time.
    • In the Category option, the collection cycle for the selected item is 5 seconds.
  • Event pause: This option can prevent excessive alert notifications from happening. No alerts are sent for the selected period after the first alert notification is generated. In addition, they are not recorded in Event history.

  • Related category: You can set the related categories up to 5 and see them when checking notifications.

  • Event reception tag: If this tag is selected, notifications can be sent to project members and 3rd-party plug-ins with the corresponding tags. If the event receiving tag is not selected, alerts are sent to all project members.

    Note

    In Alert > Notification setting, you can set the tags in project members and 3rd-party plug-ins.

Testing event rules

Alert Test

You can check how many alerts have occurred by enabling the event conditions you set for the selected time period. If you select Run, you can see the number of notifications, and the selected fields and thresholds are displayed on the chart when the event conditions are met.

Composite metrics event

To use the Composite metrics, you have to understand the following concepts:

The Composite metrics event can generate events by using more complex rules along with the metrics data and send alerts. Composite metrics can be used effectively in the following situations:

  • You have to make comprehensive decisions on data received from multiple agents.
  • You have to compare the past data with the current ones to make judgment.

Metrics events make judgment whenever metrics are received from the agents. On the other hand, the composite metrics event stores the metrics collected from each agent into the database. Then they are reviewed to judge the event. Because of this characteristic, the data from multiple agents can be used collectively or the past data can be used. However, there is a barrier to entry that requires to use MXQL, the WhaTap's unique data query language. Therefore, event templates are provided so that users can effectively set events only if they understand the basic MXQL. Basic MXQL users can apply events by just modifying the query for event target filtering and conditions.

  1. Select Metrics on the screen in Alert > Event configuration.

  2. In the Composite metrics section, select Add Alert Policy on the right.

  3. If the Composite metrics window appears, select Creating as a chart.

The Event Setting window appears.

Composite Metrics Event Setting

Note

To set a composite metrics event, you have to have the event setting role.

Query event data

Composite metrics The event creates event conditions by using MXQL, a metrics data query language. The Creating as a chart function provides a combo box function for automatic completion of MXQL. This template is used to query the event data, construct a chart, and then directly enter the event generation conditions. Select the Widget or Text option, and then configure the event.

Through the option to configure the time series charts, you can autocomplete MXQL for using when setting events.

Event data inquiry

  • Filter: Select an event condition target. Enter values for formula, tag, and filtering values to create filtering conditions.

    Filter

  • Group by: Select the grouped metrics data. You can select multiple items.

  • Time unit: Set the time criterion for dividing the grouped data. You can set it by selecting sec, Minutes, and Hour.

  • Field: Select fields to use as event generation conditions. You can select multiple items.

Notification

Enter basic data for alert settings.

  • Activate events: You can select to enable or disable the events by clicking the toggle button.

  • Level: Select a level among Fatal, Warning, and Info.

    Notify when event status is resolved.: You can set whether or not to send notifications when the occurred event's status is resolved. This function can be enabled or disabled by selecting the toggle button.

  • Title: Enter the title of the alert.

  • Message: Enter a notification message to be displayed when events occur. By entering ${Tag} or ${Field} key, you can apply the variable to the message. The key to enter in the variable must be included in Category of the selected metrics data. You can see the tags or field keys that can be entered in Metrics Search.

    Message example

Alert Policy

Enter the conditions to send alerts.

  • Time Range: Set the time range to view the MXQL real-time data for event conditions. You can use only the fields included for viewing the event data.

    Composite metrics events retrieve metrics in DB for later use. Therefore, first specify the time range to query data. If you select 5 minutes for the data lookup time, the event generation conditions are checked by searching for the data collected for the last 5 minutes. You can set it short when you set any event for recent data, or long when you want to approach statistically for a wide period.

    Note

    For actual usage examples, see the following.

  • Condition: Enter the fields, calculation rules, and thresholds reflected in MXQL.

Additional information

Set additional options that are related to receiving alerts.

  • Interval: Check the notification conditions at the selected time interval.

  • Silent: This option can prevent excessive alerts from happening. No alerts are sent for the selected period after the first alert notification is generated. In addition, they are not recorded in Event history.

  • Event reception tag: If you select an event receiving tag, alert notifications can be sent to project members and 3rd-party plug-ins with the tag. If the event receiving tag is not selected, alerts are sent to all project members.

    Note

    In Alert > Notification setting, you can set the tags in project members and 3rd-party plug-ins.

Test Event Rules

Testing event rules

You can check how many alerts have occurred by enabling the event conditions you set for the selected time period. If you select Run, you can see the number of notifications, and the selected fields and thresholds are displayed on the chart when the event conditions are met.

Most of what is included in Event Setting can be specified using MXQL. It provides the function to simulate whether MXQL has been properly written. The simulation function queries the past 24-hour data to make judgment, and then informs you how many metrics were queried and how many of them are successful.

Modifying and deleting metrics events

  1. Go to Alert > Event configuration and then select the Metrics tab.

  2. In the event list, select Edit icon at the utmost right of the item to modify or delete.

  3. If the metrics or composite metrics event setting window appears, modify each option and then select Save.

To delete the selected event, select Delete icon Delete on the upper right of the event setting window.

Guide to select generation conditions and targets

For the event generation conditions and selection of event targets on metrics alerts, use the same syntaxes. For event generation conditions, use the tag key as a variable. For selection of event targets, use the field key as a variable.

Basic syntax rules

  • If you just enter a string, it is recognized as a variable. If you enclose it in single quotation marks ('') or double quotation marks (""), it is recognized as text.

    oid == "oid"
    1. oid: variable
    2. ==: function
    3. "oid": text
    // In case oname is ott-1235

    // Normal cases
    onname = 'ott-1235' or onname = "ott-1235"

    // In abnormal cases, notification does not work.
    onname = ott-1235
  • If you just enter a number, it is recognized as number, and if you enclose it in single quotation marks ('') or double quotation marks (""), it is recognized as text.

    oid == 123
    1. oid: variable
    2. ==: function
    3. 123: number
    // In case oid is 123

    // Normal cases
    oid = 123

    // In abnormal cases, notification does not work.
    id == '123' or oid == "123"

Available operators

OperatorUsageDescription
==operand1== operand2It checks whether operand1 is equal to operand2.
!=operand1 != operand2It checks whether operand1 and operand2 have different values.
>operand1 > operand2Check whether the operand1 value is greater than the operand2 value.
>=operand1 >= operand2Checks whether the operand1 value is greater than or equal to the operand2 value.
<operand1 < operand2Check whether the operand1 value is less than the operand2 value.
<=operand1 <= operand2Check whether the operand1 value is less than or equal to the operand2 value.
likeoperand1 like operand2Search with patterns whether operand1 includes operand2.
&&expression1 && expression2Check whether expression1 and expression2 are all true.
andexpression1 and expression2Check whether expression1 and expression2 are all true.
The operator plays the same role as &&.
orexpression1 or expression2Check whether expression1 and expression2 are all true.
The operator plays the same role as **

Usage of like

You can conveniently search for embedded strings via the wildcard (*).

  • Searching for strings that start with a specific keyword


    Key like "Value*"

  • Searching for strings that end with a specific keyword


    Key like "*Value"

  • Searching for strings that include a specific keyword


    Key like "*Value*"

  • The wildcard (*) cannot be used in the middle of keywords.


    // Unsupported syntax
    Key like "Va*lue"

  • If you omit the wildcard (*) in the like operator, it operates as equals (==).


    // The following two statements have the same result.
    Key like "Value"
    Key == "Value"

Available functions

MethodUsageDescription
startsWithstartsWith(param1, param2)If the value whose param1 is the key starts with param2, the result is true. Otherwise, the result is false.
endsWithendsWith(param1, param2)If the value whose param1 is the key ends with param2, the result is true. Otherwise, the result is false.
isNullisNull(param1)If param1 is null, the value becomes true. Otherwise, the value becomes false.
isNotNullisNotNull(param1)If param1 is not null, the value becomes true. Otherwise, the value becomes false.
isEmptyisEmpty(param1)If param1 is null or EmptyString(""), the value becomes true. Otherwise, the value becomes false.
isNotEmptyisNotEmpty(param1)If param1 is not null nor EmptyString(""), the value becomes true. Otherwise, the value becomes false.

startsWith

startsWith(Key, "Value")

endsWith

endsWith(Key, "Value")

isNull

isNull(Key)

isNotNull

isNotNull(Key)

isEmpty

isEmpty(Key)

isNotEmpty

isNotEmpty(Key)

Template

Metrics Event Templates

Note

For more information about the reason field in the Kubernetes event, see Kubernetes official documentation.

BackOff

The alert occurs when BackOff is entered several times in the reason field in a Kubernetes event. The Message example is as follows:

Kubernetes Event (Kube Event ${message})

Evicted

The alert occurs when Evicted is entered several times in the reason field in a Kubernetes event. The Message example is as follows:

Kubernetes Event (Kube Event ${message})

FailedCreatePodSandBox

The alert occurs when FailedCreatePodSandBox is entered several times in the reason field in a Kubernetes event. The Message example is as follows:

Kubernetes Event (Kube Event ${message})

FailedMount

The alert occurs when FailedMount is entered several times in the reason field in a Kubernetes event. The Message example is as follows:

Kubernetes Event (Kube Event ${message})

FailedScheduling

The alert occurs when FailedScheduling is entered several times in the reason field in a Kubernetes event. The Message example is as follows:

Kubernetes Event (Kube Event ${message})

FailedSync

The alert occurs when FailedSync is entered several times in the reason field in a Kubernetes event. The Message example is as follows:

Kubernetes Event (Kube Event ${message})

NodeNotReady

The alert occurs when NodeNotReady is entered several times in the reason field in a Kubernetes event. The Message example is as follows:

Kubernetes Event (Kube Event ${message})

Unhealthy

The alert occurs when Unhealthy is entered several times in the reason field in a Kubernetes event. The Message example is as follows:

Kubernetes Event (Kube Event ${message})

Notification of utilization based on container CPU quota

The alert occurs when the total CPU usage (${cpu_per_quota}) based on the container's CPU limit is 70% or more. The Message example is as follows:

The CPU utilization of container ${oname} in ${okindName} is high, ${cpu_per_quota}% >= 70%.

Container memory fail count

If the container's memory limit is reached more than once, the alert occurs. The Message example is as follows:

Because the ${oname} container of ${okindName} exceeded the limit, the ${mem_failcnt} increased.

Container memory utilization

The alert occurs when the usage (${container.mem_percent}) based on the container's memory limit is 90% or more. The Message example is as follows:

The memory usage of ${oname} container in ${okindName} is ${container.mem_percent}% >= 90%.

Container DEAD status notification

The corresponding alert occurs when the container status code is 100. The status code 100 indicates DEAD. The example of Message is as follows:

Container ${oname} is in DEAD state.

Cluster CPU request notification

The corresponding alert occurs when the available amount of CPU for node allocation divided by the total limit CPU and multiplied by 100 is 80% or more. The example of Message is as follows:

The CPU request (minimum required resource) is more than 80% of the cluster CPU allocation.

Cluster memory request notification

The corresponding alert occurs when the available amount of memory for node allocation divided by the total limit memory and multiplied by 100 is 80% or more. The example of Message is as follows:

Memory Request (minimum required resource) is more than 80% of the cluster memory allocation.

Cluster CPU request notification

The corresponding alert occurs when the available amount of CPU for node allocation divided by the total limit CPU and multiplied by 100 is 60% or more. The example of Message is as follows:

The CPU request (minimum required resource) is more than 60% of the cluster CPU allocation.

Cluster memory request notification

The corresponding alert occurs when the available amount of memory for node allocation divided by the total limit memory and multiplied by 100 is 60% or more. The example of Message is as follows:

Memory Request (minimum required resource) is more than 60% of the cluster memory allocation.

Notification of the number of master pods

The alert occurs when there are no Pods that can be allocated to the node. The Message example is as follows:

The number of PODs that can be allocated to the master is 0.

Node CPU utilization notification

The alert occurs when the node's CPU utilization (${cpu}) is 70% or more. The Message example is as follows:

The CPU usage of ${oname} is ${cpu}% >= 70%.

Node memory utilization notification

The alert occurs when the node's memory utilization (${memory_pused}) is 90% or more. The Message example is as follows:

The memory usage of ${oname} is ${memory_pused}% >= 90%.

Unassignable node notification

The corresponding alert occurs when the available number of Pods that can be assigned to a node is zero.

APDEX

The alert occurs when the transaction exists and the APDEX score is less than 0.7. The Message example is as follows:

The alert occurs when the number of Pods that can be allocated to the node is 0. The Message example is as follows:

APDEX is lower than 0.7 (${oname})

Composite Metrics Event Templates

Inactive agents has been found

An alert occurs when the number of active agents is less than the specified value. An example of Message is as follows:

${ip} ${okindName} The number of active agents has decreased to ${num_of_current_agents}.

TPS has changed by more than 30% compared to the previous week

An alert occurs when the application's TPS changes by more than 30% compared to that of the previous week. An example message is as follows:

${okindName} a week ago : ${prev_week_tps_display}, current : ${current_tps_display},  difference : ${one_week_diff_display}

Very slow active transactions are detected

An alert occurs when the number of transactions that exceed 8 seconds in the application exceeds 10 on average. An example message is as follows:

${okindName} ${very_slow_tx_cnt_m5_avg_display} active transactions performed for more than 8 seconds were detected.

APDEX score dropped

An alert occurs when the APDEX score falls below 70. An example of Message is as follows:

The average apdex of ${pname} in the last 5 seconds is ${apdex_display}

CPU % is too high

An alert occurs when the node's CPU utilization exceeds 80%. An example message is as follows:

CPU utilization rate of the ${oname} in the last minute > ${_rule_} %

CPU User % is too high

An alert occurs when the user's CPU utilization exceeds 50%. An example of Message is as follows:

CPU User utilization rate of the ${oname} in the last minute > ${_rule_} %

The number of agents with high CPU SYS % is too large

An alert occurs when the system's CPU utilization exceeds 50%. An example message is as follows:

The number of agents with a CPU SYS of 70% or more in the last minute > ${_rule_} %

The Disk I/O is too high

An alert occurs when the disk I/O utilization exceeds 10%. An example of Message is as follows:

In the last minute, ${oname}'s Disk I/O > ${_rule_} %

The Disk Used % is too high

An alert occurs when the file system's utilization exceeds 90%. An example of Message is as follows:

In the last minute, ${oname}'s Disk Used > ${_rule_} %

Network Traffic I/O is too high

An alert occurs when the network inbound/outbound traffic exceeds 10%. An example of Message is as follows:

In the last minute, ${oname}'s Network Traffic I/O > ${_rule_} %

Network Packet I/O is too high

An alert occurs when the network inbound/outbound packets exceed 10%. An example of Message is as follows:

In the last minute, ${oname}'s Network Packet I/O > ${_rule_} %

Network Error I/O is too high

An alert occurs when the network inbound/outbound errors exceed 10%. An example of Message is as follows:

In the last minute, The maximum value of the ${oname}'s Network Error I/O > ${_rule_} %

The kube-apiserver latency over 10 second

Among control plane components, an alert occurs when the latency of kube-apiserver exceeds 10 seconds. However, WATCH actions are excluded. An example of Message is as follows:

Latency of the ${verb} verb in ${instance} of kube-apiserver exceeded ${metricValue} seconds.

The kube-apiserver response increase/decrease rate for error codes

Among control plane components, an alert occurs when the number of error responses from kube-apiserver exceeds 50 and the increase/decrease rate changes by more than 50%. An example of Message is as follows:

Rate of increase in the number of requests for code ${code} on instance ${instance} of kube-apiserver exceeded ${metricValue}.