Skip to main content

Parsing the logs

The log parser helps you transform the irregular logs into a structured form. WhaTap Log Monitoring provides two types of parsers.

  • GROK parser: The logs collected in arbitrary forms are parsed using regular expressions and GROK syntax.

  • JSON parser: The logs collected in the JSON form are parsed.

Note

Common precautions

  • If multiple parsers are registered in the same category, only the first matching parser is applied.

  • WhaTap has the role to disable the parsers that may affect the stability of the WhaTap service.

GROK parser

If the logs are collected in an irregular form, you can use the GROK parser to parse them. GROK syntax provides named regular expressions, allowing you to use regular expressions more easily.

See the following video guide for more information about the GROK parser pattern registration.

Starting GROK

GROK provides two types of syntaxes.

  1. %{SYNTAX:SEMANTIC}: Syntax provided in the GROK library. Using the named regular expressions, you can extract tags. For usage examples, see the following.

    • SYNTAX: Specify the named regular expressions provided by GROK.

    • SEMANTIC: Specify the name for the matched value.

    Note

    named regular expressions

    The syntax is provided by GROK. This function provided by GROK allows you to assign names to complex regular expressions.

    nameregular expression
    WORD\b\w+\b
    SPACE\s*
    NOTSPACE\S+
    UUID[A-Fa-f0-9]{8}-(?:[A-Fa-f0-9]{4}-){3}[A-Fa-f0-9]{12}

    To see all named regular expressions provided by WhaTap, see the following link.

  2. (?<SEMANTIC>REGX): The named capturing group syntax for regular expressions. You can use regular expressions to extract tags according to the user's intent. For usage examples, see the following.

    • SEMANTIC: Specify the name for matched values.

    • REGX: Enter regular expressions for matching.

    Note

    named capturing group

    It is the syntax provided by regular expressions.

    • capturing group: This indicates the function to bundle multiple tokens into one as a single matching unit.

    • named capturing group: Name assigned to the capturing group.

    • Let's look at the string matching example. dev@whatap.io

      • Example 1 (\w+)@(\w+\.\w+)
      • Example 2: When the entire email is matched and the username and domain are matched additionally, (?<username>\w+)@(?<domain>\w+\.\w+)

%{SYNTAX:SEMANTIC} Usage example

The following is an example that uses the %{SYNTAX:SEMANTIC} syntax.

Sample log
[2023-08-08 02:02:30,101 GMT][INFO ][i.w.y.l.c.LogSinkDexScheduleThread.realProcess(159)] 8 VirtualLog 20230808 02:01:00.000 {area=4, city=5} 56ms

You can look at the sample log to infer what each word means. When replacing each part with semantic words, it can be expressed as follows:

semantic replace
[date][logLevel][caller] projectCode logCategiry dexBuildStartTime {area=areaEnum, city=cityEnum} dexBuildElapsed

Any semantic word can be replaced with a regular expression. The GROK parser allows you to use the predefined named regular expressions. TIMESTAMP_ISO8601, LOGLEVEL, and DATA used here are named regular expressions provided by GROK. These values are matched after replacing each of them with the following regular expression.

  • name: TIMESTAMP_ISO8601

    • regular expression: %{YEAR}-%{MONTHNUM}-%{MONTHDAY}[T ]%{HOUR}:?%{MINUTE}(?::?%{SECOND})?%{ISO8601_TIMEZONE}?
  • name: LOGLEVEL

    • regular expression: LOGLEVEL ([Aa]lert|ALERT|[Tt]race|TRACE|[Dd]ebug|DEBUG|[Nn]otice|NOTICE|[Ii]nfo|INFO|[Ww]arn?(?:ing)?|WARN?(?:ING)?|[Ee]rr?(?:or)?|ERR?(?:OR)?|[Cc]rit?(?:ical)?|CRIT?(?:ICAL)?|[Ff]atal|FATAL|[Ss]evere|SEVERE|EMERG(?:ENCY)?|[Ee]merg(?:ency)?)
  • name: DATA

    • regular expression: .*?
GROK parsing pattern
\[%{TIMESTAMP_ISO8601:date}\sGMT\]\[%{LOGLEVEL:level}\s\]\[%{DATA:caller}\]

If you parse with the above syntax, you can extract tags as follows. In this way, GROK's %{SYNTAX:SEMANTIC} syntax helps you apply complex and long regular expressions easily and concisely.

Tag extraction
- date : 2023-08-08 02:02:30,101
- caller : i.w.y.l.c.LogSinkDexScheduleThread.realProcess(159)
- level : LEVEL

(?<SEMANTIC>REGX) Usage example

The parts that do not match with the named regular expressions can be parsed using the (?<SEMANTIC>REGX) pattern. In the sample log above, the parts that cannot be parsed with the %{SYNTAX:SEMANTIC} syntax are as follows.

Unparsed area
8 VirtualLog 20230808 02:01:00.000 {area=4, city=5} 56ms

When replacing each part of the log with semantic words, it can be expressed as follows.

semantic replace
projectCode logCategiry dexBuildStartTime {area=areaEnum, city=cityEnum} dexBuildElapsed

These irregularly strings can be parsed using the (?<SEMANTIC>REGX) syntax.

Regular expressions that match for each sample log parsing keyword
Parsing keyword(?<SEMANTIC>REGX)
8(?<projectCode>\d)
VirtualLog(?<logCategory>\w*)
20230808 02:01:00.000(?<dexBuildStartTime>\d{8}\s\d{2}:\d{2}:\d{2}\.\d{3})
area=4area=(?<areaEnum>\d)
city=5city=(?<cityEnum>\d)
56ms(?<dexBuildElapsed>\d{2})ms
Basic regular expression syntax
Syntax ruleMeaningAlias
?0 or 1-
+1 or more-
*0 or more-
a{5}exactly 5-
\wword character[a-zA-Z_0-9]
\swhite space-
.any character except newline
[abc]any of-
[^abc]not a,b, or c-
[a-z]character between a and z-
[1-3[7-9]]union (combining two or more character classes)-
[1-6&&[3-9]]intersection-
[0-9&&[^2468]]subtraction-
a{2,}2 or more-
a{1,3}between 1 and 3-
a+?match as few as possible-
{2,3}?match as few as possible-
(abc)capturing group (processes multiple strings as a single unit)-
\ddigit[0-9]
\Dnon-digit[^0-9]
\Wnon-word character-
\Snon-white space-

By connecting the parsed keywords with space (\s) and special character escapes (\{, \,, \}), you can apply the pattern as follows.

GROK parsing pattern
(?<projectCode>\d)\s(?<logCategory>\w*)\s(?<dexBuildStartTime>\d{8}\s\d{2}:\d{2}:\d{2}\.\d{3})\s\{area=(?<areaEnum>\d),\scity=(?<cityEnum>\d)\}\s(?<dexBuildElapsed>\d{2})ms

If you parse with the above syntax, you can extract tags as follows.

Tag extraction
- projectCode : 8
- logCategory : VirtualLog
- dexBuildStartTime : 20230808 02:01:00.000
- areaEnum : 4
- cityEnum : 5
- dexBuildElapsed : 56

Applying GROK

Log Configuration > Log primary parser setting

  1. To apply the GROK pattern parser, go to the Log primary parser setting tab in Log Configuration.

    Log parser list

  2. Select + Add and then select the GROK parser in the Parser field.

    Add Log Parser

  3. When you select Register pattern, the pattern registration and simulation window appears on the right.

  4. Enter Pattern and Log, and then click Simulation to check whether parsing is successful with the pattern to apply.

    Pattern example: \[%{TIMESTAMP_ISO8601:date}\sGMT\]\[%{LOGLEVEL:level}\s\]\[%{DATA:caller}\]\s(?<projectCode>\d)\s(?<logCategory>\w*)\s(?<dexBuildStartTime>\d{8}\s\d{2}:\d{2}:\d{2}\.\d{3})\s\{area=(?<areaEnum>\d),\scity=(?<cityEnum>\d)\}\s(?<dexBuildElapsed>\d{2})ms

    Log example: [2023-08-08 02:02:30,101 GMT][INFO ][i.w.y.l.c.LogSinkDexScheduleThread.realProcess(159)] 8 VirtualLog 20230808 02:01:00.000 {area=4, city=5} 56ms

  5. If the simulation is successful, you can view Simulation result and Performance measurement results.

    Parser simulation and performance measurement

  6. When you click Apply pattern after simulation, the pattern that has been entered for the selected parser is applied.

  7. After applying the pattern, enter values for Category, Log detection condition, and Pattern.

    GROK parser input

    • Category

      Select a log category. Category is required.

    • Log detection condition
      • Only the logs that meet the conditions are applied to the parser.

      • Select or enter values for Search key and Search value.

      • Log detection condition are applied before all parsers are run. Accordingly, you cannot use the Tag that have been added for the parser.

  8. Select Add to register a parser.

Note
  • In the log parser list, you can change the Priority or Enable, Edit, or Delete the parser.

  • After parser simulation, you can register a pattern.

Note

GROK parser precautions

  • GROK parser supports two patterns: %{SYNTAX:SEMANTIC} and %{SYNTAX:SEMANTIC}.

  • When using the %{SYNTAX:SEMANTIC} pattern, SEMANTIC must be entered.

  • When using the %{SYNTAX:SEMANTIC} pattern, SEMANTIC must be unique in a parser.

  • When using the (?<SEMANTIC>REGX) pattern, SEMANTIC can only contain characters (a-z, A-Z), numbers (0-9), and specified special characters (., _, -).

  • SEMANTIC must start with a character (a-z, A-Z).

  • SEMANTIC must end with a character (a-z, A-Z) or number (0-9).

JSON parser

If logs are collected in JSON format, they can be easily parsed using the JSON parser.

Applying the JSON

Log Configuration > Log primary parser setting

  1. To apply the JSON pattern parser, go to the Log primary parser setting tab in Log Configuration.

    Log parser list

  2. Select + Add and then select the JSON parser in the Parser field.

    Add Log Parser

  3. When you select Register pattern, the pattern registration and simulation window appears on the right.

  4. Enter Pattern and Log, and then click Simulation to check whether parsing is successful with the pattern to apply.

    Example
    2023-08-08 02:43:28,615 -- {"host":"10.21.3.24","method":"POST","status":"200","url":"http://devote.whatap.io/yard/api/flush"} --

    In the example log, if Prefix and Postfix are specified as -- and url is specified in Ignore, only three Tag (host, method, status) are created.

    • Prefix
      Specify the position where the JSON format begins in the log. If the entire log is in JSON format, set an empty value.

    • Postfix
      Specify the position where the JSON format ends in the log. If the entire log is in JSON format, set an empty value.

    • Ignore
      Specify the key for which tags are not created during JSON formatting.

  5. If the simulation is successful, you can view Simulation result and Performance measurement results.

    Parser simulation and performance measurement

  6. When you click Apply pattern after simulation, the pattern that has been entered for the selected parser is applied.

  7. After applying the pattern, enter values for Category, Log detection condition, and Pattern.

    Json parser input

    • Category

      Select a log category. Category is required.

    • Log detection condition
      • Only the logs that meet the conditions are applied to the parser.

      • Select or enter values for Search key and Search value.

      • Log detection condition are applied before all parsers are run. Accordingly, you cannot use the Tag that have been added for the parser.

  8. Select Add to register a parser.

Note
  • In the log parser list, you can change the Priority or Enable, Edit, or Delete the parser.

  • Before registering a parser, you can check whether the pattern to register is normal through Simulation. It is the same as the GROK parser registration simulation process. See the following.

Example of using JSON format

Sample log
{"host":"10.21.3.24","method":"POST","status":"200","url":"http://devote.whatap.io/yard/api/flush"}

If the above sample log has been collected, select the JSON parser in the Parser Add window. Without having to write a complex parsing logic, you can extract the Tag for log analysis as follows:

Tag extraction
- host : 10.21.3.24
- method : POST
- status : 200
- url : http://dev.whatap.io/yard/api/flush

Usage example when configuring part of the JSON format

Some JSON format sample log
2023-08-08 02:43:28,615 -- {"host":"10.21.3.24","method":"POST","status":"200","url":"http://devote.whatap.io/yard/api/flush"} --

If part of the log is in JSON format as in the example, specify Prefix and Postfix. WhaTap log monitoring recognizes and parses the area between Prefix and Postfix in JSON format.

Tag extraction
- host : 10.21.3.24
- method : POST
- status : 200
- url : http://dev.whatap.io/yard/api/flush