Custom Parser Syntax
Important
The regular expression syntax supported by Taegis XDR Custom Parsers is the Golang variant.
Statements ⫘
!SAMPLE=... ⫘
A sample message. Everything to the right of the =
is interpreted literally, all the way up to a newline. This field is optional, but strongly encouraged.
!SCHEMA=... ⫘
This is the schema for this message type, for example scwx.nids
, scwx.netflow
, scwx.auth
. If not specified, the schema from the parent or closest ancestor is used.
!CONFIRMWITH ⫘
This is either PATTERN
or EXPRESSION
. This works in tandem with !CONFIRMSTRING to determine if a message matches this parser. If set to PATTERN
, then CONFIRMSTRING is a regex pattern. If set to EXPRESSION
, then CONFIRMSTRING is an expression that evaluates to True/False.
!CONFIRMSTRING= ⫘
See !CONFIRMWITH.
!DISABLED= ⫘
This disables the parser. The parser is completely removed from the runtime catalog. This is useful when you don’t yet know how to handle a message but want to capture minimal documentation of its existence.
!IMPORT= ⫘
Import another parser into this parser at the current line. Variables are shared between the importing and imported parser. This allows repeating lines of parser code to be consolidated into one place.
!IMPORTONLY ⫘
Indicates that this parser is only for import (via !IMPORT). With extremely rare exceptions, all imported parsers should be !IMPORTONLY. This flag exempts the parser from many validation rules (For example, it doesn’t have to have a parent parser, no CONFIRMWITH/CONFIRMSTRING, etc.)
!TRIMALLOFF ⫘
This disables the default behavior of running TRIM_ALL()
for all parsers. In some cases, this causes problems as TRIM_ALL()
removes leading or trailing braces ({
and }
and also [
and ]
), which leads to incorrect data for Json fields.
!SANITIZEALLOFF ⫘
This disables the default behavior of running SANITIZE_ALL()
for all parsers.
Regular Expression Capturing Groups ⫘
Capturing groups can be used to extract values from an unstructured log message or portion of a log message.
The syntax for a capture group is {captureVariable} = {sourceString}|({regex pattern})
. Resulting matches are stored in a list and can be referenced using the captureVariable and the array value; e.g., captureVariable[1].
Capture groups can also be named using {captureVariable} = {sourceString}|(?P<group_name>{regex pattern})
. Resulting matches can be referenced using the captureVariable and the group name; e.g., captureVariable["group_name"].
Examples ⫘
# The pattern is read unescaped to the end of the line.
jsonMatch = originalData$|(\{.*})$
# To find patterns such as an IP address
# originalData = Dec 10 16:49:10 10.10.70.10 Dec 10 10:49:10 dddd-aaabbb-01 dddd-aaabbb-01
queryCapture = originalData$|\s\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}\s
# queryCapture = 10.10.70.10
# To capture a value after a field name
# message = Source Network Address: 10.17.2.186 Source Port: 54692
srcIp = message|Source Network Address:\s+(\S+)\s+
# srcIp = 10.17.2.186
# An example where the '%' character isn't scaped as in Golang regex
# message = %AAA-6-RADIUS_IN_GLOBAL_LIST: radius_db.c:481 RADIUS ACCT
topLevel = message|\s+%(\w+)-(\w+-)?(\d+)-(\w+):\s+(.+)
part = topLevel[1]
# part = AAA
# Named capture group example
# message = Aug 21 12:12:20 10.194.72.254 1 1566000000.000000000 IDOFDEVICE flows src=192.168.0.5 dst=192.168.0.255 mac=DE:ED:BE:EF:AB:AB protocol=udp sport=49154 dport=1128 pattern: deny (src 192.168.0.0/24)
partA = message|^(?P<prefix>0|1)\s+(?P<timestamp>\d+\.\d+)\s+(?P<idofdevice>\S+)\s+(?P<logtype>ip_flow|events|airmarshal_events|flows|security_event|ids-alerts|urls|.*firewall)\s+(?P<remainder>.*)
timestamp = partA["timestamp"]
device = partA["idofdevice"]
logtype = partA["logtype"]
# timestamp = 1566000000.000000000
# device = IDOFDEVICE
# logtype = flows
Functions ⫘
SPLIT(data, delimiter, makeGreedy) ⫘
Splits data
into tokens separated by delimiter
. For example, if (optional) makeGreedy
is "true"
then the data of 0,,2
with a delimiter of ,
is evaluated to [0,2]
instead of [0,'',2]
.
Example ⫘
data = "aaa,bbb,ccc,eee"
values1 = SPLIT(data, ",", FALSE)
OUTPUT1$ = values1[3]
#OUTPUT1$: eee (String)
data = "aaa,bbb,ccc,,eee"
values1 = SPLIT(data, ",", FALSE)
values2 = SPLIT(data, ",", TRUE)
OUTPUT1$ = values1[3]
OUTPUT2$ = values2[3]
#OUTPUT1$: NULL (null)
#OUTPUT2$: eee (String)
SPLIT_NAME_VALUES(data, delimiter, separator, quoteChar) ⫘
Splits data
into a collection of name/value pairs where delimiter
separates the pairs and separator
separates the name vs value. quotechar
indicates the character for quoting the value.
Example ⫘
data = "User: Unknown, InitiatorPackets: 2, ResponderPackets: 1, InitiatorBytes: 120, ResponderBytes: 66"
dict = SPLIT_NAME_VALUES(data, ",", ":", "\\")
OUTPUT$ = dict["InitiatorBytes"]
# OUTPUT$: 120 (String)
JSON(data) ⫘
Converts data
into a json object that can be accessed with square brackets containing a json path. See https://goessner.net/articles/JsonPath/ and https://github.com/ohler55/ojg.
Example ⫘
data= "{ \"store\": { \"book\": [ { \"category\": \"reference\", \"author\": \"Nigel Rees\", \"title\": \"Sayings of the Century\", \"price\": 8.95 }, { \"category\": \"fiction\", \"author\": \"Evelyn Waugh\", \"title\": \"Sword of Honour\", \"price\": 12.99 }, { \"category\": \"fiction\", \"author\": \"Herman Melville\", \"title\": \"Moby Dick\", \"isbn\": \"0-553-21311-3\", \"price\": 8.99 }, { \"category\": \"fiction\", \"author\": \"J.R. R. Tolkien\", \"title\": \"The Lord of the Rings\", \"isbn\": \"0-395-19395-8\", \"price\": 22.99 } ], \"bicycle\": { \"color\": \"red\", \"price\": 19.95 } } }"
json= JSON(data)
OUTPUT$ = json["$.store.book[*].author"]
# OUTPUT$: "Nigel Rees","Evelyn Waugh","Herman Melville","J. R. R. Tolkien" (String)
Example usage for JSON keys that contain dots:
data= "{ \"store\": { \"book\": [ { \"id.category\": \"reference\" } ] } }"
json= JSON(data)
OUTPUT$ = json["$.store.book[0][\"id.category\"]"]
# OUTPUT$: reference(String)
CEF(data) ⫘
Parses data
as a CEF-formatted message. The header fields can be accessed with an integer and the named fields can be accessed by name.
Example ⫘
!SAMPLE=Nov 6 07:49:03 10.42.0.1 %helloWorld: CEF:0|Check Point|VPN-1 & FireWall-1|Check Point|Log|Address spoofing|Unknown|act=Drop cs3Label=Protection Type cs3=IPS
values = CEF(originalData$)
OUTPUT1$= values[2]
OUTPUT2$= values["act"]
OUTPUT3$= values["Protection Type"]
# OUTPUT1$: VPN-1 & FireWall-1 (String)
# OUTPUT2$: Drop (String)
# OUTPUT3$: IPS (String)
LEEF(data, delimiterOverride) ⫘
Parses data
as a LEEF-formatted message. The header fields can be accessed with an integer and the named fields can be accessed by name. Optionally, a delimiter override may be specified. LEEF extensions should be either tab-separated or they should indicate an alternate delimiter in field 6 of the header. The override parameter should be used when you know that a device is not compliant with the standard.
DATETIME(data, fmt, handle2DigitYear) ⫘
Converts a string to a time value for fields like EventTimeUsec$
. Also accepts time.Parse
format strings (optional). If handle2digitYear
is TRUE, an appropriate year is chosen; usually the current year with an edge case around the new year.
Example ⫘
data = "Sep 21 2018 17:35:54"
OUTPUT1$ = DATETIME(data, "Jan 02 2006 15:04:05")
OUTPUT2$ = data
# OUTPUT1$: 2018-09-21 17:35:54 +0000 UTC (time)
# OUTPUT2$: Sep 21 2018 17:35:54 (String)
IS_PRIVATE_IP(string) ⫘
Returns boolean if the passed in (IP address) string is in the private IP range. Currently only supports IPv4 and tests against the private IP ranges defined in RFC1918.
Example ⫘
data1 = "10.0.0.1"
data2 = "11.0.0.1"
OUTPUT1$ = IS_PRIVATE_IP(data1)
OUTPUT2$ = IS_PRIVATE_IP(data2)
# OUTPUT1$: true (bool)
# OUTPUT2$: false (bool)
IS_VALID_IP(string) ⫘
Returns boolean if the passed in string is a valid IP address, leveraging net.ParseIP.
Example ⫘
data1 = "10.0.0.1"
data2 = "999.255.255.255"
data3 = "2001:0db8:85a3:0000:0000:8a2e:0370:7334"
OUTPUT1$ = IS_VALID_IP(data1)
OUTPUT2$ = IS_VALID_IP(data2)
OUTPUT3$ = IS_VALID_IP(data3)
# OUTPUT1$: true (bool)
# OUTPUT2$: false (bool)
# OUTPUT3$: true (bool)
REPLACE(data, oldString, newString) ⫘
Replaces all occurrences of oldString with newString.
Example ⫘
data = "aaaBBBaaaCCC"
OUTPUT$ = REPLACE(data, "aaa", "zzz")
# OUTPUT$: zzzBBBzzzCCC (String)
REPLACE_REGEX(data, pattern, newString) ⫘
Replaces all occurrences of oldPattern with newString.
Example ⫘
data = "aaaBBBaaaCCC"
OUTPUT$ = REPLACE_REGEX(data, "a+", "z")
# OUTPUT$: zBBBzCCC (String)
STRLEN(string) ⫘
Returns the length of the passed in string. On error, returns -1; if NULL Type passed, returns 0 (zero, no error).
Example ⫘
data = "1234567890"
OUTPUT$ = STRLEN(data)
# OUTPUT$: 10 (int)
UPPERCASE(string) ⫘
Returns the passed in string with all Unicode letters mapped to their upper case; just an interface/wrapper for strings.ToUpper().
Example ⫘
data = "aaabbbccc acme"
OUTPUT$ = UPPERCASE(data)
# OUTPUT$: AAABBBCCC ACME (String)
LOWERCASE(string) ⫘
Returns the passed in string with all Unicode letters mapped to their lower case; just an interface/wrapper for strings.ToLower().
Example ⫘
data = "AAABBBCCC ACME"
OUTPUT$ = LOWERCASE(data)
# OUTPUT$: aaabbbccc acme (String)
SANITIZE_ALL() ⫘
Cleans up null/empty values in event field variables. For example, all of these are set to null: " ", "N/A", "n/a", "null", "nil", "-". This function is run by default on all parsers unless disabled with !SANITIZEALLOFF
.
Example ⫘
data = "N/A"
OUTPUT$ = data
# OUTPUT$: NULL (null)
!SANITIZEALLOFF
data = "N/A"
OUTPUT$ = data
# OUTPUT$: N/A (String)
TRIM(data) ⫘
Removes whitespace, quotes, braces etc.
Example ⫘
data = " aaa bbb bcc "
OUTPUT1$ = "---" + data
OUTPUT2$ = "---" + TRIM(data)
# OUTPUT1$: --- aaa bbb bcc (String)
# OUTPUT2$: ---aaa bbb bcc (String)
TRIM_ALL() ⫘
Removes whitespace from the beginning/end of all event field variables. This function is run by default on all parsers unless disabled with !TRIMALLOFF
.
Example ⫘
!TRIMALLOFF
data = " aaa bbb bcc "
OUTPUT2$ = data
# OUTPUT2$: aaa bbb bcc (String)
ADDFIELD(collection, fieldName, fieldValues) ⫘
Adds a field to an array of objects. The values of the field for each object are specified by fieldValues (also an array). The name of the new field is specified by fieldName. If collection is NULL, a new array of objects is created, each with a single field (fieldName) with the provided values.
Example ⫘
keys = ["httpSourceName", "httpSourceId"]
values = [json["$.httpSourceName"], json["$.httpSourceId"]]
eventMetadata$.record$ = ADDFIELD(NULL, "key$", keys)
eventMetadata$.record$ = ADDFIELD(eventMetadata$.record$, "value$", values)
# event_metadata = {
# "httpSourceName": json["$.httpSourceName"]
# "httpSourceId": json["$.httpSourceId"]
# }
URL_PARSE(url, silent) ⫘
Parse a URL.
Tip
For more on working with parsing, see Creating, Editing, and Enabling a Custom Parser in XDR.
If silent
is true, this does not throw an error in the case of the URL being invalid, and instead nulls all fields. For badly formatted URLs, it always attempts to extract as much as possible. The expected passed in URL format is one of:
scheme:opaque?query#fragment
scheme://userinfo@host/path?query#fragment
Examples ⫘
http://user:password@192.1.1.1:8080/1/asdfasdfasdf.html?key=value&key2=value2#topOfTheMorning
hTtps://Example.com:443/here//is/path.html?a=1+6&x=%2f%2Fkey=%41%0Avalue&b=ddd#top
https://example.com/foo/bar/bar/../baz.html?a=1&b=2
example.com/foo/bar/bar/../baz.html?a=1&b=2
If the scheme is not provided (for example, example.com/index.html
instead of http://example.com/index.html
), then http is assumed and returned in the scheme
value.
The resulting collection object contains the following values, if possible, given the URL:
- scheme - normalized; the given scheme converted to lower case or
http
if not provided. - user - the passed in user, if provided.
- host_raw - not normalized; the passed in host including the port, for example
Example.com:443
- host - normalized host; all lower case, and not including the port, for example
example.com
- port - the extracted port, if present
- path_raw - not normalized; the passed in path, for example
/foo/bar/bar/../baz.html
. Does not including a trailing?
, even if there is a query part of the URI. - path - normalized path, for example
/foo/bar/baz.html
. does not including a trailing?
, even if there is a query part of the URI. Normalizations done:- Characters are URI decoded. (Single pass so
%253D
is%3D
not=
)/fo%6F/bar.html
→/foo/bar.html
- Multiple forward slashes are reduded to a single one.
/foo///bar.html
→/foo/bar.html
- Directory traversal sequences are removed.
/foo/../bar/
→/bar/
/foo/./bar/
→/foo/bar/
- Characters are URI decoded. (Single pass so
- query_raw - not normalized; the passed in query string, for example
a=1+6&x=%2f%2Fkey=%41%0Avalue&b=ddd
. Does not include a leading?
but does preserve order. (Single pass so%253D
is%3D
not=
) - query - normalized query string. Does not include a leading
?
but does preserve order. URI decoding done; if URI decoding fails, then the name-value pair where the decoding is unsuccessful is included, in the given order, with no normalization done to it.a=1&b=%44%57
→a=1&b=DW
- raw_query - DEPRECATED, do not use. Same as
query_raw
and provided for legacy compatibility but going away as soon as the parsers are updated. - password - the passed in password, if provided.
- fragment - the passed in fragment, if provided.
Examples ⫘
data = "hTtps://Example.com:443/here//is/path.html?a=1+6&x=%2f%2Fkey=%41%0Avalue&b=ddd#top"
urlParts = URL_PARSE(data, FALSE)
OUTPUT$ = urlParts["path_raw"]
# OUTPUT$: /here//is/path.html (String)
CONTAINS(string, substring) ⫘
Wraps golang's strings.Contains(string,subString)
, returns a bool.
Example ⫘
data = "aaabbbccc acme"
OUTPUT1$ = CONTAINS(data, "roadrunner")
OUTPUT2$ = CONTAINS(data, "acme")
# OUTPUT1$: false (bool)
# OUTPUT2$: true (bool)
IDX_OF_TLD(string) ⫘
Returns an int64 that signals where in the string the top-level domain is at for indexOfTopPrivateDomain$
. If -1 is returned set IsTopPrivateDomainParsed$
to false, otherwise set IsTopPrivateDomainParsed$
to true.
Example ⫘
OUTPUT0$ = IDX_OF_TLD("aaa http://example.com")
OUTPUT1$ = IDX_OF_TLD("http://example.com")
OUTPUT2$ = IDX_OF_TLD("")
# OUTPUT0$: 0 (int)
# OUTPUT1$: 0 (int)
# OUTPUT2$: -1 (int)
PARSE_ERROR(string,string) ⫘
ParseError
explicitly raises an error in the parser .parameters[0] errText if coercion.EvaluateAsString()
is passed. parameter[1] is an optional string that is cast into a boolean via ParserValue.BoolValue()
to denote if a generic event should be created. It defaults to true if not provided. The message does not normalize to any other schema.
Examples ⫘
Creates a Generic Event ⫘
test = IF someVal != "Expected_value" THEN PARSE_ERROR("bad data received") ELSE "ok"
Doesn’t Create a Generic Event ⫘
tenantId$ = TENANT_LOOKUP("ngav_id", vals["Account"], PARSE_ERROR("Unable to find Taegis tenant id for Deep Armor account " + vals["Account"],"False"))
TENANT_LOOKUP(label, value, default) ⫘
Looks up the tenant id in Taegis Tenant Manager based on a label and a value from the message. If no tenant is found, the specified default expression is evaluated. Note that if a tenant is found, the third parameter is not evaluated. This gives the caller the option to provide a default value or to use the PARSE_ERROR()
function to raise an error.
Example ⫘
tenantId$ = TENANT_LOOKUP("VendorName", messageValues["customerId"], PARSE_ERROR("Customer Id not on file"))
BASE64_DECODE(string) ⫘
Returns plain text string of a base64 encoded string input.
Example ⫘
OUTPUT$ = BASE64_DECODE("aG1lZXBcISBobWVlcFwh")
#OUTPUT$: hmeep\! hmeep\! (String)
INT(string, base) ⫘
Returns integer of a number string with specific base
Example ⫘
OUTPUT$ = INT("4e0", 16)
#OUTPUT$: 1248 (int)
STRING(valueType) ⫘
Attempts to cast the variable input into a string representation.
# In some cases "key" can be a string, empty (NULL), an array, or even map.
key = json["$.requestParameters.key"]
# By calling STRING() you guarantee objectKey is set with a value.
objectKey$ = STRING(key)
# Note: ParserValue.StringValue() isn't used directly because addition logic breaks when appending two valuetype.OBJECT to make a list (addition operator).
# valuetype.LIST, valuetype.OBJECT, and valuetype.JSONDATA returns the json string representation all others are cast to their string analogs.
OBJKEYS(value) ⫘
Will return a list of the keys of a map or json object.
Example ⫘
# Suppose the original json was:
{
"values" : {
"c" : "x",
"b" : "y",
"a" : "z"
}
}
keys = OBJKEYS(json["$.values"])
# keys is now an array of ["a", "b", "c"]
# NOTE: this function puts the values in alphabetical order
OBJVALUES(value) ⫘
Will return a list of the values of a map or json object.
Example ⫘
# Suppose the original json was:
{
"values" : {
"c" : "x",
"b" : {
"foo" : "bar"
},
"a" : "z"
}
}
vals = OBJVALS(json["$.values"])
# vals is now an array of ["z", "{ 'foo' : 'bar' }", "x"]
# NOTE: this function puts the values in alphabetical order by their key. This assures that OBJKEYS and OBJVALS output their elements in the same order which is important when combining these functions with ADDFIELD().
FLATTEN(json, keyLabel, valueLabel) ⫘
Converts arbitrary json to a list of objects.
Each object has two fields: a key and a value, both of type string. Parameters keyLabel and valueLabel are optional with default values "key"and"value" and "value"and"value" respectively. This function is intended to provide a convenient way to put json data into the schema fields of type KeyValuePairsIndexed; for example, the tags field on the generic schema or the evidence.sourceData.record field of ThirdPartyAlert.
Example ⫘
# Suppose the original json was:
{
"val" : {
"x": [
"1",
"2",
"3"
]
}
}
# The output would be:
[
{
"key$": "val.x.0",
"value$": "1"
}, _
{
"key$": "val.x.1",
"value$": "2"
}, _
{
"key$": "val.x.2",
"value$": "3"
}
]