The extract function retrieves the first substring that matches a regular expression from a source string. Use this function when you need to pull out specific patterns from log messages, URLs, or any text field using regex capture groups.

For users of other query languages

If you come from other query languages, this section explains how to adjust your existing queries to achieve the same results in APL.

In Splunk SPL, you use rex with named or numbered groups. APL's extract is similar but uses a numbered capture group parameter.

```sql Splunk example | rex field=message "user=(?\w+)" ```
['sample-http-logs']
| extend username = extract('user=([A-Za-z0-9_]+)', 1, uri)

In ANSI SQL, regex extraction varies by database. APL's extract provides a consistent approach across all data.

```sql SQL example SELECT REGEXP_SUBSTR(field, 'pattern', 1, 1, NULL, 1) AS extracted FROM logs; ```
['sample-http-logs']
| extend extracted = extract('pattern', 1, field)

Usage

Syntax

extract(regex, captureGroup, text)

Parameters

Name Type Required Description
regex string Yes A regular expression pattern with optional capture groups.
captureGroup int Yes The capture group to extract. Use 0 for the entire match, 1 for the first group, 2 for the second, etc.
text string Yes The source string to search.

Returns

Returns the substring matched by the specified capture group, or null if no match is found.

Use case examples

Extract user IDs from HTTP request URIs to identify which users are accessing specific endpoints.

Query

['sample-http-logs']
| extend user_id = extract('/users/([0-9]+)', 1, uri)
| where isnotempty(user_id)
| summarize request_count = count() by user_id, method
| sort by request_count desc
| limit 10

Run in Playground

Output

user_id method request_count
12345 GET 234
67890 POST 187
11111 GET 156
22222 PUT 98

This query extracts numeric user IDs from URIs like '/users/12345' using a regex capture group, helping analyze per-user API usage patterns.

Extract version numbers from service names to track which service versions are running.

Query

['otel-demo-traces']
| extend version = extract('v([0-9]+[.][0-9]+)', 1, ['service.name'])
| where isnotempty(version)
| summarize span_count = count() by ['service.name'], version
| sort by span_count desc
| limit 10

Run in Playground

Output

service.name version span_count
frontend-v2.1 2.1 3456
checkout-v1.5 1.5 2341
cart-v3.0 3.0 1987

This query extracts version numbers from service names, helping track which versions of services are generating traces.

Extract IP addresses from URIs or request headers to identify the source of suspicious requests.

Query

['sample-http-logs']
| extend ip_address = extract('([0-9]{1,3}[.][0-9]{1,3}[.][0-9]{1,3}[.][0-9]{1,3})', 1, uri)
| where status == '403' or status == '401'
| where isnotempty(ip_address)
| summarize failed_attempts = count() by ip_address, status
| sort by failed_attempts desc
| limit 10

Run in Playground

Output

ip_address status failed_attempts
192.168.1.100 401 45
10.0.0.25 403 32
172.16.0.50 401 28

This query extracts IP addresses embedded in URIs from failed authentication requests, helping identify potential attackers or misconfigured systems.

  • extract_all: Extracts all matches of a regex pattern. Use this when you need multiple matches instead of just the first one.
  • parse_json: Parses JSON strings into dynamic objects. Use this when working with structured JSON data rather than regex patterns.
  • split: Splits strings by a delimiter. Use this for simpler tokenization without regex complexity.
  • replace_regex: Replaces regex matches with new text. Use this when you need to modify matched patterns rather than extract them.

Good evening

I'm here to help you with the docs.

I
AIBased on your context