The extract function retrieves the first substring that matches a regular expression from a source string. Use this function when you need to pull out specific patterns from log messages, URLs, or any text field using regex capture groups.
For users of other query languages
If you come from other query languages, this section explains how to adjust your existing queries to achieve the same results in APL.
In Splunk SPL, you use rex with named or numbered groups. APL's extract is similar but uses a numbered capture group parameter.
['sample-http-logs']
| extend username = extract('user=([A-Za-z0-9_]+)', 1, uri)In ANSI SQL, regex extraction varies by database. APL's extract provides a consistent approach across all data.
['sample-http-logs']
| extend extracted = extract('pattern', 1, field)Usage
Syntax
extract(regex, captureGroup, text)Parameters
| Name | Type | Required | Description |
|---|---|---|---|
| regex | string | Yes | A regular expression pattern with optional capture groups. |
| captureGroup | int | Yes | The capture group to extract. Use 0 for the entire match, 1 for the first group, 2 for the second, etc. |
| text | string | Yes | The source string to search. |
Returns
Returns the substring matched by the specified capture group, or null if no match is found.
Use case examples
Extract user IDs from HTTP request URIs to identify which users are accessing specific endpoints.
Query
['sample-http-logs']
| extend user_id = extract('/users/([0-9]+)', 1, uri)
| where isnotempty(user_id)
| summarize request_count = count() by user_id, method
| sort by request_count desc
| limit 10Output
| user_id | method | request_count |
|---|---|---|
| 12345 | GET | 234 |
| 67890 | POST | 187 |
| 11111 | GET | 156 |
| 22222 | PUT | 98 |
This query extracts numeric user IDs from URIs like '/users/12345' using a regex capture group, helping analyze per-user API usage patterns.
Extract version numbers from service names to track which service versions are running.
Query
['otel-demo-traces']
| extend version = extract('v([0-9]+[.][0-9]+)', 1, ['service.name'])
| where isnotempty(version)
| summarize span_count = count() by ['service.name'], version
| sort by span_count desc
| limit 10Output
| service.name | version | span_count |
|---|---|---|
| frontend-v2.1 | 2.1 | 3456 |
| checkout-v1.5 | 1.5 | 2341 |
| cart-v3.0 | 3.0 | 1987 |
This query extracts version numbers from service names, helping track which versions of services are generating traces.
Extract IP addresses from URIs or request headers to identify the source of suspicious requests.
Query
['sample-http-logs']
| extend ip_address = extract('([0-9]{1,3}[.][0-9]{1,3}[.][0-9]{1,3}[.][0-9]{1,3})', 1, uri)
| where status == '403' or status == '401'
| where isnotempty(ip_address)
| summarize failed_attempts = count() by ip_address, status
| sort by failed_attempts desc
| limit 10Output
| ip_address | status | failed_attempts |
|---|---|---|
| 192.168.1.100 | 401 | 45 |
| 10.0.0.25 | 403 | 32 |
| 172.16.0.50 | 401 | 28 |
This query extracts IP addresses embedded in URIs from failed authentication requests, helping identify potential attackers or misconfigured systems.
List of related functions
- extract_all: Extracts all matches of a regex pattern. Use this when you need multiple matches instead of just the first one.
- parse_json: Parses JSON strings into dynamic objects. Use this when working with structured JSON data rather than regex patterns.
- split: Splits strings by a delimiter. Use this for simpler tokenization without regex complexity.
- replace_regex: Replaces regex matches with new text. Use this when you need to modify matched patterns rather than extract them.