Introduction
The summarize operator in APL enables you to perform data aggregation and create summary tables from large datasets. You can use it to group data by specified fields and apply aggregation functions such as count(), sum(), avg(), min(), max(), and many others. This is particularly useful when analyzing logs, tracing OpenTelemetry data, or reviewing security events. The summarize operator is helpful when you want to reduce the granularity of a dataset to extract insights or trends.
For users of other query languages
If you come from other query languages, this section explains how to adjust your existing queries to achieve the same results in APL.
In Splunk SPL, the stats command performs a similar function to APL’s summarize operator. Both operators are used to group data and apply aggregation functions. In APL, summarize is more explicit about the fields to group by and the aggregation functions to apply.
['sample-http-logs']
| summarize count() by methodThe summarize operator in APL is conceptually similar to SQL’s GROUP BY clause with aggregation functions. In APL, you explicitly specify the aggregation function (like count(), sum()) and the fields to group by.
['sample-http-logs']
| summarize count() by methodUsage
Syntax
| summarize [[Field1 =] AggregationFunction [, ...]] [by [Field2 =] GroupExpression [, ...]]Parameters
Field1: A field name.AggregationFunction: The aggregation function to apply. Examples includecount(),sum(),avg(),min(), andmax().GroupExpression: A scalar expression that can reference the dataset.
Returns
The summarize operator returns a table where:
- The input rows are arranged into groups having the same values of the
byexpressions. - The specified aggregation functions are computed over each group, producing a row for each group.
- The result contains the
byfields and also at least one field for each computed aggregate. Some aggregation functions return multiple fields.
Use case examples
In log analysis, you can use summarize to count the number of HTTP requests grouped by method, or to compute the average request duration.
Query
['sample-http-logs']
| summarize count() by methodOutput
| method | count_ |
|---|---|
| GET | 1000 |
| POST | 450 |
This query groups the HTTP requests by the method field and counts how many times each method is used.
You can use summarize to analyze OpenTelemetry traces by calculating the average span duration for each service.
Query
['otel-demo-traces']
| summarize avg(duration) by ['service.name']Output
| service.name | avg_duration |
|---|---|
| frontend | 50ms |
| cartservice | 75ms |
This query calculates the average duration of traces for each service in the dataset.
In security log analysis, summarize can help group events by status codes and see the distribution of HTTP responses.
Query
['sample-http-logs']
| summarize count() by statusOutput
| status | count_ |
|---|---|
| 200 | 1200 |
| 404 | 300 |
This query summarizes HTTP status codes, giving insight into the distribution of responses in your logs.
Other examples
['sample-http-logs']
| summarize topk(content_type, 20)['github-push-event']
| summarize topk(repo, 20) by bin(_time, 24h)Returns a table that shows the heatmap in each interval [0, 30], [30, 20, 10], and so on. This example has a cell for HISTOGRAM(req_duration_ms).
['sample-http-logs']
| summarize histogram(req_duration_ms, 30)['github-push-event']
| where _time > ago(7d)
| where repo contains "axiom"
| summarize count(), numCommits=sum(size) by _time=bin(_time, 3h), repo
| take 100Limit behavior with time-binning
When using limit or take with summarize where the first grouping expression is a time bin, Axiom applies the following limit behavior:
- Compute the global top N groups across all time buckets, disregarding the time dimension.
- Limit each time bucket to only include those groups that are in the global top N.
This means the total number of output rows can be more than N rows because each time bucket may contain up to N groups. For example, if you have 10 time buckets and limit to 5 groups, you can get up to 50 rows.
To limit the result set to exactly N rows:
-
Apply a second
summarizestatement after the first to aggregate further and limit the results. For example:['sample-http-logs'] | summarize count() by _time=bin(_time, 1h), status | summarize make_list(count_), make_list(_time) by status | limit 10 -
If you don’t need time as the first grouping expression, reorder your groups so time is not first. For example:
['sample-http-logs'] | summarize count() by status, _time=bin(_time, 1h) | limit 10 -
Convert time to an integer and bin on that instead:
['sample-http-logs'] | extend intTime = toint(_time) | summarize count() by bin(intTime, 3600), status | limit 10