maxif aggregation in APL
Introduction
The maxif aggregation function in APL is useful when you want to return the maximum value from a dataset based on a conditional expression. This allows you to filter the dataset dynamically and only return the maximum for rows that satisfy the given condition. It’s particularly helpful for scenarios where you want to find the highest value of a specific metric, like response time or duration, but only for a subset of the data (for example, successful responses, specific users, or requests from a particular geographic location).
You can use the maxif function when analyzing logs, monitoring system traces, or inspecting security-related data to get insights into the maximum value under certain conditions.
For users of other query languages
If you come from other query languages, this section explains how to adjust your existing queries to achieve the same results in APL.
In Splunk SPL, you might use the stats max() function alongside a conditional filtering step to achieve a similar result. APL’s maxif function combines both operations into one, streamlining the query.
['sample-http-logs']
| summarize maxif(req_duration_ms, status == "200")In ANSI SQL, you typically use the MAX function in conjunction with a WHERE clause. APL’s maxif allows you to perform the same operation with a single aggregation function.
['sample-http-logs']
| summarize maxif(req_duration_ms, status == "200")Usage
Syntax
summarize maxif(column, condition)Parameters
column: The column containing the values to aggregate.condition: The condition that must be true for the values to be considered in the aggregation.
Returns
The maximum value from column for rows that meet the condition. If no rows match the condition, it returns null.
Use case examples
In log analysis, you might want to find the maximum request duration, but only for successful requests.
Query
['sample-http-logs']
| summarize maxif(req_duration_ms, status == "200")Output
| max_req_duration |
|---|
| 1250 |
This query returns the maximum request duration (req_duration_ms) for HTTP requests with a 200 status.
In OpenTelemetry traces, you might want to find the longest span duration for a specific service type.
Query
['otel-demo-traces']
| summarize maxif(duration, ['service.name'] == "checkoutservice" and kind == "server")Output
| max_duration |
|---|
| 2.05s |
This query returns the maximum span duration (duration) for server spans in the checkoutservice.
For security logs, you might want to identify the longest request duration for any requests originating from a specific country, such as the United States.
Query
['sample-http-logs']
| summarize maxif(req_duration_ms, ['geo.country'] == "United States")Output
| max_req_duration |
|---|
| 980 |
This query returns the maximum request duration for requests coming from the United States (geo.country).
List of related aggregations
- minif: Returns the minimum value from a column for rows that satisfy a condition. Use
minifwhen you’re interested in the lowest value under specific conditions. - max: Returns the maximum value from a column without filtering. Use
maxwhen you want the highest value across the entire dataset without conditions. - sumif: Returns the sum of values for rows that satisfy a condition. Use
sumifwhen you want the total value of a column under specific conditions. - avgif: Returns the average of values for rows that satisfy a condition. Use
avgifwhen you want to calculate the mean value based on a filter. - countif: Returns the count of rows that satisfy a condition. Use
countifwhen you want to count occurrences that meet certain criteria.