The variance aggregation function in APL calculates the variance of a numeric expression across a set of records. Variance is a statistical measurement that represents the spread of data points in a dataset. It’s useful for understanding how much variation exists in your data. In scenarios such as performance analysis, network traffic monitoring, or anomaly detection, variance helps identify outliers and patterns by showing how data points deviate from the mean.
For users of other query languages
If you come from other query languages, this section explains how to adjust your existing queries to achieve the same results in APL.
In SPL, variance is computed using the stats command with the var function, whereas in APL, you can use variance for the same functionality.
['sample-http-logs']
| summarize variance(req_duration_ms)In ANSI SQL, variance is typically calculated using VAR_POP or VAR_SAMP. APL provides a simpler approach using the variance function without needing to specify population or sample.
['sample-http-logs']
| summarize variance(req_duration_ms)Usage
Syntax
summarize variance(Expression)Parameters
Expression: A numeric expression or field for which you want to compute the variance. The expression should evaluate to a numeric data type.
Returns
The function returns the variance (a numeric value) of the specified expression across the records.
Use case examples
You can use the variance function to measure the variability of request durations, which helps in identifying performance bottlenecks or anomalies in web services.
Query
['sample-http-logs']
| summarize variance(req_duration_ms)Output
| variance_req_duration_ms |
|---|
| 1024.5 |
This query calculates the variance of request durations from a dataset of HTTP logs. A high variance indicates greater variability in request durations, potentially signaling performance issues.
For OpenTelemetry traces, variance can be used to measure how much span durations differ across service invocations, helping in performance optimization and anomaly detection.
Query
['otel-demo-traces']
| summarize variance(duration)Output
| variance_duration |
|---|
| 1287.3 |
This query computes the variance of span durations across traces, which helps in understanding how consistent the service performance is. A higher variance might indicate unstable or inconsistent performance.
You can use the variance function on security logs to detect abnormal patterns in request behavior, such as unusual fluctuations in response times, which may point to potential security threats.
Query
['sample-http-logs']
| summarize variance(req_duration_ms) by statusOutput
| status | variance_req_duration_ms |
|---|---|
| 200 | 1534.8 |
| 404 | 2103.4 |
This query calculates the variance of request durations grouped by HTTP status codes. High variance in certain status codes (e.g., 404 errors) can indicate network or application issues.
List of related aggregations
- stdev: Computes the standard deviation, which is the square root of the variance. Use
stdevwhen you need the spread of data in the same units as the original dataset. - avg: Computes the average of a numeric field. Combine
avgwithvarianceto analyze both the central tendency and the spread of data. - count: Counts the number of records. Use
countalongsidevarianceto get a sense of data size relative to variance. - percentile: Returns a value below which a given percentage of observations fall. Use
percentilefor a more detailed distribution analysis. - max: Returns the maximum value. Use
maxwhen you are looking for extreme values in addition to variance to detect anomalies.