Series Pearson Correlation — axiomhq/docs

The series_pearson_correlation function calculates the Pearson correlation coefficient between two numeric dynamic arrays (series). This measures the linear relationship between the two series, returning a value between -1 and 1, where 1 indicates perfect positive correlation, -1 indicates perfect negative correlation, and 0 indicates no linear correlation.

You can use series_pearson_correlation when you need to measure the strength and direction of linear relationships between time-series datasets. This is particularly useful for identifying related metrics, detecting causal relationships, validating hypotheses about system behavior, or finding leading indicators of performance issues.

For users of other query languages

If you come from other query languages, this section explains how to adjust your existing queries to achieve the same results in APL.

In Splunk SPL, you would typically need to export data and use external statistical tools to calculate correlation. In APL, series_pearson_correlation provides built-in correlation analysis for array data.

```sql Splunk example ... | stats list(metric1) as m1, list(metric2) as m2 by group ... (manual correlation calculation or external tool) ```

datatable(series1: dynamic, series2: dynamic)
[
  dynamic([1, 2, 3, 4, 5]), dynamic([2, 4, 6, 8, 10])
]
| extend correlation = series_pearson_correlation(series1, series2)

In SQL, correlation functions exist but typically operate on row-based data. In APL, series_pearson_correlation works directly on array columns, making time-series correlation analysis more straightforward.

```sql SQL example SELECT CORR(metric1, metric2) AS correlation FROM measurements GROUP BY group_id; ```

datatable(series1: dynamic, series2: dynamic)
[
  dynamic([1, 2, 3, 4, 5]), dynamic([2, 4, 6, 8, 10])
]
| extend correlation = series_pearson_correlation(series1, series2)

Usage

Syntax

series_pearson_correlation(series1, series2)

Parameters

Parameter	Type	Description
`series1`	dynamic	A dynamic array of numeric values.
`series2`	dynamic	A dynamic array of numeric values.

Returns

A numeric value between -1 and 1 representing the Pearson correlation coefficient:

1: Perfect positive linear correlation
0: No linear correlation
-1: Perfect negative linear correlation

Use case examples

In log analysis, you can use series_pearson_correlation to identify relationships between request durations across different geographic regions, helping understand if performance issues are correlated.

Query

['sample-http-logs']
| extend city1 = iff(['geo.city'] == 'Tokyo', req_duration_ms, 0)
| extend city2 = iff(['geo.city'] == 'Nagasaki', req_duration_ms, 0)
| summarize tokyo_times = make_list(city1), nagasaki_times = make_list(city2)
| extend correlation = series_pearson_correlation(tokyo_times, nagasaki_times)
| project correlation

Run in Playground

Output

correlation
0.87

This query calculates the correlation between request durations in Tokyo and Nagasaki, revealing if performance issues in one region tend to coincide with issues in another.

In OpenTelemetry traces, you can use series_pearson_correlation to analyze relationships between service latencies, identifying dependencies and bottlenecks.

Query

['otel-demo-traces']
| extend duration_ms = duration / 1ms
| extend frontend_dur = iff(['service.name'] == 'frontend', duration_ms, 0)
| extend checkout_dur = iff(['service.name'] == 'checkout', duration_ms, 0)
| summarize frontend = make_list(frontend_dur), checkout = make_list(checkout_dur)
| extend correlation = series_pearson_correlation(frontend, checkout)
| project correlation

Run in Playground

Output

correlation
0.65

This query measures the correlation between frontend and checkout service latencies, helping understand if performance of one service affects the other.

In security logs, you can use series_pearson_correlation to identify relationships between failed authentication attempts and successful requests, detecting potential attack patterns.

Query

['sample-http-logs']
| extend success_count = iff(status == '200', 1, 0)
| extend failure_count = iff(status == '500', 1, 0)
| summarize successes = make_list(success_count), failures = make_list(failure_count) by bin(_time, 1h)
| extend correlation = series_pearson_correlation(successes, failures)
| project correlation

Run in Playground

Output

correlation
-0.45

This query analyzes the correlation between successful and failed requests, where a negative correlation might indicate that high failure rates suppress successful requests, potentially signaling an attack.

series_magnitude: Calculates the magnitude of a series. Use when you need vector length instead of correlation.
series_stats: Returns comprehensive statistics. Use when you need variance and covariance components separately.
series_subtract: Performs element-wise subtraction. Often used to compute deviations before correlation analysis.
series_multiply: Performs element-wise multiplication. Use for weighted combinations instead of correlation.

#For users of other query languages

#Usage

#Syntax

#Parameters

#Returns

#Use case examples

#List of related functions

Good evening

For users of other query languages

Usage

Syntax

Parameters

Returns

Use case examples

List of related functions