The series_cosine_similarity function calculates the cosine similarity between two dynamic arrays (series) of numeric values. Cosine similarity measures the cosine of the angle between two vectors, providing a metric of similarity that ranges from -1 to 1. A value of 1 indicates identical direction, 0 indicates orthogonality (no similarity), and -1 indicates opposite directions. This function is particularly useful for comparing patterns, trends, and behaviors in time-series data.
You can use series_cosine_similarity when you need to identify similar patterns in different datasets, compare user behaviors, detect anomalies by measuring deviation from normal patterns, or find correlations between different metrics. Common applications include recommendation systems, anomaly detection, pattern matching in performance metrics, and behavioral analysis.
For users of other query languages
If you come from other query languages, this section explains how to adjust your existing queries to achieve the same results in APL.
In Splunk SPL, calculating cosine similarity requires complex mathematical operations using eval commands with square roots and dot products. In APL, series_cosine_similarity provides this calculation directly for dynamic arrays.
datatable(x: dynamic, y: dynamic)
[
dynamic([0.5*pi(), 1.0*pi(), 1.5*pi()]),
dynamic([2.0*pi(), 2.5*pi(), 3.0*pi()])
]
| extend similarity = series_cosine_similarity(x, y)In SQL, calculating cosine similarity requires complex operations involving dot products, magnitudes, and square roots across multiple rows. You would typically need window functions and mathematical operations. In APL, series_cosine_similarity handles this calculation directly on dynamic arrays.
datatable(x: dynamic, y: dynamic)
[
dynamic([0.5*pi(), 1.0*pi(), 1.5*pi()]),
dynamic([2.0*pi(), 2.5*pi(), 3.0*pi()])
]
| extend similarity = series_cosine_similarity(x, y)Usage
Syntax
series_cosine_similarity(array1, array2)Parameters
| Parameter | Type | Description |
|---|---|---|
array1 |
dynamic | The first dynamic array of numeric values. |
array2 |
dynamic | The second dynamic array of numeric values. |
Returns
A real value between -1 and 1 representing the cosine similarity between the two arrays. Returns null if either array is empty or contains only zeros.
Use case examples
In log analysis, you can use series_cosine_similarity to compare request duration patterns between different users to identify similar usage behaviors.
Query
['sample-http-logs']
| summarize user1_durations = make_list(iff(id == 'user1', req_duration_ms, 0)), user2_durations = make_list(iff(id == 'user2', req_duration_ms, 0))
| extend similarity = series_cosine_similarity(user1_durations, user2_durations)Output
| user1_durations | user2_durations | similarity |
|---|---|---|
| [120, 0, 300, 0] | [0, 150, 0, 280] | 0.85 |
This query compares request duration patterns between two users to identify behavioral similarities.
In OpenTelemetry traces, you can use series_cosine_similarity to compare span duration patterns between different services to identify similar performance characteristics.
Query
['otel-demo-traces']
| summarize frontend_durations = make_list(iff(['service.name'] == 'frontend', duration / 1ms, 0)), cart_durations = make_list(iff(['service.name'] == 'cart', duration / 1ms, 0))
| extend pattern_similarity = series_cosine_similarity(frontend_durations, cart_durations)Output
| frontend_durations | cart_durations | pattern_similarity |
|---|---|---|
| [200, 0, 150, 0] | [0, 80, 0, 120] | 0.72 |
This query compares performance patterns between frontend and cart services to identify correlated behaviors.
In security logs, you can use series_cosine_similarity to compare request patterns between different HTTP status codes to detect anomalous behavior.
Query
['sample-http-logs']
| summarize success_durations = make_list(iff(status == '200', req_duration_ms, 0)), error_durations = make_list(iff(status == '500', req_duration_ms, 0))
| extend behavior_similarity = series_cosine_similarity(success_durations, error_durations)Output
| success_durations | error_durations | behavior_similarity |
|---|---|---|
| [100, 0, 150, 0] | [0, 250, 0, 300] | 0.23 |
This query compares request duration patterns between successful and error responses to identify potential security anomalies.
List of related functions
- series_add: Performs element-wise addition between two arrays. Use when you need to combine values instead of calculating ratios.
- series_divide: Performs element-wise division between two arrays. Use when you need to calculate ratios or normalize values.
- series_dot_product: Calculates the dot product between two arrays. Use when you need the raw dot product value rather than normalized similarity.
- series_sum: Calculates the sum of all elements in a single array. Use when you need to sum elements within one array rather than computing dot products.