Union Operator — axiomhq/docs

The union operator in APL allows you to combine the results of two or more queries into a single output. The operator is useful when you need to analyze or compare data from different datasets or tables in a unified manner. By using union, you can merge multiple sets of records, keeping all data from the source tables without applying any aggregation or filtering.

The union operator is particularly helpful in scenarios like log analysis, tracing OpenTelemetry events, or correlating security logs across multiple sources. You can use it to perform comprehensive investigations by bringing together information from different datasets into one query.

Union of two datasets

To understand how the union operator works, consider these datasets:

Server requests

_time	status	method	trace_id
12:10	200	GET	1
12:15	200	POST	2
12:20	503	POST	3
12:25	200	POST	4

App logs

_time	trace_id	message
12:12	1	foo
12:21	3	bar
13:35	27	baz

Performing a union on Server requests and Application logs would result in a new dataset with all the rows from both DatasetA and DatasetB.

A union of requests and logs would produce the following result set:

_time	status	method	trace_id	message
12:10	200	GET	1
12:12			1	foo
12:15	200	POST	2
12:20	503	POST	3
12:21			3	bar
12:25	200	POST	4
13:35			27	baz

This result combines the rows and merges types for overlapping fields.

For users of other query languages

If you come from other query languages, this section explains how to adjust your existing queries to achieve the same results in APL.

In Splunk SPL, the append command works similarly to the union operator in APL. Both operators are used to combine multiple datasets. However, while append in Splunk typically adds one dataset to the end of another, APL’s union merges datasets while preserving all records.

```splunk Splunk example index=web OR index=security ```

['sample-http-logs']
| union ['security-logs']

In ANSI SQL, the UNION operator performs a similar function to the APL union operator. Both are used to combine the results of two or more queries. However, SQL’s UNION removes duplicates by default, whereas APL’s union keeps all rows unless you use union with=kind=unique.

```sql SQL example SELECT * FROM web_logs UNION SELECT * FROM security_logs; ```

['sample-http-logs']
| union ['security-logs']

Usage

Syntax

T1 | union [withsource=FieldName] [T2], [T3], ...

Parameters

T1, T2, T3, ...: Tables or query results you want to combine into a single output.
withsource: Optional, adds a field to the output where each value specifies the source dataset of the row. Specify the name of this additional field in FieldName.

Returns

The union operator returns all rows from the specified tables or queries. If fields overlap, they are merged. Non-overlapping fields are retained in their original form.

Use case examples

In log analysis, you can use the union operator to combine HTTP logs from different sources, such as web servers and security systems, to analyze trends or detect anomalies.

Query

['sample-http-logs']
| union ['security-logs']
| where status == '500'

Output

_time	id	status	uri	method	geo.city	geo.country	req_duration_ms
2024-10-17 12:34:56	user123	500	/api/login	GET	London	UK	345
2024-10-17 12:35:10	user456	500	/api/update-profile	POST	Berlin	Germany	123

This query combines two datasets (HTTP logs and security logs) and filters the combined data to show only those entries where the HTTP status code is 500.

When working with OpenTelemetry traces, you can use the union operator to combine tracing information from different services for a unified view of system performance.

Query

['otel-demo-traces']
| union ['otel-backend-traces']
| where ['service.name'] == 'frontend' and status_code == 'error'

Output

_time	trace_id	span_id	['service.name']	kind	status_code
2024-10-17 12:36:10	trace-1234	span-567	frontend	server	error
2024-10-17 12:38:20	trace-7890	span-345	frontend	client	error

This query combines traces from two different datasets and filters them to show only errors occurring in the frontend service.

For security logs, the union operator is useful to combine logs from different sources, such as intrusion detection systems (IDS) and firewall logs.

Query

['sample-http-logs']
| union ['security-logs']
| where ['geo.country'] == 'Germany'

Output

_time	id	status	uri	method	geo.city	geo.country	req_duration_ms
2024-10-17 12:34:56	user789	200	/api/login	GET	Berlin	Germany	245
2024-10-17 12:40:22	user456	404	/api/nonexistent	GET	Munich	Germany	532

This query combines web and security logs, then filters the results to show only those records where the request originated from Germany.

Other examples

Basic union

This example combines all rows from github-push-event and github-pull-request-event without any transformation or filtering.

['github-push-event']
| union ['github-pull-request-event']

Run in Playground

Filter after union

This example combines the datasets, and then filters the data to only include rows where the method is GET.

['sample-http-logs']
| union ['github-issues-event']
| where method == "GET"

Run in Playground

Aggregate after union

This example combines the datasets and summarizes the data, counting the occurrences of each combination of content_type and actor.

['sample-http-logs']
| union ['github-pull-request-event']
| summarize Count = count() by content_type, actor

Run in Playground

Filter and project specific data from combined log sources

This query combines GitHub pull request event logs and GitHub push events, filters by actions made by github-actions[bot], and displays key event details such as time, repository, commits, head , id.

['github-pull-request-event']
| union ['github-push-event']
| where actor == "github-actions[bot]"
| project _time, repo, ['id'], commits, head

Run in Playground

Union with field removing

This example removes the content_type and commits field in the datasets sample-http-logs and github-push-event before combining the datasets.

['sample-http-logs']
| union ['github-push-event']
| project-away content_type, commits

Run in Playground

Filter after union

This example performs a union and then filters the resulting set to only include rows where the method is GET.

['sample-http-logs']
| union ['github-issues-event']
| where method == "GET"

Run in Playground

Union with order by

After the union, the result is ordered by the type field.

['sample-http-logs']
| union hn
| order by type

Run in Playground

Union with joint conditions

This example performs a union and then filters the resulting dataset for rows where content_type contains the letter a and city is seattle.

['sample-http-logs']
| union ['github-pull-request-event']
| where content_type contains "a" and ['geo.city']  == "Seattle"

Run in Playground

Union and count unique values

After the union, the query calculates the number of unique geo.city and repo entries in the combined dataset.

['sample-http-logs']
| union ['github-push-event']
| summarize UniqueNames = dcount(['geo.city']), UniqueData = dcount(repo)

Run in Playground

Union using withsource

The example below returns the union of all datasets that match the pattern github* and counts the number of events in each.

union withsource=dataset github*
| summarize count() by dataset

Run in Playground

Union with wildcards

The union operator supports wildcards to combine multiple datasets matching a pattern. Use * to match all datasets, or a suffix pattern like github* to match datasets starting with a prefix.

The wildcard `*` is useful to match multiple datasets, but it increases query complexity and decreases performance.

Using union * to query all datasets is very expensive. Avoid it in production. Use specific dataset names or prefix patterns instead.

Match all datasets:

union *
| summarize count() by dataset

Match datasets with a prefix:

union withsource=dataset github*
| summarize count() by dataset

Best practices for the union operator

To maximize the effectiveness of the union operator in APL, here are some best practices to consider:

Before using the union operator, ensure that the fields being merged have compatible data types.
Use project or project-away to include or exclude specific fields. This can improve performance and the clarity of your results, especially when you only need a subset of the available data.

#Union of two datasets

#For users of other query languages

#Usage

#Syntax

#Parameters

#Returns

#Use case examples

#Other examples

#Basic union

#Filter after union

#Aggregate after union

#Filter and project specific data from combined log sources

#Union with field removing

#Filter after union

#Union with order by

#Union with joint conditions

#Union and count unique values

#Union using withsource

#Union with wildcards

#Best practices for the union operator

Good evening

Union of two datasets

For users of other query languages

Usage

Syntax

Parameters

Returns

Use case examples

Other examples

Basic union

Filter after union

Aggregate after union

Filter and project specific data from combined log sources

Union with field removing

Filter after union

Union with order by

Union with joint conditions

Union and count unique values

Union using withsource

Union with wildcards

Best practices for the union operator