SQL: Understanding the UNION command

Those familiar with SQL will have most likely used the UNION command when they want to combine select statements into 1 result set. Most often individuals will use UNION without understanding that additional work is done behind the scenes to remove duplicate entries. That is why it is important to understand your database and to know if there is the possiblity of duplicate entries. If there is no chance for duplicate entries or if you want duplicates in your results then you would want to use the UNION ALL command that simply joins the 2 result sets without using any worktables enhancing its performance.

To illustrate lets assume we have an activity log table that stores users activities within an application. In addition we have a log history table that actions get moved to at the end of the day so that the inserting of daily records does not get bogged down by the table having thousands/millions of rows.

USER_ACTIVITY_LOG
USER_ID USER_ACTION_TS USER_ACTION_NM USER_ACTION_SUB_NM
USER_ACTIVITY_LOG_HIST
USER_ID USER_ACTION_TS USER_ACTION_NM USER_ACTION_SUB_NM

Now lets assume we have a monitoring application that allows priveledged individuals to do searches on a user for a time period to get a list of activities they did within the application. To do this most individuals would do a UNION on the LOG and LOG_HISTORY tables.

SELECT USER_ID, USER_ACTION_TS, USER_ACTION_NM, USERACTION_SUB_NM
FROM USER_ACTIVITY_LOG
WHERE USER_ID = @userId
AND USER_ACTION_TS BETWEEN @startTS AND @endTS

UNION

SELECT USER_ID, USER_ACTION_TS, USER_ACTION_NM, USERACTION_SUB_NM
FROM USER_ACTIVITY_LOG_HIST
WHERE USER_ID = @userId
AND USER_ACTION_TS BETWEEN @startTS AND @endTS

However in this case it is not possible for there to be duplicate records returned from the two tables. So instead to enhance the performance of the query we could use the UNION ALL statement.

SELECT USER_ID, USER_ACTION_TS, USER_ACTION_NM, USERACTION_SUB_NM
FROM USER_ACTIVITY_LOG
WHERE USER_ID = @userId
AND USER_ACTION_TS BETWEEN @startTS AND @endTS

UNION ALL

SELECT USER_ID, USER_ACTION_TS, USER_ACTION_NM, USERACTION_SUB_NM
FROM USER_ACTIVITY_LOG_HIST
WHERE USER_ID = @userId
AND USER_ACTION_TS BETWEEN @startTS AND @endTS
// SQL //

Comments & Questions

Add Your Comment