This is the first post of many to come that talk about new enhancements in the Sql Server 2005 T-Sql language.
Let say you have a products table. I’m going to use the
products table from the AdventureWorks database. Now let’s say
you imported a list of products into the database into a table called
productduplicates. Just for a reason, let’s say you manufacturer
sent you a full list of products, and you need to see which products
you already have in your database.
Normally, if you didn’t have matching primary keys on both tables,
like a productId, you would do an inner join on some columns that
uniquely identify the rows, as such:
Now you can replace this with a simple INTERSECT statement like the following:
Just like if you were using a UNION for two select statements, the
same number of columns must exist in both select statements, and they
must be of the same datatype. In both above examples, you get a
list of products that exist in both tables, so you know which products
from the manufacturer you already have listed.
I haven’t seen any performance difference in the two, but I’ve only
played with this on small tables. The execution plan does show
that while you are doing a hash match inner join on the first query,
the second query performs a hash match right semi join. Both hash
matches perform 50% of the work in the query. The other 50% is
split evenly between the table scan and index scan in both
queries. So what does the fact that a right semi match is
performed when using INTERSECT? I tells me its doing the same
thing as if we were to use the EXISTS, like so:
Same scenario, if you wanted to know which products you have that do
not exist in the duplicates table, you would normally do a NOT EXISTS
And you get back the list of products that you have that aren’t in
the duplicates list. You can flip this around to figure out which
products are in the duplicates that you don’t have in your products.
Let’s simply the code a little bit by using the EXCEPT operator. This looks identical to the INTERSECTS query:
Again, can’t find a performance difference. In all queries
shows, the reads and duration all average out to be virtually
identical. You’d have to test this on some large tables (high
number of rows) to figure out if there is a performance difference at
Looking at the execution plans for everything, it doesn’t seem that
there is a any gain from using INTERSECT and EXCEPT over JOINS or
EXISTS/NOT EXISTS except for one big thing: Much simpler code!