EXISTS Unflawed

Home  RSS

2001
2002
2003
2004
2005
2006
2007
Letras
Libros
Pensar
Cosas

Problem

This page is a comment on the article published on www.firstsql.com, The EXISTS Flaw. You might want to read that article before this one, but it's not required.

Given a table sp with shipment information, and columns sno (supplier number), pno (part number) and qty (quantity, nullable), the following is asked: find supplier numbers for suppliers who are known to supply part P1, but not in a quantity of 1,000. The suggested query to retrieve this information is as follows.

Sample: suggested query

SELECT DISTINCT spx.sno
FROM   sp spx
WHERE  spx.pno = 'P1'
AND    1000 NOT IN
       ( SELECT spy.qty
         FROM   sp spy
         WHERE  spy.sno = spx.sno
         AND    spy.pno = 'P1') ;

The article then goes on to explain how the following "equivalent" query returns different results.


Sample: suggested equivalent query
SELECT DISTINCT spx.sno
FROM   sp spx
WHERE  spx.pno = 'P1'
AND    NOT EXISTS
       ( SELECT spy.qty
         FROM   sp spy
         WHERE  spy.sno = spx.sno
         AND    spy.pno = 'P1'
         AND    spy.qty = 1000) ;

For starters, if the query returns different results, it is not equivalent. Disregard what relational theory might say for a few seconds and get ahold of a dictionary or some nearby repository of common sense: if the results are not the same, they are not equivalent.

They problem seems to arise because given the table rows ('S1', 'P1', NULL), ('S2', 'P1', 200) and ('S3', 'P1', 1000), the first query returns S2, and the second S1 and S2. This is not so strange, because the first query asks for suppliers whose quantity for P1 are known not to be 1000, and the second asks for suppliers whose quantity for P1 are not known to be 1000. If we allow the possibility that we may or may not know some facts, we are obviously asking for different things, which is why we get different results. The original question is ambiguous on whether the "1,000 quantity" restriction has to be known or not.

Moral

When you are working with three-valued logic you must take into account that sometimes things are not known, and you have to be explicit about how the query should behave when missing data are found.

Word of Wisdom

Instead of giving a 'stay away from NULLs' general advice, here's some really good advice: know what NULLs are used for, what problems they solve, and think things through before using them, just like you should do with every non-trivial task you perform.