| Home
RSS 2001 2002 2003 2004 2005 2006 2007 Letras Libros Pensar Cosas |
ProblemThis page is a comment on the article published on www.firstsql.com, The EXISTS Flaw. You might want to read that article before this one, but it's not required. Given a table sp with shipment information, and columns sno (supplier number), pno (part number) and qty (quantity, nullable), the following is asked: find supplier numbers for suppliers who are known to supply part P1, but not in a quantity of 1,000. The suggested query to retrieve this information is as follows.
Sample: suggested query
SELECT DISTINCT spx.sno
FROM sp spx
WHERE spx.pno = 'P1'
AND 1000 NOT IN
( SELECT spy.qty
FROM sp spy
WHERE spy.sno = spx.sno
AND spy.pno = 'P1') ;
The article then goes on to explain how the following "equivalent" query returns different results.
Sample: suggested equivalent query
SELECT DISTINCT spx.sno
FROM sp spx
WHERE spx.pno = 'P1'
AND NOT EXISTS
( SELECT spy.qty
FROM sp spy
WHERE spy.sno = spx.sno
AND spy.pno = 'P1'
AND spy.qty = 1000) ;
For starters, if the query returns different results, it is not equivalent. Disregard what relational theory might say for a few seconds and get ahold of a dictionary or some nearby repository of common sense: if the results are not the same, they are not equivalent. They problem seems to arise because given the table rows ('S1', 'P1', NULL), ('S2', 'P1', 200) and ('S3', 'P1', 1000), the first query returns S2, and the second S1 and S2. This is not so strange, because the first query asks for suppliers whose quantity for P1 are known not to be 1000, and the second asks for suppliers whose quantity for P1 are not known to be 1000. If we allow the possibility that we may or may not know some facts, we are obviously asking for different things, which is why we get different results. The original question is ambiguous on whether the "1,000 quantity" restriction has to be known or not. MoralWhen you are working with three-valued logic you must take into account that sometimes things are not known, and you have to be explicit about how the query should behave when missing data are found. Word of WisdomInstead of giving a 'stay away from NULLs' general advice, here's some really good advice: know what NULLs are used for, what problems they solve, and think things through before using them, just like you should do with every non-trivial task you perform. |