Rajeysh's blog: August 2010

from the forum by

John Spencer

thread:HOWTO: Compare two tables

TO FIND RECORDS IN TABLE1 WHICH DO NOT EXIST IN TABLE2

SELECT column_list
FROM table1
WHERE column_list NOT IN (SELECT column_list
                         FROM table2)

SELECT column_list
FROM table1
MINUS
SELECT column_list
FROM table2;

SELECT column_list
FROM table1
WHERE NOT EXISTS (SELECT 1
                 FROM table2
                 WHERE table1.column1 = table2.column1 and
                       table1.column2 = table2.column2 and
                       ...);

SELECT table1.column_list
FROM table1, table2
WHERE table1.column1 = table2.column1(+) and
     table1.column2 = table2.column2(+) and
     ...  and
     (table2.column1 IS NULL or
      table2.column2 IS NULL or
      ...)
-- Use and instead of or if you are sure that there is no case where
-- the two column1's will match, but the two column2's will not.

TO FIND RECORDS IN TABLE1 WHICH EXIST IN TABLE2.

SELECT column_list
FROM table1
WHERE column_list IN (SELECT column_list
                     FROM table2)

SELECT column_list
FROM table1
INTERSECT
SELECT column_list
FROM table2;

SELECT column_list
FROM table1
WHERE EXISTS (SELECT 1
             FROM table2
             WHERE table1.column1 = table2.column1 and
                   table1.column2 = table2.column2 and
                   ...);

SELECT table1.column_list
FROM table1, table2
WHERE table1.column1 = table2.column1 and
     table1.column2 = table2.column2 and
     ...

Which is better, depends on a number of factors. Some of the most important factors are:

1. Absloute sizes of the two tables
2. The relative sizes of the two tables
3. Cost Based Optimizer versus Rule Based Optimizer
4. Version of Oracle
5. Number and type columns that are being compared.

Number 5 may need a little clarification. If the question is find all Primary Key's in table1 which exist or do not exist in table2, then using EXISTS or a JOIN is probably best. However, if the question is find entire rows in table1 that do or do not have a corresponding entire row in table2 then INTERSECT or MINUS are likely to be best.

In all of the queries above, column_list represents the columns that you want to find the matches or mismatches for. This can be anything from a single column to * (all columns). column1, column2 ... represent each of the columns in your column list.

If we assume that there are no other selection criteria used against either table (which might affect the choice of indexes), then all methods will require at least one full table scan (at best a fast full index scan), and sometimes two. So, the chief factor affecting performance is the method Oracle uses to get the rows in table1 that do or do not exist in table2. On a conceptual basis, this is how I understand the various approaches operate. This is almost certainly not the exact algorithm that Oracle uses.

IN / NOT IN will do a full scan of table2 to get a list of values, then for each row of table1 scan this LOV to see if it finds a match (IN) or does not (NOT IN). If table2 is absolutely small (e.g. state_code table with 50 records) then this may well be the fastest approach since scanning the LOV should be fast in memory.

INTERSECT/ MINUS does a full scan of both tables, sorts each, then scans the two lists together to identify matches or mismatches. If the two tables are relatively similar in size, then this is likely to be one of the faster methods. The optimizer may choose to use index scans if the columns being compared are all indexed. If you are trying to compare all columns, or a large (unindexed) subset, then this is likely to be the fastest method. It may also be faster if you expect a relatively large number of matched or mis-matched records.

In a NOT EXISTS (or EXISTS), for each row in table 1, the correlated sub-query is executed against table2. If the all of the indentity columns in the sub-query are indexed, then this is pretty quick. The relative sizes of the tables do not matter much, but if both are large, this may be a little slower. Also, it is not as efficient when you need to look at values in several columns. If table2 is not appropriately indexed, and it is large, then this could be incredibly slow since it will require a full scan of table2 for each row in table1.

The [OUTER] JOIN approach uses SORT JOIN - MERGE JOIN or SORT JOIN - NESTED LOOP (depending on the Oracle version and optimizer mode). This is pretty efficient, particularly if the two tables are large, and relativley equally sized, and you expect a relativley small number of mismatches, or a relatively large number of matches (INNER JOIN).

There is also a fifth approach to finding records in table1 that do not exist in table2.

In sqlplus run $ORACLE_HOME/rdbms/admin/utlexcpt.sql then

ALTER TABLE table1
ADD constraint t1_t2_fk
FOREIGN KEY (column_list) REFERENCES table2 (column list)
EXCEPTIONS INTO exceptions

The rowid of any rows that do not pass validation will be stored in a table called exception (created by the script). You can then use the rowids to identify and deal with the rows. NOTE: This requires a unique index on column_list in table2.

Probably not useful for a one-off excercise, but it will prevent it happening on an on-going basis.

Caveat emptor

This posting is the work of the Rogue Moderators. It is posted with the best of intentions but is not guaranteed in any way, shape or form. In particular, the code is presented "as is" and you assume full responsibility for running it on your system.

reference:
http://forums.oracle.com/forums/thread.jspa?threadID=2996

Rajeysh's blog

Compare two table Datas

Terminal time for Linux

About Me

Archive