Determining if two tables match

You can determine if two tables contain the same information by comparing the columns of one table to the columns of a second table.

Difficulty

Objective

You have two tables that appear to be the same, but you want to determine if they are indeed exact matches. You can do this by creating an md5 hash of the information across all columns in both tables and then compare the two hashes.

Solution

<meta>empty
</meta>
<base table="pub.doc.retail.altseg.stores"/>
<willbe name="md5_hash" value="r_md5sum(store division subdivision zip 
divisiondesc subdivisiondesc manager sqft format city state dc compeffdate 
pycompeffdate compDate;)"/>
<colord cols="md5_hash"/>
<link table2="pub.doc.retail.altseg.stores" col="md5_hash" col2="md5_hash" 
type="exclude">
  <willbe name="md5_hash" value="r_md5sum(store division subdivision zip 
  divisiondesc subdivisiondesc manager sqft format city state dc compeffdate 
  pycompeffdate compDate;)"/>
</link>

Alternate solution

<base table="pub.doc.retail.altseg.stores"/>
<willbe name="md5_hash" value="r_md5sum(store division subdivision zip 
divisiondesc subdivisiondesc manager sqft format city state dc compeffdate 
pycompeffdate compDate;)"/>
<colord cols="md5_hash"/>
<link table2="pub.doc.retail.altseg.stores" col="md5_hash" col2="md5_hash" 
type="exclude">
  <willbe name="md5_hash" value="r_md5sum(store division subdivision zip 
  divisiondesc subdivisiondesc manager sqft format city state dc compeffdate 
  pycompeffdate compDate;)"/>
  <willbe name="match" value="1"/>
  <colord cols="md5_hash,match"/>
</link>

Discussion

Sometimes two tables can look similar but it's difficult to determine if they are an exact match just by previewing both tables. Instead, you can create an md5 hash of the information contained in both tables and use this hash to compare the tables.

In both solutions shown, a column is created that contains the md5 hash in both the base table and the foreign table. This is created by using the function, r_md5sum(C;S), which hashes the data across a given set of columns. For this recipe, the hash includes data from all columns, but if you want to compare only a subset of the columns from each table, you can adjust accordingly.

The first solution performs a link and exclude with the foreign table, which will result in the selection of rows from the base table that do not exist in the foreign table. In this case, no rows are selected because all rows from the base table have a match in the foreign table.
Note: <meta> empty </meta> is present at the beginning of the query. This allows no rows to be selected as the result of your query. If this was absent from your query, you would receive the error message No rows selected.

The second solution creates an additional column, match, that contains the value 1. When a regular link is performed between the base table and the foreign table on the md5_hash column, match will display a 1 if the row matches the foreign table and a 0 otherwise.

Further reading

If you would like to learn more about the functions and operations discussed in this recipe, click on the links below:

r_md5sum(C;S)

<meta>

<link>