Determining if two tables match
You can determine if two tables contain the same information by comparing the columns of one table to the columns of a second table.
Difficulty
Objective
You have two tables that appear to be the same, but you want to determine if they are indeed exact matches. You can do this by creating an md5 hash of the information across all columns in both tables and then compare the two hashes.
Solution
<meta>empty </meta> <base table="pub.doc.retail.altseg.stores"/> <willbe name="md5_hash" value="r_md5sum(store division subdivision zip divisiondesc subdivisiondesc manager sqft format city state dc compeffdate pycompeffdate compDate;)"/> <colord cols="md5_hash"/> <link table2="pub.doc.retail.altseg.stores" col="md5_hash" col2="md5_hash" type="exclude"> <willbe name="md5_hash" value="r_md5sum(store division subdivision zip divisiondesc subdivisiondesc manager sqft format city state dc compeffdate pycompeffdate compDate;)"/> </link>
Alternate solution
<base table="pub.doc.retail.altseg.stores"/> <willbe name="md5_hash" value="r_md5sum(store division subdivision zip divisiondesc subdivisiondesc manager sqft format city state dc compeffdate pycompeffdate compDate;)"/> <colord cols="md5_hash"/> <link table2="pub.doc.retail.altseg.stores" col="md5_hash" col2="md5_hash" type="exclude"> <willbe name="md5_hash" value="r_md5sum(store division subdivision zip divisiondesc subdivisiondesc manager sqft format city state dc compeffdate pycompeffdate compDate;)"/> <willbe name="match" value="1"/> <colord cols="md5_hash,match"/> </link>
Discussion
Sometimes two tables can look similar but it's difficult to determine if they are an exact match just by previewing both tables. Instead, you can create an md5 hash of the information contained in both tables and use this hash to compare the tables.
In both solutions shown, a column is created that contains the md5 hash in both the base
table and the foreign table. This is created by using the function,
r_md5sum(C;S)
, which hashes the data across a given set of columns. For
this recipe, the hash includes data from all columns, but if you want to compare only a
subset of the columns from each table, you can adjust accordingly.
<meta> empty </meta>
is present at the beginning of
the query. This allows no rows to be selected as the result of your query. If this was
absent from your query, you would receive the error message No rows
selected.The second solution creates an additional column, match
, that contains the
value 1. When a regular link is performed between the base
table and the foreign table on the md5_hash
column, match
will display a 1 if the row matches the foreign table and a
0 otherwise.
Further reading
If you would like to learn more about the functions and operations discussed in this recipe, click on the links below: