Determining the differences between two data sets

You can use 1010data to determine the number of differences between two tables, similar to using COUNTIF in Excel.

Difficulty

Objective

You have two tables and you would like to know how many points differ between the two data sets. You know how to do this in Excel by using COUNTIF and SUM, and you would like to produce the same results using 1010data. For this analogue, the number of differences between the players in the Batting table and the Fielding table are calculated.

Excel solution

After importing the two tables with all duplicates removed into Excel, the two columns of player ID's are placed side by side in the same worksheet. This allows the comparisons to take place. Then the COUNTIF function is used to compare if any single entry is contained in both columns. SUM then adds these results to determine the number of similarities between the tables.

=COUNTIF($B:$B,$A2)
=SUM(C:C)

Another COUNTIF function is used to count the total number of players.

=COUNTIF(A:B,"*")-2

Then finally, the number of differences are calculated by subtracting twice the similarities from the total count.

=E2-2*D2

An image of the Excel worksheet can be seen below.

Calculating these results in Excel is not ideal. Pre-adjustments to the tables are required before you can begin work in Excel. In addition, the time to complete the comparison between the two ranges is lengthy. With 1010data, you can complete all parts of the calculation in one place and at a faster speed.

1010data Macro Language solution

<base table="pub.demo.baseball.batting"/>
<willbe name="first" value="g_first1(lahmanid;;lahmanid)"/>
<sel value="first=1"/>
<colord cols="lahmanid"/>
<merge table2="pub.demo.baseball.fielding" type="intersect">
  <willbe name="first" value="g_first1(lahmanid;;lahmanid)"/>
  <sel value="first=1"/>
  <colord cols="lahmanid"/>
</merge>
<tabu label="Number of Rows">
  <tcol source="lahmanid" fun="cnt" label="Count"/>
</tabu>
<merge table2="pub.demo.baseball.batting">
  <willbe name="first" value="g_first1(lahmanid;;lahmanid)"/>
  <sel value="first=1"/>
  <merge table2="pub.demo.baseball.fielding">
    <willbe name="first" value="g_first1(lahmanid;;lahmanid)"/>
    <sel value="first=1"/>
  </merge>
  <tabu label="Number of Rows">
    <tcol source="lahmanid" fun="cnt" label="Count"/>
  </tabu>
</merge>
<transpose/>
<willbe name="differences" value="m1-2*m0"/>

With 1010data, tabulations are utilized to determine the number of rows, and therefore the number of players.

In the Batting table, a column is created using <willbe> to distinguish the first occurrence of every player with the function g_first1(G;S;O). A selection statement then selects only this first occurrence to eliminate duplicates. Since you are only interested in the differences of players, <colord> is used to show only the lahmanid column.

Then, the Fielding table is merged with the Batting table, with the same transformations applied. A key step here is using type="intersect" with the merge, therefore, only players that appear in both tables will appear in the new merged table. Now that you have a table that contains only the players the two tables have in common, you can count them by performing a tabulation and using fun="cnt".

The total number of players contained in both tables is needed to then determine the number of differences. This is done by again applying the same transformations to each individual table and merging them together. Then the count tabulation is again used to count the total number of players. Additionally, all of this is done inside another merge, so the results can be combined with the results of the previous tabulation.

Due to the nature of the table, a <transpose/> operation is needed to allow both results to be visible. Finally, another <willbe> column is created where its value is equal to the total number of players less twice the similarities. The similarities need to be doubled to account for their presence in both tables. Below is an image of your final table.

Further reading

If you would like to learn more about the functions and operations discussed in this recipe, click on the links below:

g_first1(G;S;O)

<transpose/>

<link>