Combining rows of multiple tables

You can efficiently combine the rows of multiple tables that are located in the same folder and have the same columns using <loop>. The result will be a single worksheet containing the rows from each table.

Difficulty

Objective

You have a folder containing multiple tables that have the same columns but which have data from different years. You would like to combine the rows of each table into a master table, and you would like each row to indicate which year it is from. While it is possible to simply perform a <merge> operation for each table you'd like to include, new tables added to the folder in the future will not be accounted for, and your code could become lengthy. Instead, utilizing the <loop> operation allows you to perform the same <merge> operation on every table in the folder. This method will also allow you to include any tables that might be added in the future without having to modify the code.

Solution

<loop with_="files">
  <outer>
    <directory folder="pub.demo.weather"/>
    <colord cols="path"/>
    <col name="path" format="width:50"/>
    <sel value="contains(path;'pub.demo.weather.hourly')"/>
    <transpose/>
  </outer>
  <inner>
    <base table="{@files.m0}"/>
    <letseq tables="{@files}" keys="{pkg_names(@tables)}">
      <tabu label="Tabulation on Hourly U.S. Weather" breaks="id,date">
        <tcol source="prec" fun="avg" 
         label="Average`Hourly`Precip`(.01 inch)"/>
        <tcol source="snow" fun="avg" 
         label="Average`Snow`Depth`(inches)"/>
        <tcol source="temp" fun="avg" 
         label="Average`Dry Bulb`Temp`(Celsius)"/>
      </tabu>
      <foreach table_year="{@keys}" tally_="@i">
        <if test="{@i>1}">
          <then>
            <merge table2="{@files.{@table_year}}" match="pad">
              <tabu label="Tabulation on Hourly U.S. Weather" 
              breaks="id,date">
                <tcol source="prec" fun="avg" 
                 label="Average`Hourly`Precip`(.01 inch)"/>
                <tcol source="snow" fun="avg" 
                 label="Average`Snow`Depth`(inches)"/>
                <tcol source="temp" fun="avg" 
                 label="Average`Dry Bulb`Temp`(Celsius)"/>
              </tabu>
            </merge>
          </then>
          <else/>
        </if>
      </foreach>
    </letseq>
  </inner>
</loop>

Alternate solution

<loop with_="files">
  <outer>
    <directory folder="pub.demo.weather"/>
    <colord cols="path"/>
    <col name="path" format="width:50"/>
    <sel value="contains(path;'pub.demo.weather.hourly')"/>
    <transpose/>
  </outer>
  <inner>
    <base table="default.lonely"/>
    <letseq tables="{@files}" keys="{pkg_names(@tables)}">
      <foreach table_year="{@keys}" tally_="@i">
        <merge table2="{@files.{@table_year}}" match="pad">
          <tabu label="Tabulation on Hourly U.S. Weather" breaks="id,date">
            <tcol source="prec" fun="avg" label="Average`Hourly`Precip`
            (.01 inch)"/>
            <tcol source="snow" fun="avg" label="Average`Snow`Depth`
            (inches)"/>
            <tcol source="temp" fun="avg" label="Average`Dry Bulb`Temp`
            (Celsius)"/>
          </tabu>
        </merge>
      </foreach>
    </letseq>
    <sel value="date<>NA"/>
    <colord hide="c1"/>
  </inner>
</loop>

Discussion

The <loop> operation is a useful way to perform the same actions repeatedly without the need to duplicate code. This recipe uses <loop> in conjunction with the <merge> operation to merge several tables in the same folder. In this way, you can combine numerous tables without needing to use multiple <merge> operations.

Both of the above solutions use the <directory> operation to first create a table containing the information for all of the tables within a specific folder. Once that information is obtained, the <colord> operation is used to reduce the worksheet's contents to a single column of table names. Since the folder contains additional tables that should not be included in the merged table, the <sel> operation is used to select only the rows that contain the names of the desired tables. Then, the <transpose> operation converts the worksheet from a single column to a single row, and the values are placed in a package by the <outer> part of the <loop> in the variable specified by the with_ attribute.

In the main solution, the <inner> loop then specifies a base table using the first item in the package ({@files.m0}), and the <foreach> operation merges each additional item. In order to avoid merging the base table to itself, you can use an <if> statement to only merge after the first iteration. An initial tabulation is performed to aggregate the data into the desired format, and the same tabulation is then performed on each subsequent table before the final results are merged.

Alternatively, you can use an empty table, default.lonely, as the base table to which all other tables are merged. This will create an empty row at the top of the data table and an empty column labeled c1. However, these can be easily removed by using the <sel> and <colord> operations after the merge.

Further reading

If you would like to learn more about the functions and operations discussed in this recipe, click on the links below:

<loop>

<merge>

<foreach>