# Summarizations and Tabulations

Some of the basic ways to analyze your data using 1010data include quick summaries, tabulations, and cross tabulations.

Oftentimes, you may want to determine the total of all the values in a particular column or the average of those values. Or perhaps you want to find the highest or lowest value in a column. A quick summary allows you to perform calculations on one or more columns in a 1010data table to ascertain these quantities very easily.

Using tabulations, you can summarize the values in a column (or columns) based on the values in another column (or columns) and display those results in a tabular format.

For example, suppose a table has a row for every employee and columns **Name**,
**Title**, **Department**, and **Salary**. Using a tabulation, we can compute the
number of people and average salary for each department. The result will look something
like:

Department | Count | Average Salary |
---|---|---|

228 | 73,946 | |

Finance | 35 | 61,778 |

Marketing | 41 | 87,562 |

Sales | 67 | 92,962 |

Engineering | 34 | 69,276 |

Technology | 51 | 49,481 |

The cells in the top row beneath the column headings are column aggregates. In this example, the total number of employees (which corresponds to the number of rows in the original table) is 228, and the average salary across all departments is 73,946.

A cross tabulation allows you to summarize the values in a column based on the values in two or more other columns and display the result as a matrix.

In our above example, we could use a cross tabulation to compute the average salary in each department for each title. This would result in a table with one row for each department and one column for each title:

Title | VP | AVP | Manager | Grunt | |
---|---|---|---|---|---|

Department |
73,946 | 143,356 | 118,979 | 85,345 | 85,345 |

Finance | 61,778 | 95,774 | 85,812 | 70,553 | 48,944 |

Marketing | 87,562 | 127,344 | 127,664 | 100,359 | 70,686 |

Sales | 92,962 | 155,863 | 125,268 | 100,945 | 74,314 |

Engineering | 69,276 | 116,181 | 94,189 | 78,594 | 53,997 |

Technology | 49,481 | 79,178 | 67,080 | 56,285 | 39,657 |

In this example, we can see that the average salary for AVPs in Engineering is 94,189. The cells in the first row beneath the column headings and the first column next to the row headings are aggregates. The first row of data contains the average salaries across all departments for each title (e.g., the average salary for all VPs is 143,356). Similarly, the first column of data contains the average salaries across all titles for each department (e.g., the average salary for all people in the Finance department is 61,778). The top left-most cell of data is the average for all employees across all departments and titles, which is 73,946.

Certain types of summarizations require a second column to perform the computation. For instance, a weighted average requires two columns: the column on which you are calculating the average and the column that contains the weight. Other examples of summarizations that involve two columns of data include the correlation, dot product, or covariance of two columns.

Missing values in the data are handled automatically in 1010data. For example, if you are computing the average value of a column that has some missing (N/A) values, those values are ignored when computing the average.

Quick summaries, tabulations, and cross tabulations can be performed easily using the
1010data web interface. In addition, the 1010data Macro Language functions (in particular,
G_Functions) provide a comprehensive set of tools for performing similar and more advanced
operations on data in 1010data. For instance, you can obtain the same information from the
tabulation in the above example using a very simple G_Function: To find out the average salary
for each department, you could create a computed column and use the `g_avg`

function as its value expression. The results of the cross tabulation could be determined in a
similar way. And this is just the tip of the iceberg when it comes to G_Functions. The
potential for data discovery using these powerful tools is limitless.