Appendix A: Missing Values

Most databases contain at least some missing data values. The values may be missing due to non availability, non applicability, data processing errors, or even intentional omission. Missing values are also referred to as "N/A" values (for "not available" or "not applicable").

N/A values can also be created as a result of certain analyses. Here are some examples:

Accurate analysis requires understanding the nature and source of N/A values and the appropriate ways of dealing with them. The following sections describe how N/A values are represented in the system and how they are handled in selection and value expressions and tabulations. Some additional techniques for handling such values are discussed as well.

Representing N/A Values

The first thing we should say is that missing values aren't really missing. That may sound like a contradiction in terms, but here is what we mean: When you see a blank value, it isn't that there is no value there at all, rather there is a special value that indicates that meaningful data is missing. The special value may display as blank, but in the database there really is a value. That is why we prefer to refer to a missing value as an N/A value.

Special values are commonly used to represent missing or meaningless data, although different people use different values. The number 999, for example, it often used as a numeric N/A value, especially in a column containing whole numbers. Take the table:

Since it is unlikely that anyone will actually be 999 years old, it is safe to use 999 as the N/A value for age. 999 also has the nice property that it is easily recognizable. Similarly, values like 999.99 or 999.999 are often used for decimal numbers.

In 1010data, we use our own special values. For example, in columns that contain text, we use '' (the empty string). In numeric columns, we use one of two other values, depending on whether the column contains whole numbers or decimal numbers. What these values are isn't really important; what is important is that they are treated differently than other values.

  1. They display as blank.
  2. They are treated specially in selection and value expressions and tabulations. For example, if a computed column's value expression is col1+col2 and col1 (or col2) is N/A on a particular row, the computed column will also be N/A on that row. (See the following sections about N/A handling in selection and value expressions and tabulations.)

Because the system automatically treats these values specially, we try to use them to represent N/As whenever we can. When loading data into a table, if the source data contains other types of N/A values (999, 999.99, NA, etc.), we usually convert those values into our N/A values. We do not convert such values unless the technical documentation for the data specifically says the values are in fact N/A values or if it is patently obvious that is the case. It is therefore entirely possible that some columns in some tables may contain 999, for instance. Please be aware of this when doing computations; if col1 has 999s in it, and you compute col1+col2, the result may not be terribly meaningful!

Looking for N/A Values

Before using a database, it is only prudent to see how many N/A values there are in various columns. There are many ways of doing this.

You can use row selection to select rows where a particular column is N/A. For example:

Or you can tabulate on the column to get all its unique values and look for N/A in the result. 

But perhaps the most convenient method is to use the tabulation summary designed for this very purpose: number of N/As. In a quick summary, this allows you to determine the number of N/As in each of several columns in one operation and, in a tabulation or cross tabulation, to determine the number of N/As for each group. Related summarization methods include number of valid values, number of valid pairs, and number of N/A pairs.