Includes recipes conveying how to
and overall cleanse data sets.
Removing duplicate rows from a table
Sometimes a table can contain multiple copies of the same information. Whether these duplicates were original to the table, or they occurred after combining data sets, you can easily locate and remove all of the duplicate rows from a table.
Removing values n standard deviations from the mean
You can determine outliers by identifying values in specific columns that fall n standard deviations outside of the mean for that column in a given data set. Rows that fall outside the desired range can then be eliminated.