 # sample variance

## Terms from Statistics for HCI: Making Sense of Quantitative Data

A measure of the variability of your sampled data. In the simplest case it is the average square of the difference between the data and the sample mean (μ), but dividing by N–1 rather than N as you would for the population variance or the variance of a theoretical distribution. That is, if the data is x1, x2, ... xN, the sample variance is ( Σ (xi–mean) 2) / N–1
The reason for dividing by N–1 rather than N is that the mean is usually estimated from the sample data and so is slightly closer to the middle of the data than the real (theoretical or population) mean. Think about the extreme case with just two measured values, the sample mean is exactly in the middle, so their spread from that would give a biased estimate of the true population variance. This use of N–1 is an example where we can model the bias and hence correct it.
More generally the sample variance is the sum of the square of the residuals divided by the degrees of freedom, (N–n), where n is the number of things fitted by the statistical model.
The sample variance is often affected strongly by outliers, which then has an impact on the statistical power; one of the reason for (carefully!) removing outliers during data cleaning.

Used on page 81