Doing it – if not p then what | Statistics for HCI

In this part we will look at the major kinds of statistical analysis methods:

Hypothesis testing (the dreaded p!) – robust but confusing
Confidence intervals – powerful but underused
Bayesian stats – mathematically clean but fragile
Simulation based – rare but useful

None of these is a magic bullet; all need care and a level of statistical understanding to apply.

We will discuss how these are related including the relationship between ‘likelihood’ in hypothesis testing and conditional probability as used in Bayesian analysis. There are common issues including the need to clearly report numbers and tests/distributions used. avoiding cherry picking, dealing with outliers, non-independent effects and correlated features. However, there are also specific issues for each method.

Classic statistical methods used in hypothesis testing and confidence intervals depend on ideas of ‘worse’ for measures, which are sometimes obvious, sometimes need thought (one vs. two tailed test), and sometimes outright confusing. In addition, care is needed in hypothesis testing to avoid classic fails such as treating non-significant as no-effect and inflated effect sizes.

In Bayesian statistics different problems arise: the need to be able to decide in a robust and defensible manner what are the expected likelihood of different hypothesis before an experiment; and the dangers of common causes leading to inflated probability estimates due to a single initial fluke event or optimistic prior.

Crucially, while all methods have problems that need to be avoided, we will see how not using statistics at all can be far worse.

Detailed notes and videos

Making Sense of Statistics in HCI: part 2 – doing it from Alan Dix