In this part we will look at the major kinds of statistical analysis methods:

Hypothesis testing (the dreaded p!) – robust but confusing
Confidence intervals – powerful but underused
Bayesian stats – mathematically clean but fragile

None of these is a magic bullet; all need care and a level of statistical understanding to apply.

We will discuss how these are related including the relationship between ‘likelihood’ in hypothesis testing and conditional probability as used in Bayesian analysis. There are common issues including the need to clearly report numbers and tests/distributions used. avoiding cherry picking, dealing with outliers, non-independent effects and correlated features. However, there are also specific issues for each method.

Classic statistical methods used in hypothesis testing and confidence intervals depend on ideas of ‘worse’ for measures, which are sometimes obvious, sometimes need thought (one vs. two tailed test), and sometimes outright confusing. In addition, care is needed in hypothesis testing to avoid classic fails such as treating non-significant as no-effect and inflated effect sizes.

In Bayesian statistics different problems arise including the need to be able to decide in a robust and defensible manner what are the expected prior probabilities of different hypothesis before an experiment; the closeness of replication; and the danger of common causes leading to inflated probability estimates due to a single initial fluke event or optimistic prior.

Crucially, while all methods have problems that need to be avoided, not using statistics at all can be far worse.

Thing to come …

probing the unknown

conditional probability and likelihood
statistics as counter-factual reasoning

types of statistics

hypothesis testing (the dreaded p!) – robust but confusing
confidence intervals – powerful but underused
Bayesian stats – mathematically clean .. but fragile – issues of independence

issues

idea of ‘worse’ for measures
what to do with ‘non-sig’
priors for experiments? Finding or verifying
significance vs effect size vs power

dangers

avoiding cherry picking – multiple tests, multiple stats, outliers, post-hoc hypotheses
non-independent effects (e.g. fat and sugar)
correlated features (e.g. weight, diet and exercise)

Statistics for HCI

making sense of quantitative data

Doing it (making sense of statistics) – 1 – introduction

Thing to come …