Doing it (making sense of statistics) – 1 – introduction

In this part we will look at the major kinds of statistical analysis methods:

  • Hypothesis testing (the dreaded p!) – robust but confusing
  • Confidence intervals – powerful but underused
  • Bayesian stats – mathematically clean but fragile

None of these is a magic bullet; all need care and a level of statistical understanding to apply.

We will discuss how these are related including the relationship between ‘likelihood’ in hypothesis testing and conditional probability as used in Bayesian analysis. There are common issues including the need to clearly report numbers and tests/distributions used. avoiding cherry picking, dealing with outliers, non-independent effects and correlated features. However, there are also specific issues for each method.

Classic statistical methods used in hypothesis testing and confidence intervals depend on ideas of ‘worse’ for measures, which are sometimes obvious, sometimes need thought (one vs. two tailed test), and sometimes outright confusing. In addition, care is needed in hypothesis testing to avoid classic fails such as treating non-significant as no-effect and inflated effect sizes.

In Bayesian statistics different problems arise including the need to be able to decide in a robust and defensible manner what are the expected prior probabilities of different hypothesis before an experiment; the closeness of replication; and the danger of common causes leading to inflated probability estimates due to a single initial fluke event or optimistic prior.

Crucially, while all methods have problems that need to be avoided, not using statistics at all can be far worse.

Thing to come …

probing the unknown

  • conditional probability and likelihood
  • statistics as counter-factual reasoning

types of statistics

  • hypothesis testing (the dreaded p!) – robust but confusing
  • confidence intervals – powerful but underused
  • Bayesian stats – mathematically clean .. but fragile – issues of independence

issues

  • idea of ‘worse’ for measures
  • what to do with ‘non-sig’
  • priors for experiments?  Finding or verifying
  • significance vs effect size vs power

dangers

  • avoiding cherry picking – multiple tests, multiple stats, outliers, post-hoc hypotheses
  • non-independent effects (e.g. fat and sugar)
  • correlated features (e.g. weight, diet and exercise)