book

Statistics for HCI: Making Sense of Quantitative Data

This is due out in 2020 published by Morgan & Claypool as part of the series Synthesis Lectures on Human-Centered Informatics.

About the book

Sometimes it seems we are bombarded with numbers, from global warming to utility bills. In user research or academic studies we may also encounter more formal statistics such as significance testing (all those p values) or Bayesian methods, and certainly graphs and tables.

For those of us working with people, we know that numbers do not capture the complexities of social activity or the nuances of human feelings, which are often more appropriately explored through rich qualitative studies. Indeed many researchers shun anything numerical as at best simplistic and at worst dehumanising.

However, the truth is that we all use statistics, both in our work and day-to-day lives. This may be obvious if you read an article with explicit statistics, but mostly the statistics we use are informal and implicit. If you eyeball a graph, table of results or simple summary of survey responses and it affects your opinions, you are making a statistical inference. If you interview a selection of people or conduct a user trial of new software and notice that most people mention a particular issue or have a particular problem, you are using statistics.

Behind the surface our brains constantly average and weigh odds and we may be subconsciously aware of statistical patterns in the world well before we explicitly recognise them. Statistics are everywhere and, consciously or unconsciously, we are all statisticians. The core question is how well we understand this.

This book is intended to fill the gap between the `how to’ knowledge in basic statistics books and having a real understanding of what those statistics mean. It will help you make sense of the various alternative approaches presented in recent articles in HCI and wider scientific literature. In addition the later chapters will present aspects of statistical `craft’ skill that are rarely considered in standard textbooks. Some of the book relates to more formal statistics, but other parts will be useful if you are only eyeballing graphs, or making qualitative judgements about data.

There are some excellent books on advanced statistical techniques within HCI: Robertson and Kaptein’s collection `”Modern Statistical Methods for HCI” and Cairns “`Doing Better Statistics in Human-Computer Interaction“. This book is intended to complement these, allowing you to follow statistical arguments without necessarily knowing how to perform each of the analyses yourself, and also, if you are using more advanced techniques, to understand them more thoroughly.

This book arose from a course on `”Understanding Statistics'” at CHI 2017, which itself drew on earlier short courses and tutorials from 20 years before. The fundamentals of statistics changed little in those 20 years; indeed I could and should have written this book then. However, there have been two main changes, which have intensified both the need and the timeliness. The first is the increased availability, usability and power of statistical tools such as R. This is wonderful in making it so much easier to apply statistics, but can also lead to a false sense of security when complex methods are applied without understanding their purpose, assumptions and limitations. The second change has been a growing publicity about the problems of badly applied statistics – the `statistical crisis’: topics that were once only discussed amongst professional statisticians are now a matter of intense debate on the pages of Nature and the halls of CHI. Again this awareness is a very positive step, but with the danger that HCI researchers and UX practitioners may reach for new forms of statistics with even less understanding and greater potential for misuse. Even worse, the fear of doing it wrong may lead some to avoid using statistics where they are appropriate, or to excuse abandoning them entirely.

We are in a world where big data rules, and nowhere more than in HCI where A–B testing and similar analysis of fine-grained logging means that automated analysis appears to be overtaking design expertise. To make sense of big data as well as the results of smaller laboratory experiments, surveys or field studies, it is essential that we are able to make sense of the statistics necessary to interpret quantitative data and to understand the limitations of numbers and how quantitative and qualitative methods can work together.

By the end of the book you should have a richer understanding of: the nature of random phenomena and different kinds of uncertainty; the different options for analysing data and their strengths and weaknesses; ways to design studies and experiments to increase `power’ – the likelihood of successfully uncovering real effects; and the pitfalls to avoid and issues to consider when dealing with empirical data. I hope that you will be better equipped to understand reports, data and academic papers that use statistical techniques, and to critically assess the validity of their results and how they may apply to your own practice or research. Most important, you will be better placed to design studies that efficiently use available resources and appropriately, effectively and reliably analyse the results.

Intended Readership

This book is intended for both experienced researchers and students who have already engaged, or intend to engage, in quantitative analysis of empirical data or other forms of statistical analysis. It will also be of value to practitioners using quantitative evaluation. There will be occasional formulae, but the focus of the book is on conceptual understanding, not mathematical skills.