We don’t like data

John Haman


Categories: Statistics Tags: Statistics

Here is a model I’d like to fit: What proportion of statisticians actually like data? What does your prior distribution for this proportion look like?

Some statisticians like the idea of data, the theory of data, stuff like that. Real data means data cleaning, which is kind of fun, in limited doses. But I do not think some (many?, most?) statisticians like working on real data problems. They are just too messy. We’d rather spend our time finding the right methods to apply, helping others apply those methods, advocating, teaching, and so on.

Real data is the hardest when data is messy and when research questions are poorly defined. One way to make a real problem more fun is by putting in the upfront work to properly define the research questions in terms of a good model for the data. Actually, this is one the best ways to convince people of the usefulness of statistics.