Wednesday, April 22, 2015
There is no "average person"
"Average LA residents drove..."
"The average family has 2.4 kids."
I've heard phrases like this too many times this week, and I have to put my contempt for them in writing. My wife knows how much I hate this phrase, and it's a running joke for her to point it out to me when we hear it.
Why do I hate this phrase? It's partly because it's not accurate. When someone says "The average American owns 1.6 cars" what they mean to say is "Americans own an average of 1.6 cars." Such phrasing has become notoriously ridiculous in such cases like cars and kid for which it's impossible to own half a car or have half a kid. But it sneaks in through other statistics that make more sense as fractions (e.g., "The average Californian uses 54.3 gallons of water a day").
I'm a stickler for grammar and accuracy, but that's not the main reason why I hate this phrasing. It goes deeper than that.
1) It makes interpretation of the statistic difficulty for a lay audience, and makes it easy to criticize the statistic (or statistics in general). "No one has 2.4 kids!" "Who are these 'average' people?!"
2) There is no "average American" (or any member of any population) but there can be a multivariate average of multiple characteristics (or the most frequently occurring characteristics) among Americans (or any other population). In this "The average person..." phrasing, we're not talking about that, though. If you think through the grammar of the sentence, it implies that they've created some sort of multivariate average from some sample and are reporting a statistic for that group. Really all they're doing is misapplying the term "average" to the people (e.g., Americans, Californians) when it should be applied to the outcome (e.g., cars, kids, gallons of water).
Understanding statistics in popular media is already hard enough, and many of them are easy enough to critique for many reasons. Why confuse the public more and paint a bigger bullseye for targets of criticism, when a simple grammatical fix would make things clearer?
Tuesday, April 21, 2015
The elusive rate
I was just listening to a story on NPR about how NYC subway ridership is at its highest since 1940-something. Then they reported a total for current ridership. Comparisons and confidence intervals aside, this is a clear situation where the rate of ridership among the population is more important than total ridership. It's possible that population has grown at the same rate as ridership, in which case there would be no per capita increase in ridership. That's impossible to know from this story alone. Perhaps someday I'll have a chance to look it up.
Just before hearing that story, I was thinking about water use in Southern California. They've been reporting a lot on average gallons used per household. This is enlightening about household habits (and, turns out, type of housing in the area), but it says little about damage to the overall water shortage problem. A community with 100 gallons per household per day but only 1000 households will use 100,000 gallons of water each day, but a community with a mere 10 gallons per household per day use and 1 million people will use 10 million gallons per day. So the community that looks better through the average metric will be doing more damage to our water supply as a whole.
Almost every story has somr uses that make more sense as an average (usually when inference is about individual units in the population), and some that make more sense as a total (usually when inferences are about the overall situation or effort). For example, in survey data collection we often deal with interview duration as a metric of data collection efficiency. If were care about efficiency in individual interviews or interviewers, a mean or median is best. If we a trying to figure out how many total hours the effort is taking (and thus how much it is costing), total hours is more important. So before you decide whether to report a mean (or proportion) or a total, think about what part of the problem you're focusing on.