, , , , ,

Many people are required to take a statistics class and wonder why.  Common reactions from students include distaste for all formulas, a general feeling that numbers cannot convey reality and a concern that statistical formulas use obscure mumbo jumbo to either obscure a simple idea or mislead people.  Our students complain about stats classes but, by the time that they graduate, they have had enough statistics to understand the challenges of making decisions in the face of uncertainty.

The on-going discussion concerning the 2011 Canadian Census represents an example of how much you need to know in order to recognize the issues at stake.  Many users (e.g. [1] [2]) complained about the change in the method of collecting data before it was implemented.  The reason given for the change in method was a concern for privacy but that is disputed.  Canada’s Chief Statistician resigned at this time.

Census data is used in lots of ways every day, directly or after repackaging by market research companies.  If a retailer is trying to decide where to open a store then it would really like to have accurate data on income or the number of children in the families who live nearby.  If a city is trying to understand who is moving into their city and what services they need then it would help to have accurate information on their ethnicity and ages.  It is possible to ask a bunch of friends for this information but most people do not have enough friends for the sample to be reliable enough to make a multi-million dollar decision.

As noted in the Globe and Mail, the problem is not that the data are wrong but that they are unreliable.  If the data say that 40,000 families in a market area have children, and that number is known to be too low by 10 percent then an experienced retailer could easily fix the data problem.  It is very simple and I will teach it to my class in about 10 minutes.

The real problem with using unreliable, incomplete and imperfect data to understand a changing world, as my students are learning this weekend while writing a report, is that life becomes much more complicated if you do not start by knowing the correct answer.  People have identified specific points of concern in the published census data ([1] [2]) but a concern is not a correction.  Without starting with the correct answer, a well-intentioned “correction” could make the estimate more inaccurate.  Since the next census will be conducted in 2016, and those data will be released a couple of years later, many decisions will need to be made before knowing the correct value of the correction.

Even if classes in statistical theory do not provide an answer, they provide a language precise enough to communicate.  Especially if you do not start by knowing the answer, there are serious issues to be discussed and hard work is needed for real understanding.  That is why such courses are required so often.

People often dismiss statistics as being worse than “damned lies”.  I prefer to close with a more obscure quote from a BBC TV show (Yes, Prime Minister  in the episode concerning cigarette smoking, “The Smoke Screen”):

Sir Humphrey: You can prove anything with statistics.
Prime Minister: Even the truth.
(soon after)
Prime Minister: Your statistics are facts and my facts are statistics.