, , , , , , ,

Most people think that, when academics meet, they discuss obscure topics of interest to nobody else. If you are curious, you will be surprised at what you can learn. Academics have to be professionally curious, both in order to do worthwhile research and so that today’s students are prepared with ideas newer than those taught to their parents.

At a recent meeting of 11,000 of my best friends from around the world, there were lots of obscure discussions. One of the panel discussions focused on the use of administrative data for research purposes.

Administrative data is stuff which is collected as a by-product of a government implementing a policy, as opposed to the data collected by a statistical agency for statistical purposes ([1] [2]).  The push for Open Data is the most obvious example whose benefits are starting to be realized as a way to keep governments accountable.

Administrative data has real advantages over a common alternative. People find it incredibly easy to dismiss academic research, if and only if it reaches a “wrong conclusion”, if the research uses undergraduate students as research subjects.

Studying real life behaviour would be more useful, if the data could be collected without bothering the people being studied. The ideal research study finds the effect of a change by comparing the situations before and after a natural change or “experiment”. The problem is that it is easier to collect data after than to do it before but, without knowing both, nothing can be learned. The panel noted that natural experiments help to resolve policy debates, which have not been resolved using regular data sources, and other hard puzzles.

Encouragement is needed to make such data available. First, because there are real concerns about privacy. Good researchers care about the research goal and not the individuals, even if some information could be personally identified by matching it with other information. (I have talked with people who work at Statistics Canada about the efforts they take to make sure that no such information is revealed either directly or, by using inference, indirectly.) Paradoxically, this issue becomes more acute as more people reveal more information about themselves voluntarily and on-line (Facebook, Twitter, …). Privacy concerns are important but they do not necessarily beat other concerns.

Second, government cutbacks make it difficult for government officials to organize the data into a form suitable for researchers. Data is not just a bunch of numbers. In order to be useful, errors in the data need to be “cleaned” plus the data need to be organized and documented so clearly that there is no confusion about the meaning of “income” or “family”.

Making administrative data available can have profound implications. It may seem trivial now but what we learn now shows up in our graduates a few years later. Later still, asking better questions stimulates new methods of analysis and better answers which are repackaged into the speeches of leading edge thinkers in the private sector.

So, you can learn a lot if you are curious.

PS Some of the best quality and most widely used data in Canada come from the census conducted in Canada every five years. Recently, this exercise has become controversial and has prompted front-cover stories in leading newspapers and a private member’s bill.

PPS (Jan. 30) Questions related to data availability and data quality interest data geeks, but everybody who is interested in evidence-based decision making should be concerned when widely-used statistics are suspect.  This article from today’s Globe and Mail offers lots of unusually detailed insights and should provoke more fundamental questions about methods.