Numbers can be enlightening and misleading

More than a century ago, Irish physicist William Thomson (Lord Kelvin) argued that numerical data is fundamental in research: “When you cannot express it in numbers, your knowledge is of a meagre and unsatisfactory kind.” That meshes with the contemporary saying that “Without data all you have is an opinion.”

Economists take this to heart. It is difficult to find an economic study that does not include information expressed in numbers. But while quantitative data imposes discipline on scholarly analysis and exposition, it also introduces intellectual snares.

Take the social and political issue of killings by police. While reading news accounts over the past few months, one concludes that these happen frequently. But just how frequently? Following the shooting of Philando Castile in Falcon Heights — less than two miles from my house — I heard someone in my neighborhood assert that a U.S. resident is 400 times more likely to be killed by the police than someone in the United Kingdom. That is a striking bit of information and one that is much more specific than, “Gee, there seem to be a lot of killings like this these days.”

So I thought I’d check it out. Available data shows the assertion is roughly true. But what I found revealed complexities, however. Killings by police while carrying out their duties actually are more common here than they are in Britain. But the numbers for our country are not precise. The FBI and Department of Justice, which tabulate many crime statistics, do not have a separate category for killings by police. We have a decentralized policing system, and there is no law that requires local or state law enforcement to report such deaths. There is an FBI category for “justifiable homicides” but its definition does not overlap closely with killings by police.

Given deficiencies in the tabulated data and the ease of searching real-time news, at least two newspapers and a number of nongovernmental organizations have started running tallies of such incidents. These indicate annual rates of 900 to 1,500 for the 330 million people in the country.

That last number highlights an important point. If one compares across states or across nations, one has to adjust for differing population sizes and express data as rates per 100,000 or million population, rather than total amounts.

Care also is needed when there are small changes in low levels of something. The U.K. had a total of five incidents in 2013 during which police fired one or more shots. There were only three such incidents in 2014. Numerically that represents a 40 percent decline in the incidence of police firing weapons. But that is probably just random variation in a very low rate.

Also note that these were the number of incidents in which the police fired at least one shot. Not all resulted in deaths. The U.S. population is roughly five times that of the 64 million in the U.K. So 1,500 deaths in our country would be a rate equal to 300 there, as a percentage of their population.

Over the same period, discharge of firearms by police in the Netherlands reportedly increased from 15 to 25. Since the population of that country is only one-fourth that of the U.K., one would be correct in making the calculation that eight times as many incidents in one-fourth of the people means that police firearms use was about 32 times as high per million people in the Netherlands. But does that mean it is a European version of the Wild West? In both countries, police shooting guns is extremely rare.

This bit of information, that police shooting or killing is far rarer in another country than here is mentally striking. But it doesn’t explain much about why the difference exists. Nor does it tell us anything about trends.

The inadequacy of underlying information makes it as difficult to compare U.S. deaths at the hands of police over time as it is between different countries. There is evidence that the rate of such deaths has increased somewhat over time, but the data is simply so poor that it is hard to draw firm conclusions.

The data problems are evidenced by the fact that The Guardian newspaper tabulated 1,134 such deaths in our country for 2015 while the Washington Post, in a similar effort, came up with only 990. This was only for one year. Going back in time is difficult, and the results of news searches become more tenuous. So one cannot say much about long-term trends.

One can break these media-tabulated numbers down by state, race and gender. African-Americans were killed at a rate of some seven per million, Hispanics and Native Americans at about 3.5 and whites at about 2.9 per million. But there is no reliable tabulation of differing circumstances of the shootings, i.e. whether it was in the course of commission of a verified crime or as a result of a traffic stop.

Public controversy over lack of reliable numbers might motivate change. FBI director James Comey has term it “unacceptable” that a British newspaper has more comprehensive data about such shootings than his own agency.

There are similar intricacies of data in most other areas.

There are some areas, such as the “vital statistics” of births, deaths and illnesses, in which data definitions and collection techniques are very standardized and have been in place for decades. Comparisons between states and nations or between earlier and later years generally are easy and reliable. There still are a few quirks however in something as basic as infant mortality, in that some European nations still classify all deaths of children born live before a certain length of gestation as a stillbirth while others tabulate it as an infant death.

The “national income and product accounts” that include “gross domestic product,” “national income” and “personal income” have become standardized and are reliable and comparable for the major industrialized nations. There still are some definitional differences between nations in labor market data, such as unemployment rates, but many nations now publish what the headline rate would be using other definitions. One has to delve in the back pages of monthly reports to find the tables, but the information is available.

But just because data is available does not mean that correct interpretation of the results is easy. Some years ago, political controversy arose over a study by Carmen Reinhart and Ken Rogoff showing that economic growth suffered when a nation’s government debt to GDP ratio passed 90 percent. It turned out that there was a spread sheet error that, when corrected, changed the tabulated results. But the data depended heavily on small nations such as New Zealand under economic conditions very different than the United States. Reaching a conclusion about our country from this data might have been like reaching a conclusion about corn production from a study that lumped together production of barley, radishes and begonias.

Similarly, I, like many other economists, long relied on research by George Borjas on how legal and illegal immigration affected unemployment rates. Borjas is a very respected scholar, and he found the effect very small. But it turns out that most of his work focused on one particular metro area at a couple points in time. To conclude that what is true for one municipal area therefore is true for a whole nation is a gross “fallacy of composition” that is warned against in the first chapter of every intro econ text. But many renowned economists fell into the error.

Lord Kelvin might have been right that in the absence of numbers, knowledge is meager. But having numbers is no guarantee that one’s conclusions are full or correct.