The traditional census – counting all the people in a country or locale and tabulating those results – currently sits at the center of a vast web of administrative and commercial data collection taking place globally, notes data scientist Andrew Whitby in his just-released book, The Sum of the People. As he writes in the prologue:
“But this does not guarantee its future. In a world of driver’s licenses and passports, tax returns and benefit checks, fingerprints and retina scans, hourly social media status updates and minute-by-minute location tracking, the traditional census seems increasingly anachronistic – as one group of sociologists put it, ‘an outdated high modernist invention.’ It is infrequent, expensive, and bound by strict privacy rules.”
That’s a rough health update for someone promoting a new book on the subject, but Whitby adds, “for now, though, the traditional census still rules.” In an age where issues of ethnicity and identity matter, as well, as in the United States, political representation, the import and impact of censuses, along with how they are structured, carried out and analyzed, matters greatly. And with the U.S. Census being conducted this year – today, April 1, is Census Day, although coronavirus-marred collection of data will continue until August 14 – this is an apt time to talk with Whitby about censuses past, present and future.
Whitby matriculated in the statistical and computational side of economics. As an undergrad he studied economics and computer science, and his economics PhD from the University of Oxford focused on econometrics, the statistical side of empirical economics. “After grad school, I realized that I enjoy statistics and I enjoyed the math to a point, but I didn’t see a career in actually doing pencil-and-paper, proof-based mathematical work.”
This epiphany came around the time that new data sources, and the computational ability to analyze these data, were arising. Whitby jumped on that, landing a research fellowship at the British think tank Nesta before relocating to the United States to join the World Bank. At the World Bank he started in their innovation lab, working with things like satellite data and cell phone records, and later transferred to a section using more traditional data sources like household surveys and even some national censuses.
Here in 2020 I see the point of writing a popular book centering on the Census, but given books take time to write, what was the origin?
I recently read a note from 2008 saying, ‘A book about the census might be interesting,’ and that was just as I was starting graduate school. I guess I’d forgotten about it for many years in between, even though I’ve been on and off collecting material. And at the World Bank we’d done some census support for some countries, so that reawakened this idea I’d had.
As exciting and sexy as people find big data science and this new vision of statistics, there’s something really crucial and critical and central to understand society through the more traditional survey programs official statistical agencies run. A part of all that is the census: It’s the longest running, most well established way we have of quantitatively understanding society.
So at some point, I thought, well, if I’m gonna write this book, it should come out in a census year. So I said goodbye to the World Bank, at least as a full time staff member, and started the book.
In some sense it is timeless, though. Just because it’s a census book doesn’t mean you couldn’t read it next year and get a lot out of it.
That’s true. But what kind of census book did you see?
I really liked the idea that by looking at this long-running institution, I could pick and choose different time periods and different places where interesting things happened that are threaded together through the census.
To be honest, there’s some profoundly depressing parts of the book. It covers eugenics and Nazism, and then we talk about American blacks only counting as three-fifths of a citizen up to the U.S. Civil War, and American Indians not being counted at all. There are uplifting stories, such as how South Africa candidly addressed its undercount in its post-apartheid first count, but there’s a lot of darkness, too.
I’m very aware that there’s a waxing and waning of the threat aspect of the census. I try to be objective in how I’m writing it, but I come at it from a fairly sentimental perspective – I like this idea of having this institution, particularly in the traditional form where you would have people being sent out across the country to physically visit households and get some information from them. There is something democratic about it, right?
But there’s also something terribly totalitarian about as well, which you can’t get away from. For certain people, the idea that the government comes and acknowledges their existence and gets information from them, and then keeps that information, is a scary prospect. I don’t think it’s a scary prospect in the United States in 2020, but you only have to look at the history to say there’s always an inherent risk.
You can have technical measures, you can have ethical measures, you can have legal measures, but fundamentally, the relationship between individuals and government is such that we empower government by sharing our information with it. And that power can be used for good and evil. It’s an essential part of the complex societies we live in today.
Tell me about registries, and how they differ from censuses, especially since one impression I get from your book is that registries might be the wave of the future.
I come from Australia, which went through a process of trying to bring in a centralized identity card in the ‘80s. And it was unpopular then, although the idea resurfaces from time to time. The UK tried to do a similar thing when I was there as a student, and as an immigrant in the UK, I was one of the few people who got one of these cards that was going to be the prototype of the National Identity Card that a Conservative government cancelled. So I came at this this thing from a skeptical point of view: I don’t want government having a too-efficient database of everybody. That’s somehow a scary thing.
Then the more I looked into it, the more I concluded governments have that anyway. We have these highly decentralized, overlapping databases that exist in a country like the U.S. today, and it’s a little bit naive to imagine that just because those things aren’t linked in a central database that that somehow protects us. From a background in computer science and big data, I know it’s trivial to merge these databases. So not having the database in a single place, I don’t think really is a much protection from tyranny, which is the ‘slippery slope’ argument people will make.
On the other hand, I think there’s a huge advantage to having some sort of database like that. And it’s not just for statistical purposes. Where these exist, the statistical function of them tends to be secondary to the administrative function. It’s useful when you interact with government be able to say, ‘I’m Andrew Whitby, this is my number. You have all the information you need about me for this transaction, I don’t need to fill out everything new.’ For me, that is the strong argument in favor of bringing those kinds of systems. And I think once you have those systems properly in place for initial purposes, then it’s just a natural thing to run your census count out of those systems.
Every country has to approach that slightly differently, in terms of who was going to be missed in registry, because just as a census will miss people, a registry will miss people, too.
And in the U.S. …
I think the US may be the last place on earth that will ever get that kind of thing.
But I think a lot of other countries where it’s not such a heated topic of debate, I think we’ll go in that direction.
Can you talk a little bit about why, in the US specifically, there such anti-census bias? We see it in things like votes to get rid of the annual American Community Survey or to make it voluntary. Is it just centering on political representation? Or is there something deeper?
As an outsider looking at the country, I do think, in a country whose mythology is a fight against tyranny, there is this resistance to centralized government, a localism still inherent in a certain American mindset. Censuses, as I talk about it, which is mostly a national census, is inherently a centralizing thing. It’s about the federal government going out and counting everybody. It’s very advantageous to have this centralized federal bureaucracy that runs this, but I definitely think that it clashes with a perspective a lot of Americans have.
I’ll give you I’ll give you another reason. There was a campaign at one point to have a Middle Eastern and North African racial category added to the census, which didn’t in the end go through. There’s obviously there’s a lot of people who identify as that ethnicity who want to be recognized. But there’s going to be other people in that same group who would very much rather not have not have that column on the census. If you asked that question immediately in the wake of 9/11, I’m sure some of the same people would not have wanted to see a Middle Eastern or North African column on the census, while some outside that group then would. So there’s, there’s always this tension.
You talked at length about the example of the Netherlands, where because they had such excellent bureaucratic mechanisms, that the Jewish population suffered disproportionately when the Nazis invaded.
You ask yourself, is there is there such a thing as having a government which is too efficient, a government that can too easily call these records up? Are we setting ourselves up for some sort of future bad regime to be able to take advantage of that? I don’t think you can ever fully dismiss that. I don’t know that it’s a worry that for me in the US Census as it is now. It’s a very limited questionnaire in reality; the census that everybody gets is not revealing a whole lot of information about you that I can’t see just by looking at you, apart from tying you to an address. But it’s a very limited survey compared with other countries where people are being asked dozens of questions and lots more personal things like the American Community Survey does.
What is going on today? Maybe the equivalent of that is some of the discussions we’re having around, say, facial recognition, like being able to track people as they move around from place to place. Do we want to share that with the government? The reality is those data points today are not the ones that the census asks, but when you actually start to think about the amount of data that is being collected about ourselves on a daily basis, there’s tons of other things that I think can be alarming.
In the in that vein, when we talk about census, in any country, we’re always it’s always linked to the national or regional government. But then, in your last chapter, you briefly talked about Facebook, and why is it conceivable that there would be a private census. Is that or something akin to that likely?
I guess I’m arguing that Facebook kind of is that already, but they’re doing it for different reasons. Fairly late in my own research tour, I thought about what questions Facebook asks you. It’s like 10 years since I even thought about setting up a Facebook account, so I couldn’t really remember what questions I had actually answered. So I went back, and I created a new account with fake names on Facebook. What they ask is so similar to what you would answer on a government survey or a census. Of course their motivation is about selling advertising and promoting their business model. But actually, it’s also just the questions we ask people now — age and sex, and where do you live and where did you grow up. This is just sort of, like, general background information they’re collecting. I would argue that in some sense that it is a private census.
To be clear, it’s not a census in the narrow definition of a census that needs to be a complete enumeration of a group of people who live in a particular territory. It’s a very incomplete enumeration of the people who live in the entire world, but it’s also the closest thing we have to a private census. I don’t imagine anything coming about much beyond that, unless it was a strong business reason for why it made sense to collect that information.
I think you could imagine the idea of a census that occurred in the private sector happening in a situation where we said, ‘Well, we’re no longer going to rely on government issued documentation to identify ourselves, we’re going to rely on some sort of private source.’ Google in a sense already offers that with an email address that’s effectively some part of identity. So that sort of thing, I think, is a more likely direction than a private census and in which they make the data available to everybody.
A lot of the book strikes me as a sort of history of statistics. The creation of statistics, or even social science, seems tied up in censuses and that’s an animating thesis of the book.
It is interesting. One of the points I make in the book is that census-taking diverges from modern statistics where there’s so much focus on methods and sampling and probability and mathematical statistics, whereas censuses, in a pure sense, have none of that. It’s just going around and counting things. So it’s a different kind of statistics from what most statisticians are thinking about and working on today.
But, even if you don’t use census data directly — and a lot of people don’t because the census data in the US particularly is very limited — the phrase I’ve heard used is “the spine of the federal statistical system.” So if you don’t have the census, then every other federal survey that’s being conducted using some sort of sampling approach, and is reliant on having some idea of where people live and the different structure of people within that, if you didn’t have that basic data in a reasonably solid state, it would become much harder and more expensive to do those surveys and achieve the same level of accuracy on those surveys. It underlies everything in terms of government statistics, and beyond that, as well; if you’re running a survey as an academic or privately, you’re probably still at some level using Census data in order to understand who you should be asking and where to ask them.
So if you if you just took it away today, that’s the biggest impact it would have – it would make all of those things much more expensive. There would be a lot more people reinventing the wheel, trying to deal with the fact that you didn’t have good underlying sort of population data broken down by a few key demographics.
It’s a much more interesting question, in some ways, to ask, ‘Where would statistics and social science be today if there had never been the idea of a census? If we had just never developed this statistical instrument?’ A lot of the sort of people who were working in the 19th century in developing statistics in developing social science were using census data, and it went both ways.
I tell that story about Malthus in the UK, working without the benefit of data, but his theory very much encouraging people to take a census in the UK, or across the Atlantic in United States. And then it goes back and forth, with government offices starting to use the data that became available in the first couple of censuses. Having that data available helps human beings apply maths and logic and reason to think through in a kind of ‘population way.’ This happens today in the same way as everything is becoming more quantitative in the way that people think about the human world.
So we’ve been talking a lot about the US Census, but I’m wondering what country or what entity do you believe does the best census? Define that any way you want, but just let me know how you are defining it.
I don’t know if I have an answer that. I spent so much time looking at countries that do core censuses for one reason or another, and the US doesn’t do a bad census. There’s undercount, there’s some double counting, there’s a lot of politics which is not unique to the U.S. but is certainly worse in this country than in a lot of other comparable countries. But I think that Census here is done pretty well. And the Census Bureau here has, some of the highest statistical capacity of any country doing still doing traditional census is in this way.
I think you see that in one of the below-the-surface controversies going on now about the 2020 census, here is this idea of taking a new approach to how they sort of confidentialize the data using these kind of more mathematical proof based techniques to sort of edit the data in such a way that by publishing all the tables that nobody can put that together and identify individual people. It’s very controversial.
The US Census Bureau really is at the forefront in terms of method. Where they’re not it’s very much about political decisions. Through its history that it’s very much tied to politics through the Constitution and the way congressional apportionment works. That’s been great for the Census for a long time because it gave it a reason that it just couldn’t be canceled. Now, I think that holds the census back as it’s very hard for them to make a purely technical decision about the Census that might have political or partisan consequences. It’s a double-edged sword.
But that doesn’t exactly answer your question to say the US is the best, and then tell you it’s not the best! There are small countries that a much more homogenous populations where I’m sure they do a great job of census taking. Australia’s census — on the whole — seems to have been done very well; I was a user of a certain census data for many years. The last one did not do so well as they tried to transition online, which is a bit of a scary warning for the coming US Census.
Might you recount the story of sitting at your parents’ table in 2016 filling out the Australian Census and then tell me what that means to the rest of the world?
This was in the 2016 census, Australia has a census every five years, rather every 10 years. Like the U.S. they had experimented with offering an optional online response in previous censuses, but for 2016 there was a new level of encouraging everybody to do this. (And I won’t pretend that I didn’t somewhat intentionally plan a return trip to Australia to coincide with the census, but it was convenient timing.) So I was I was at home with my parents, in the same house where I would have been recorded in the censuses of my childhood. I’m very, very excited and enthusiastic to respond to a census. We follow the instructions in the paper that was sent to us and tried to log in. And yeah, it’s like any bad experience online that you have had – you answer a question, you click next, and then nothing happens or you can’t log in or something else.
We and we and many other Australians try this over and over during a period of a couple of days before it was all resolved. People just had huge amounts of trouble responding. Now, I don’t think it had a huge impact on the response rates as people might have anticipated, which is a good thing, right? I guess people were somewhat tolerant and willing to try many times, but you know, you can easily imagine that that the opposite could happen. In New Zealand, they had an online response option on the last census and the response rate was lower. I don’t think they had particular technical problems in the same way Australia did, but the idea that they hadn’t properly promoted or hadn’t properly advertised or explained it was held up as one of the reasons why that might have happened.
The broader implication is that whatever you, you do have to be really careful every time you try a new thing with censuses. And what I remember one of the census directors I talked to pointed out that with a census, you don’t have a lot of opportunity to learn from your mistakes? If you screw it up one year, the next time you get to do it is 10 years’ time! You can test out small things, and some of the surveys that come along the way have, so it’s not like the Census Bureau hasn’t done online in the US. And they they had a trial, with a minority of people doing online census in 2010. But you really have to get it right the first time and that puts a lot of pressure on those systems.
In Australia, there was a lot of debate at the time of what was going wrong. Different people blamed not just the Census Bureau itself, but the no commercial supplier of the platform and the security infrastructure that was supposed to protect that. There were various claims made about foreign interference or hackers or denial of service attacks, and there were people outside Australia trying to interfere with the website.
But at the same time, these were all things that should have been anticipated. And certainly, even if you may not have anticipated it in 2016, you absolutely have to be anticipating in 2020. And I think the Census Bureau here and in other countries had plenty of time to plan for that possibility. But it’s hard to test until the thing actually happens. So I think they’ll be a lot of people sort of waiting nervously, you know, as people start to, to sign on to the website and enter their information.