Anyone following our newsletter these past years knows that we are not exactly fans of the way that disclosure avoidance (DA) was handled in the 2020 census. Planning for the 2030 census is already well underway, including a review of the DA methodology used in 2020 (see Announcing the 2030 Census Disclosure Avoidance Research Program).
I think we all agree that we should not be able to go to the census files and discover that John and Mary Smith live at 48 Elm Street, Peoria IL, and the age and sex of their three children. Personal information collected by the government for enumeration and statistical purposes must not be disclosed. Specifically, they cannot disclose or publish any private information that identifies an individual or business, including name, address, GPS coordinates, social security number, and telephone number (Federal Law). The census publishes none of this information, so what is the fuss about?
“The database reconstruction theorem, also known as the fundamental law of information reconstruction, tells us that if you publish too many statistics derived from a confidential data source, at too high a degree of accuracy, then after a finite number of queries you will completely expose the confidential data (Dinur & Nissim, 2003).” Michael B. Hawes, “Implementing Differential Privacy: Seven Lessons from the 2020 United States Census”, April 30, 2020, Harvard Data Science Review.
Simply put, the more we tell you and give you time to analyze it, the more likely the cat is out of the bag.
The argument goes like this – suppose we had a consumer or credit list. We could match people on that list to households in the block under some circumstances. All it takes is one characteristic of an individual to be rare in that census block and I have found you. All true, but in my view the Census Bureau derailed by going down this path for one simple reason: Disclosure, by definition, is when I tell you something you didn’t already know or find out from third parties. Unless you are Sasquatch and live off the grid in the mountains, every bit of the information that could ever be ‘disclosed’ is already public information. Public tax records. Consumer list files. Google your own name. The only thing you won’t easily find is your social security number (that you have to pay for on the dark web).
But for the moment, let’s pretend that this really does include private information that we can protect. The 2020 method was an operational disaster. The releases were horribly late, and not worth waiting for, since the distortions at the block level were so grievous that we came up with mocking labels like mermaid, ghost, and Lord of the Flies blocks. Meanwhile the Census Bureau, apparently not displaying a sense of humor, told us not to compute average household size by dividing the population in households by the household count. But maybe it was worth it to protect the ‘private’ information from escaping? We only wish. Not only did they damage the data, but they failed to satisfy their primary objective (Paul Francis, “A Note on the Misinterpretation of the US Census Re-Identification Attack”, https://arxiv.org/pdf/2202.04872). So, what we got was mangled data for no benefit.
So why are we talking about this again? The Census folks have already decided to “use formally private noise injection to protect the 2030 Census block-level total population counts” and “to release state-level total population counts as enumerated.” In other words, they are planning to do the same thing in 2030 as they did in 2020. The problem is that as it’s now known that the methods they used didn’t work, you can likely expect more, not less, distortion in the 2030 product.
There are some glimmers of hope here in that they are looking at trying to improve the quality and usability of 2030 census products which might include actually reporting the total population of the block accurately. Even that would be an improvement, since it would at least avoid the problem of having blocks that have no people but occupied households.
Our fear is that they instead double down on a methodology that doesn’t work. Can we push them in another direction? Probably not, but it is worth a try. We urge you to make your preferences known, and you can do so, politely please, by sending an email to 2030DAS@census.gov. Ask them to reevaluate the notion of ‘private’ information, since their misinterpretation is the root problem here.
Recent Comments