During the years prior to the 2020 census, many in the industry understood that the Census Bureau was headed down a path which was sure to reduce the utility of the decennial census. The essential claim was that it was possible to match census respondents to geocoded consumer lists using just a few variables such as age, sex, and race. The census has used techniques for several decades that were aimed at protecting the identify of respondents by swapping household characteristics between block groups. The precise details of the methodology and more importantly, how widespread its usage, were not furnished by the Census Bureau. No big deal really, as the modifications never reached a point where the integrity of the data was in serious question.
The early conference proceedings and published papers regarding the “differential privacy” initiatives gave no real hint of what lay ahead. They were, at that point, little more than theoretical scribblings of equations on a blackboard. In other words, nobody had any idea how this would play out in practice.
Within minutes of opening what we thought was a gift from the census bureau, known as the redistricting release of the 2020 Census, we knew we had been had. Maps and data extracts started flying back and forth within AGS, and by the end of the first day we had concluded that at least 10% of the census blocks contained statistically impossible data. Those of you who have endured our ongoing grumbling on the subject are probably aware of our evolving terminology – mermaid blocks, ghost blocks, baseball team blocks, and Lord of the Flies blocks. If you aren’t sure what we mean by that, check out the AGS white paper on the issues with the 2020 Census.
Late last year, the Census published a guide to using the data, and gave us a new kinder and gentler term to use – “disclosure avoidance”. Those interested in demographic analysis of any sort should read this handbook. One of the interesting tidbits? You should not work with the population and housing tables simultaneously, which means that you should not calculate average household size from the PL-94 release but should instead wait for the “Detailed Demographic and Housing Characteristics File” (DHC). We can’t help but wonder how this will yield better results given that the detailed data will be subject to much more stringent disclosure avoidance.
I think we all agree that the census is obligated to maximize the privacy of respondents, but what if the new methods seriously damaged the utility of the data and did not actually accomplish the stated goal? A recent article by Paul Francis, entitled “A Note on the Misinterpretation of the US Census Re-Identification Attack” concludes that “the [method] used by the US Census Bureau for the 2020 census in no way prevents attackers from inferring race and ethnicity with high accuracy for a substantial portion of census respondents”. In other words, we destroyed perfectly good data and got nothing for our troubles.
From the outset, AGS has been very transparent on these issues and was certainly the first of the major demographics creators to highlight the issues to the user community.
The others?
One has been completely silent on the issue, as they seem blissfully unaware that the data is deeply flawed and released data claiming to have incorporated the PL-94 data within weeks of its release.
A second responded briefly to their client base – after one of their clients sent them our paper on the matter – and did so quite laughably by claiming that they have a “special” relationship with the Census Bureau, and thus it was not a problem for them.
The third has honestly acknowledged the issues in their 2022 release documentation but confines the discussion to the block group level where the problems are far less transparent. While we are pleased that they are informing their clients, the issues which are crystal clear at the block level are somewhat obscured at the block group level. And those issues have an adverse effect which ripples through the many models which rely on the integrity of those base counts. Remember that the bureau admonishes that users should not compute average household size on the raw data.
We at AGS remain committed to providing not only a very high-quality demographic base, but also to the continual effort to educate users on how to effectively utilize data and understand the role of error in decision making.
Recent Comments