Living DocumentDecember 10 2019

Balancing Data Utility and Confidentiality in the 2020 US Census

danah boyd

In the age of commercial data and advanced computer science, the US Census Bureau is implementing a new technical system to ensure the confidentiality of individual data in the 2020 census.

This system is based on a mathematical technique known as “differential privacy.” A new living document by Data & Society Founder and President danah boyd, Balancing Data Utility and Confidentiality in the 2020 US Census, explains how differential privacy works in the context of the US Census and illuminates key conversations, misunderstandings, and anxieties surrounding this disclosure avoidance system.

Differential privacy allows the Census Bureau to mathematically balance between privacy and data utility. By examining how much risk there is of reconstructing identifying information from a particular census statistical table, the Census Bureau can ensure a certain degree of confidentiality by inserting noise in strategic places. Previous disclosure avoidance systems could not withstand the risks brought on by the increase in commercial data from data-centric technologies.

Every 10 years, the Census Bureau conducts the census, and the resulting data and statistical tables support the appointment of the House of Representatives, the allocation of federal tax dollars, and the redistricting within each state, as well as social science research and local policy making. Since this data plays such a vital role in our democracy, used by government, academia, and civil society among others, it is important that people trust the data. As the paper explains, the introduction of differential privacy has surfaced a range of anxieties about how the data products will be produced for 2020.

With useful insights for academics, civil society, data users, and anyone concerned about their privacy in the 2020 census, boyd uses this living document to clarify the motivations, risks, and unknowns surrounding differential privacy. She calls for communication, collaboration, and understanding between all parties tackling the trade-offs between confidentiality and accuracy.

Key messages:

  • Trust of census data is vital to “democracy, resource allocation, justice, and research.”
  • Differential privacy is a mathematical definition of privacy. The Census Bureau’s disclosure avoidance system for 2020 is based on differential privacy and creates the opportunity to balance between the utility of the data and the risk to confidentiality.
  • New advanced computing technologies and commercial data collection have made it easier to reidentify individuals from census statistical tables, creating a need for a new type of privacy security.
  • Data users and the Census Bureau’s disclosure avoidance team have struggled to communicate with each other and find common ground for balancing the need for accurate data with the need for privacy.
  • Protecting confidentiality is essential for counting hard-to-reach populations.