Votes from the 2020 US Presidential election are in, and many are turning to what comes next for the incoming administration. A core concern across political spectrums is the need to (re)build trust in democratic institutions and government functions. One of the key sources of trust in democratic institutions—not only in the United States, but around the world—is public sector data, which acts as a bulwark against populism by undergirding evidence-based policy-making. And yet, both in the United States and internationally, we’ve seen public-sector data infrastructures come under attack through means that are both political and technical. Political interference has surrounded the methods and messaging of collecting and using census data, electoral data, climate modeling data, environmental protection data, public health data related to the COVID-19 pandemic, and more. This has meant that in each of these areas, there is rising distrust in both the data itself and in the institutions charged with collecting and protecting it. And at the same time, these institutions’ ability to act in a complex world, and to drive evidence-based policy, is reduced. Yet, trust—in both methods and integrity of the data itself — is essential for evidence-based policymaking to matter. Scholars of agnotology (the study of ignorance), especially those who have tracked battles over climate change, recognize this script. And as with media misinformation and disinformation, the United States has much to learn from other countries and regions that have long acknowledged and grappled with issues of trust and distrust in government data infrastructures.
For centuries, governments have built significant data infrastructures, upon which democracies and economies have been structured. Data has long been a source of power and state legitimacy, as well as a tool to argue for specific policies and defend core values. Yet, the history of public-sector data infrastructures is fraught, in no small part because state data has long been used to oppress, colonize, and control. Numbers have politics and politics has numbers. Meanwhile, public-sector data infrastructures have also grounded equity-oriented policies and advances in knowledge. Environmental justice movements benefit from climate data, voting rights efforts benefit from census data, and community health efforts from public health data. In the U.S., enhancing state data infrastructure has been viewed as a progressive agenda.
The purpose of this workshop is to bring together scholars who wish to grapple with the state of public-sector data infrastructures, with the longer term goal of establishing methods of protection, repair, and trust-building at a societal level. While governments collect data for innumerable purposes, we are particularly interested in the data infrastructure underpinning four epistemic efforts we see as operating at the crossroads of societal urgency and long-term democratic resilience: 1) Climate science; 2) Public health (e.g., pandemics, vaccines); 3) Democratic purposes (e.g., voting, census); 4) Economic modeling (e.g. labor statistics, employment data). In an American context, we are thinking about the data infrastructures underpinning federal agencies like the CDC, EPA, NOAA, BLS, HHS, Census, Department of Energy, etc. as well as the various state, tribal, and local data sources that operate within non-federal contexts.
Anti-colonial and anti-racist movements have long challenged what data the state collects, about whom, and for what purposes. Decades of public policy debates about privacy and power have shaped public-sector data infrastructures. Amidst these efforts to ensure that data is used to ensure equity—and not abuse—there have been a range of adversarial forces who have invested in polluting data for political, financial, or ideological purposes. Some of the work in this area stems from studies of agnotology, the study of ignorance, including ignorance that is purposefully manufactured. Others come from studies of conspiracy theories, as perceptions are twisted to fear data that might drive decisions for vaccination or climate policy. There is also, always, the threat that data might be directly manipulated or that certain metrics might be designed to ensure outcomes to benefit few. Economic measures require externalities, but what is center and what is an externality is a matter of values.
The legitimacy of public-sector data infrastructures is socially constructed. It is not driven by either the quality or quantity of data, but how the data—and the institution that uses its credibility to guarantee the data—is perceived. When data are manipulated or political interests contort the appearance of data, data infrastructures are at risk. As with any type of infrastructure, data infrastructures must be maintained, both technically and—crucially—socially. Data infrastructures are rendered visible when they break, but the cracks in the system must be negotiated long before the system has collapsed.
We are a U.S.-based research institute and we acknowledge that our framing is biased from our U.S.-centric perspective. That said, we are extremely interested in applications from researchers taking an international perspective or focusing on non-U.S. case studies. And we welcome papers that engage the ideas outlined above, but may take our thinking in a direction that we did not consider.
To provide you with a flavor of the type of papers that we imagine being appropriate, here are some questions that scholars might be investigating. This is by no means exhaustive.
- How does the technical and bureaucratic design of government data infrastructures shape what states know? (e.g., StatsCanada as a national data source vs. the distributed approach in the U.S.; restrictions on U.S. federal access to data from state-managed programs like SNAP, etc.)
- What are the (intended as well as unintended) consequences of using government data infrastructures as resources to enact policy? And how does this, in turn, shape government data infrastructures?
- What data is purposefully not collected by the state in order to render certain types of knowledge invisible? (e.g., France’s approach to race data, Lebanon’s approach to refugees, U.S.’s approach to certain COVID-19 and climate data, etc.)
- What are the ramifications of maintaining data infrastructures when the data is not used or using data that hasn’t been maintained?
- How do new policies shape/discredit/limit access to data that typically guides long-standing policy? (e.g., redefining how poverty is measured, the development of 1970s privacy laws, ramifications of FOIA or “open government”, EPA’s redefinition of scientific “transparency” etc.)
- What are the social and cultural ramifications of policy choices concerning how data infrastructures are funded, managed, organized, and maintained? (e.g., NASA’s austerity budgeting, procurement and technical outsourcing, creation of OSTP, rules that govern who can lead statistical agencies, etc.)
- How might defensive strategies (e.g., organizational norms, administrative policies, standards bodies, actions by professional associations, etc.) “check” political power or rebuild trust? (e.g., “Public interest technologists,” new interpretations of administrative or privacy law)
We encourage attendees to approach the Data & Society Workshop series as an opportunity to engage across fields, and to strengthen both relationships and research through participation. While we recognize the value for individual authors, we also see this as a field-building exercise valuable for all involved.