Data Voids

danah boyd; Michael Golebiewski

report ⋅ October 29 2019

Data Voids

Where Missing Data Can Easily Be Exploited

Michael Golebiewski,
danah boyd

Download Report

Data Voids demonstrates how manipulators expose people to problematic content by exploiting search engine results.

Report Summary

“Data voids are a security vulnerability that must be systematically, intentionally, and thoughtfully managed.”

Michael Golebiewski of Microsoft coined the term “data void” in May 2018 to describe search engine queries that turn up little to no results, especially when the query is rather obscure, or not searched often.

In Data Voids: Where Missing Data Can Easily Be Exploited, Golebiewski teams up with danah boyd (Microsoft Research; Data & Society) to demonstrate how data voids are exploited by manipulators eager to expose people to problematic content including falsehoods, misinformation, and disinformation.

Data voids are often difficult to detect. Most can be harmless until something happens that causes lots of people to search for the same term, such as a breaking news event, or a reporter using an unfamiliar phrase. In some cases, manipulators work quickly to produce conspiratorial content to fill a void, whereas other data voids, such as those from outdated terms, are filled slowly over time. Data voids are compounded by the fraught pathways of search-adjacent recommendation systems such as auto-play, auto-fill, and trending topics; each of which are vulnerable to manipulation.

The report identifies five types of data voids in play:

Breaking News: The production of problematic content can be optimized to terms that are suddenly spiking due to a breaking news situation; these voids will eventually be filled by legitimate news content, but are abused before such content exists.
Strategic New Terms: Manipulators create new terms and build a strategically optimized information ecosystem around them before amplifying those terms into the mainstream, often through news media, in order to introduce newcomers to problematic content and frames.
Outdated Terms: When terms go out of date, content creators stop producing content associated with these terms long before searchers stop seeking out content. This creates an opening for manipulators to produce content that exploits search engines’ dependence on freshness.
Fragmented Concepts: By breaking connections between related ideas, and creating distinct clusters of information that refer to different political frames, manipulators can segment searchers into different information worlds.
Problematic Queries: Search results for disturbing or fraught terms that have historically returned problematic results continue to do so, unless high quality content is introduced to contextualize or outrank such problematic content.

Data voids are not unique to search engines; they occur on social media platforms, too, where search is typically limited to information hosted on that particular platform. Golebiewski and boyd emphasize that there is no “quick fix” for data voids. Instead, they urge search engines and content creators to work together to anticipate and identify risky data voids, and to fill them with quality content. “Data voids are a security vulnerability that must be systematically, intentionally, and thoughtfully managed.” Golebiewski and boyd first introduced data voids in the May 2018 version of this report. Read it here.