.
June 29, 2016
Last month, in Spokeo v. Robins, the Supreme Court declined the opportunity to clarify a question that will determine the fate of many consumer privacy laws. What kinds of information-related harms suffice to ground federal lawsuits? In punting, however, the Spokeo opinion also cast light on the Court’s understanding — and misunderstanding — of the role that data and algorithms now play in our lives.
First, some background. In 2010, Thomas Robins filed a class-action lawsuit against Spokeo, a “people search engine” that allows prospective employers to retrieve data about job applicants. Although Robins’ legal theory is somewhat technical, his core objection is not. Robins alleges that his Spokeo profile contained errors — for example, it described him as “married” when, in fact, he is single. In Robins’ view, these errors suggest that Spokeo failed to follow “reasonable procedures” to assure the “maximum possible accuracy” of the information it broadcasts, as required by federal law.
By the time Spokeo made its way to the Supreme Court, neither the veracity of Robins’ allegations nor the legal adequacy of Spokeo’s procedures was at issue. Instead, the question was more primary: Does Robins’ alleged harm suffice, in the first place, to anchor his claim? To have standing to a bring lawsuit in federal court, a plaintiff must demonstrate a “concrete and particularized” injury. It is not enough, in other words, for a would-be plaintiff to allege that someone broke the law; she must allege that someone broke the law in a manner that injured her. In 2014, the Ninth Circuit, applying this rule, concluded that Robins’ suit could proceed, because the claimed injury — the existence of inaccurate information — was specific to him.
Last Monday, the Supreme Court bounced the case back. According to the Court, the Ninth Circuit’s opinion, though not necessarily wrong, was incomplete; it failed to analyze whether Robins’ grievance, in addition to being specific to him, was sufficiently “concrete.” The reason both components matter, the Court explained, is that although the “dissemination of false information” certainly qualifies as a particularized injury, “not all inaccuracies cause harm or present any material risk of harm.” By way of example, the Court offered “incorrect zip code” data. “It is difficult,” the Court wrote, “to imagine how the dissemination of an incorrect zip code, without more, could work any concrete harm.”
USA 10-cent postage stamp, 1973, designed by Randall McDougall.
The Supreme Court’s imagination — and its limits — notwithstanding, data science tells a different story. Today, a growing number of sensitive decisions, from employment opportunities to policing strategies, rely (at least in part) on algorithms that source data from brokers like Spokeo. Moreover, zip code information often plays a central role in these algorithms, operating as a proxy for status and privilege — two inputs that invariably shape the behavior of big companies and governmental actors.
Some decisions informed by zip code data — like targeted advertising — are largely innocuous, and inaccuracy, in most cases, is unlikely to result in anything more serious than nuisance. But other decisions are less innocuous. For example, in a January 2016 report, the Federal Trade Commission found that online retailers have begun using zip code data to engage in price discrimination, since zip codes predict both consumers’ ability to pay, as well as the presence (or absence) of competition from brick-and-mortar stores. Similarly, zip code data almost certainly plays a role in predictive policing algorithms like the “Beware” tool — recently adopted by the Fresno Police Department — which churns through vast datasets to assign “threat scores” to individuals and residences that have been targeted by law enforcement. (I say that zip code data “almost certainly” plays a role in the “Beware” tool because its inner workings, like many algorithms developed in the private sector and then used by the state, are proprietary, so opaque.)
The list could go on. The upshot is that in most contexts, both grand and quotidian, algorithmic decision-making will soon be the norm — and once it is, zip code data stands to impact everyday life in countless ways. Which makes it a bit disheartening to hear the Supreme Court — an institution made up of men and women who, at least at this point in their lives, presumably do not worry much about paying elevated prices for consumer goods, much less unwanted entanglement with law enforcement — profess difficulty “imagin[ing] how the dissemination of an incorrect zip code . . . could [do] harm.” The point, of course, is not that incorrect zip code data will always be harmful. The point is that it is easy to envision how it could be. And the question in Spokeo, as the Court understood it, was exactly that: whether the complained-of inaccuracies posed a material risk — not a certainty — of harm. With respect to zip code data, the answer is resounding. Yes, it does.
To be fair, the most plausible explanation behind the Court’s misfired zip code example is good faith ignorance. Few members of the public have a working sense of the role algorithms currently play — much less the ballooned role they soon will play — in our lives. It hardly comes as a surprise that the Justices of the Supreme Court are similarly bereft. Although the law rarely keeps pace with technology, in most settings, it also does a more-or-less competent job catching up. The Supreme Court has time to refine its understanding of information harms.
The problem of accurate data
But the real crux here is not Spokeo; nor is it even the problem of inaccurate data, important though that problem is. The real crux, counter-intuitively enough, is the problem of accurate data. What kinds of information — assuming conditions of perfect accuracy — should drive algorithms? Most people believe, for example, that race is an illegitimate input to many, if not all, decisions made by big companies and state actors. But if not race, what about zip codes? Long before the age of big data, zip code data has operated as a proxy for race, at times intentionally — as in the shameful history of mortgage-redlining throughout the United States — but at other times unintentionally. Decisions based on zip code (and other geographical shorthand) often do not mean to yield disparate outcomes along racial lines, but they do so all the same. Furthermore, zip code information is just the tip of the iceberg: a particularly crisp, but by no means unique, example of how forbidden variables easily resurface by proxy.
Going forward, this problem — the problem of regulating proxy variables — will be among the key questions of algorithmic governance.
Although Spokeo did not raise these questions, it gestured toward the fast-approaching horizon where they lurk. And it made clear that headway on algorithmic governance will require all of us, and judges most especially, to expand our ability to “imagine” — as the Court put it — how information, in today’s world, can “work concrete harm.”
Points: “The Supreme Court’s Big Data Problem” — Kiel Brennan-Marquez unpacks Spokeo v. Robins, with an eye to effective governance of algorithms in the future.— Ed.
Kiel Brennan-Marquez is a postdoctoral research fellow at NYU Law School. His research focuses on how legal institutions respond (and fail to respond) to technological change.
