featured


filtered by: discrimination


How to Cite Like a Badass Tech Feminist Scholar of Color

points | 08.22.19

Rigoberto Lara Guzmán, Sareeta Amrute

How can citation practices be used as a strategy to decolonize tech research, ask Data & Society Events Production Assistant Rigoberto Lara Guzmán and Director of Research Sareeta Amrute in their new zine.

“So, make it a habit to do a ‘badass feminist tech scholar of color’ scan on everything your write, every speech you are about to give, and all those emails you are about to answer. Ask yourself, for each topic you present, each yes or no you give to a request, where are the women of color? Who can I suggest who would be a better person than me to be the expert here? Who do I want to be in community with?”


2018-19 Data & Society Fellow Jessie Daniels offers strategies for racial literacy in tech grounded in intellectual understanding, emotional intelligence, and a commitment to take action. In this podcast, Daniels describes how the biggest barrier to racial literacy in tech is “thinking that race doesn’t matter in tech.” She argues that “without racial literacy in tech, without a specific and conscious effort to address race, we will certainly be recreating a high-tech Jim Crow: a segregated, divided, unequal future, sped-up, spread out, and automated through algorithms, AI, and machine learning.”

Listen to the podcast with transcript at:
https://listen.datasociety.net/why-now-is-the-time-for-racial-literacy-in-tech/.

Jessie Daniels, PhD is a Professor at Hunter College (Sociology) and at The Graduate Center, CUNY (Africana Studies, Critical Social Psychology, and Sociology). She earned her PhD from the University of Texas-Austin and held a Charles Phelps Taft postdoctoral fellowship at University of Cincinnati. Her main area of interest is in race and digital media technologies; she is an internationally recognized expert on Internet manifestations of racism. Daniels is the author or editor of five books and has bylines at The New York Times, DAME, The Establishment, Entropy, and a regular column at Huffington Post.

Her recent paper, “Advancing Racial Literacy in Tech,” co-authored with 2018-19 Fellow Mutale Nkonde and 2017-18 Fellow Darakhshan Mir, can be found at http://www.racialliteracy.tech.


Research Analyst Kinjal Dave urges us to move past the individual framing of “bias” to critically examine broader socio-technical systems.

“When we stop overusing the word ‘bias,’ we can begin to use language that has been designed to theorize at the level of structural oppression.”


Advancing Racial Literacy in Tech

paper | 05.22.19

Jessie Daniels, Mutale Nkonde, Darakhshan Mir

How can we do less harm to communities of color with the technology we create?


In their new paper Advancing Racial Literacy in Tech, Data & Society 2018-19 Fellows Jessie Daniels and Mutale Nkonde and 2017-18 Fellow Darakhshan Mir urge tech companies to adopt racial literacy practices in order to break out of old patterns.

Conceived and launched under Data & Society’s fellowship program, this paper moves past conversations of implicit bias to think about racism in tech at a systems-level. The authors offer strategies grounded in intellectual understanding, emotional intelligence, and a commitment to take action.

“The real goal of building capacity for racial literacy in tech is to imagine a different world, one where we can break free from old patterns. This will take real leadership to take this criticism seriously and a willingness to assess the role that tech products, company culture, and supply chain practices may have in perpetuating structural racism.”

To follow the project and learn more, visit https://racialliteracy.tech/.


D&S founder and president sings praise for Virginia Eubank’s new book Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor.

“This book should be mandatory for anyone who works in social services, government, or the technology sector because it forces you to really think about what algorithmic decision-making tools are doing to our public sector, and the costs that this has on the people that are supposedly being served. It’s also essential reading for taxpayers and voters who need to understand why technology is not the panacea that it’s often purported to be. Or rather, how capitalizing on the benefits of technology will require serious investment and a deep commitment to improving the quality of social services, rather than a tax cut.”


D&S Researcher Madeleine Clare Elish discusses the implications of biased AI in different contexts.

She said when AI is applied to areas like targeted marketing or customer service, this kind of bias is essentially an inconvenience. Models won’t deliver good results, but at the end of the day, no one gets hurt.

The second type of bias, though, can be more impactful to people. Elish talked about how AI is increasingly seeping into areas like insurance, credit scoring and criminal justice. Here, biases, whether they result from unrepresentative data samples or from unconscious partialities of developers, can have much more severe effects.


D&S founder danah boyd discusses machine learning algorithms and prejudice, digital white flight on social media, trust in the media, and more on The Ezra Klein Show.

“Technology is made by people in a society, and it has a tendency to mirror and magnify the issues that affect everyday life.”


Code of Silence

Washington Monthly | 06.13.17

Rebecca Wexler

D&S lawyer-in-residence Rebecca Wexler unpacks how private companies hide flaws in software that the government uses to convict and exonerate people in the criminal justice system.

What’s alarming about protecting trade secrets in criminal cases is that it allows private companies to withhold information not from competitors, but from individual defendants like Glenn Rodríguez. Generally, a defendant who wants to see evidence in someone else’s possession has to show that it is likely to be relevant to his case. When the evidence is considered “privileged,” the bar rises: he often has to convince the judge that the evidence could be necessary to his case—something that’s hard to do when, by definition, it’s evidence the defense hasn’t yet seen.


D&S resident Rebecca Wexler describes the flaws of an increasingly automated criminal justice system

The root of the problem is that automated criminal justice technologies are largely privately owned and sold for profit. The developers tend to view their technologies as trade secrets. As a result, they often refuse to disclose details about how their tools work, even to criminal defendants and their attorneys, even under a protective order, even in the controlled context of a criminal proceeding or parole hearing.


D&S researcher Alex Rosenblat explains how and why Uber & Lyft drivers surveil their passengers during rides.

Passenger shaming is partly a consequence of the Uber/Lyft business model. Drivers can’t get reliable accountability from their employers or passengers, so they turn to tools like dash-cams. These are part of the externalized costs of the lean gig economy employment model.


D&S researcher Mark Latonero provides an overview of the role of large tech companies in refugee crises.

While the 40-page brief is filled with arguments in support of immigration, it hardly speaks about refugees, except to note that those seeking protection should be welcomed. Any multinational company with a diverse workforce would be concerned about limits to international hiring and employee travel. But tech companies should also be concerned about the refugee populations that depend on their digital services for safety and survival.


Creating Simple Rules for Complex Decisions

Harvard Business Review | 04.19.17

Jongbin Jung, Connor Concannon, Ravi Shroff, Sharad Goel, Daniel G. Goldstein

Jongbin Jung, Connor Concannon, D&S fellow Ravi Shroff, Sharad Goel, and Daniel G. Goldstein explore new methods for machine learning in criminal justice.

Simple rules certainly have their advantages, but one might reasonably wonder whether favoring simplicity means sacrificing performance. In many cases the answer, surprisingly, is no. We compared our simple rules to complex machine learning algorithms. In the case of judicial decisions, the risk chart above performed nearly identically to the best statistical risk assessment techniques. Replicating our analysis in 22 varied domains, we found that this phenomenon holds: Simple, transparent decision rules often perform on par with complex, opaque machine learning methods.


Julia Angwin, Jeff Larson, Lauren Kirchner, and Surya Mattu complete the Black Box series with an analysis of premiums and payouts in California, Illinois, Texas and Missouri that shows that some major insurers charge minority neighborhoods as much as 30 percent more than other areas with similar accident costs.

But a first-of-its-kind analysis by ProPublica and Consumer Reports, which examined auto insurance premiums and payouts in California, Illinois, Texas and Missouri, has found that many of the disparities in auto insurance prices between minority and white neighborhoods are wider than differences in risk can explain. In some cases, insurers such as Allstate, Geico and Liberty Mutual were charging premiums that were on average 30 percent higher in zip codes where most residents are minorities than in whiter neighborhoods with similar accident costs.


Combatting Police Discrimination in the Age of Big Data

paper | 04.02.17

Sharad Goel, Maya Perelman, Ravi Shroff, David Alan Sklansky

Sharad Goel, Maya Perelman, D&S fellow Ravi Shroff, and David Alan Sklansky examine a method that can “reduce the racially disparate impact of pedestrian searches and to increase their effectiveness”. Abstract is below:

The exponential growth of available information about routine police activities offers new opportunities to improve the fairness and effectiveness of police practices. We illustrate the point by showing how a particular kind of calculation made possible by modern, large-scale datasets — determining the likelihood that stopping and frisking a particular pedestrian will result in the discovery of contraband or other evidence of criminal activity — could be used to reduce the racially disparate impact of pedestrian searches and to increase their effectiveness. For tools of this kind to achieve their full potential in improving policing, though, the legal system will need to adapt. One important change would be to understand police tactics such as investigatory stops of pedestrians or motorists as programs, not as isolated occurrences. Beyond that, the judiciary will need to grow more comfortable with statistical proof of discriminatory policing, and the police will need to be more receptive to the assistance that algorithms can provide in reducing bias.


Close Calls

Real Life Magazine | 01.26.17

Zara Rahman

D&S fellow Zara Rahman writes about how immigrant families use social media and digital technologies.

The consequence is that the home of our deeply personal information has gone from treasured letters stored in a box at our houses, to servers owned by corporate companies that we’ll never see. Those personal notes, the ways of showing our family that we’re happy and content in our new lives, despite what we’ve lost — they live online now. The more you share with that corporation, the stronger those family ties get. There is a third party in these relationships.


D&S advisor Baratunde Thurston details his exploration of The Glass Room exhibit.

I want to see The Glass Room everywhere there is an Apple Store…And anyone founding or working for a tech company should have to prove they’ve gone through this space and understood its meaning.


D&S affiliate Mimi Onuoha profiles the Asian American Performers Action Coalition (AAPAC) of Broadway and off-Broadway and their efforts “to track racial demographic data in the industry.”


D&S founder danah boyd’s prepared remarks for a public roundtable in the European Parliament on algorithmic accountability and transparency in the digital economy were adapted in this Points piece.

I believe that algorithmic transparency creates false hope. Not only is it technically untenable, but it obfuscates the real politics that are at stake.


Phones, but No Papers

points | 11.18.16

Julia Ticona

D&S post-doctoral scholar Julia Ticona responds to “Gig Work, Online Selling and Home Sharing” from Pew Research Center.

Contingent work has always been prevalent in communities where workers have been historically excluded from secure jobs, from union membership, and even from wider public forms of social welfare through systemic forms of discrimination. For these workers, there was no “golden era” of plentiful stable work and a strong social safety net. Despite these long-standing trends, emerging forms of on-demand labor, and the data-driven technologies that workers interact with, can deepen the vulnerabilities of certain populations of workers.


How to Hold Algorithms Accountable

MIT Technology Review | 11.17.16

Nicholas Diakopoulos, Sorelle Friedler

D&S affiliate Sorelle Friedler, with Nicholas Diakopoulos, discuss five principles to hold algorithmic systems accountable.

Recent investigations show that risk assessment algorithms can be racially biased, generating scores that, when wrong, more often incorrectly classify black defendants as high risk. These results have generated considerable controversy. Given the literally life-altering nature of these algorithmic decisions, they should receive careful attention and be held accountable for negative consequences.


D&S fellow Anne L. Washington published a Points piece responding to Cathy O’Neil’s Weapons of Math Destruction.

Complex models with high stakes require rigorous periodic taste tests. Unfortunately most organizations using big data analytics have no mechanism for feedback because the models are used in secrecy.

Producing predictions, like making sausage, is currently an obscure practice. If botulism spreads, someone should be able to identify the supply chain that produced it. Since math is the factory that produces the sausage that is data science, some form of reasoning should be leveraged to communicate the logic behind predictions.


D&S fellow Ravi Shroff examines Cathy O’Neil’s analysis of criminal justice algorithms, like predictive policing.

There are a few minor mischaracterizations and omissions in this chapter of Weapons of Math Destruction that I would have liked O’Neil to address. CompStat is not, as she suggests, a program like PredPol’s. This is a common misconception; CompStat is a set of organizational and management practices, some of which use data and software. In the section on stop-and-frisk, the book implies that a frisk always accompanies a stop, which is not the case; in New York, only about 60% of stops included a frisk. Moreover, the notion of “probable cause” is conflated with “reasonable suspicion,” which are two distinct legal standards. In the section on recidivism, O’Neil asks of prisoners,

“is it possible that their time in prison has an effect on their behavior once they step out? […] prison systems, which are awash in data, do not carry out this highly important research.”

Although prison systems may not conduct this research, there have been numerous academic studies that generally indicate a criminogenic effect of harsh incarceration conditions. Still, “Civilian Casualties” is a thought-provoking exploration of modern policing, courts, and incarceration. By highlighting the scale and opacity of WMDs in this context, as well as their vast potential for harm, O’Neil has written a valuable primer for anyone interested in understanding and fixing our broken criminal justice system.


D&S researcher Josh Scannell responds to Georgetown Center on Privacy & Technology’s “The Perpetual Line-Up” study.

Reports like “The Perpetual Line-Up” force a fundamental question: What do we want technologies like facial recognition to do? Do we want them to automate narrowly “unbiased” facets of the criminal justice system? Or do we want to end the criminal justice system’s historical role as an engine of social injustice? We can’t have both.


D&S researcher Claire Fontaine looks at how school performance data can lead to segregation.

In our technocratic society, we are predisposed toward privileging the quantitative. So, we need to find ways to highlight what is truly helpful in the data, but also insert an element of creative distrust. We need to encourage data consumers to think deeply about their values, rather than using data to reify knee-jerk prejudicial attitudes. Data scientists and engineers are in the position to help shift the conversation around data as truth.


Breaking the Black Box: When Machines Learn by Experimenting on Us

ProPublica | 10.12.16

Julia Angwin, Terry Parris Jr., Surya Mattu, Seongtaek Lim

D&S affiliate Surya Mattu, with Julia Angwin, Terry Parris Jr., and Seongtaek Lim, continue the Black Box series.

Depending on what data they are trained on, machines can “learn” to be biased. That’s what happened in the fall of 2012, when Google’s machines “learned” in the run-up to the presidential election that people who searched for President Obama wanted more Obama news in subsequent searches, but people who searched for Republican nominee Mitt Romney did not. Google said the bias in its search results was an inadvertent result of machine learning.

Sometimes machines build their predictions by conducting experiments on us, through what is known as A/B testing. This is when a website will randomly show different headlines or different photos to different people. The website can then track which option is more popular, by counting how many users click on the different choices.


D&S affiliate Surya Mattu with Julia Angwin examine how Amazon’s shopping algorithm directs customers to buy Amazon or Amazon-affiliated sellers’ merchandise, even if products from other sellers on the platform cost much less.

Through its rankings and algorithm, Amazon is quietly reshaping online commerce almost as dramatically as it reshaped offline commerce when it burst onto the scene more than 20 years ago. Just as the company’s cheap prices and fast shipping caused a seismic shift in retailing that shuttered stores selling books, electronics and music, now Amazon’s pay-to-play culture is forcing online sellers to choose between paying hefty fees or leaving the platform altogether.


D&S researcher Claire Fontaine writes a compelling piece analyzing whether or not school performance data reinforces segregation.

Data is great at masking its own embedded bias, and school performance data allows privileged parents to reinforce educational inequality. The best interests of some individuals are optimized at the expense of the society. Accountability programs, particularly when coupled with school choice models, serve to keep middle and upper middle class families invested in public schools, but in an uneven and patterned way, causing segregated school environments to persist despite racial and socioeconomic residential diversity.


D&S advisor Ethan Zuckerman defends usage of video recording of police officers.

If video doesn’t lead to the indictment of officers who shoot civilians, are we wrong to expect justice from sousveillance? The police who shot Castille and Sterling knew they were likely to be captured on camera—from their police cars, surveillance cameras, and cameras held by bystanders—but still used deadly force in situations that don’t appear to have merited it. Is Mann’s hope for sousveillance simply wrong?

Not quite. While these videos rarely lead to grand jury indictments, they have become powerful fuel for social movements demanding racial justice and fairer policing. In the wake of Sterling and Castille’s deaths, protests brought thousands into the streets in major U.S. cities and led to the temporary closure of interstate highways.


The FBI recently announced its plan to request that their massive biometrics database, called the Next Generation Identification (NGI) system, be exempted from basic requirements under the Privacy Act. These exemptions would prevent individuals from finding out if they are included within the database, whether their profile is being shared with other government entities, and whether their profile is accurate or contains false information.Forty-four organizations, including Data & Society, sent a letter to the Department of Justice asking for a 30-day extension to review the proposal.

Points: In this Points original, Robyn Caplan highlights the First Amendment implications of the FBI’s request for exemptions from the Privacy Act for its Next Generation Identification system. Public comment on the FBI’s proposal is being accepted until July 6, 2016.


Code is key to civic life, but we need to start looking under the hood and thinking about the externalities of our coding practices, especially as we’re building code as fast as possible with few checks and balances.

Points: “Be Careful What You Code For” is danah boyd’s talk from Personal Democracy Forum 2016 (June 9, 2016); her remarks have been modified for Points. danah exhorts us to mind the externalities of code and proposes audits as a way to reckon with the effects of code in high stakes areas like policing. Video is available here.


Machine Bias: Risk Assessments in Criminal Sentencing

ProPublica | 05.23.16

Julia Angwin, Jeff Larson, Surya Mattu, Lauren Kirchner, ProPublica

D&S fellow Surya Mattu investigated bias in risk assessments, algorithmically generated scores predicting the likelihood of a person committing a future crime. These scores are increasingly used in courtrooms across America to inform decisions about who can be set free at every stage of the criminal justice system, from assigning bond amounts to fundamental decisions about a defendant’s freedom:

We obtained the risk scores assigned to more than 7,000 people arrested in Broward County, Florida, in 2013 and 2014 and checked to see how many were charged with new crimes over the next two years, the same benchmark used by the creators of the algorithm.

The score proved remarkably unreliable in forecasting violent crime: Only 20 percent of the people predicted to commit violent crimes actually went on to do so.

When a full range of crimes were taken into account — including misdemeanors such as driving with an expired license — the algorithm was somewhat more accurate than a coin flip. Of those deemed likely to re-offend, 61 percent were arrested for any subsequent crimes within two years.

We also turned up significant racial disparities, just as Holder feared. In forecasting who would re-offend, the algorithm made mistakes with black and white defendants at roughly the same rate but in very different ways.

  • The formula was particularly likely to falsely flag black defendants as future criminals, wrongly labeling them this way at almost twice the rate as white defendants.
  • White defendants were mislabeled as low risk more often than black defendants.

Could this disparity be explained by defendants’ prior crimes or the type of crimes they were arrested for? No. We ran a statistical test that isolated the effect of race from criminal history and recidivism, as well as from defendants’ age and gender. Black defendants were still 77 percent more likely to be pegged as at higher risk of committing a future violent crime and 45 percent more likely to be predicted to commit a future crime of any kind.


D&S Researcher Robyn Caplan considers whether Facebook is saving journalism or ruining it:

The question of whether Facebook is saving or ruining journalism is not relevant here because, like it or not, Facebook is a media company. That became more apparent recently as human editors became a visible part of Facebook’s news curation process. In truth, this team is only a tiny fraction of a network of actors whose decisions affect the inner workings of Facebook’s platform and the content we see.


Researchers Alexandra Mateescu and Alex Rosenblat published a paper with D&S Founder danah boyd examine police-worn body cameras and their potential to provide avenues for police accountability and foster improved policy-community relations. The authors raise concerns about potential harmful consequences of constant surveillance that has sparked concerns from civil rights groups about how body-worn cameras may violate privacy and exacerbate existing police practices that have historically victimized people of color and vulnerable populations. They consider whether one can demand greater accountability without increased surveillance at the same time and suggest that “the trajectory laid out by body-worn cameras towards greater surveillance is clear, if not fully realized, while the path towards accountability has not yet been adequately defined, let alone forged.”

The intimacy of body-worn cameras’ presence—which potentially enables the recording of even mundane interpersonal interactions with citizens—can be exploited with the application of technologies like facial recognition; this can exacerbate existing practices that have historically victimized people of color and vulnerable populations. Not only do such technologies increase surveillance, but they also conflate the act of surveilling citizens with the mechanisms by which police conduct is evaluated. Although police accountability is the goal, the camera’s view is pointed outward and away from its wearer, and audio recording captures any sounds within range. As a result, it becomes increasingly difficult to ask whether one can demand greater accountability without increased surveillance at the same time.

Crafting better policies on body-worn camera use has been one of the primary avenues for balancing the right of public access with the need to protect against this technology’s invasive aspects. However, no universal policies or norms have been established, even on simple issues such as whether officers should notify citizens that they are being recorded. What is known is that body-worn cameras present definite and identifiable risks to privacy. By contrast, visions of accountability have remained ill-defined, and the role to be played by body-worn cameras cannot be easily separated from the wider institutional and cultural shifts necessary for enacting lasting reforms in policing. Both the privacy risks and the potential for effecting accountability are contingent upon an ongoing process of negotiation, shaped by beliefs and assumptions rather than empirical evidence.



D&S Researcher Alex Rosenblat examines and problematizes Uber’s stance against tipping and the resulting effects on Uber drivers.


Accountable Algorithms: Reflections

Balkin.blogspot.com | 03.31.16

Joshua A. Kroll, Joanna Huey, Solon Barocas, Edward W. Felten, Joel R. Reidenberg, David G. Robinson, and Harlan Yu

Reflections from D&S Affiliate Solon Barocas and Advisors Edward W. Felten and Joel Reidenberg on the recent “Unlocking the Black Box” Conference held on April 2 at Yale Law School:

Our work on accountable algorithms shows that transparency alone is not enough: we must have transparency of the right information about how a system works. Both transparency and the evaluation of computer systems as inscrutable black boxes, against which we can only test the relationship of inputs and outputs, both fail on their own to effect even the most basic procedural safeguards for automated decision making. And without a notion of procedural regularity on which to base analysis, it is fruitless to inquire as to a computer system’s fairness or compliance with norms of law, politics, or social acceptability. Fortunately, the tools of computer science provide the necessary means to build computer systems that are fully accountable. Both transparency and black-box testing play a part, but if we are to have accountable algorithms, we must design for this goal from the ground up.


Hiring by Algorithm

paper | 03.10.16

Ifeoma Ajunwa, Sorelle Friedler, Carlos E Scheidegger, Suresh Venkatasubramanian

D&S Fellow Sorelle Friedler and D&S Affiliate Ifeoma Ajunwa argue in this essay that well settled legal doctrines that prohibit discrimination against job applicants on the basis of sex or race dictate an examination of how algorithms are employed in the hiring process with the specific goals of: 1) predicting whether such algorithmic decision-making could generate decisions having a disparate impact on protected classes; and 2) repairing input data in such a way as to prevent disparate impact from algorithmic decision-making.

 

Abstract:

Major advances in machine learning have encouraged corporations to rely on Big Data and algorithmic decision making with the presumption that such decisions are efficient and impartial. In this Essay, we show that protected information that is encoded in seemingly facially neutral data could be predicted with high accuracy by algorithms and employed in the decision-making process, thus resulting in a disparate impact on protected classes. We then demonstrate how it is possible to repair the data so that any algorithm trained on that data would make non-discriminatory decisions. Since this data modification is done before decisions are applied to any individuals, this process can be applied without requiring the reversal of decisions. We make the legal argument that such data modifications should be mandated as an anti-discriminatory measure. And akin to Professor Ayres’ and Professor Gerarda’s Fair Employment Mark, such data repair that is preventative of disparate impact would be certifiable by teams of lawyers working in tandem with software engineers and data scientists. Finally, we anticipate the business necessity defense that such data modifications could degrade the accuracy of algorithmic decision-making. While we find evidence for this trade-off, we also found that on one data set it was possible to modify the data so that despite previous decisions having had a disparate impact under the four-fifths standard, any subsequent decision-making algorithm was necessarily non-discriminatory while retaining essentially the same accuracy. Such an algorithmic “repair” could be used to refute a business necessity defense by showing that algorithms trained on modified data can still make decisions consistent with their previous outcomes.


Accountable Algorithms

University of Pennsylvania Law Review | 03.02.16

Joshua A. Kroll, Joanna Huey, Solon Barocas, Edward W. Felten, Joel R. Reidenberg, David G. Robinson, and Harlan Yu

D&S Affiliate Solon Barocas and Advisors Edward W. Felten and Joel Reidenberg collaborate on a paper outlining the importance of algorithmic accountability and fairness, proposing several tools that can be used when designing decision-making processes.

Abstract: Many important decisions historically made by people are now made by computers. Algorithms count votes, approve loan and credit card applications, target citizens or neighborhoods for police scrutiny, select taxpayers for an IRS audit, and grant or deny immigration visas.

The accountability mechanisms and legal standards that govern such decision processes have not kept pace with technology. The tools currently available to policymakers, legislators, and courts were developed to oversee human decision-makers and often fail when applied to computers instead: for example, how do you judge the intent of a piece of software? Additional approaches are needed to make automated decision systems — with their potentially incorrect, unjustified or unfair results — accountable and governable. This Article reveals a new technological toolkit to verify that automated decisions comply with key standards of legal fairness.

We challenge the dominant position in the legal literature that transparency will solve these problems. Disclosure of source code is often neither necessary (because of alternative techniques from computer science) nor sufficient (because of the complexity of code) to demonstrate the fairness of a process. Furthermore, transparency may be undesirable, such as when it permits tax cheats or terrorists to game the systems determining audits or security screening.

The central issue is how to assure the interests of citizens, and society as a whole, in making these processes more accountable. This Article argues that technology is creating new opportunities — more subtle and flexible than total transparency — to design decision-making algorithms so that they better align with legal and policy objectives. Doing so will improve not only the current governance of algorithms, but also — in certain cases — the governance of decision-making in general. The implicit (or explicit) biases of human decision-makers can be difficult to find and root out, but we can peer into the “brain” of an algorithm: computational processes and purpose specifications can be declared prior to use and verified afterwards.

The technological tools introduced in this Article apply widely. They can be used in designing decision-making processes from both the private and public sectors, and they can be tailored to verify different characteristics as desired by decision-makers, regulators, or the public. By forcing a more careful consideration of the effects of decision rules, they also engender policy discussions and closer looks at legal standards. As such, these tools have far-reaching implications throughout law and society.

Part I of this Article provides an accessible and concise introduction to foundational computer science concepts that can be used to verify and demonstrate compliance with key standards of legal fairness for automated decisions without revealing key attributes of the decision or the process by which the decision was reached. Part II then describes how these techniques can assure that decisions are made with the key governance attribute of procedural regularity, meaning that decisions are made under an announced set of rules consistently applied in each case. We demonstrate how this approach could be used to redesign and resolve issues with the State Department’s diversity visa lottery. In Part III, we go further and explore how other computational techniques can assure that automated decisions preserve fidelity to substantive legal and policy choices. We show how these tools may be used to assure that certain kinds of unjust discrimination are avoided and that automated decision processes behave in ways that comport with the social or legal standards that govern the decision. We also show how algorithmic decision-making may even complicate existing doctrines of disparate treatment and disparate impact, and we discuss some recent computer science work on detecting and removing discrimination in algorithms, especially in the context of big data and machine learning. And lastly in Part IV, we propose an agenda to further synergistic collaboration between computer science, law and policy to advance the design of automated decision processes for accountability.


Auditing Black-box Models by Obscuring Features

paper | 02.23.16

Philip Adler, Casey Falk, Sorelle A. Friedler, Gabriel Rybeck, Carlos Scheidegger, Brandon Smith, Suresh Venkatasubramanian

The ubiquity and power of machine learning models in society to determine and control an increasing number of real-world decisions presents a challenge.  D&S fellow Sorelle Friedler and a team of researchers have developed a technique to do black-box auditing of machine-learning classification models to gain a deeper understanding of these complex and opaque model behaviors.

Abstract: Data-trained predictive models are widely used to assist in decision making. But they are used as black boxes that output a prediction or score. It is therefore hard to acquire a deeper understanding of model behavior: and in particular how different attributes influence the model prediction. This is very important when trying to interpret the behavior of complex models, or ensure that certain problematic attributes (like race or gender) are not unduly influencing decisions. In this paper, we present a technique for auditing black-box models: we can study the extent to which existing models take advantage of particular features in the dataset without knowing how the models work. We show how a class of techniques originally developed for the detection and repair of disparate impact in classification models can be used to study the sensitivity of any model with respect to any feature subsets. Our approach does not require the black-box model to be retrained. This is important if (for example) the model is only accessible via an API, and contrasts our work with other methods that investigate feature influence like feature selection. We present experimental evidence for the effectiveness of our procedure using a variety of publicly available datasets and models. We also validate our procedure using techniques from interpretable learning and feature selection.


D&S fellow Mimi Onuoha thinks through the implications of the moment of data collection and offers a compact set of reminders for those who work with and think about data.

The conceptual, practical, and ethical issues surrounding “big data” and data in general begin at the very moment of data collection. Particularly when the data concern people, not enough attention is paid to the realities entangled within that significant moment and spreading out from it.

The point of data collection is a unique site for unpacking change, abuse, unfairness, bias, and potential. We can’t talk about responsible data without talking about the moment when data becomes data.


D&S Fellow Mark Latonero considers the digital infrastructure for movement of refugees — the social media platforms, mobile apps, online maps, instant messaging, translation websites, wire money transfers, cell phone charging stations, and Wi-Fi hotspots — that is accelerating the massive flow of people from places like Syria, Iraq, and Afghanistan to Greece, Germany, and Norway. He argues that while the tools that underpin this passage provide many benefits, they are also used to exploit refugees and raise serious questions about surveillance.

Refugees are among the world’s most vulnerable people. Studies have shown that undue surveillance towards marginalized populations can drive them off the grid. Both perceived and real fears around data collection may result in refugees seeking unauthorized routes to European destinations. This avoidance strategy can make them invisible to officials and more susceptible to criminal enterprises. Data collection on refugees should balance security and public safety with the need to preserve human dignity and rights. Governments and refugee agencies need to establish trust when collecting data from refugees. Technology companies should acknowledge their platforms are used by refugees and smugglers alike and create better user safety measures. As governments and leaders coordinate a response to the crisis, appropriate safeguards around data and technology need to be put in place to ensure the digital passage is safe and secure.


Certifying and removing disparate impact

working paper | 07.16.15

Michael Feldman, Sorelle A. Friedler, John Moeller, Carlos Scheidegger, and Suresh Venkatasubramanian

D&S fellow Sorelle Friedler and her research colleagues investigate the ways that algorithms make decisions in all aspects of our lives and whether or not we can determine if these algorithms are biased, involve illegal discrimination, or are unfair? In this paper, they introduce and address two problems with the goals of quantifying and then removing disparate impact.

Abstract: What does it mean for an algorithm to be biased? In U.S. law, unintentional bias is encoded via disparate impact, which occurs when a selection process has widely different outcomes for different groups, even as it appears to be neutral. This legal determination hinges on a definition of a protected class (ethnicity, gender, religious practice) and an explicit description of the process.
When the process is implemented using computers, determining disparate impact (and hence bias) is harder. It might not be possible to disclose the process. In addition, even if the process is open, it might be hard to elucidate in a legal setting how the algorithm makes its decisions. Instead of requiring access to the algorithm, we propose making inferences based on the data the algorithm uses.
We make four contributions to this problem. First, we link the legal notion of disparate impact to a measure of classification accuracy that while known, has received relatively little attention. Second, we propose a test for disparate impact based on analyzing the information leakage of the protected class from the other data attributes. Third, we describe methods by which data might be made unbiased. Finally, we present empirical evidence supporting the effectiveness of our test for disparate impact and our approach for both masking bias and preserving relevant information in the data. Interestingly, our approach resembles some actual selection practices that have recently received legal scrutiny.


What Amazon Taught the Cops

magazine article | 05.27.15

Ingrid Burrington

D&S artist in residence Ingrid Burrtington writes about the history of the term “predictive policing”, the pressures on police forces that are driving them to embrace data-driven policing, and the many valid causes for concern and outrage among civil-liberties advocates around these techniques and tactics.

It’s telling that one of the first articles to promote predictive policing, a 2009 Police Chief Magazine piece by the LAPD’s Charlie Beck and consultant Colleen McCue, poses the question “What Can We Learn From Wal-Mart and Amazon About Fighting Crime in a Recession?” The article likens law enforcement to a logistics dilemma, in which prioritizing where police officers patrol is analogous to identifying the likely demand for Pop-Tarts. Predictive policing has emerged as an answer to police departments’ assertion that they’re being asked to do more with less. If we can’t hire more cops, the logic goes, we need these tools to deploy them more efficiently.

 


“[D&S advisor] Dr. Alondra Nelson studies gender and black studies at the intersection of science, technology, and medicine. She is the author of numerous articles, including, ‘Bio Science: Genetic Genealogy Testing and the Pursuit of African Ancestry,’ as well as Body and Soul: The Black Panther Party and the Fight Against Medical Discrimination and the forthcoming The Social Life of DNA. We talked to her in the Trustees’ Room at Columbia University where she is professor of sociology and gender studies and the Dean of Social Science.”

Jamie Courville, Interview with Alondra Nelson: Race + Gender + Technology + Medicine, JSTOR Daily, February 18, 2015


Data & Civil Rights: Criminal Justice Primer

primer | 10.30.14

Alex Rosenblat, Kate Wikelius, danah boyd, Seeta Peña Gangadharan, Corrine Yu

Discrimination and racial disparities persist at every stage of the U.S. criminal justice system, from policing to trials to sentencing. The United States incarcerates a higher percentage of its population than any of its peer countries, with 2.2 million people behind bars. The criminal justice system disproportionately harms communities of color: while they make up 30 percent of the U.S. population, they represent 60 percent of the incarcerated population. There has been some discussion of how “big data” can be used to remedy inequalities in the criminal justice system; civil rights advocates recognize potential benefits but remained fundamentally concerned that data-oriented approaches are being designed and applied in ways that also disproportionately harms those who are already marginalized by criminal justice processes.

This document is a workshop primer from Data & Civil Rights: Why “Big Data” is a Civil Rights Issue.


Data & Civil Rights: Consumer Finance Primer

primer | 10.30.14

Alex Rosenblat, Rob Randhava, danah boyd, Seeta Peña Gangadharan, Corrine Yu

New data analytics tools, predictive technologies, and an increasingly available range of data sources have enabled new financial instruments and services to be developed, but access to high-quality services remains restricted, often along racial and socio-economic class lines. How data is used and how algorithms and scores are designed have the potential to minimize or maximize discrimination and inequity. Yet, because of the complexity of many of these systems, developing mechanisms of oversight and accountability is extremely challenging. Not only is there little transparency for those being assessed, but the very nature of the new types of algorithms being designed makes it difficult for those with technical acumen to truly understand what is unfolding and why. This raises significant questions for those invested in making certain that finance and pricing are fair.

This document is a workshop primer from Data & Civil Rights: Why “Big Data” is a Civil Rights Issue.


Data & Civil Rights: Technology Primer

primer | 10.30.14

Solon Barocas, Alex Rosenblat, danah boyd, Seeta Peña Gangadharan, Corrine Yu

Data have assumed a significant role in routine decisions about access, eligibility, and opportunity across a variety of domains. These are precisely the kinds of decisions that have long been the focus of civil rights campaigns. The results have been mixed. Companies draw on data in choosing how to focus their attention or distribute their resources, finding reason to cater to some of its customers while ignoring others. Governments use data to enhance service delivery and increase transparency, but also to decide whom to subject to special scrutiny, sanction, or punishment. The technologies that enable these applications are sometimes designed with a particular practice in mind, but more often are designed more abstractly, such that technologists are often unaware of and not testing for the ways in which they might benefit some and hurt others.

The technologies and practices that are driving these shifts are often described under the banner of “big data.” This concept is both vague and controversial, particularly to those engaged in the collection, cleaning, manipulation, use, and analysis of data. More often than not, the specific technical mechanisms that are being invoked fit under a different technical banner: “data mining.”

Data mining has a long history in many industries, including marketing and advertising, banking and finance, and insurance. As the technologies have become more affordable and the availability of data has increased, both public and private sectors—as well as civil society—are envisioning new ways of using these techniques to wrest actionable insights from once intractable datasets. The discussion of these practices has prompted fear and anxiety as well as hopes and dreams. There is a significant and increasing gap in understanding between those who are and are not technically fluent, making conversations about what’s happening with data challenging. That said, it’s important to understand that transparency and technical fluency is not always enough. For example, those who lack technical understanding are often frustrated because they are unable to provide oversight or determine the accuracy of what is produced while those who build these systems realize that even they cannot meaningfully assess the product of many algorithms.

This primer provides a basic overview to some of the core concepts underpinning the “big data” phenomenon and the practice of data mining. The purpose of this primer is to enable those who are unfamiliar with the relevant practices and technical tools to at least have an appreciation for different aspects of what’s involved.

This document is a workshop primer from Data & Civil Rights: Why “Big Data” is a Civil Rights Issue.


Future of Labor: Networked Employment Discrimination

primer | 10.08.14

Alex Rosenblat, Tamara Kneese, danah boyd

As businesses begin implementing algorithms to sort through applicants and use third party services to assess the quality of candidates based on their networks, personality tests, and other scores, how do we minimize the potential discriminatory outcomes of such hiring processes?

This document was produced as a part of the Future of Work Project at Data & Society Research Institute. This effort is supported by the Open Society Foundations’ U.S. Programs Future of Work inquiry, which is bringing together a cross-disciplinary and diverse group of thinkers to address some of the biggest questions about how work is transforming and what working will look like 20-30 years from now. The inquiry is exploring how the transformation of work, jobs and income will affect the most vulnerable communities, and what can be done to alter the course of events for the better.


Unionization emerged as a way of protecting labor rights when society shifted from an agricultural ecosystem to one shaped by manufacturing and industrial labor. New networked work complicates the organizing mechanisms that are inherent to unionization. How then do we protect laborers from abuse, poor work conditions, and discrimination?

This document was produced as a part of the Future of Work Project at Data & Society Research Institute. This effort is supported by the Open Society Foundations’ U.S. Programs Future of Work inquiry, which is bringing together a cross-disciplinary and diverse group of thinkers to address some of the biggest questions about how work is transforming and what working will look like 20-30 years from now. The inquiry is exploring how the transformation of work, jobs and income will affect the most vulnerable communities, and what can be done to alter the course of events for the better.


In this op-ed, Data & Society fellow Seeta Peña Gangadharan argues that the “rise of commercial data profiling is exacerbating existing inequities in society and could turn de facto discrimination into a high-tech enterprise.” She urges us to “respond to this digital discrimination by making civil rights a core driver of data-powered innovations and getting companies to share best practices in detecting and avoiding discriminatory outcomes.”


Subscribe to the Data & Society newsletter

Twitter |  Facebook  |  Medium  | RSS   |  Donate

Reporters and media:
[email protected]

General inquiries:
[email protected]

Unless otherwise noted this site and its contents are licensed under a Creative Commons Attribution 3.0 Unported license.  |  Privacy policy