The New York Times | 05.07.19
The Algorithmic Accountability Act is a step forward, but there’s still room for improvement. Postdoctoral Scholar Andrew Selbst and Margot Kaminski explain.
“The bill is a meaningful first step in addressing the problems with algorithmic decision-making. Companies must be pushed to consider and document what goes into algorithm design. They should be pushed, too, to come up with solutions. But the bill is lacking in three main areas.”
WNYC The Takeaway | 08.17.16
D&S lawyer-in-residence Rebecca Wexler describes the intersection of automated technologies, trade secrets, and the criminal justice system.
For-profit companies dominate the criminal justice technologies industry and produce computer programs that are widespread throughout the justice system. These automated programs deploy cops, analyze forensic evidence, and assess the risk levels of inmates. But these technological advances may be making the system less fair, and without access to the source code, it’s impossible to hold computers to account.
D&S resident Rebecca Wexler describes the flaws of an increasingly automated criminal justice system
The root of the problem is that automated criminal justice technologies are largely privately owned and sold for profit. The developers tend to view their technologies as trade secrets. As a result, they often refuse to disclose details about how their tools work, even to criminal defendants and their attorneys, even under a protective order, even in the controlled context of a criminal proceeding or parole hearing.
Ford Foundation blog | 05.30.17
D&S affiliate Wilneida Negrón details the role of bots and automation in activism today.
As everyone from advertisers to political adversaries jockey for attention, they are increasingly using automated technologies and processes to raise their own voices or drown out others. In fact, 62 percent of all Internet traffic is made up of programs acting on their own to analyze information, find vulnerabilities, or spread messages. Up to 48 million of Twitter’s 320 million users are bots, or applications that perform automated tasks. Some bots post beautiful art from museum collections, while some spread abuse and misinformation instead. Automation itself isn’t cutting edge, but the prevalence and sophistication of how automated tools interact with users is.
Harvard Business Review | 04.19.17
Jongbin Jung, Connor Concannon, D&S fellow Ravi Shroff, Sharad Goel, and Daniel G. Goldstein explore new methods for machine learning in criminal justice.
Simple rules certainly have their advantages, but one might reasonably wonder whether favoring simplicity means sacrificing performance. In many cases the answer, surprisingly, is no. We compared our simple rules to complex machine learning algorithms. In the case of judicial decisions, the risk chart above performed nearly identically to the best statistical risk assessment techniques. Replicating our analysis in 22 varied domains, we found that this phenomenon holds: Simple, transparent decision rules often perform on par with complex, opaque machine learning methods.
D&S advisor Anil Dash discusses how interviews can exclude people from the tech industry.
When we mimic patterns from tech culture without knowing why we do them, we often take good ideas and turn them into terrible barriers.
Ford Foundation blog | 01.12.17
points | 12.06.16
D&S advisor Baratunde Thurston details his exploration of The Glass Room exhibit.
I want to see The Glass Room everywhere there is an Apple Store…And anyone founding or working for a tech company should have to prove they’ve gone through this space and understood its meaning.
D&S fellow Mark Ackerman develops a checklist to address the sociotechnical issues demonstrated in Cathy O’Neil’s Weapons of Math Destruction.
These checklist items for socio-technical design are all important for policy as well. Yet the book makes it clear that not all “sins” can be reduced to checklist form. The book also explicates other issues that cannot easily be foreseen and are almost impossible for implementers to see in advance, even if well-intentioned. One example from the book is college rankings, where the attempt to be data-driven slowly created an ecology where universities and colleges paid more attention to the specific criteria used in the algorithm. In other situations, systems will be profit-generating in themselves, and therefore implemented, but suboptimal or societally harmful — this is especially true, as the book nicely points out, for systems that operate over time, as happened with mortgage pools. Efficiency may not be the only societal goal — there is also fairness, accountability, and justice. One of the strengths of the book is to point this out and make it quite clear.
points | 10.25.16
D&S researcher Josh Scannell responds to Georgetown Center on Privacy & Technology’s “The Perpetual Line-Up” study.
Reports like “The Perpetual Line-Up” force a fundamental question: What do we want technologies like facial recognition to do? Do we want them to automate narrowly “unbiased” facets of the criminal justice system? Or do we want to end the criminal justice system’s historical role as an engine of social injustice? We can’t have both.
D&S researcher Josh Scannell wrote an extensive analysis of predictive policing algorithms, showing that, while they were not built to be racist, they mirror a racist system.
Northpointe’s algorithms will always be racist, not because their engineers may be bad but because these systems accurately reflect the logic and mechanics of the carceral state — mechanics that have been digitized and sped up by the widespread implementation of systems like CompStat.
MIT Technology Review | 07.11.16
D&S advisor Ethan Zuckerman defends usage of video recording of police officers.
If video doesn’t lead to the indictment of officers who shoot civilians, are we wrong to expect justice from sousveillance? The police who shot Castille and Sterling knew they were likely to be captured on camera—from their police cars, surveillance cameras, and cameras held by bystanders—but still used deadly force in situations that don’t appear to have merited it. Is Mann’s hope for sousveillance simply wrong?
Not quite. While these videos rarely lead to grand jury indictments, they have become powerful fuel for social movements demanding racial justice and fairer policing. In the wake of Sterling and Castille’s deaths, protests brought thousands into the streets in major U.S. cities and led to the temporary closure of interstate highways.
points | 06.21.16
The FBI recently announced its plan to request that their massive biometrics database, called the Next Generation Identification (NGI) system, be exempted from basic requirements under the Privacy Act. These exemptions would prevent individuals from finding out if they are included within the database, whether their profile is being shared with other government entities, and whether their profile is accurate or contains false information.Forty-four organizations, including Data & Society, sent a letter to the Department of Justice asking for a 30-day extension to review the proposal.
Points: In this Points original, Robyn Caplan highlights the First Amendment implications of the FBI’s request for exemptions from the Privacy Act for its Next Generation Identification system. Public comment on the FBI’s proposal is being accepted until July 6, 2016.
Code is key to civic life, but we need to start looking under the hood and thinking about the externalities of our coding practices, especially as we’re building code as fast as possible with few checks and balances.
Points: “Be Careful What You Code For” is danah boyd’s talk from Personal Democracy Forum 2016 (June 9, 2016); her remarks have been modified for Points. danah exhorts us to mind the externalities of code and proposes audits as a way to reckon with the effects of code in high stakes areas like policing. Video is available here.
points | 06.13.16
In this Points piece “Real Life Harms of Student Data,” D&S researcher Mikaela Pitcan argues that assessing real harms connected with student data forces us to acknowledge the mundane, human causes. And she asks: “What do we do now?”
“Overall, cases where student data has led to harms aren’t about data per se, but about the way that people interact with the data…
…accidental data leaks, data being hacked, data being lost, school officials using off-campus information for discipline, oversight in planning for data handling when companies are sold, and faulty data and data systems resulting in negative outcomes. Let’s break it down.”
ProPublica | 05.23.16
D&S fellow Surya Mattu investigated bias in risk assessments, algorithmically generated scores predicting the likelihood of a person committing a future crime. These scores are increasingly used in courtrooms across America to inform decisions about who can be set free at every stage of the criminal justice system, from assigning bond amounts to fundamental decisions about a defendant’s freedom:
We obtained the risk scores assigned to more than 7,000 people arrested in Broward County, Florida, in 2013 and 2014 and checked to see how many were charged with new crimes over the next two years, the same benchmark used by the creators of the algorithm.
The score proved remarkably unreliable in forecasting violent crime: Only 20 percent of the people predicted to commit violent crimes actually went on to do so.
When a full range of crimes were taken into account — including misdemeanors such as driving with an expired license — the algorithm was somewhat more accurate than a coin flip. Of those deemed likely to re-offend, 61 percent were arrested for any subsequent crimes within two years.
We also turned up significant racial disparities, just as Holder feared. In forecasting who would re-offend, the algorithm made mistakes with black and white defendants at roughly the same rate but in very different ways.
- The formula was particularly likely to falsely flag black defendants as future criminals, wrongly labeling them this way at almost twice the rate as white defendants.
- White defendants were mislabeled as low risk more often than black defendants.
Could this disparity be explained by defendants’ prior crimes or the type of crimes they were arrested for? No. We ran a statistical test that isolated the effect of race from criminal history and recidivism, as well as from defendants’ age and gender. Black defendants were still 77 percent more likely to be pegged as at higher risk of committing a future violent crime and 45 percent more likely to be predicted to commit a future crime of any kind.
D&S fellow Mark Latonero considers recent attempts by policymakers, big tech companies, and advocates to address the deepening refugee and migrant crisis and, in particular, the educational needs of displaced children through technology and app development projects. He cautions developers and policymakers to consider the risks of failing to understand the unique challenges facing refugee children living without running water, let alone a good mobile network.
The reality is that no learning app or technology will improve education by itself. It’s also questionable whether mobile apps used with minimal adult supervision can improve a refugee child’s well-being. A roundtable at the Brookings Center for Universal Education noted that “children have needs that cannot be addressed where there is little or no human interaction. A teacher is more likely to note psychosocial needs and to support children’s recovery, or to refer children to other services when they are in greater contact with children.” Carleen Maitland, a technology and policy professor who led the Penn State team, found through her experience at Zaatari that in-person interactions with instructors and staff in the camp’s many community centers could provide far greater learning opportunities for young people than sitting alone with a mobile app.
In fact, unleashing ed tech vendors or Western technologists to solve development issues without the appropriate cultural awareness could do more harm than good. Children could come to depend on technologies that are abandoned by developers once the attention and funding have waned. Plus, the business models that sustain apps through advertising, or collecting and selling consumer data, are unethical where refugees are concerned. Ensuring data privacy and security for refugee children using apps should be a top priority for any software developer.
In cases where no in-person education is available, apps can still play a role, particularly for children who feel unsafe to travel outside their shelters or are immobile owing to injuries or disabilities. But if an app is to stand a chance of making a real difference, it needs to arise not out of a tech meet-up in New York City but on a field research trip to a refugee camp, where it will be easier to see how mobile phones are actually accessed and used. Researchers need to ask basic questions about the value of education for refugees: Is the goal to inspire learning on traditional subjects? Empower students with academic credentials or job skills? Assimilate refugees into their host country? Provide a protected space where children can be fed and feel safe? Or combat violent extremism at an early age?
To decide, researchers need to put the specific needs of refugee children first—whether economic, psychosocial, emotional, or physical—and work backward to see whether technology can help, if at all.
Medium | 05.18.16
D&S Researcher Alex Rosenblat on the fallout of the Austin Transportation’s showdown with Uber and Lyft:
Uber allied with Lyft in Austin to lobby against an ordinance passed by the city council which requires ridehail drivers to undergo fingerprint-based background checks. The two companies spent $8.1 million combined to encourage (i.e. bombard with robo-texts) Austin voters to oppose the ordinance in a referendum vote called Proposition 1. If local cities take a stand against Uber or Lyft’s demand about background checks, and they prevail, that could produce a ripple effect in other cities that have regulatory demands. The local impact on Austin is a secondary concern to the global and national ambitions of imperial Uber and parochial Lyft. When they lost the vote on Prop. 1, they followed through on their threats to withdraw their services.
D&S Fellow Natasha Singer looks into exploitative interactive website design techniques known as “dark patterns”.
Persuasive design is a longstanding practice, not just in marketing but in health care and philanthropy. Countries that nudge their citizens to become organ donors — by requiring them to opt out if they don’t want to donate their body parts — have a higher rate of participation than the United States, where people can choose to sign up for organ donation when they obtain driver’s licenses or ID cards.
But the same techniques that encourage citizens to do good may also be used to exploit consumers’ cognitive biases. User-experience designers and marketers are well aware that many people are so eager to start using a new service or complete a task, or are so loath to lose a perceived deal, that they will often click one “Next” button after another as if on autopilot — without necessarily understanding the terms they have agreed to along the way.
“That’s when things start to drift into manipulation,” said Katie Swindler, director of user experience at FCB Chicago, an ad agency. She and Mr. Brignull are part of an informal effort among industry experts trying to make a business case for increased transparency.
Surveillance & Society | 08.16.11
Researchers Alexandra Mateescu and Alex Rosenblat published a paper with D&S Founder danah boyd examine police-worn body cameras and their potential to provide avenues for police accountability and foster improved policy-community relations. The authors raise concerns about potential harmful consequences of constant surveillance that has sparked concerns from civil rights groups about how body-worn cameras may violate privacy and exacerbate existing police practices that have historically victimized people of color and vulnerable populations. They consider whether one can demand greater accountability without increased surveillance at the same time and suggest that “the trajectory laid out by body-worn cameras towards greater surveillance is clear, if not fully realized, while the path towards accountability has not yet been adequately defined, let alone forged.”
The intimacy of body-worn cameras’ presence—which potentially enables the recording of even mundane interpersonal interactions with citizens—can be exploited with the application of technologies like facial recognition; this can exacerbate existing practices that have historically victimized people of color and vulnerable populations. Not only do such technologies increase surveillance, but they also conflate the act of surveilling citizens with the mechanisms by which police conduct is evaluated. Although police accountability is the goal, the camera’s view is pointed outward and away from its wearer, and audio recording captures any sounds within range. As a result, it becomes increasingly difficult to ask whether one can demand greater accountability without increased surveillance at the same time.
Crafting better policies on body-worn camera use has been one of the primary avenues for balancing the right of public access with the need to protect against this technology’s invasive aspects. However, no universal policies or norms have been established, even on simple issues such as whether officers should notify citizens that they are being recorded. What is known is that body-worn cameras present definite and identifiable risks to privacy. By contrast, visions of accountability have remained ill-defined, and the role to be played by body-worn cameras cannot be easily separated from the wider institutional and cultural shifts necessary for enacting lasting reforms in policing. Both the privacy risks and the potential for effecting accountability are contingent upon an ongoing process of negotiation, shaped by beliefs and assumptions rather than empirical evidence.
CultureDigitally.org | 05.09.16
D&S Advisor Tarleton Gillespie responds to Gizmodo’s recent piece alleging bias in Facebook’s Trending Topics list. He argues that information algorithms like the ones used to identify “trends” on Facebook do not work alone and cannot work alone and argues that “in so many ways that we must simply discard the fantasy that they do, or ever will.”
People are in the algorithm because how could they not be? People produce the Facebook activity being measured, people design the algorithms and set their evaluative criteria, people decide what counts as a trend, people name and summarize them, and people look to game the algorithm with their next posts.
Trending algorithms are undeniably becoming part of the cultural landscape, and revelations like Gizmodo’s are helpful steps in helping us shed the easy notions of what they are and how they work, notions the platforms have fostered. Social media platforms must come to fully realize that they are newsmakers and gatekeepers, whether they intend to be or not, whether they want to be or not. And while algorithms can chew on a lot of data, it is still a substantial, significant, and human process to turn that data into claims about importance that get fed back to millions of users. This is not a realization that they will ever reach on their own — which suggests to me that they need the two countervailing forces that journalism has: a structural commitment to the public, imposed if not inherent, and competition to force them to take such obligations seriously.
Medium | 04.29.16
D&S Researcher Alex Rosenblat examines and problematizes Uber’s stance against tipping and the resulting effects on Uber drivers.
Harvard Business Review | 04.06.16
D&S Researcher Alex Rosenblat examines how Uber’s app design and deployment redistributes management functions to semiautomated and algorithmic systems, as well as to consumer ratings systems, creating ambiguity around who is in charge and what is expected of workers. Alex also raises questions about Uber’s neutral branding as an intermediary between supply (drivers) and demand (passengers) and considers the employment structures and hierarchies that emerge through its software platform:
Most conversations about the future of work and automation focus on issues of worker displacement. We’re only starting to think about the labor implications in the design of platforms that automate management and coordination of workers. Tools like the rating system, performance targets and policies, algorithmic surge pricing, and insistent messaging and behavioral nudges are part of the “choice architecture” of Uber’s system: it can steer drivers to work at particular places and at particular times while maintaining that its system merely reflects demand to drivers. These automated and algorithmic management tools complicate claims that drivers are independent workers whose employment opportunities are made possible through a neutral, intermediary software platform.
In many ways, automation can obscure the role of management, but as our research illustrates, algorithmic management cannot be conflated with worker autonomy. Uber’s model clearly raises new challenges for companies that aim to produce scalable, standardized services for consumers through the automation of worker-employer relationships.
Balkin.blogspot.com | 03.31.16
Reflections from D&S Affiliate Solon Barocas and Advisors Edward W. Felten and Joel Reidenberg on the recent “Unlocking the Black Box” Conference held on April 2 at Yale Law School:
Our work on accountable algorithms shows that transparency alone is not enough: we must have transparency of the right information about how a system works. Both transparency and the evaluation of computer systems as inscrutable black boxes, against which we can only test the relationship of inputs and outputs, both fail on their own to effect even the most basic procedural safeguards for automated decision making. And without a notion of procedural regularity on which to base analysis, it is fruitless to inquire as to a computer system’s fairness or compliance with norms of law, politics, or social acceptability. Fortunately, the tools of computer science provide the necessary means to build computer systems that are fully accountable. Both transparency and black-box testing play a part, but if we are to have accountable algorithms, we must design for this goal from the ground up.
D&S Research Analyst Mikeala Pitcan gives us a round-up of news events from January through March 2016 addressing data and equity in schools with a focus on efforts to combat bias in data in New York’s Specialized High Schools.
Medium | 03.27.16
D&S Advisor Andrew McGlaughlin reflects on Facebook’s approach to implementing their Free Basics program:
In opening a door to the Internet, Facebook doesn’t need to be a gatekeeper The good news, though, is that Facebook could quite easily fix its two core flaws and move forward with a program that is effective, widely supported, and consistent with Internet ideals and good public policy.
In this gatekeeper-less model, neither the user nor the online service has to ask Facebook’s permission to connect with each other. And that’s what makes all the difference. Rather than referring to an approved set of ~300 companies, the word “Basics” in Free Basics would denote any site or service anywhere in the world that provides a standards-compliant, low-bandwidth, mobile-optimized version.
Student data can and has served as an equalizer, but it also has the potential to perpetuate discriminatory practices. In order to leverage student data to move toward equity in education, researchers, parents, and educators must be aware of the ways in which data serves to equalize as well as disenfranchise. Common discourse surrounding data as an equalizer can fall along a spectrum of “yes, it’s the fix” or “this will never work.” Reality is more complicated than that.
Points: “Does data-driven learning improve equity?” That depends, says Mikaela Pitcan in this Points original. Starting assumptions, actual data use practices, interpretation, context context context — all complicate the story around education data and must be kept in mind if equity is our objective.
D&S Research Analyst Mikaela Pitcan gives a brief recap of student data & privacy related news from January – March 2016. Special attention is given to the Every Student Succeeds Act.
paper | 03.10.16
D&S Fellow Sorelle Friedler and D&S Affiliate Ifeoma Ajunwa argue in this essay that well settled legal doctrines that prohibit discrimination against job applicants on the basis of sex or race dictate an examination of how algorithms are employed in the hiring process with the specific goals of: 1) predicting whether such algorithmic decision-making could generate decisions having a disparate impact on protected classes; and 2) repairing input data in such a way as to prevent disparate impact from algorithmic decision-making.
Major advances in machine learning have encouraged corporations to rely on Big Data and algorithmic decision making with the presumption that such decisions are efficient and impartial. In this Essay, we show that protected information that is encoded in seemingly facially neutral data could be predicted with high accuracy by algorithms and employed in the decision-making process, thus resulting in a disparate impact on protected classes. We then demonstrate how it is possible to repair the data so that any algorithm trained on that data would make non-discriminatory decisions. Since this data modification is done before decisions are applied to any individuals, this process can be applied without requiring the reversal of decisions. We make the legal argument that such data modifications should be mandated as an anti-discriminatory measure. And akin to Professor Ayres’ and Professor Gerarda’s Fair Employment Mark, such data repair that is preventative of disparate impact would be certifiable by teams of lawyers working in tandem with software engineers and data scientists. Finally, we anticipate the business necessity defense that such data modifications could degrade the accuracy of algorithmic decision-making. While we find evidence for this trade-off, we also found that on one data set it was possible to modify the data so that despite previous decisions having had a disparate impact under the four-fifths standard, any subsequent decision-making algorithm was necessarily non-discriminatory while retaining essentially the same accuracy. Such an algorithmic “repair” could be used to refute a business necessity defense by showing that algorithms trained on modified data can still make decisions consistent with their previous outcomes.
University of Pennsylvania Law Review | 03.02.16
D&S Affiliate Solon Barocas and Advisors Edward W. Felten and Joel Reidenberg collaborate on a paper outlining the importance of algorithmic accountability and fairness, proposing several tools that can be used when designing decision-making processes.
Abstract: Many important decisions historically made by people are now made by computers. Algorithms count votes, approve loan and credit card applications, target citizens or neighborhoods for police scrutiny, select taxpayers for an IRS audit, and grant or deny immigration visas.
The accountability mechanisms and legal standards that govern such decision processes have not kept pace with technology. The tools currently available to policymakers, legislators, and courts were developed to oversee human decision-makers and often fail when applied to computers instead: for example, how do you judge the intent of a piece of software? Additional approaches are needed to make automated decision systems — with their potentially incorrect, unjustified or unfair results — accountable and governable. This Article reveals a new technological toolkit to verify that automated decisions comply with key standards of legal fairness.
We challenge the dominant position in the legal literature that transparency will solve these problems. Disclosure of source code is often neither necessary (because of alternative techniques from computer science) nor sufficient (because of the complexity of code) to demonstrate the fairness of a process. Furthermore, transparency may be undesirable, such as when it permits tax cheats or terrorists to game the systems determining audits or security screening.
The central issue is how to assure the interests of citizens, and society as a whole, in making these processes more accountable. This Article argues that technology is creating new opportunities — more subtle and flexible than total transparency — to design decision-making algorithms so that they better align with legal and policy objectives. Doing so will improve not only the current governance of algorithms, but also — in certain cases — the governance of decision-making in general. The implicit (or explicit) biases of human decision-makers can be difficult to find and root out, but we can peer into the “brain” of an algorithm: computational processes and purpose specifications can be declared prior to use and verified afterwards.
The technological tools introduced in this Article apply widely. They can be used in designing decision-making processes from both the private and public sectors, and they can be tailored to verify different characteristics as desired by decision-makers, regulators, or the public. By forcing a more careful consideration of the effects of decision rules, they also engender policy discussions and closer looks at legal standards. As such, these tools have far-reaching implications throughout law and society.
Part I of this Article provides an accessible and concise introduction to foundational computer science concepts that can be used to verify and demonstrate compliance with key standards of legal fairness for automated decisions without revealing key attributes of the decision or the process by which the decision was reached. Part II then describes how these techniques can assure that decisions are made with the key governance attribute of procedural regularity, meaning that decisions are made under an announced set of rules consistently applied in each case. We demonstrate how this approach could be used to redesign and resolve issues with the State Department’s diversity visa lottery. In Part III, we go further and explore how other computational techniques can assure that automated decisions preserve fidelity to substantive legal and policy choices. We show how these tools may be used to assure that certain kinds of unjust discrimination are avoided and that automated decision processes behave in ways that comport with the social or legal standards that govern the decision. We also show how algorithmic decision-making may even complicate existing doctrines of disparate treatment and disparate impact, and we discuss some recent computer science work on detecting and removing discrimination in algorithms, especially in the context of big data and machine learning. And lastly in Part IV, we propose an agenda to further synergistic collaboration between computer science, law and policy to advance the design of automated decision processes for accountability.
D&S Board Member Anil Dash contrasts two recent approaches to making internet connectivity more widely available. Comparing the efforts to build consensus behind Facebook’s Free Basics initiative to LinkNYC, the recently-launched program to bring free broadband wifi to New York City, Dash views each situation as a compelling example of who gets heard, and when, any time a big institution tries to create a technology infrastructure to serve millions of people.
There’s one key lesson we can take from these two attempts to connect millions of people to the Internet: it’s about building trust. Technology infrastructure can be good or bad, extractive or supportive, a lifeline or a raw deal. Objections to new infrastructure are often dismissed by the people pushing them, but people’s concerns are seldom simply about advertising or bring skeptical of corporations. There are often very good reasons to look a gift horse in the mouth.
Whether we believe in the positive potential of getting connected simply boils down to whether we feel the people providing that infrastructure have truly listened to us. The good news is, we have clear examples of how to do exactly that.
paper | 02.23.16
The ubiquity and power of machine learning models in society to determine and control an increasing number of real-world decisions presents a challenge. D&S fellow Sorelle Friedler and a team of researchers have developed a technique to do black-box auditing of machine-learning classification models to gain a deeper understanding of these complex and opaque model behaviors.
Abstract: Data-trained predictive models are widely used to assist in decision making. But they are used as black boxes that output a prediction or score. It is therefore hard to acquire a deeper understanding of model behavior: and in particular how different attributes influence the model prediction. This is very important when trying to interpret the behavior of complex models, or ensure that certain problematic attributes (like race or gender) are not unduly influencing decisions. In this paper, we present a technique for auditing black-box models: we can study the extent to which existing models take advantage of particular features in the dataset without knowing how the models work. We show how a class of techniques originally developed for the detection and repair of disparate impact in classification models can be used to study the sensitivity of any model with respect to any feature subsets. Our approach does not require the black-box model to be retrained. This is important if (for example) the model is only accessible via an API, and contrasts our work with other methods that investigate feature influence like feature selection. We present experimental evidence for the effectiveness of our procedure using a variety of publicly available datasets and models. We also validate our procedure using techniques from interpretable learning and feature selection.
D&S fellow Mimi Onuoha thinks through the implications of the moment of data collection and offers a compact set of reminders for those who work with and think about data.
The conceptual, practical, and ethical issues surrounding “big data” and data in general begin at the very moment of data collection. Particularly when the data concern people, not enough attention is paid to the realities entangled within that significant moment and spreading out from it.
The point of data collection is a unique site for unpacking change, abuse, unfairness, bias, and potential. We can’t talk about responsible data without talking about the moment when data becomes data.
D&S Fellow Mark Latonero considers the digital infrastructure for movement of refugees — the social media platforms, mobile apps, online maps, instant messaging, translation websites, wire money transfers, cell phone charging stations, and Wi-Fi hotspots — that is accelerating the massive flow of people from places like Syria, Iraq, and Afghanistan to Greece, Germany, and Norway. He argues that while the tools that underpin this passage provide many benefits, they are also used to exploit refugees and raise serious questions about surveillance.
Refugees are among the world’s most vulnerable people. Studies have shown that undue surveillance towards marginalized populations can drive them off the grid. Both perceived and real fears around data collection may result in refugees seeking unauthorized routes to European destinations. This avoidance strategy can make them invisible to officials and more susceptible to criminal enterprises. Data collection on refugees should balance security and public safety with the need to preserve human dignity and rights. Governments and refugee agencies need to establish trust when collecting data from refugees. Technology companies should acknowledge their platforms are used by refugees and smugglers alike and create better user safety measures. As governments and leaders coordinate a response to the crisis, appropriate safeguards around data and technology need to be put in place to ensure the digital passage is safe and secure.
TheRideShareGuy.com | 11.25.15
“On today’s podcast, I get to interview Alex Rosenblat, a researcher from the Data & Society Research Institute. Now that name may seem familiar because in addition to spending the last 9 months studying how Uber drivers interact with the driver app, Alex has also published several very popular articles on things like Uber’s phantom cabs and a technical paper on the subject of driver control.”
Harry Campbell, Alex Rosenblat On How Much Control Uber Really Has Over Its Drivers, The Rideshare Guy Podcast, November 25, 2015
working paper | 07.16.15
D&S fellow Sorelle Friedler and her research colleagues investigate the ways that algorithms make decisions in all aspects of our lives and whether or not we can determine if these algorithms are biased, involve illegal discrimination, or are unfair? In this paper, they introduce and address two problems with the goals of quantifying and then removing disparate impact.
Abstract: What does it mean for an algorithm to be biased? In U.S. law, unintentional bias is encoded via disparate impact, which occurs when a selection process has widely different outcomes for different groups, even as it appears to be neutral. This legal determination hinges on a definition of a protected class (ethnicity, gender, religious practice) and an explicit description of the process.
When the process is implemented using computers, determining disparate impact (and hence bias) is harder. It might not be possible to disclose the process. In addition, even if the process is open, it might be hard to elucidate in a legal setting how the algorithm makes its decisions. Instead of requiring access to the algorithm, we propose making inferences based on the data the algorithm uses.
We make four contributions to this problem. First, we link the legal notion of disparate impact to a measure of classification accuracy that while known, has received relatively little attention. Second, we propose a test for disparate impact based on analyzing the information leakage of the protected class from the other data attributes. Third, we describe methods by which data might be made unbiased. Finally, we present empirical evidence supporting the effectiveness of our test for disparate impact and our approach for both masking bias and preserving relevant information in the data. Interestingly, our approach resembles some actual selection practices that have recently received legal scrutiny.
“The collapse of the financial system starting in 2008 shattered public confidence in the traditional intermediaries of the financial system – the regulated banks. Not only did the mainstream financial system implode leaving millions of borrowers baring an extraordinary debt burden, the contraction that followed left individuals and small businesses cut off from fresh sources of credit. “Disintermediation,” the idea that we can have credit without banks, became a political rallying cry for those interested in reforming the financial system to better serve the interests of consumers. As the Financial Times has put it, peer-to-peer lending companies offered to “revolutionize credit by cutting out, or disintermediating, banks from the traditional lending process.” Although the amount of credit available through peer-to-peer lending is miniscule in comparison to traditional credit, the public attention given to this phenomenon is significant.”
This primer maps the peer-to-peer/marketplace lending ecosystem in order to ground the Data & Fairness initiative’s investigations into its benefits and challenges and potential for fairness and discrimination.
writeup | 06.19.15
On May 19, 2015 a group of about 20 individuals gathered at New America in Washington, DC for a discussion co-hosted by The Leadership Conference on Civil and Human Rights, Data & Society Research Institute, Upturn, and New America’s Open Technology Institute. The group was composed of technologists, researchers, civil rights advocates, and law enforcement representatives with the goal to broaden the discussion surrounding police worn body cameras within their respective fields and to understand the various communities’ interests and concerns. The series of discussions focused on what the technology behind police cameras consists of, how the cameras can be implemented to protect civil rights and public safety, and what the consequences of implementation might be.
causes for concern and outrage among civil-liberties advocates around these techniques and tactics.
It’s telling that one of the first articles to promote predictive policing, a 2009 Police Chief Magazine piece by the LAPD’s Charlie Beck and consultant Colleen McCue, poses the question “What Can We Learn From Wal-Mart and Amazon About Fighting Crime in a Recession?” The article likens law enforcement to a logistics dilemma, in which prioritizing where police officers patrol is analogous to identifying the likely demand for Pop-Tarts. Predictive policing has emerged as an answer to police departments’ assertion that they’re being asked to do more with less. If we can’t hire more cops, the logic goes, we need these tools to deploy them more efficiently.
D&S founder danah boyd considers recent efforts at reforming laws around student privacy and what it would mean to actually consider the privacy rights of the most marginalized students.
The threats that poor youth face? That youth of color face? And the trade-offs they make in a hypersurveilled world? What would it take to get people to care about how we keep building out infrastructure and backdoors to track low-status youth in new ways? It saddens me that the conversation is constructed as being about student privacy, but it’s really about who has the right to monitor which youth. And, as always, we allow certain actors to continue asserting power over youth.
primer | 10.30.14
Many education reformers see the merging of student data, predictive analytics, processing tools, and technology-based instruction as the key to the future of education and a means to further opportunity and equity in education. However, despite widespread discussion of the potential benefits and costs of using data in educational reform, it is difficult to determine who benefits from reforms since there has been little assessment of these programs and few oversight mechanisms.
This document is a workshop primer from Data & Civil Rights: Why “Big Data” is a Civil Rights Issue.
primer | 10.30.14
New data analytics tools, predictive technologies, and an increasingly available range of data sources have enabled new financial instruments and services to be developed, but access to high-quality services remains restricted, often along racial and socio-economic class lines. How data is used and how algorithms and scores are designed have the potential to minimize or maximize discrimination and inequity. Yet, because of the complexity of many of these systems, developing mechanisms of oversight and accountability is extremely challenging. Not only is there little transparency for those being assessed, but the very nature of the new types of algorithms being designed makes it difficult for those with technical acumen to truly understand what is unfolding and why. This raises significant questions for those invested in making certain that finance and pricing are fair.
This document is a workshop primer from Data & Civil Rights: Why “Big Data” is a Civil Rights Issue.
primer | 10.30.14
Data has always played an important role in housing policies, practices, and financing. Housing advocates worry that new sources of data are being used to extend longstanding discriminatory practices, particularly as it affects those who have access to credit for home ownership as well as the ways in which the rental market is unfolding. Open data practices, while potentially shedding light on housing inequities, are currently more theoretical than actionable. Far too little is known about the ways in which data analytics and other data-related practices may expand or relieve inequities in housing.
This document is a workshop primer from Data & Civil Rights: Why “Big Data” is a Civil Rights Issue.
primer | 10.30.04
The complexity of hiring algorithms which fold all kinds of data into scoring systems make it difficult to detect and therefore challenge hiring decisions, even when outputs appear to disadvantage particular groups within a protected class. When hiring algorithms weigh many factors to reach an unexplained decision, job applicants and outside observers are unable to detect and challenge factors that may have a disparate impact on protected groups.
This document is a workshop primer from Data & Civil Rights: Why “Big Data” is a Civil Rights Issue.
primer | 10.30.14
Data have assumed a significant role in routine decisions about access, eligibility, and opportunity across a variety of domains. These are precisely the kinds of decisions that have long been the focus of civil rights campaigns. The results have been mixed. Companies draw on data in choosing how to focus their attention or distribute their resources, finding reason to cater to some of its customers while ignoring others. Governments use data to enhance service delivery and increase transparency, but also to decide whom to subject to special scrutiny, sanction, or punishment. The technologies that enable these applications are sometimes designed with a particular practice in mind, but more often are designed more abstractly, such that technologists are often unaware of and not testing for the ways in which they might benefit some and hurt others.
The technologies and practices that are driving these shifts are often described under the banner of “big data.” This concept is both vague and controversial, particularly to those engaged in the collection, cleaning, manipulation, use, and analysis of data. More often than not, the specific technical mechanisms that are being invoked fit under a different technical banner: “data mining.”
Data mining has a long history in many industries, including marketing and advertising, banking and finance, and insurance. As the technologies have become more affordable and the availability of data has increased, both public and private sectors—as well as civil society—are envisioning new ways of using these techniques to wrest actionable insights from once intractable datasets. The discussion of these practices has prompted fear and anxiety as well as hopes and dreams. There is a significant and increasing gap in understanding between those who are and are not technically fluent, making conversations about what’s happening with data challenging. That said, it’s important to understand that transparency and technical fluency is not always enough. For example, those who lack technical understanding are often frustrated because they are unable to provide oversight or determine the accuracy of what is produced while those who build these systems realize that even they cannot meaningfully assess the product of many algorithms.
This primer provides a basic overview to some of the core concepts underpinning the “big data” phenomenon and the practice of data mining. The purpose of this primer is to enable those who are unfamiliar with the relevant practices and technical tools to at least have an appreciation for different aspects of what’s involved.
This document is a workshop primer from Data & Civil Rights: Why “Big Data” is a Civil Rights Issue.
primer | 10.08.14
Unionization emerged as a way of protecting labor rights when society shifted from an agricultural ecosystem to one shaped by manufacturing and industrial labor. New networked work complicates the organizing mechanisms that are inherent to unionization. How then do we protect laborers from abuse, poor work conditions, and discrimination?
This document was produced as a part of the Future of Work Project at Data & Society Research Institute. This effort is supported by the Open Society Foundations’ U.S. Programs Future of Work inquiry, which is bringing together a cross-disciplinary and diverse group of thinkers to address some of the biggest questions about how work is transforming and what working will look like 20-30 years from now. The inquiry is exploring how the transformation of work, jobs and income will affect the most vulnerable communities, and what can be done to alter the course of events for the better.
testimony | 08.15.14
In this letter to the Federal Trade Commission (FTC), New American Foundation’s Open Technology Institute is joined by Data & Society and Solon Barocas, an independent researcher, in asking the FTC to address the ethical problems, legal constraints, and technical difficulties associated with building a body of evidence of big data harms, the issue of whether intentions should matter in the evaluation of big data harms, and the unique context of vulnerable populations and implications for problem solving and taking steps to protect them.
The letter was submitted in response to an FTC request for comments in advance of its workshop, Big Data: A Tool for Inclusion or Exclusion?
Accountability is fundamentally about checks and balances to power. In theory, both government and corporations are kept accountable through social, economic, and political mechanisms. Journalism and public advocates serve as an additional tool to hold powerful institutions and individuals accountable. But in a world of data and algorithms, accountability is often murky. Beyond questions about whether the market is sufficient or governmental regulation is necessary, how should algorithms be held accountable? For example what is the role of the fourth estate in holding data-oriented practices accountable?
This document is a workshop primer from The Social, Cultural & Ethical Dimensions of “Big Data”.
Stanford Law Review | 09.03.13
In the Stanford Law Review symposium issue on privacy and big data (September 2013), Cynthia Dwork and Data & Society advisor Deirdre Mulligan argue that “privacy controls and increased transparency fail to address concerns with the classifications and segmentation produced by big data analysis.” “If privacy and transparency are not the panacea to the risks posed by big data,” they ask, “what is?” They offer a quartet of approaches/areas of focus.