For Slate, Data & Society Researcher Jacob Metcalf argues that we should be more concerned about behavioral models developed by entities like Cambridge Analytica, which can be traded between political entities, rather than the voter data itself.
” In other words, the one thing we can be sure of psychographic profiling is that it provided one more way to transfer knowledge and economic value between campaigns and organizations.”
Medium | 11.30.17
Data & Society Researcher Jacob Metcalf argues for an ethical approach to data science and offers strategies for future research.
“On the one hand, it is banally predictable that the consequences of machine-learning-enabled surveillance will fall disproportionately on demographic minorities. On the other hand, queer folks hardly need data scientists scrutinizing their jawlines and hairstyles to warm them about this. They have always known this.”
D&S founder and President danah boyd & affiliate Solon Barocas investigate the practice of ethics in data science.
“Critical commentary on data science has converged on a worrisome idea: that data scientists do not recognize their power and, thus, wield it carelessly. These criticisms channel legitimate concerns about data science into doubts about the ethical awareness of its practitioners. For these critics, carelessness and indifference explains much of the problem—to which only they can offer a solution.”
points | 10.19.16
D&S fellow Zara Rahman introduces her upcoming research at Data & Society, where she will examine the work of translators in technology projects.
Whatever it’s called, it’s also under-appreciated. In our tech-focused world, we often hold those with so-called “hard” programming skills up on a pedestal, and we relegate those with “soft” communication skills to being invisible caretakers. It’s not an accident that this binary correlates strongly with traditionally male-dominated roles of programming and largely female-dominated roles of community management or emotional labour. It’s worth noting too, that one is paid much more than the other.
ProPublica | 10.12.16
D&S affiliate Surya Mattu, with Julia Angwin, Terry Parris Jr., and Seongtaek Lim, continue the Black Box series.
Depending on what data they are trained on, machines can “learn” to be biased. That’s what happened in the fall of 2012, when Google’s machines “learned” in the run-up to the presidential election that people who searched for President Obama wanted more Obama news in subsequent searches, but people who searched for Republican nominee Mitt Romney did not. Google said the bias in its search results was an inadvertent result of machine learning.
Sometimes machines build their predictions by conducting experiments on us, through what is known as A/B testing. This is when a website will randomly show different headlines or different photos to different people. The website can then track which option is more popular, by counting how many users click on the different choices.
D&S affiliate Wilneida Negrón writes five tips to allow for more inclusive AI research.
Although, a step in the right direction, the Partnership on AI does highlight a certain conundrum — what exactly is it that we want from Silicon Valley’s tech giants? Do we want a seat at their table? Or are we asking for a deeper and more sustaining type of participation? Or perhaps, more disturbingly, is it too late for any truly inclusive and meaningful participation in the development of future AI technologies?
teaching | 09.19.16
University campuses provide an ecosystem of support to technical researchers, including computer scientists, as they navigate emerging issues of privacy, ethics, security, and consent in big data research. These support systems have varying levels of coordination and may be implicit or explicit.
As part of the Supporting Ethics in Data Research project at Data & Society, we held workshops with twelve to sixteen student researchers,professors, information technology leaders, repository managers, and research librarians at a handful of universities. The goal was to tease out the individual components of ethical, technical, and legal support that were available or absent on each campus, and to better understand the interactions between different actors as they encounter common ethical quandaries.
|Materials:||sticky notes, scratch paper, pens, and markers|
Case Study of a Technical Researcher: provides a fictional scenario involving a researcher who needs assistance navigating a number of obstacles during her technical research.
Data Clinic Model: facilitates a brainstorming session about the components needed for a drop-in clinic to offer peer and professional support.
Ethics Conversation: asks participants to link words, feelings, and thoughts to the word “ethics,” followed by a discussion.
|Read More:||For the results of this project, please see the final report, Supporting Ethical Data Research: An Exploratory Study of Emerging Issues in Big Data and Technical Research, which provides detailed findings.|
primer | 08.04.16
In the era of big data, how do researchers ethically collect, analyze, and store data? danah boyd, Emily F. Keller, and Bonnie Tijerina explore this question and examine issues from how to achieve informed consent from research subjects in big data research to how to store data securely in case of breaches. The primer evolves into a discussion on how libraries can collaborate with computer scientists to examine ethical big data research issues.
Communications of the ACM | 07.01.16
Data & Society Researcher Jacob Metcalf reconsiders research ethics in the wake of big data.
“Many of the familiar norms and regulations of research ethics formulated to prior paradigms of research risks and harms, and thus the formal triggers for ethics review are miscalibrated. We need to reevaluate longstanding assumptions of research ethics in light of the emergence of “big data” analytics.”
EDUCAUSE review | 06.27.16
paper | 05.23.16
The Council for Big Data, Ethics, and Society has released a comprehensive white paper consolidating conversations and ideas from two years of meetings and discussions:
Today’s release marks a major milestone for the Council, which began in 2014 with support from the National Science Foundation and the goal of providing critical social and cultural perspectives on “big data” research initiatives. The work of the Council consistently surfaced conflicts between big data research methods and existing norms. Should big data methods be exempted from those norms? pushed into them? Are entirely new paradigms needed? The white paper provides recommendations in the areas of policy, pedagogy, and network building, as well as identifying crucial areas for further research. From the Executive Summary:
The Council’s findings, outputs, and recommendations—including those described in this white paper as well as those in earlier reports—address concrete manifestations of these disjunctions between big data research methods and existing research ethics paradigms. We have identified policy changes that would encourage greater engagement and reflection on ethics topics. We have indicated a number of pedagogical needs for data science instructors, and endeavored to fulfill some of them. We have also explored cultural and institutional barriers to collaboration between ethicists, social scientists, and data scientists in academia and industry around ethics challenges. Overall, our recommendations are geared toward those who are invested in a future for data science, big data analytics, and artificial intelligence guided by ethical considerations along with technical merit.
Big Data and Society | 05.14.16
D&S Researcher Jake Metcalf and D&S Affiliate Kate Crawford examine the growing discontinuities between research practices of data science and established tools of research ethics regulation.
Abstract: There are growing discontinuities between the research practices of data science and established tools of research ethics regulation. Some of the core commitments of existing research ethics regulations, such as the distinction between research and practice, cannot be cleanly exported from biomedical research to data science research. These discontinuities have led some data science practitioners and researchers to move toward rejecting ethics regulations outright. These shifts occur at the same time as a proposal for major revisions to the Common Rule — the primary regulation governing human-subjects research in the U.S. — is under consideration for the first time in decades. We contextualize these revisions in long-running complaints about regulation of social science research, and argue data science should be understood as continuous with social sciences in this regard. The proposed regulations are more flexible and scalable to the methods of non-biomedical research, but they problematically exclude many data science methods from human-subjects regulation, particularly uses of public datasets. The ethical frameworks for big data research are highly contested and in flux, and the potential harms of data science research are unpredictable. We examine several contentious cases of research harms in data science, including the 2014 Facebook emotional contagion study and the 2016 use of geographical data techniques to identify the pseudonymous artist Banksy. To address disputes about human-subjects research ethics in data science, critical data studies should offer a historically nuanced theory of “data subjectivity” responsive to the epistemic methods, harms and benefits of data science and commerce.
Balkin.blogspot.com | 03.31.16
Reflections from D&S Affiliate Solon Barocas and Advisors Edward W. Felten and Joel Reidenberg on the recent “Unlocking the Black Box” Conference held on April 2 at Yale Law School:
Our work on accountable algorithms shows that transparency alone is not enough: we must have transparency of the right information about how a system works. Both transparency and the evaluation of computer systems as inscrutable black boxes, against which we can only test the relationship of inputs and outputs, both fail on their own to effect even the most basic procedural safeguards for automated decision making. And without a notion of procedural regularity on which to base analysis, it is fruitless to inquire as to a computer system’s fairness or compliance with norms of law, politics, or social acceptability. Fortunately, the tools of computer science provide the necessary means to build computer systems that are fully accountable. Both transparency and black-box testing play a part, but if we are to have accountable algorithms, we must design for this goal from the ground up.
University of Pennsylvania Law Review | 03.02.16
D&S Affiliate Solon Barocas and Advisors Edward W. Felten and Joel Reidenberg collaborate on a paper outlining the importance of algorithmic accountability and fairness, proposing several tools that can be used when designing decision-making processes.
Abstract: Many important decisions historically made by people are now made by computers. Algorithms count votes, approve loan and credit card applications, target citizens or neighborhoods for police scrutiny, select taxpayers for an IRS audit, and grant or deny immigration visas.
The accountability mechanisms and legal standards that govern such decision processes have not kept pace with technology. The tools currently available to policymakers, legislators, and courts were developed to oversee human decision-makers and often fail when applied to computers instead: for example, how do you judge the intent of a piece of software? Additional approaches are needed to make automated decision systems — with their potentially incorrect, unjustified or unfair results — accountable and governable. This Article reveals a new technological toolkit to verify that automated decisions comply with key standards of legal fairness.
We challenge the dominant position in the legal literature that transparency will solve these problems. Disclosure of source code is often neither necessary (because of alternative techniques from computer science) nor sufficient (because of the complexity of code) to demonstrate the fairness of a process. Furthermore, transparency may be undesirable, such as when it permits tax cheats or terrorists to game the systems determining audits or security screening.
The central issue is how to assure the interests of citizens, and society as a whole, in making these processes more accountable. This Article argues that technology is creating new opportunities — more subtle and flexible than total transparency — to design decision-making algorithms so that they better align with legal and policy objectives. Doing so will improve not only the current governance of algorithms, but also — in certain cases — the governance of decision-making in general. The implicit (or explicit) biases of human decision-makers can be difficult to find and root out, but we can peer into the “brain” of an algorithm: computational processes and purpose specifications can be declared prior to use and verified afterwards.
The technological tools introduced in this Article apply widely. They can be used in designing decision-making processes from both the private and public sectors, and they can be tailored to verify different characteristics as desired by decision-makers, regulators, or the public. By forcing a more careful consideration of the effects of decision rules, they also engender policy discussions and closer looks at legal standards. As such, these tools have far-reaching implications throughout law and society.
Part I of this Article provides an accessible and concise introduction to foundational computer science concepts that can be used to verify and demonstrate compliance with key standards of legal fairness for automated decisions without revealing key attributes of the decision or the process by which the decision was reached. Part II then describes how these techniques can assure that decisions are made with the key governance attribute of procedural regularity, meaning that decisions are made under an announced set of rules consistently applied in each case. We demonstrate how this approach could be used to redesign and resolve issues with the State Department’s diversity visa lottery. In Part III, we go further and explore how other computational techniques can assure that automated decisions preserve fidelity to substantive legal and policy choices. We show how these tools may be used to assure that certain kinds of unjust discrimination are avoided and that automated decision processes behave in ways that comport with the social or legal standards that govern the decision. We also show how algorithmic decision-making may even complicate existing doctrines of disparate treatment and disparate impact, and we discuss some recent computer science work on detecting and removing discrimination in algorithms, especially in the context of big data and machine learning. And lastly in Part IV, we propose an agenda to further synergistic collaboration between computer science, law and policy to advance the design of automated decision processes for accountability.
D&S fellow Mimi Onuoha thinks through the implications of the moment of data collection and offers a compact set of reminders for those who work with and think about data.
The conceptual, practical, and ethical issues surrounding “big data” and data in general begin at the very moment of data collection. Particularly when the data concern people, not enough attention is paid to the realities entangled within that significant moment and spreading out from it.
The point of data collection is a unique site for unpacking change, abuse, unfairness, bias, and potential. We can’t talk about responsible data without talking about the moment when data becomes data.
blog post | 08.03.15
International Journal of Communication | 05.18.15
“Communication technologies increasingly mediate data exchanges rather than human communication. We propose the term data valences to describe the differences in expectations that people have for data across different social settings. Building on two years of interviews, observations, and participation in the communities of technology designers, clinicians, advocates, and users for emerging mobile data in formal health care and consumer wellness, we observed the tensions among these groups in their varying expectations for data. This article identifies six data valences (self-evidence, actionability, connection, transparency, “truthiness,” and discovery) and demonstrates how they are mediated and how they are distinct across different social domains. Data valences give researchers a tool for examining the discourses around, practices with, and challenges for data as they are mediated across social settings.”
other | 07.09.14
As data moves between actors and organizations, what emerges is a data supply chain. Unlike manufacturing supply chains, transferred data is often duplicated in the process, challenging the essence of ownership. What does ethical data labor look like? How are the various stakeholders held accountable for being good data guardians? What does clean data transfer look like? What kinds of best practices can business and government put into place? What upstream rights to data providers have over downstream commercialization of their data?
This document is a workshop primer from The Social, Cultural & Ethical Dimensions of “Big Data”.
Just because data can be made more accessible to broader audiences does not mean that those people are equipped to interpret what they see. Limited topical knowledge, statistical skills, and contextual awareness can prompt people to read inferences into, be afraid of, and otherwise misinterpret the data they are given. As more data is made more available, what other structures and procedures need to be in place to help people interpret what’s available?
This document is a workshop primer from The Social, Cultural & Ethical Dimensions of “Big Data”.