Criminal Justice and Behavior | 11.23.18
Data & Society Fellow Cynthia Conti-Cook and co-authors assess the bias involved in risk assessment tools.
“In the top layer, we identify challenges to fairness within the risk-assessment models themselves. We explain types of statistical fairness and the tradeoffs between them. The second layer covers biases embedded in data. Using data from a racially biased criminal justice system can lead to unmeasurable biases in both risk scores and outcome measures. The final layer engages conceptual problems with risk models: Is it fair to make criminal justice decisions about individuals based on groups?”
Washington Journal of Law, Technology & Arts | 06.07.18
Data & Society Data & Human Rights Research Lead Mark Latonero and Zachary Gold look at the history of web crawlers usage and legal issues surrounding that usage.
“This paper discusses the history of web crawlers in courts as well as the uses of such programs by a wide array of actors. It addresses ethical and legal issues surrounding the crawling and scraping of data posted online for uses not intended by the original poster or by the website on which the information is hosted. The article further suggests that stronger rules are necessary to protect the users’ initial expectations about how their data would be used, as well as their privacy.”
New Media & Society | 05.15.18
Data & Society Postdoctoral Scholar Julia Ticona and Research Analyst Alexandra Mateescu investigate the consequences of “visibility” in carework apps.
“Based on a discourse analysis of carework platforms and interviews with workers using them, we illustrate that these platforms seek to formalize employment relationships through technologies that increase visibility. We argue that carework platforms are “cultural entrepreneurs” that create and maintain cultural distinctions between populations of workers, and institutionalize those distinctions into platform features. Ultimately, the visibility created by platforms does not realize the formalization of employment relationships, but does serve the interests of platform companies and clients and exacerbate existing inequalities for workers.”
Journal of Computer-Mediated Communication | 04.06.18
Social Media + Society | 03.20.18
Data & Human Rights Research Lead Mark Latonero investigates the impact of digitally networked technologies on the safe passage of refugees and migrants.
“…in making their way to safe spaces, refugees rely not only on a physical but increasingly also digital infrastructure of movement. Social media, mobile devices, and similar digitally networked technologies comprise this infrastructure of ‘digital passages’—sociotechnical spaces of flows in which refugees, smugglers, governments, and corporations interact with each other and with new technologies.”
Data & Society Postdoctoral Scholar Andrew Selbst argues for regulations in big data policing.
“The way police are adopting and using these technologies means more people of color are arrested, jailed, or physically harmed by police, while the needs of communities being policed are ignored.”
Big Data & Society | 02.14.18
How do algorithms & data-driven tech induce similarity across an industry? Data & Society Researcher Robyn Caplan and Founder & President danah boyd trace Facebook’s impact on news media organizations and journalists.
“This type of analysis sheds light on how organizational contexts are embedded into algorithms, which can then become embedded within other organizational and individual practices. By investigating technical practices as organizational and bureaucratic, discussions about accountability and decision-making can be reframed.”
Social Media + Society | 02.01.18
Data & Society Media Manipulation Lead Joan Donovan investigates the development of InterOccupy, a virtual organization operated by participants in the Occupy Movement.
“InterOccupy took infrastructure building as a political strategy to ensure the movement endured beyond the police raids on the encampments. I conclude that NSMs create virtual organizations when there are routine and insurmountable failures in the communication milieu, where the future of the movement is at stake. My research follows the Occupy Movement ethnographically to understand what happens after the keyword.”
American Association of Pediatrics Journal | 11.03.17
D&S researcher Monica Bulger co-authored an article on the way children are engaging with technology nationally.
“Beyond revealing pressing and sizeable gaps in knowledge, this cross-national review also reveals the importance of understanding local values and practices regarding the use of technologies. This leads us to stress that future researchers must take into account local contexts and existing inequalities and must share best practices internationally so that children can navigate the balance between risks and opportunities.”
D&S founder and President danah boyd & affiliate Solon Barocas investigate the practice of ethics in data science.
“Critical commentary on data science has converged on a worrisome idea: that data scientists do not recognize their power and, thus, wield it carelessly. These criticisms channel legitimate concerns about data science into doubts about the ethical awareness of its practitioners. For these critics, carelessness and indifference explains much of the problem—to which only they can offer a solution.”
Sage Journals | 05.30.17
First Monday | 05.01.17
Philip Napoli and D&S researcher Robyn Caplan write on why companies like Google and Facebook insist that they are merely tech companies with no media impact, and why they are wrong for First Monday. Abstract is below:
A common position amongst social media platforms and online content aggregators is their resistance to being characterized as media companies. Rather, companies such as Google, Facebook, and Twitter have regularly insisted that they should be thought of purely as technology companies. This paper critiques the position that these platforms are technology companies rather than media companies, explores the underlying rationales, and considers the political, legal, and policy implications associated with accepting or rejecting this position. As this paper illustrates, this is no mere semantic distinction, given the history of the precise classification of communications technologies and services having profound ramifications for how these technologies and services are considered by policy-makers and the courts.
PLOS Computational Biology | 03.30.17
Matthew Zook, D&S affiliate Solon Barocas, D&S founder danah boyd, D&S affiliate Kate Crawford, Emily Keller, D&S affiliate Seeta Peña Gangadharan, Alyssa Goldman, Rachelle Hollander, Barbara A. Koenig, D&S researcher Jacob Metcalf, Arvind Narayanan, D&S advisor Alondra Nelson, and Frank Pasquale wrote a paper detailing ten rules for responsible big data research. Introduction is below:
The use of big data research methods has grown tremendously over the past five years in both academia and industry. As the size and complexity of available datasets has grown, so too have the ethical questions raised by big data research. These questions become increasingly urgent as data and research agendas move well beyond those typical of the computational and natural sciences, to more directly address sensitive aspects of human behavior, interaction, and health. The tools of big data research are increasingly woven into our daily lives, including mining digital medical records for scientific and economic insights, mapping relationships via social media, capturing individuals’ speech and action via sensors, tracking movement across space, shaping police and security policy via “predictive policing,” and much more.
The beneficial possibilities for big data in science and industry are tempered by new challenges facing researchers that often lie outside their training and comfort zone. Social scientists now grapple with data structures and cloud computing, while computer scientists must contend with human subject protocols and institutional review boards (IRBs). While the connection between individual datum and actual human beings can appear quite abstract, the scope, scale, and complexity of many forms of big data creates a rich ecosystem in which human participants and their communities are deeply embedded and susceptible to harm. This complexity challenges any normative set of rules and makes devising universal guidelines difficult.
Nevertheless, the need for direction in responsible big data research is evident, and this article provides a set of “ten simple rules” for addressing the complex ethical issues that will inevitably arise. Modeled on PLOS Computational Biology’s ongoing collection of rules, the recommendations we outline involve more nuance than the words “simple” and “rules” suggest. This nuance is inevitably tied to our paper’s starting premise: all big data research on social, medical, psychological, and economic phenomena engages with human subjects, and researchers have the ethical responsibility to minimize potential harm.
The variety in data sources, research topics, and methodological approaches in big data belies a one-size-fits-all checklist; as a result, these rules are less specific than some might hope. Rather, we exhort researchers to recognize the human participants and complex systems contained within their data and make grappling with ethical questions part of their standard workflow. Towards this end, we structure the first five rules around how to reduce the chance of harm resulting from big data research practices; the second five rules focus on ways researchers can contribute to building best practices that fit their disciplinary and methodological approaches. At the core of these rules, we challenge big data researchers who consider their data disentangled from the ability to harm to reexamine their assumptions. The examples in this paper show how often even seemingly innocuous and anonymized data have produced unanticipated ethical questions and detrimental impacts.
This paper is a result of a two-year National Science Foundation (NSF)-funded project that established the Council for Big Data, Ethics, and Society, a group of 20 scholars from a wide range of social, natural, and computational sciences (http://bdes.datasociety.net/). The Council was charged with providing guidance to the NSF on how to best encourage ethical practices in scientific and engineering research, utilizing big data research methods and infrastructures.
New Media & Society | 01.16.17
D&S researcher Monica Bulger, with Patrick Burton, Brian O’Neill, and Elisabeth Staksrud, writes “Where policy and practice collide: Comparing United States, South African and European Union approaches to protecting children online”.
That children have a right to protection when they go online is an internationally well-established principle, upheld in laws that seek to safeguard children from online abuse and exploitation. However, children’s own transgressive behaviour can test the boundaries of this protection regime, creating new dilemmas for lawmakers the world over. This article examines the policy response from both the Global North and South to young people’s online behaviour that may challenge adult conceptions of what is acceptable, within existing legal and policy frameworks. It asks whether the ‘childhood innocence’ implied in much protection discourse is a helpful basis for promoting children’s rights in the digital age. Based on a comparative analysis of the emerging policy trends in Europe, South Africa and the United States, the article assesses the implications for policymakers and child welfare specialists as they attempt to redraw the balance between children’s online safety while supporting their agency as digital citizens.
Sage Journals | 12.13.16
D&S affiliate Kate Crawford co-wrote, with Mike Ananny, this piece discussing transparency and algorithmic accountability.
Models for understanding and holding systems accountable have long rested upon ideals and logics of transparency. Being able to see a system is sometimes equated with being able to know how it works and govern it—a pattern that recurs in recent work about transparency and computational systems. But can “black boxes’ ever be opened, and if so, would that ever be sufficient? In this article, we critically interrogate the ideal of transparency, trace some of its roots in scientific and sociotechnical epistemological cultures, and present 10 limitations to its application. We specifically focus on the inadequacy of transparency for understanding and governing algorithmic systems and sketch an alternative typology of algorithmic accountability grounded in constructive engagements with the limitations of transparency ideals.
Philosophical Transactions of the Royal Society A | 11.14.16
D&S advisor Deirdre Mulligan, with co-authors Colin Koopman and Nick Doty, released “Privacy is an essentially contested concept: a multi-dimensional analytic for mapping privacy”.
The meaning of privacy has been much disputed throughout its history in response to wave after wave of new technological capabilities and social configurations. The current round of disputes over privacy fuelled by data science has been a cause of despair for many commentators and a death knell for privacy itself for others. We argue that privacy’s disputes are neither an accidental feature of the concept nor a lamentable condition of its applicability. Privacy is essentially contested. Because it is, privacy is transformable according to changing technological and social conditions. To make productive use of privacy’s essential contestability, we argue for a new approach to privacy research and practical design, focused on the development of conceptual analytics that facilitate dissecting privacy’s multiple uses across multiple contexts.
D&S affiliate Kate Crawford, with Ryan Calo, wrote this piece discussing risks in AI.
Artificial intelligence presents a cultural shift as much as a technical one. This is similar to technological inflection points of the past, such as the introduction of the printing press or the railways. Autonomous systems are changing workplaces, streets and schools. We need to ensure that those changes are beneficial, before they are built further into the infrastructure of everyday life.
paper | 10.24.16
Low-income communities have historically been subject to a wide range of governmental monitoring and related privacy intrusions in daily life. The privacy harms poor communities and their residents suffer as a result of this pervasive surveillance are especially acute when considering the economic and social consequences they experience, and the low likelihood that they will be able to bear the costs associated with remedying those harms. In the “big data” era, there are growing concerns that low-status internet users may be further differentially impacted by certain forms of internet-enabled data collection, surveillance, and marketing. They may be both unfairly excluded from opportunities and unfairly targeted based on determinations made by predictive analytics and scoring systems—growing numbers of which rely on some form of social media input. These new kinds of “networked privacy” harms, in which users are simultaneously held liable for their own behavior and the actions of those in their networks, could have particularly negative impacts on the poor.
In addition to the harms created by targeting (e.g., predatory marketing) or exclusion from opportunity, the poor may face magnified privacy vulnerabilities as a result of community-specific patterns around technology use, and knowledge gaps about privacy- and security-protective tools. Legal scholars have identified a broad group of consumers as “privacy vulnerable” when they “misunderstand the scope of data collection and falsely believe that relevant privacy rights are enshrined in privacy policies and guaranteed by law.” These misconceptions are common across all socioeconomic categories, but this article suggests that these conditions may be exacerbated by poor communities’ higher reliance on mobile connectivity and lower likelihood to take various privacy-protective measures online. When low-income adults rely on devices and apps that make them more vulnerable to surveillance, and they wittingly or unwittingly do not restrict access to the content they post online, they may be further exposed to forms of commercial data collection that can affect the way they are assessed in various employment, education and law enforcement contexts.
Part I of this article provides a historical overview of the ways in which the poor have been subject to uniquely far-reaching surveillance across many aspects of life, and how their experiences of harm may be impacted by evolving practices in big-data-driven decision making. In using the term “poor” to signify a condition of economic deprivation, this article recognizes that low-income people in America are a diverse and multifaceted group and that each person has his or her own individualized narrative. Despite this diversity, this article highlights a shared reality for many poor people, which is heightened vulnerability to on-line surveillance and associated adverse outcomes. Part II presents new empirical findings from a nationally representative survey to highlight various technology-related behaviors and concerns that suggest low-status internet users may be especially vulnerable to surveillance and networked privacy-related harms. In Part III, we show why and how this matters through a legal examination of several timely case studies that demonstrate how on-line activity, and the emerging use of social media data in particular, might have detrimental impacts on the poor when used in high-stakes decision-making systems. This Part explains why current legal frameworks fail to shield the poor from negative outcomes. Finally, in Part IV, we assess major proposals for protecting on-line, personal data through the lens of class vulnerability. In other words, we evaluate how these proposals might impact poor people. We agree with other scholars that additional technical and non-technical reforms are needed to address the risks associated with the use of social media data. As policymakers consider reforms, we urge greater attention to impacts on low-income persons and communities.
Journal of Law, Medicine and Ethics | 09.12.16
D&S affiliates Ifeoma Ajunwa and Kate Crawford, with Joel Ford, co-wrote this piece discussing how big data is used in wellness programs instituted by large corporations and how that can impact workers’ privacy and can spark employment discrimination.
IEEE Annals of the History of Computing | 09.01.16
D&S post-doctoral scholar Caroline Jack analyzes the history of how businesses donated personal computers to classrooms as a way to engage students.
In late 1982, the corporate-funded business education nonprofit Junior Achievement (JA) distributed 121 donated personal computers to classrooms across the United States as part of its new high school course, Applied Economics. Studying JA’s use of computers in Applied Economics reveals how a corporate-sponsored nonprofit group used personal computers to engage students, adapt its traditional outreach methods to the classroom, and bolster an appreciation of private enterprise in American economic life. Mapping the history of how business advocacy and education groups came to adopt software as a means of representing work and commerce offers a new perspective on how systems of cultural meaning have been attached to, and expressed through, computers and computing.
The mythology surrounding “big data” rests on the notion that technical systems can increase efficiency and decrease bias. Such “neutral” systems are supposedly good for implementing legal logic because, like these systems, law relies on binaries in decision-making, removing the gray and fuzzy from the equation. The problem with this formulation is that efficiency is not necessarily desirable, bias is baked into the data sets and reified technically as well as through interpretation, and legal binaries are neither socially productive nor logically sound.
D&S founder danah boyd responds to Margaret Hu’s work in Big Data Blacklisting with supportive arguments that further Hu’s assertions. boyd discusses how procedure and efficiency make algorithmic decision-making so attractive to policymakers and bureaucrats yet flawed systems in place do not make data neutral and in fact ‘blacklists purposefully distance decision-makers from the humanity of those who are being labeled’.
International Journal of Communication | 07.31.16
D&S researcher Alex Rosenblat and Luke Stark published a case study of Uber drivers and highlight the information and power asymmetries produced by the Uber application. Abstract is below.
Uber manages a large, disaggregated workforce through its ridehail platform, one that delivers a relatively standardized experience to passengers while simultaneously promoting its drivers as entrepreneurs whose work is characterized by freedom, flexibility, and independence. Through a nine-month empirical study of Uber driver experiences, we found that Uber does leverage significant indirect control over how drivers do their jobs. Our conclusions are twofold: First, the information and power asymmetries produced by the Uber application are fundamental to its ability to structure control over its workers; second, the rhetorical invocations of digital technology and algorithms are used to structure asymmetric corporate relationships to labor, which favor the former. Our study of the Uber driver experience points to the need for greater attention to the role of platform disintermediation in shaping power relations and communications between employers and workers.
ACM.org | 07.15.16
D&S researcher Jacob Metcalf writes “Big Data Analytics and Revision of the Common Rule”. Abstract is below:
“Big data” is a major technical advance in terms of computing expense, speed, and capacity. But it is also an epistemic shift wherein data is seen as infinitely networkable, indefinitely reusable, and significantly divorced from the context of collection.1,7 The statutory definitions of “human subjects” and “research” are not easily applicable to big data research involving sensitive human data. Many of the familiar norms and regulations of research ethics formulated to prior paradigms of research risks and harms, and thus the formal triggers for ethics review are miscalibrated. We need to reevaluate long-standing assumptions of research ethics in light of the emergence of “big data” analytics.6,10,13
The U.S. Department of Health and Human Services (HHS) released a Notice of Proposed Rule-Making (NPRM) in September 2015 regarding proposed major revisions (the first in three decades) to the research ethics regulations known as the Common Rule.a The proposed changes grapple with the consequences of big data, such as informed consent for bio-banking and universal standards for privacy protection. The Common Rule does not apply to industry research, and some big data science in universities might not fall under its purview, but the Common Rule addresses the burgeoning uses of big data by setting the tone and agenda for research ethics in many spheres.
Taylor & Francis Online | 07.12.16
D&S advisor Tarleton Gillespie writes “Algorithmically recognizable: Santorum’s Google problem, and Google’s Santorum problem”. Abstract is below:
Because information algorithms make judgments that can have powerful consequences, those interested in having their information selected will orient themselves toward these algorithmic systems, making themselves algorithmically recognizable, in the hopes that they will be amplified by them. Examining this interplay, between information intermediaries and those trying to be seen by them, connects the study of algorithmic systems to long-standing concerns about the power of intermediaries – not an algorithmic power, uniquely, but the power to grant visibility and certify meaning, and the challenge of discerning who to grant it to and why. Here, I consider Dan Savage’s attempt to redefine the name of U.S. Senator Rick Santorum, a tactical intervention that topped Google’s search results for nearly a decade, and then mysteriously dropped during the 2012 Republican nominations. Changes made to Google’s algorithm at the time may explain the drop; here, they help to reveal the kind of implicitly political distinctions search engines must invariably make, between genuine patterns of participation and tactical efforts to approximate them.
Communications of the ACM | 07.01.16
Data & Society Researcher Jacob Metcalf reconsiders research ethics in the wake of big data.
“Many of the familiar norms and regulations of research ethics formulated to prior paradigms of research risks and harms, and thus the formal triggers for ethics review are miscalibrated. We need to reevaluate longstanding assumptions of research ethics in light of the emergence of “big data” analytics.”
Science, Technology, & Human Values | 06.01.16
D&S fellow and technical writer Martha Poon responds to Viktor Mayer-Schönberger and Kenneth Cukier’s Big Data: A Revolution That Will Transform How We Live, Work, and Think, challenging their premise that “companies that can situate themselves in the middle of information flows and can collect data will thrive.”
In this Review Essay, which also engages with Dan Bouk’s How Our Days Became Numbered: Risk and the Rise of the Statistical Individual and Liz McFall’s Devising Consumption: Cultural Economies of Insurance, Credit and Spending, Martha takes us on a tour of the history of capitalism and consumption and brings us back to first principles to ask: “What’s old and what’s new about data science? What kinds of outcomes can digital data predict?”
What kind of power is big data?
Data are used to run the system, but data analytics do not describe the process by which the system generates its outcomes. That’s why data science should be categorized as operational control and not as an exercise in the practice of knowledge.
Big Data and Society | 05.14.16
D&S Researcher Jake Metcalf and D&S Affiliate Kate Crawford examine the growing discontinuities between research practices of data science and established tools of research ethics regulation.
Abstract: There are growing discontinuities between the research practices of data science and established tools of research ethics regulation. Some of the core commitments of existing research ethics regulations, such as the distinction between research and practice, cannot be cleanly exported from biomedical research to data science research. These discontinuities have led some data science practitioners and researchers to move toward rejecting ethics regulations outright. These shifts occur at the same time as a proposal for major revisions to the Common Rule — the primary regulation governing human-subjects research in the U.S. — is under consideration for the first time in decades. We contextualize these revisions in long-running complaints about regulation of social science research, and argue data science should be understood as continuous with social sciences in this regard. The proposed regulations are more flexible and scalable to the methods of non-biomedical research, but they problematically exclude many data science methods from human-subjects regulation, particularly uses of public datasets. The ethical frameworks for big data research are highly contested and in flux, and the potential harms of data science research are unpredictable. We examine several contentious cases of research harms in data science, including the 2014 Facebook emotional contagion study and the 2016 use of geographical data techniques to identify the pseudonymous artist Banksy. To address disputes about human-subjects research ethics in data science, critical data studies should offer a historically nuanced theory of “data subjectivity” responsive to the epistemic methods, harms and benefits of data science and commerce.
Surveillance & Society | 08.16.11
Researchers Alexandra Mateescu and Alex Rosenblat published a paper with D&S Founder danah boyd examine police-worn body cameras and their potential to provide avenues for police accountability and foster improved policy-community relations. The authors raise concerns about potential harmful consequences of constant surveillance that has sparked concerns from civil rights groups about how body-worn cameras may violate privacy and exacerbate existing police practices that have historically victimized people of color and vulnerable populations. They consider whether one can demand greater accountability without increased surveillance at the same time and suggest that “the trajectory laid out by body-worn cameras towards greater surveillance is clear, if not fully realized, while the path towards accountability has not yet been adequately defined, let alone forged.”
The intimacy of body-worn cameras’ presence—which potentially enables the recording of even mundane interpersonal interactions with citizens—can be exploited with the application of technologies like facial recognition; this can exacerbate existing practices that have historically victimized people of color and vulnerable populations. Not only do such technologies increase surveillance, but they also conflate the act of surveilling citizens with the mechanisms by which police conduct is evaluated. Although police accountability is the goal, the camera’s view is pointed outward and away from its wearer, and audio recording captures any sounds within range. As a result, it becomes increasingly difficult to ask whether one can demand greater accountability without increased surveillance at the same time.
Crafting better policies on body-worn camera use has been one of the primary avenues for balancing the right of public access with the need to protect against this technology’s invasive aspects. However, no universal policies or norms have been established, even on simple issues such as whether officers should notify citizens that they are being recorded. What is known is that body-worn cameras present definite and identifiable risks to privacy. By contrast, visions of accountability have remained ill-defined, and the role to be played by body-worn cameras cannot be easily separated from the wider institutional and cultural shifts necessary for enacting lasting reforms in policing. Both the privacy risks and the potential for effecting accountability are contingent upon an ongoing process of negotiation, shaped by beliefs and assumptions rather than empirical evidence.
The State Education Standard | 05.09.16
In a piece written for the National Association of State Boards of Education journal, The State Education Standard, D&S affiliate Elana Zeide writes about student data privacy and the need for states to shift towards proactive management of education records in order to address parents’ fears that student data will be abused.
FERPA governs education agencies’ disclosures to outsiders but does not restrict their own practices regarding data collection, use, protection, and retention. This fact, coupled with its limited minimal transparency obligations and reticence about enforcement contribute to stakeholder resistance. It is only natural to fear that which is unknown and over which one has no control. Accordingly, I advise individuals and entities with access to personal student information to go beyond mere compliance toward proactive management of information and privacy. Adopting and articulating public policies can go a long away in reassuring parents that education agencies are thoughtful stewards of student information.
Nature | 05.05.16
D&S fellow Sorelle Friedler’s latest research appears in Nature. She and her team document their creation of a machine-learning algorithm that accurately predicts new ways to make crystals. The team trained the algorithm using data from both successful and “unsuccessful” experiments and trials.
Inorganic–organic hybrid materials such as organically templated metal oxides, metal–organic frameworks (MOFs) and organohalide perovskites have been studied for decades, and hydrothermal and (non-aqueous) solvothermal syntheses have produced thousands of new materials that collectively contain nearly all the metals in the periodic table. Nevertheless, the formation of these compounds is not fully understood, and development of new compounds relies primarily on exploratory syntheses. Simulation- and data-driven approaches (promoted by efforts such as the Materials Genome Initiative provide an alternative to experimental trial-and-error.Three major strategies are: simulation-based predictions of physical properties (for example, charge mobility, photovoltaic properties, gas adsorption capacity or lithium-ion intercalation to identify promising target candidates for synthetic efforts; determination of the structure–property relationship from large bodies of experimental data, enabled by integration with high throughput synthesis and measurement tool; and clustering on the basis of similar crystallographic structure (for example, zeolite structure classification or gas adsorption properties. Here we demonstrate an alternative approach that uses machine-learning algorithms trained on reaction data to predict reaction outcomes for the crystallization of templated vanadium selenites. We used information on ‘dark’ reactions—failed or unsuccessful hydrothermal syntheses—collected from archived laboratory notebooks from our laboratory, and added physicochemical property descriptions to the raw notebook information using cheminformatics techniques. We used the resulting data to train a machine-learning model to predict reaction success. When carrying out hydrothermal synthesis experiments using previously untested, commercially available organic building blocks, our machine-learning model outperformed traditional human strategies, and successfully predicted conditions for new organically templated inorganic product formation with a success rate of 89 per cent. Inverting the machine-learning model reveals new hypotheses regarding the conditions for successful product formation.
Most Chemical reactions that are performed are never reported because they are deemed “unsuccessful”. Normally this is because they do not yield sufficient (or any) product, or do not do so to a required level of purity. Nevertheless, such data are important because they define bounds on the space of successful reactions. Moreover, they are important to the understanding of the physical parameters that govern those chemical reactions. This project seeks to use historical synthesis data to train machine learning models in order to make better hypotheses and predictions about the success of reactions ahead of time.
Big Data & Society | 03.25.16
In this Big Data & Society commentary, Karen Levy and Dave Johns suggest that “legislative efforts that invoke the language of data transparency can sometimes function as ‘Trojan Horses’ through which other political goals are pursued.”
Openness and transparency are becoming hallmarks of responsible data practice in science and governance. Concerns about data falsification, erroneous analysis, and misleading presentation of research results have recently strengthened the call for new procedures that ensure public accountability for data-driven decisions. Though we generally count ourselves in favor of increased transparency in data practice, this Commentary highlights a caveat. We suggest that legislative efforts that invoke the language of data transparency can sometimes function as ‘‘Trojan Horses’’ through which other political goals are pursued. Framing these maneuvers in the language of transparency can be strategic, because approaches that emphasize open access to data carry tremendous appeal, particularly in current political and technological contexts. We illustrate our argument through two examples of pro-transparency policy efforts, one historical and one current: industry-backed ‘‘sound science’’ initiatives in the 1990s, and contemporary legislative efforts to open environmental data to public inspection. Rules that exist mainly to impede science-based policy processes weaponize the concept of data transparency. The discussion illustrates that, much as Big Data itself requires critical assessment, the processes and principles that attend it—like transparency—also carry political valence, and, as such, warrant careful analysis.
D&S Affiliates Ifeoma Ajunwa, Kate Crawford, and Jason Schultz examine the effectiveness of the law as a check on worker surveillance, given recent technological innovations. This law review article focuses on popular trends in worker tracking – productivity apps and worker wellness programs – to argue that current legal constraints are insufficient and may leave American workers at the mercy of 24/7 employer monitoring. They also propose a new comprehensive framework for worker privacy protections that should withstand current and future trends.
From the Pinkerton private detectives of the 1850s, to the closed-circuit cameras and email monitoring of the 1990s, to contemporary apps that quantify the productivity of workers, American employers have increasingly sought to track the activities of their employees. Along with economic and technological limits, the law has always been presumed as a constraint on these surveillance activities. Recently, technological advancements in several fields – data analytics, communications capture, mobile device design, DNA testing, and biometrics – have dramatically expanded capacities for worker surveillance both on and off the job. At the same time, the cost of many forms of surveillance has dropped significantly, while new technologies make the surveillance of workers even more convenient and accessible. This leaves the law as the last meaningful avenue to delineate boundaries for worker surveillance.
Computers and Society | 03.01.16
Abstract: What claims are made about the objectivity of machines versus that of human experts? Whereas most current debates focus on the growing impact of algorithms in the age of Big Data, I argue here in favor of taking a longer historical perspective on these developments. Drawing on Daston and Galison’s analysis of scientific production since the eighteenth century, I show that their distinction among three forms of objectivity (“truth-to-nature,” “mechanical objectivity,” and “trained judgment”) sheds light on existing discussions about algorithmic objectivity and accountability in expert fields.
Feminist Media Studies | 02.25.16
In this commentary, D&S fellow Karen Levy’s considers the gendered dimensions of shifting cultures of work in response to the growing demands of the technologized/mediated workplace. She also explores the impact of new digital surveillance technologies on constructions of masculinity in the male-dominated US long-haul trucking industry.
New workplace technologies are often met with resistance from workers, particularly to the degree that they challenge traditional workplace norms and practices. These conflicts may be all the more acute when a work culture is deeply and historically gendered. In this Commentary, I draw from one such context—long-haul trucking to consider the role a hypermasculine work culture plays in the reception of new digital monitoring technologies.
I base my analysis on ethnographic study of the United States long-haul trucking industry between 2011 and 2014. My research focused on the use of digital fleet management systems to achieve legal and organizational compliance. The research was multi-sited, taking me to eleven states in total, and to many sites of trucking-related work, including large and small firms, trucking conventions, regulatory meetings, inspection stations, and truck stops. Throughout the work, I spoke with and observed a wide variety of industry participants— truckers themselves, of course, but also fleet managers, technology vendors, trucking historians, insurance agents, lawyers, police officers, and many others.
Social Science Computer Review | 12.13.15
Abstract: Trauma-based interventions are common in mental health practice, and yet there is a gap in services because social media has created new ways of managing trauma. Practitioners identify treatments for traumatic experiences and are trained to implement evidence-based practices, but there is limited research that uses social media as a data source. We use a case study to explore over 400 Twitter communications of a gang member in Chicago’s Southside, Gakirah Barnes, who mourned the death of her friend on Twitter. We further explore how, following her own death, members of her Twitter network mourn her. We describe expressions of trauma that are difficult to uncover in traditional trauma-based services. We discuss practice and research implications regarding using Twitter to address trauma among gang-involved youth.
Sur le journalisme | 11.23.15
What kinds of articles are read most often on news websites? In web newsrooms, journalists now have access to software programs that allow them to track the preferences of their readers in real time. Based on this data, web journalists often claim that “sex, scandals, and celebrities” are the best recipe for attracting readers and “clicks.” Conversely, journalists assert that articles on world news, politics, and culture fare poorly online. Journalists also strongly distinguish between groups of readers, depending on whether they “click,” “like,” “tweet,” or write comments. This paper compares qualitative material and quantitative data to explore whether journalists’ representations of their “quantified audience” diverge from the actual behavior of their readers. Drawing on an original data set composed of 13,159 articles published between 2007 and 2012 on a French news site, I find that articles about sex, scandals, and celebrities indeed attract more readers than articles about world news or culture. Yet articles about politics, long articles, and user-generated contributions are also highly popular. In addition, the preferences of readers vary depending on whether one measures this by the number of “clicks,” “likes,” “tweets,” or comments on the articles. Though regression models do not reveal significant differences, a more qualitative approach indicates that the most popular articles on Facebook and Twitter are humorous pieces and user-generated contributions, whereas controversial political topics attract more comments. I propose several explanations in order to make sense of this gap between the journalists’ representations and the behavior of online readers. These involve the editorial line and audience of the French website under consideration, the changing political context and media coverage of politics in France, and the specific temporal structure of internet traffic for longer articles.
new media & society | 11.09.15
Abstract: Increasing broadband adoption among members of underserved populations remains a high priority among policymakers, advocates, corporations, and affected communities. But questions about the risks entailed in the flow of personal information are beginning to surface and shine light on the tension between broadband’s benefits and harms. This article examines broadband adoption programs at community-based and public institutions in the United States in order to understand the ways in which privacy and surveillance issues emerge and are engaged in these settings. While adults who enroll in introductory digital literacy classes and access the Internet at public terminals feel optimistic about broadband “opportunities,” they encounter “privacy-poor, surveillance-rich” broadband. Users experience myriad anxieties, while having few meaningful options to meet their concerns.
Abstract: The user has become central to the way technology is conceptualized, designed, and studied in sociotechnical research and human-computer interaction; recently, non-users have also become productive foci of scholarly analysis. This paper argues that a focus on individualized users and non-users is incomplete, and conflates multiple modes of complex relation among people, institutions, and technologies. Rather than the use/non-use conception, I argue for conceptualizing users as networks: as constellations of power relations and institutional entanglements, mediated through technologies. To illustrate, I offer a case study of Nexafed, a tamperproof formulation of pseudoephedrine. The market for Nexafed seems nonexistent in traditional use/non-use terms, but when we construe the user more broadly — as a network of interpersonal, legal, and institutional relationships, consisting of multiple modes of relation between people and technology — not only does the drug’s market make sense, but we also understand how new motivations (social shame, mistrust, robbery, gossip) can act as salient drivers of technology use. The Nexafed case illustrates the utility of a networked perspective to develop more nuanced theoretical understandings of use and non-use in sociotechnical relations, beyond the direct human-technology interface.
social media + society | 09.30.15
Abstract: In this essay, we reconstruct a keyword for communication—affordance. Affordance, adopted from ecological psychology, is now widely used in technology studies, yet the term lacks a clear definition. This is especially problematic for scholars grappling with how to theorize the relationship between technology and sociality for complex socio-technical systems such as machine-learning algorithms, pervasive computing, the Internet of Things, and other such “smart” innovations. Within technology studies, emerging theories of materiality, affect, and mediation all necessitate a richer and more nuanced definition for affordance than the field currently uses. To solve this, we develop the concept of imagined affordance. Imagined affordances emerge between users’ perceptions, attitudes, and expectations; between the materiality and functionality of technologies; and between the intentions and perceptions of designers. We use imagined affordance to evoke the importance of imagination in affordances—expectations for technology that are not fully realized in conscious, rational knowledge. We also use imagined affordance to distinguish our process-oriented, socio-technical definition of affordance from the “imagined” consensus of the field around a flimsier use of the term. We also use it in order to better capture the importance of mediation, materiality, and affect. We suggest that imagined affordance helps to theorize the duality of materiality and communication technology: namely, that people shape their media environments, perceive them, and have agency within them because of imagined affordances.
Journal of Technology Science | 09.01.15
For decades, The Princeton Review has prepared students for a battery of standardized tests for a price. In some cases, that price varies by ZIP code (or United States postal codes). The Princeton Review’s website requests users enter their ZIP code before receiving a price for the individualized tutoring service. We at ProPublica analyzed the price variations for an online SAT tutoring service offered by The Princeton Review. The Princeton Review told ProPublica that the regional pricing differences for its “online tutoring package” were based on the “differential costs of running our business and the competitive attributes of the given market” and that any “differences in impact” were “incidental.”
Results summary: We collected the price for The Princeton Review’s “24-hr Online Tutoring,” packages from each U.S. ZIP code and found that the prices varied by as much as $1,800. We compared the price in each ZIP code to the demographics and income of the ZIP code. Our analysis showed that Asians were disproportionately represented in ZIP codes that were quoted a higher price. As a result, Asians were 1.8 times as likely to be quoted a higher price than non-Asians. Our analysis also showed an increased likelihood of being quoted a higher price for ZIP codes with high median incomes.
Interface | 08.04.15
D&S advisor Janet Vertesi essays her presentation from Theorizing the Web 2014:
The past five years have seen the rise and expansion of a new infrastructural layer to the Internet. Inspired by the unprecedented success of Google Ad-words and Facebook’s “social” data collection regime, this middle layer has expanded to include a plethora of bots, cookies, trackers, canvases, and other data sniffers intent on recording user clicks, likes, and purchases. Contemporary consumers do not even need to buy: browsing, searching, or clicking on a quiz can all be indicators that a user is looking for new shoes or a winter coat, triggering targeted ads. In the world of contemporary Internet companies, personal data reigns supreme.
This is not only an online phenomenon. Offline, too, loyalty card programs and credit card purchase data translate into targeted mail catalogs and direct-to-consumer marketing programs. Across social media companies and third-party data brokers, certain consumers are designated high-value targets. For example, according to a Financial Times report, information pertaining to prospective new mothers, who are likely to be making new and lasting brand choices, can sell for much higher prices than everyday users. Attempts to coordinate online and offline datasets to better identify such high-value targets continue apace, leading personal data gleaned through commercial services to exchange hands among brokers for high prices.
I have spent the past three years attempting to evade various aspects of this middle-layer of the Internet; first, through studious avoidance of Google-related products and services, and most recently through concealing the impending arrival of a family member from data detection. I undertook this latter practice as an experiment in infrastructural inversion: an attempt to make visible the embedded nature of this tracking infrastructure in daily life, as well as the values and assumptions that such technologies make about everyday users. While I have described the experiments elsewhere, resisting these data dragnets presented several key findings relevant to the study of Internet infrastructures and behaviors online that are important for our research community to address.
GeoJournal | 08.01.15
Journal of Children and Media | 07.30.15
In this Journal of Children and Media article, D&S researcher Monica Bulger seeks to address the dynamics of information evaluation among youth, investigating the attitudes of youth (aged 11–18) toward information credibility, their information evaluation practices, and the effects of developmental and demographic differences and information literacy training on young people’s information evaluation skills.
Abstract: Young people are increasingly turning to the Internet more than to traditional media and information sources to find information. Yet, research demonstrates suboptimal online information literacy among youth today, suggesting potential shortcomings in young people’s information consumption behaviors. To assess this, this study investigates several predictors of young people’s success in online information evaluation, including their awareness of credibility problems associated with digital information, their use of specific information evaluation practices, and their accuracy in credibility assessment. Results from a study of 2,747 11–18-year-old Internet users indicate both expected and surprising influences of young people’s cognitive development, decision-making style, demographic background, and digital information literacy training on their information evaluation awareness, skills, and practices. Theoretical implications and those for redesigning online information literacy interventions are discussed.
Information, Communication & Society | 07.27.15
D&S researcher Monica Bulger, Jonathan Bright, and Cristóbal Cobo investigate why some Massive Open Online Course (MOOC) users organize face-to-face meetings, exploring the hypothesis that meetup invites would reflect students’ learning goals. They also investigate the extent to which these patterns vary between industrialised and developing countries.
Abstract: Massive open online courses (MOOCs) offer the possibility of entirely virtual learning environments, with lectures, discussions, and assignments all distributed via the internet. The virtual nature of MOOCs presents considerable advantages to students in terms of flexibility to learn what they want, when they want. Yet despite their virtual focus, some MOOC users also seek to create face-to-face communities with students taking similar courses or using similar platforms. This paper aims to assess the learner motivations behind creation of these offline communities. Do these face-to-face meetings represent an added extra to the learning experience, with students taking advantage of the context of the MOOCs to create new personal and professional connections? Or, are offline meetups filling a gap for students who feel that not all learning can take place online? We also assess the extent to which these patterns vary between developing and industrialised regions, thus testing the claim that MOOCs are helping to democratise access to education around the world. Our research is based on a unique source of socially generated big data, drawn from the website ‘meetup.com’, which gives us a data set of over 4000 MOOC related events taking place in over 140 countries around the world over a two year period. We apply a mixed methods approach to this data, combining large-scale analysis with more in-depth thematic hand coding, to more fully explore the reasons why some learners add a ‘real’ component to their virtual learning experience.
European Journal of Cultural Studies | 06.17.15
Abstract: “The recent proliferation of wearable self-tracking devices intended to regulate and measure the body has brought contingent questions of controlling, accessing and interpreting personal data. Given a socio-technical context in which individuals are no longer the most authoritative source on data about themselves, wearable self-tracking technologies reflect the simultaneous commodification and knowledge-making that occurs between data and bodies. In this article, we look specifically at wearable, self-tracking devices in order to set up an analytical comparison with a key historical predecessor, the weight scale. By taking two distinct cases of self-tracking – wearables and the weight scale – we can situate current discourses of big data within a historical framing of self-measurement and human subjectivity. While the advertising promises of both the weight scale and the wearable device emphasize self-knowledge and control through external measurement, the use of wearable data by multiple agents and institutions results in a lack of control over data by the user. In the production of self-knowledge, the wearable device is also making the user known to others, in a range of ways that can be both skewed and inaccurate. We look at the tensions surrounding these devices for questions of agency, practices of the body, and the use of wearable data by courtrooms and data science to enforce particular kinds of social and individual discipline.”
PLOS ONE | 05.20.15
International Journal of Communication | 05.18.15
“Communication technologies increasingly mediate data exchanges rather than human communication. We propose the term data valences to describe the differences in expectations that people have for data across different social settings. Building on two years of interviews, observations, and participation in the communities of technology designers, clinicians, advocates, and users for emerging mobile data in formal health care and consumer wellness, we observed the tensions among these groups in their varying expectations for data. This article identifies six data valences (self-evidence, actionability, connection, transparency, “truthiness,” and discovery) and demonstrates how they are mediated and how they are distinct across different social domains. Data valences give researchers a tool for examining the discourses around, practices with, and challenges for data as they are mediated across social settings.”
“Social media platforms don’t just guide, distort, and facilitate social activity, they also delete some of it. They don’t just link users together, they also suspend them. They don’t just circulate our images and posts, they also algorithmically promote some over others. Platforms pick and choose.”
Theory, Culture & Society | 05.04.15
Abstract: “Since the very origins of urban planning in the late nineteenth century, the field has aspired to establish a firm scientific footing for the nature of cities, their cycles of growth and decline, and ways that we can better plan and predict the outcomes of interventions through design and policy. While these efforts have long been stymied by a lack of sufficiently detailed data and limited computing power, these obstacles are rapidly being overcome. In response, a growing number of city governments and a new cadre of academic research centers are investing in data-intensive analysis and simulation of cities. While this campaign shares many similarities with computer-based efforts to study cities and inform urban policy during the United States’ urban crisis in the 1960s, the rise of the Internet presents new opportunities to involve citizens more actively, on a larger scale, and in more empowered roles than in the past. This development offers an opportunity to develop more transparent, ethical, and effective models for collaborative urban research involving universities, local government, and citizen science networks.”
Public Understanding of Science | 04.01.15
Abstract: When economists ask questions about basic financial principles, most ordinary people answer incorrectly. Economic experts call this condition “financial illiteracy,” which suggests that poor financial outcomes are due to a personal deficit of reading-related skills. The analogy to reading is compelling because it suggests that we can teach our way out of population-wide financial failure. In this comment, we explain why the idea of literacy appeals to policy makers in the advanced industrial nations. But we also show that the narrow skill set laid out by economists does not satisfy the politically inclusive definition of literacy that literacy studies fought for. We identify several channels through which people engage with ideas about finance and demonstrate that not all forms of literacy will lead people to the educational content prescribed by academic economists. We argue that truly financial literate people can defy the demands of financial theory and financial institutions.
The Information Society | 03.19.15
“This article examines the implications of electronic monitoring systems for organizational information flows and worker control, in the context of the U.S. trucking industry. Truckers, a spatially dispersed group of workers with a traditionally independent culture and a high degree of autonomy, are increasingly subjected to performance monitoring via fleet management systems that record and transmit fine-grained data about their location and behaviors. These systems redistribute operational information within firms by accruing real-time aggregated data in a remote company dispatcher. This redistribution results in a seemingly incongruous set of effects. First, abstracted and aggregated data streams allow dispatchers to quantitatively evaluate truckers’ job performance across new metrics, and to challenge truckers’ accounts of local and biophysical conditions. Second, even as these data are abstracted, information about truckers’ activities is simultaneously resocialized via its strategic deployment into truckers’ social relationships with their coworkers and families. These disparate dynamics operate together to facilitate firms’ control over truckers’ daily work practices in a manner that was not previously possible. The trucking case reveals multifaceted pathways to the entrenchment of organizational control via electronic monitoring.”
magazine article | 08.14.11
Today, press ethics are intertwined with platform design ethics, and press freedom is shared with software designers. The people at Facebook, Twitter, Flipboard, Pulse and elsewhere have a new and significant role in how news circulates and what we see on our screens. We’re only just beginning to understand how these companies’ algorithms work…
D&S affiliate Kate Crawford and Mike Ananny conducted a study on designers who are creating news apps and their perceptions about the work they do in the space between software design and news circulation. In this article and through this study, they reveal four themes that emerged in understanding the relationship between the work of designers and journalism- getting at some of the reasons why certain actions taken by platforms can result in controversies that highlight the need for a new ethics of press responsibility.
Within some public policy and scholarly accounts, human trafficking is increasingly understood as a technological problem that invites collaborative anti-trafficking solutions. A growing cohort of state, non-governmental, and corporate actors in the United States have come together around the shared contention that technology functions as both a facilitator and disrupting force of trafficking, specifically sex trafficking. Despite increased attention to the trafficking-technology nexus, scant research to date has critically unpacked these shifts nor mapped how technology reconfigures anti-trafficking collaborations. In this article, we propose that widespread anxieties and overzealous optimism about technology’s role in facilitating and disrupting trafficking have simultaneously promoted a tri-part anti-trafficking response, one animated by a law and order agenda, operationalized through augmented internet, mobile, and networked surveillance, and maintained through the integration of technology experts and advocates into organized anti-trafficking efforts. We suggest that an examination of technology has purchase for students of gender, sexuality, and neoliberal governmentality in its creation of new methods of surveillance, exclusion, and expertise.
New Media & Society | 07.21.14
Information, Communication & Society | 07.08.14
Abstract: Intense media and policy focus on issues of online child protection have prompted a resurgence of moral panics about children and adolescents’ Internet use, with frequent confounding of different types of risk and harm and little reference to empirical evidence of actual harm. Meanwhile, within the academic literature, the quantity and quality of studies detailing the risks and opportunities of online activity for children and young people has risen substantially in the past 10 years, but this is also largely focused on risk rather than evidence of harm. Whilst this is understandable given the methodological and ethical challenges of studying Internet-related harms to minors, the very concept of risk is dependent on some prior understanding of harm, meaning that without efforts to study what harms are connected with children’s online experiences, discussions of risk lack a strong foundation. This article makes a key contribution to the field by reviewing available evidence about the scale and scope of online harms from across a range of disciplines and identifying key obstacles in this research area as well as the major policy implications. The findings are based on a review of 148 empirical studies. Results were found in relation to main types of harms: health-related harms as a result of using pro-eating disorder, self-harm or pro-suicide websites; sex-related harms such as Internet-initiated sexual abuse of minors and cyber-bullying.
Boston College Law Review | 01.29.14
Abstract: The rise of “Big Data” analytics in the private sector poses new challenges for privacy advocates. Through its reliance on existing data and predictive analysis to create detailed individual profiles, Big Data has exploded the scope of personally identifiable information (“PII”). It has also effectively marginalized regulatory schema by evading current privacy protections with its novel methodology. Furthermore, poor execution of Big Data methodology may create additional harms by rendering inaccurate profiles that nonetheless impact an individual’s life and livelihood. To respond to Big Data’s evolving practices, this Article examines several existing privacy regimes and explains why these approaches inadequately address current Big Data challenges. This Article then proposes a new approach to mitigating predictive privacy harms—that of a right to procedural data due process. Although current privacy regimes offer limited nominal due process-like mechanisms, a more rigorous framework is needed to address their shortcomings. By examining due process’s role in the Anglo-American legal system and building on previous scholarship about due process for public administrative computer systems, this Article argues that individuals affected by Big Data should have similar rights to those in the legal system with respect to how their personal data is used in such adjudications. Using these principles, this Article analogizes a system of regulation that would provide such rights against private Big Data actors.
Stanford Law Review | 09.03.13
In the Stanford Law Review symposium issue on privacy and big data (September 2013), Cynthia Dwork and Data & Society advisor Deirdre Mulligan argue that “privacy controls and increased transparency fail to address concerns with the classifications and segmentation produced by big data analysis.” “If privacy and transparency are not the panacea to the risks posed by big data,” they ask, “what is?” They offer a quartet of approaches/areas of focus.