filtered by: big data

3sat TV interviewed Data & Society Founder and President danah boyd at re:publica 2018 about gaining control of our data privacy. The video is in German.


points | 04.25.18

Proceed With Caution

Kadija Ferryman, Elaine O. Nsoesie

After the Cambridge Analytica scandal, can internet data be used ethically for research? Data & Society Postdoctoral Scholar Kadija Ferryman and Elaine O. Nsoesie, PhD from the Institute for Health Metrics and Evaluation recommend “proceeding with caution” when it comes to internet data and precision medicine.

“Despite the public attention and backlash stemming from the Cambridge Analytica scandal — which began with an academic inquiry and resulted in at least 87 million Facebook profiles being disclosed — researchers argue that Facebook and other social media data can be used to advance knowledge, as long as these data are accessed and used in a responsible way. We argue that data from internet-based applications can be a relevant resource for precision medicine studies, provided that these data are accessed and used with care and caution.”

primer | 04.18.18

Algorithmic Accountability: A Primer

Robyn Caplan, Joan Donovan, Lauren Hanson, and Jeanna Matthews

Algorithmic Accountability examines the process of assigning responsibility for harm when algorithmic decision-making results in discriminatory and inequitable outcomes.

The primer–originally prepared for the Progressive Congressional Caucus’ Tech Algorithm Briefing–explores the trade-offs debates about algorithms and accountability across several key ethical dimensions, including fairness and bias; opacity and transparency; and lack of standards for auditing.

For Slate, Data & Society Researcher Jacob Metcalf argues that we should be more concerned about behavioral models developed by entities like Cambridge Analytica, which can be traded between political entities, rather than the voter data itself.

” In other words, the one thing we can be sure of psychographic profiling is that it provided one more way to transfer knowledge and economic value between campaigns and organizations.”

Data & Society Researcher Jacob Metcalf co-authored an op-ed in Slate discussing how giving researchers more access to Facebook users’ data could prevent unethical data mining.

“This case raises numerous complicated ethical and political issues, but as data ethicists, one issue stands out to us: Both Facebook and its users are exposed to the downstream consequences of unethical research practices precisely because like other major platforms, the social network does not proactively facilitate ethical research practices in exchange for access to data that users have consented to share.”

working paper | 03.02.18

The Intuitive Appeal of Explainable Machines

Andrew Selbst, Solon Barocas

This paper is a response to calls for explainable machines by Data & Society Postdoctoral Scholar Andrew Selbst and Affiliate Solon Barocas.

“We argue that calls for explainable machines have failed to recognize the connection between intuition and evaluation and the limitations of such an approach. A belief in the value of explanation for justification assumes that if only a model is explained, problems will reveal themselves intuitively. Machine learning, however, can uncover relationships that are both non-intuitive and legitimate, frustrating this mode of normative assessment. If justification requires understanding why the model’s rules are what they are, we should seek explanations of the process behind a model’s development and use, not just explanations of the model itself.”

Data & Society Postdoctoral Scholar Andrew Selbst argues for regulations in big data policing.

“The way police are adopting and using these technologies means more people of color are arrested, jailed, or physically harmed by police, while the needs of communities being policed are ignored.”

points | 02.14.18

Health Data Rush

Kadija Ferryman

As data becomes more prevalent in the health world, Data & Society Postdoctoral Scholar Kadija Ferryman urges us to consider how we will regulate its collection and usage.

“As precision medicine rushes on in the US, how can we understand where there might be tensions between fast-paced technological advancement and regulation and oversight? What regulatory problems might emerge? Are our policies and institutions ready to meet these challenges?”

Miranda Katz of WIRED interviews D&S founder and president danah boyd on the evolving public discourse around disinformation and how the tech industry can help rebuild American society.

“It’s actually really clear: How do you reknit society? Society is produced by the social connections that are knit together. The stronger those networks, the stronger the society. We have to make a concerted effort to create social ties, social relationships, social networks in the classic sense that allow for strategic bridges across the polis so that people can see themselves as one.”

On Wednesday Nov. 29th, the Supreme Court heard Carpenter vs. U.S., a 4th amendment case on cell data access. Postdoctoral scholars Julia Ticona & Andrew Selbst urged the court to understand that cell phones aren’t voluntary in this day and age.

“The justices will surely understand that without any alternatives for accessing online services, vulnerable (and over-policed) populations will be unable to make meaningful choices to protect their privacy, amplifying the disadvantages they already face.”

D&S founder danah boyd discusses machine learning algorithms and prejudice, digital white flight on social media, trust in the media, and more on The Ezra Klein Show.

“Technology is made by people in a society, and it has a tendency to mirror and magnify the issues that affect everyday life.”

Ford Foundation blog | 05.30.17

Why you Should Care about Bots if you Care about Social Justice

Wildneida Negrón, Morgan Hargrave

D&S affiliate Wilneida Negrón details the role of bots and automation in activism today.

As everyone from advertisers to political adversaries jockey for attention, they are increasingly using automated technologies and processes to raise their own voices or drown out others. In fact, 62 percent of all Internet traffic is made up of programs acting on their own to analyze information, find vulnerabilities, or spread messages. Up to 48 million of Twitter’s 320 million users are bots, or applications that perform automated tasks. Some bots post beautiful art from museum collections, while some spread abuse and misinformation instead. Automation itself isn’t cutting edge, but the prevalence and sophistication of how automated tools interact with users is.

D&S researcher Mary Madden was interviewed by the American Press Institute about Madden’s recent Knight Foundation-supported report, “How Youth Navigate the News Landscape.”

However, one of my favorite quotes was from a participant who described a future where news would be delivered by hologram: “I think like it’s going to be little holograms. You’re going to open this thing and a little guy’s going to come out and tell you about stuff.”

Given that some participants said they already found notifications annoying, I’m not sure how successful the little hologram guy would be, but it was clear that the participants fully expected that the news industry would continue to evolve and innovate in creative ways moving forward.

paper | 04.02.17

Combatting Police Discrimination in the Age of Big Data

Sharad Goel, Maya Perelman, Ravi Shroff, David Alan Sklansky

Sharad Goel, Maya Perelman, D&S fellow Ravi Shroff, and David Alan Sklansky examine a method that can “reduce the racially disparate impact of pedestrian searches and to increase their effectiveness”. Abstract is below:

The exponential growth of available information about routine police activities offers new opportunities to improve the fairness and effectiveness of police practices. We illustrate the point by showing how a particular kind of calculation made possible by modern, large-scale datasets — determining the likelihood that stopping and frisking a particular pedestrian will result in the discovery of contraband or other evidence of criminal activity — could be used to reduce the racially disparate impact of pedestrian searches and to increase their effectiveness. For tools of this kind to achieve their full potential in improving policing, though, the legal system will need to adapt. One important change would be to understand police tactics such as investigatory stops of pedestrians or motorists as programs, not as isolated occurrences. Beyond that, the judiciary will need to grow more comfortable with statistical proof of discriminatory policing, and the police will need to be more receptive to the assistance that algorithms can provide in reducing bias.

D&S researcher danah boyd discusses the problem with asking companies like Facebook and Google to ‘solve’ fake news – boyd insists the context of complex social problems are missing in this problematic solutionism of solving fake news.

Although a lot of the emphasis in the “fake news” discussion focuses on content that is widely spread and downright insane, much of the most insidious content out there isn’t in your face. It’s not spread widely, and certainly not by people who are forwarding it to object. It’s subtle content that is factually accurate, biased in presentation and framing, and encouraging folks to make dangerous conclusions that are not explicitly spelled out in the content itself.

Washington University Law Review | 03.11.17

Privacy, Poverty and Big Data: A Matrix of Vulnerabilities for Poor Americans

Mary Madden, Michele E. Gilman, Karen Levy, Alice E. Marwick

D&S researcher Mary Madden, Michele Gilman, D&S affiliate Karen Levy, and D&S fellow Alice Marwick examine how poor Americans are impacted by privacy violations and discuss how to protect digital privacy for the vulnerable. Abstract is as follows:

This Article examines the matrix of vulnerabilities that low-income people face as a result of the collection and aggregation of big data and the application of predictive analytics. On the one hand, big data systems could reverse growing economic inequality by expanding access to opportunities for low-income people. On the other hand, big data could widen economic gaps by making it possible to prey on low-income people or to exclude them from opportunities due to biases that get entrenched in algorithmic decision-making tools. New kinds of “networked privacy” harms, in which users are simultaneously held liable for their own behavior and the actions of those in their networks, may have particularly negative impacts on the poor. This Article reports on original empirical findings from a large, nationally-representative telephone survey with an oversample of low-income American adults and highlights how these patterns make particular groups of low-status internet users uniquely vulnerable to various forms of surveillance and networked privacy-related problems. In particular, a greater reliance on mobile connectivity, combined with lower usage of privacy-enhancing strategies may contribute to various privacy and security-related harms. The article then discusses three scenarios in which big data – including data gathered from social media inputs – is being aggregated to make predictions about individual behavior: employment screening, access to higher education, and predictive policing. Analysis of the legal frameworks surrounding these case studies reveals a lack of legal protections to counter digital discrimination against low-income people. In light of these legal gaps, the Article assesses leading proposals for enhancing digital privacy through the lens of class vulnerability, including comprehensive consumer privacy legislation, digital literacy, notice and choice regimes, and due process approaches. As policymakers consider reforms, the article urges greater attention to impacts on low-income persons and communities.


D&S affiliate Mimi Onuoha details the process of completely deleting data.

This overwriting process is a bit like painting a wall: If you start with a white wall and paint it red, there’s no way to erase the red. If you want the red gone or the wall returned to how it was, you either destroy the wall or you paint it over, several times, so that it’s white again.

Tampa Bay Times | 11.11.16

How data failed us in calling an election

Steve Lohr, Natasha Singer

D&S affiliate Natasha Singer and Steve Lohr discuss the role of data in the 2016 election.

Virtually all the major vote forecasters, including Nate Silver’s FiveThirtyEight site, the New York Times‘ Upshot and the Princeton Election Consortium, put Clinton’s chances of winning in the 70 percent to 99 percent range. The election prediction business is one small aspect of a far-reaching change across industries that have increasingly become obsessed with data, the value of it and the potential to mine it for cost-saving and profitmaking insights. It is a behind-the-scenes technology that quietly drives everything from the ads that people see online to billion-dollar acquisition deals.

D&S founder danah boyd spoke at the Algorithmic Accountability and Transparency in the Digital Economy panel at the EU Parliament.

D&S artist-in-residence Heather Dewey-Hagborg’s work was recently profiled on Blumhouse.

Undoubtedly the most shocking (but fascinating) of Dewey-Hagborg’s projects is “Stranger Visions,” which she launched in 2013 to much acclaim. The work not only stunned the art community, but sparked the curiosity of the scientific world… for disturbing reasons I’ll get into shortly.

D&S advisor Susan Crawford discusses the possibility and implications of an AT&T and Time Warner Cable merger.

The high-speed internet access market in America is entirely stuck on a expensive plateau of uncompetitive mediocrity, with only city fiber networks providing a public option or, indeed, any alternative at all. The AT&T/TWX deal will not prompt a drop of additional competition in that market. Nor will it mean that the entertainment industry will see more competition or new entrants — just that one player will get an unfair distribution advantage. It’s hard to think of a single positive thing this merger will accomplish, other than shining a bright light on just how awful the picture is for data transmission in this nation.

This deal should be dead on arrival. In fact, AT&T should spare us by dropping the idea now. This merger must not happen.

D&S fellow Anne L. Washington published a Points piece responding to Cathy O’Neil’s Weapons of Math Destruction.

Complex models with high stakes require rigorous periodic taste tests. Unfortunately most organizations using big data analytics have no mechanism for feedback because the models are used in secrecy.

Producing predictions, like making sausage, is currently an obscure practice. If botulism spreads, someone should be able to identify the supply chain that produced it. Since math is the factory that produces the sausage that is data science, some form of reasoning should be leveraged to communicate the logic behind predictions.

interview | 10.31.16

Heather Dewey-Hagborg Questions DNA as Big Data

Joel Kuennen, Heather Dewey-Hagborg

D&S artist-in-residence Heather Dewey-Hagborg’s work was recently profiled in ArtSlant.

At the same time, DNA extraction and sequencing has never been cheaper or easier. In light of this and the continued reliance on DNA as forensic proof, artist Heather Dewey-Hagborg approaches the cultural conception of DNA through a hacker mindset, exploiting vulnerabilities in our legal code to expose society’s unwarranted reliance on DNA as an object of truth.

points | 10.27.16

Shining a light on the darkness

Mark Van Hollebeke

D&S practitioner-in-residence Mark Van Hollebeke discusses Weapons of Math Destruction in this Points piece.

O’Neil’s analysis doesn’t just apply to mathematical models; it applies to societal models. Most of the WMDs that Cathy O’Neil describes are inextricably linked to unjust social structures.

We all, data scientists included, need to act with some humility and reflect on the nature of our social ills. As O’Neil writes, “Sometimes the job of a data scientists is to know when you don’t know enough” (216). Those familiar with Greek moral philosophy know that this type of Socratic wisdom can be very fruitful.

It’s not just the dark side of Big Data she shows us, but shady business practices and unjust social regimes. We will never disarm the WMDs without addressing the social injustice they mask and perpetuate. O’Neil deserves credit for shining a bright light on this fact.

D&S fellow Mark Ackerman develops a checklist to address the sociotechnical issues demonstrated in Cathy O’Neil’s Weapons of Math Destruction.

These checklist items for socio-technical design are all important for policy as well. Yet the book makes it clear that not all “sins” can be reduced to checklist form. The book also explicates other issues that cannot easily be foreseen and are almost impossible for implementers to see in advance, even if well-intentioned. One example from the book is college rankings, where the attempt to be data-driven slowly created an ecology where universities and colleges paid more attention to the specific criteria used in the algorithm. In other situations, systems will be profit-generating in themselves, and therefore implemented, but suboptimal or societally harmful — this is especially true, as the book nicely points out, for systems that operate over time, as happened with mortgage pools. Efficiency may not be the only societal goal — there is also fairness, accountability, and justice. One of the strengths of the book is to point this out and make it quite clear.

D&S post-doctoral scholar Caroline Jack attended the O’Reilly Next:Economy Summit and continues the conversation from the summit.

In the next economy, the most important skills may be difficult to quantify or commodify—but optimizing for human welfare demands that the people driving the innovation economy take them seriously. Care work requires workers to build trust and practice kindness. It is “emotional labor” that demands skills such as calmness, empathy and interpersonal creativity. Given this outlook, the greatest victory of our tech industry could be in turning away from systems that incentivize efficiency and profit and toward designing systems that optimize workers’ and consumers’ dignity, sustenance and welfare.

In the era of big data, how do researchers ethically collect, analyze, and store data? danah boyd, Emily F. Keller, and Bonnie Tijerina explore this question and examine issues from how to achieve informed consent from research subjects in big data research to how to store data securely in case of breaches. The primer evolves into a discussion on how libraries can collaborate with computer scientists to examine ethical big data research issues.

The Engine Room | 09.19.16

Responsible Data in Agriculture

Lindsay Ferris, Zara Rahman

D&S fellow Zara Rahman with Lindsay Ferris wrote an analysis discussing how data is used in agriculture and conclude with how to use said data responsibly.

The responsibility for addressing this does not lie solely with the smaller players in the sector, though. Practising responsible data approaches should be a key concern and policy of the larger actors, from Ministries of Agriculture to companies gathering and dealing with large amounts of data on the sector. Developing policies to proactively identify and address these issues will be an important step to making sure data-driven insights can benefit everyone in the sector.

D&S researcher Claire Fontaine writes a compelling piece analyzing whether or not school performance data reinforces segregation.

Data is great at masking its own embedded bias, and school performance data allows privileged parents to reinforce educational inequality. The best interests of some individuals are optimized at the expense of the society. Accountability programs, particularly when coupled with school choice models, serve to keep middle and upper middle class families invested in public schools, but in an uneven and patterned way, causing segregated school environments to persist despite racial and socioeconomic residential diversity.

D&S affiliate Anthony Townsend writes about his research in data and city charters.

Now, there’s a number of organizations that are working hard on pushing cities up this maturity hill. CityMart is figuring out how to help cities overhaul their innovation process from within. Bloomberg Philanthropies is driving hard to get city governments to focus on achieving measurable innovation. But its all too much within the existing framework of governance systems that are usually fundamental dysfunctional, structurally incapable of delivering. Digital maturity seems to want to engage a larger conversation about the transformation of governance that is missing. No one seems to be willing to go out on a limb — with the exception of the radical political movements like Podemos and Syria (but they haven’t engaged the smart city meme in any real way yet) — and call the whole incremental update campaign into question. (n.b. while the Pirate Party has engaged ‘smart’ in a legitimate way, they don’t represent a coherent political movement in my opinion).

D&S researcher Jacob Metcalf writes “Big Data Analytics and Revision of the Common Rule”. Abstract is below:

“Big data” is a major technical advance in terms of computing expense, speed, and capacity. But it is also an epistemic shift wherein data is seen as infinitely networkable, indefinitely reusable, and significantly divorced from the context of collection.1,7 The statutory definitions of “human subjects” and “research” are not easily applicable to big data research involving sensitive human data. Many of the familiar norms and regulations of research ethics formulated to prior paradigms of research risks and harms, and thus the formal triggers for ethics review are miscalibrated. We need to reevaluate long-standing assumptions of research ethics in light of the emergence of “big data” analytics.6,10,13

The U.S. Department of Health and Human Services (HHS) released a Notice of Proposed Rule-Making (NPRM) in September 2015 regarding proposed major revisions (the first in three decades) to the research ethics regulations known as the Common Rule.a The proposed changes grapple with the consequences of big data, such as informed consent for bio-banking and universal standards for privacy protection. The Common Rule does not apply to industry research, and some big data science in universities might not fall under its purview, but the Common Rule addresses the burgeoning uses of big data by setting the tone and agenda for research ethics in many spheres.

D&S affiliate Anthony Townsend writes more on city charters and big data.

The point is… what we now think of as ‘hidebound obsolete bureaucracy’ was no so long ago the cutting edge analytics and evidence-based administrative technology of its day. It’s outlived its usefulness for sure, but these zombie public organizations will shamble on for a long time without a better vision that can plot a transition path to within the reform process that’s required by law.

D&S fellow Mimi Onuoha is interviewed about design processes and current projects.

Data & Society Researcher Jacob Metcalf reconsiders research ethics in the wake of big data.

“Many of the familiar norms and regulations of research ethics formulated to prior paradigms of research risks and harms, and thus the formal triggers for ethics review are miscalibrated. We need to reevaluate longstanding assumptions of research ethics in light of the emergence of “big data” analytics.”

D&S researcher Bonnie Tijerina offers an overview of the work undertaken by the Supporting Ethics in Data Research project.

Complex data sets raise challenging ethical questions about risk to individuals who are not sufficiently covered by computer science training, ethics codes, or Institutional Review Boards (IRBs). The use of publicly available, corporate, and government data sets may reveal human practices, behaviors, and interactions in unintended ways, creating the need for new kinds of ethical support. Secondary data use invokes privacy and consent concerns. A team at Data & Society recently conducted interviews and campus visits with computer science researchers and librarians at eight U.S. universities to examine the role of research librarians in assisting technical researchers as they navigate emerging issues of privacy, ethics, and equitable access to data at different phases of the research process.

D&S fellow and technical writer Martha Poon responds to Viktor Mayer-Schönberger and Kenneth Cukier’s Big Data: A Revolution That Will Transform How We Live, Work, and Think, challenging their premise that “companies that can situate themselves in the middle of information flows and can collect data will thrive.”

In this Review Essay, which also engages with Dan Bouk’s How Our Days Became Numbered: Risk and the Rise of the Statistical Individual and Liz McFall’s Devising Consumption: Cultural Economies of Insurance, Credit and Spending, Martha takes us on a tour of the history of capitalism and consumption and brings us back to first principles to ask: “What’s old and what’s new about data science? What kinds of outcomes can digital data predict?”

What kind of power is big data?

Data are used to run the system, but data analytics do not describe the process by which the system generates its outcomes. That’s why data science should be categorized as operational control and not as an exercise in the practice of knowledge.

paper | 05.23.16

Perspectives on Big Data, Ethics, and Society

Jacob Metcalf, Emily F. Keller, danah boyd

The Council for Big Data, Ethics, and Society has released a comprehensive white paper consolidating conversations and ideas from two years of meetings and discussions:

Today’s release marks a major milestone for the Council, which began in 2014 with support from the National Science Foundation and the goal of providing critical social and cultural perspectives on “big data” research initiatives. The work of the Council consistently surfaced conflicts between big data research methods and existing norms. Should big data methods be exempted from those norms? pushed into them? Are entirely new paradigms needed? The white paper provides recommendations in the areas of policy, pedagogy, and network building, as well as identifying crucial areas for further research. From the Executive Summary:

The Council’s findings, outputs, and recommendations—including those described in this white paper as well as those in earlier reports—address concrete manifestations of these disjunctions between big data research methods and existing research ethics paradigms. We have identified policy changes that would encourage greater engagement and reflection on ethics topics. We have indicated a number of pedagogical needs for data science instructors, and endeavored to fulfill some of them. We have also explored cultural and institutional barriers to collaboration between ethicists, social scientists, and data scientists in academia and industry around ethics challenges. Overall, our recommendations are geared toward those who are invested in a future for data science, big data analytics, and artificial intelligence guided by ethical considerations along with technical merit.

D&S Researcher Jake Metcalf and D&S Affiliate Kate Crawford examine the growing discontinuities between research practices of data science and established tools of research ethics regulation.

Abstract: There are growing discontinuities between the research practices of data science and established tools of research ethics regulation. Some of the core commitments of existing research ethics regulations, such as the distinction between research and practice, cannot be cleanly exported from biomedical research to data science research. These discontinuities have led some data science practitioners and researchers to move toward rejecting ethics regulations outright. These shifts occur at the same time as a proposal for major revisions to the Common Rule — the primary regulation governing human-subjects research in the U.S. — is under consideration for the first time in decades. We contextualize these revisions in long-running complaints about regulation of social science research, and argue data science should be understood as continuous with social sciences in this regard. The proposed regulations are more flexible and scalable to the methods of non-biomedical research, but they problematically exclude many data science methods from human-subjects regulation, particularly uses of public datasets. The ethical frameworks for big data research are highly contested and in flux, and the potential harms of data science research are unpredictable. We examine several contentious cases of research harms in data science, including the 2014 Facebook emotional contagion study and the 2016 use of geographical data techniques to identify the pseudonymous artist Banksy. To address disputes about human-subjects research ethics in data science, critical data studies should offer a historically nuanced theory of “data subjectivity” responsive to the epistemic methods, harms and benefits of data science and commerce.

In this background primer, D&S Research Analyst Laura Reed and D&S Founder danah boyd situate the current debate around the role of technology in the public sphere within a historical context. They identify and tease out some of the underlying values, biases, and assumptions present in the current debate surrounding the relationship between media and democracy, and connect them to existing scholarship within media history that is working to understand the organizational, institutional, social, political, and economic factors affecting the flow of news and information. They also identify a set of key questions to keep in mind as the conversation around technology and the public sphere evolves.

Algorithms play an increasingly significant role in shaping the digital news and information landscape, and there is growing concern about the potential negative impact that algorithms might have on public discourse. Examples of algorithmic biases and increasingly curated news feeds call into question the degree to which individuals have equal access to the means of producing, disseminating, and accessing information online. At the same time, these debates about the relationship between media, democracy, and publics are not new, and linking those debates to these emerging conversations about algorithms can help clarify the underlying assumptions and expectations. What do we want algorithms to do in an era of personalization? What does a successful algorithm look like? What form does an ideal public sphere take in the digital age? In asking these and other questions, we seek to highlight what’s at stake in the conversation about algorithms and publics moving forward.

video | 03.04.16

Technology, Privacy, and the Future of Education Symposium

Ted Magder, Richard Arum, Anya Kamenetz, Kouross Esmaeli, Natasha Singer, Brett Frischmann, Mitchell Stevens, Elana Zeide, Helen Nissenbaum

Data & Society Fellow Natasha Singer participated in a panel alongside D&S Affiliates Elaina Zeide and Hellen Nissenbaum on the Implications of Data-driven Education as part of the Technology, Privacy, and the Future of Education symposium, hosted by the Department of Media, Culture, and Communication at NYU Steinhardt.

The symposium brought together educational specialists, journalists, and academics to open a dialogue around the pedagogical, legal, and ethical repercussions of the use of new technologies in educational environments.

The symposium took place on March 4, 2016.


Limn.it | 03.01.16

Just What Are We Archiving?

Geoffrey C. Bowker

D&S Affiliate Geoffrey C. Bowker considers our collective archives as central cultural, social, and economic constructions and asks “What kind of people will we become if we keep trying to archive everything?”:

I have graduated to the dawn of the era of Big Data. It seems as if all aspects of our lives are being tracked, monitored, and stored for future use, and by “us” here I include vast swathes of the nonhuman as well as human world. We leave traces everywhere, often without realizing it, and these are potentially stored forever; collectively, they build a picture of ourselves that can be exploited by commercial companies (by way of Google, Facebook), governments, and aspiring political candidates.

I grew up thinking that archives were dusty, dry places that only aspiring historians such as myself could find exciting…and I still treasure the peace of roaming through a nineteenth-century set of police reports on a political group (the National Union of the Working Classes) in the 1830s in England, as well as continue to feel the anguish of what I found there. I have graduated to seeing archives as performative: they constitute the present as much as document the past.

D&S fellow Mimi Onuoha thinks through the implications of the moment of data collection and offers a compact set of reminders for those who work with and think about data.

The conceptual, practical, and ethical issues surrounding “big data” and data in general begin at the very moment of data collection. Particularly when the data concern people, not enough attention is paid to the realities entangled within that significant moment and spreading out from it.

The point of data collection is a unique site for unpacking change, abuse, unfairness, bias, and potential. We can’t talk about responsible data without talking about the moment when data becomes data.

D&S fellow Natasha Singer explores the differences in how the United States and Europe treat data protection and privacy.

In the United States, a variety of laws apply to specific sectors, like health and credit. In the European Union, data protection is considered a fundamental right, which can have far-reaching consequences in all 28 member states.

All the talk about data privacy can get caught up in political wrangling. But the different approaches have practical consequences for people, too.

D&S Artist in Residence Ingrid Burrington contemplates network infrastructure, underlining the fact that today’s infrastructure can’t last much longer under the strain of exponentially expanding connectivity demands. She suggests that the tech industry will soon have to face long-term questions concerning time, maintenance, and scale.

The impact of data centers—really, of computation in general—isn’t something that really galvanizes the public, partly because that impact typically happens at a remove from everyday life. The average amount of power to charge a phone or a laptop is negligible, but the amount of power required to stream a video or use an app on either device invokes services from data centers distributed across the globe, each of which uses energy to perform various processes that travel through the network to the device. One study(weirdly enough, sponsored by the American Coal Association, its purpose to enthuse about how great coal is for technology) estimated that a smartphone streaming an hour of video on a weekly basis uses more power annually than a new refrigerator.


D&S artist in residence Ingrid Burrington shares impressions from a tour of Facebook’s massive Altoona data center, and wonders about the extent to which Facebook might be creating an infrastructure to rival the internet itself.

The entrance to the server room where all of this hardware lives is behind both an ID-card reader and a fingerprint scanner. The doors open dramatically, and they close dramatically. It is only one of several server rooms at Altoona, but just this one room also seems endless. It is exactly the glimmering-LED cyberpunk server-porn dreamscape that it is supposed to be.

D&S fellow Natasha Singer details Clever, a software service — free for schools — that enables schools to send student information to web and mobile apps at the click of a button.

Clever is clearly benefiting from the increasing adoption of education software for prekindergarten through 12th grade in the United States, a market estimated at nearly $8.4 billion last year by the Software and Information Industry Association. It has positioned itself as a partial answer to questions from politicians and parents about how much data those kinds of tools may collect on students and how that information is secured and used.

D&S Affiliate Kate Crawford writes about the use of social media platforms and mobile phone data to mine and produce accounts of how people are responding in the aftermath of crisis events. She considers the limitations of social and mobile datasets, and warns that if not sufficiently understood and accounted for, these blindspots can produce specific kinds of analytical and ethical oversights.

In this paper, we analyze some of the problems that emerge from the reliance on particular forms of crisis data, and we suggest ways forward through a deeper engagement with ethical frameworks and a more critical questioning of what crisis data actually represents. In particular, the use of Twitter data and crowdsourced text messages during crisis events such as Hurricane Sandy and the Haiti Earthquake raised questions about the ways in which crisis data act as a system of knowledge. We analyze these events from ontological, epistemological, and ethical perspectives and assess the challenges of data collection, analysis and deployment. While privacy concerns are often dismissed when data is scraped from public-facing platforms such as Twitter, we suggest that the kinds of personal information shared during a crisis—often as a way to find assistance and support—present ongoing risks. We argue for a deeper integration of critical data studies into crisis research, and for researchers to acknowledge their role in shaping norms of privacy and consent in data use.

video | 07.02.15

Data Ethics in the Age of the Quantified Society

Kate Crawford, Jonathan Zittrain, Ashkan Soltani, Alexis Madrigal

“Leading thinkers from business, government, civil society, and academia [including D&S affiliate Kate Crawford] explore and debate ethics in the age of the quantified society. What role do ethics play in guiding existing efforts to develop and deploy data and information technologies? Does data ethics need to develop as a field to help guide policy, research, and practice — just as bioethics did in order to guide medicine and biology? Why or why not?”

Data & Society submitted comments with the National Telecommunications and Information Administration (NTIA) in response to their “Request for Comment on Stakeholder Engagement on Cybersecurity in the Digital Ecosystem.”

The digital ecosystem is quickly changing as more services are offered online and as the devices that make up the Internet of Things (IoT) proliferate. We recommended that NTIA’s multistakeholder effort attempt to address, among other things, cybersecurity in the Internet of Things, user notification and choice in regard to data collection, and possible civil liberties dilemmas raised by big data and monitoring by numerous devices and websites.

These concerns about the effects of the Internet of Things on cybersecurity and civil liberties need to be addressed while the ecosystem is young. Failure to consider these questions now could leave users vulnerable to a number of threats in the future. Unless devices and services are adequately secured, users will be vulnerable to breaches that could expose intimate information about their bodies and homes to people who were never given permission to access that data. Additionally, without giving users proper notification and obtaining actual consent, users will be unaware of the privacy risks involved in using these technologies and unable to protect the information they consider private. Finally, data collection by online services and by devices that monitor our bodies and environments could lead to abuses of users’ civil liberties.

magazine article | 05.27.15

What Amazon Taught the Cops

Ingrid Burrington

D&S artist in residence Ingrid Burrtington writes about the history of the term “predictive policing”, the pressures on police forces that are driving them to embrace data-driven policing, and the many valid causes for concern and outrage among civil-liberties advocates around these techniques and tactics.

It’s telling that one of the first articles to promote predictive policing, a 2009 Police Chief Magazine piece by the LAPD’s Charlie Beck and consultant Colleen McCue, poses the question “What Can We Learn From Wal-Mart and Amazon About Fighting Crime in a Recession?” The article likens law enforcement to a logistics dilemma, in which prioritizing where police officers patrol is analogous to identifying the likely demand for Pop-Tarts. Predictive policing has emerged as an answer to police departments’ assertion that they’re being asked to do more with less. If we can’t hire more cops, the logic goes, we need these tools to deploy them more efficiently.


“Communication technologies increasingly mediate data exchanges rather than human communication. We propose the term data valences to describe the differences in expectations that people have for data across different social settings. Building on two years of interviews, observations, and participation in the communities of technology designers, clinicians, advocates, and users for emerging mobile data in formal health care and consumer wellness, we observed the tensions among these groups in their varying expectations for data. This article identifies six data valences (self-evidence, actionability, connection, transparency, “truthiness,” and discovery) and demonstrates how they are mediated and how they are distinct across different social domains. Data valences give researchers a tool for examining the discourses around, practices with, and challenges for data as they are mediated across social settings.”

“According to [D&S fellows] Karen Levy and Tim Hwang, the metaphors we use to describe our digital life matter, because the metaphors we use have baggage. In a recent article, ​’The Cloud’ and Other Dangerous Metaphors, they say that the assumptions embedded in metaphors become embedded in our discussions. And that can make us lose track of what it is we’re really talking about.”

Surfing, streams and clouds – the dangers of digital metaphor, CBC Spark, February 1, 2015

Excerpt: “What’s more, metaphors matter because they shape laws and policies about data collection and use. As technology advances, law evolves (slowly, and somewhat clumsily) to accommodate new technologies and social norms around them. The most typical way this happens is that judges and regulators think about whether a new, unregulated technology is sufficiently like an existing thing that we already have rules about—and this is where metaphors and comparisons come in.”

primer | 10.30.14

Data & Civil Rights: Technology Primer

Solon Barocas, Alex Rosenblat, danah boyd, Seeta Peña Gangadharan, Corrine Yu

Data have assumed a significant role in routine decisions about access, eligibility, and opportunity across a variety of domains. These are precisely the kinds of decisions that have long been the focus of civil rights campaigns. The results have been mixed. Companies draw on data in choosing how to focus their attention or distribute their resources, finding reason to cater to some of its customers while ignoring others. Governments use data to enhance service delivery and increase transparency, but also to decide whom to subject to special scrutiny, sanction, or punishment. The technologies that enable these applications are sometimes designed with a particular practice in mind, but more often are designed more abstractly, such that technologists are often unaware of and not testing for the ways in which they might benefit some and hurt others.

The technologies and practices that are driving these shifts are often described under the banner of “big data.” This concept is both vague and controversial, particularly to those engaged in the collection, cleaning, manipulation, use, and analysis of data. More often than not, the specific technical mechanisms that are being invoked fit under a different technical banner: “data mining.”

Data mining has a long history in many industries, including marketing and advertising, banking and finance, and insurance. As the technologies have become more affordable and the availability of data has increased, both public and private sectors—as well as civil society—are envisioning new ways of using these techniques to wrest actionable insights from once intractable datasets. The discussion of these practices has prompted fear and anxiety as well as hopes and dreams. There is a significant and increasing gap in understanding between those who are and are not technically fluent, making conversations about what’s happening with data challenging. That said, it’s important to understand that transparency and technical fluency is not always enough. For example, those who lack technical understanding are often frustrated because they are unable to provide oversight or determine the accuracy of what is produced while those who build these systems realize that even they cannot meaningfully assess the product of many algorithms.

This primer provides a basic overview to some of the core concepts underpinning the “big data” phenomenon and the practice of data mining. The purpose of this primer is to enable those who are unfamiliar with the relevant practices and technical tools to at least have an appreciation for different aspects of what’s involved.

This document is a workshop primer from Data & Civil Rights: Why “Big Data” is a Civil Rights Issue.

testimony | 08.15.14

Re: “Big Data: A Tool for Inclusion or Exclusion?”

Seeta Peña Gangadharan, danah boyd, Solon Barocas

In this letter to the Federal Trade Commission (FTC), New American Foundation’s Open Technology Institute is joined by Data & Society and Solon Barocas, an independent researcher, in asking the FTC to address the ethical problems, legal constraints, and technical difficulties associated with building a body of evidence of big data harms, the issue of whether intentions should matter in the evaluation of big data harms, and the unique context of vulnerable populations and implications for problem solving and taking steps to protect them.

The letter was submitted in response to an FTC request for comments in advance of its workshop, Big Data: A Tool for Inclusion or Exclusion?

In this op-ed, Data & Society fellow Seeta Peña Gangadharan argues that the “rise of commercial data profiling is exacerbating existing inequities in society and could turn de facto discrimination into a high-tech enterprise.” She urges us to “respond to this digital discrimination by making civil rights a core driver of data-powered innovations and getting companies to share best practices in detecting and avoiding discriminatory outcomes.”

The New Inquiry | 05.30.14

The Anxieties of Big Data

Kate Crawford

In this essay Data & Society affiliate Kate Crawford asks, “What does the lived reality of Big Data feel like?” She offers “surveillant anxiety — the fear that all the data we are shedding every day is too revealing of our intimate selves but may also misrepresent us.” And she pairs the anxiety of the surveilled with the anxiety of the surveillers: “that no matter how much data they have, it is always incomplete, and the sheer volume can overwhelm the critical signals in a fog of possible correlations.”

Data-oriented systems are inferring relationships between people based on genetic material, behavioral patterns (e.g., shared geography imputed by phone carriers), and performed associations (e.g., “friends” online or shared photographs). What responsibilities do entities who collect data that imputes connections have to those who are implicated by association? For example, as DNA and other biological materials are collected outside of medicine (e.g., at point of arrest, by informatics services like 23andme, for scientific inquiry), what rights do relatives (living, dead, and not-yet-born) have? In what contexts is it acceptable to act based on inferred associations and in which contexts is it not?

This document is a workshop primer from The Social, Cultural & Ethical Dimensions of “Big Data”.

primer | 03.17.14

Primer: Data Supply Chains

Data & Society

As data moves between actors and organizations, what emerges is a data supply chain. Unlike manufacturing supply chains, transferred data is often duplicated in the process, challenging the essence of ownership. What does ethical data labor look like? How are the various stakeholders held accountable for being good data guardians? What does clean data transfer look like? What kinds of best practices can business and government put into place? What upstream rights to data providers have over downstream commercialization of their data?

This document is a workshop primer from The Social, Cultural & Ethical Dimensions of “Big Data”.

The availability of data is not evenly distributed. Some organizations, agencies, and sectors are better equipped to gather, use, and analyze data than others. If data is transformative, what are the consequences of defense and security agencies having greater capacity to leverage data than, say, education or social services? Financial wherewithal, technical capacity, and political determinants all affect where data is employed. As data and analytics emerge, who benefits and who doesn’t, both at the individual level and the institutional level? What about the asymmetries between those who provide the data and those who collect it? How does uneven data access affect broader issues of inequality? In what ways does data magnify or combat asymmetries in power?

This document is a workshop primer from The Social, Cultural & Ethical Dimensions of “Big Data”.

Accountability is fundamentally about checks and balances to power. In theory, both government and corporations are kept accountable through social, economic, and political mechanisms. Journalism and public advocates serve as an additional tool to hold powerful institutions and individuals accountable. But in a world of data and algorithms, accountability is often murky. Beyond questions about whether the market is sufficient or governmental regulation is necessary, how should algorithms be held accountable? For example what is the role of the fourth estate in holding data-oriented practices accountable?

This document is a workshop primer from The Social, Cultural & Ethical Dimensions of “Big Data”.

Countless highly accurate predictions can be made from trace data, with varying degrees of personal or societal consequence (e.g., search engines predict hospital admission, gaming companies can predict compulsive gambling problems, government agencies predict criminal activity). Predicting human behavior can be both hugely beneficial and deeply problematic depending on the context. What kinds of predictive privacy harms are emerging? And what are the implications for systems of oversight and due process protections? For example, what are the implications for employment, health care and policing when predictive models are involved? How should varied organizations address what they can predict?

This document is a workshop primer from The Social, Cultural & Ethical Dimensions of “Big Data”.

Abstract: The rise of “Big Data” analytics in the private sector poses new challenges for privacy advocates. Through its reliance on existing data and predictive analysis to create detailed individual profiles, Big Data has exploded the scope of personally identifiable information (“PII”). It has also effectively marginalized regulatory schema by evading current privacy protections with its novel methodology. Furthermore, poor execution of Big Data methodology may create additional harms by rendering inaccurate profiles that nonetheless impact an individual’s life and livelihood. To respond to Big Data’s evolving practices, this Article examines several existing privacy regimes and explains why these approaches inadequately address current Big Data challenges. This Article then proposes a new approach to mitigating predictive privacy harms—that of a right to procedural data due process. Although current privacy regimes offer limited nominal due process-like mechanisms, a more rigorous framework is needed to address their shortcomings. By examining due process’s role in the Anglo-American legal system and building on previous scholarship about due process for public administrative computer systems, this Article argues that individuals affected by Big Data should have similar rights to those in the legal system with respect to how their personal data is used in such adjudications. Using these principles, this Article analogizes a system of regulation that would provide such rights against private Big Data actors.

Stanford Law Review | 09.03.13

It’s Not Privacy, and It’s Not Fair

Cynthia Dwork, Deirdre K. Mulligan

In the Stanford Law Review symposium issue on privacy and big data (September 2013), Cynthia Dwork and Data & Society advisor Deirdre Mulligan argue that “privacy controls and increased transparency fail to address concerns with the classifications and segmentation produced by big data analysis.” “If privacy and transparency are not the panacea to the risks posed by big data,” they ask, “what is?” They offer a quartet of approaches/areas of focus.

paper | 09.21.11

Six Provocations for Big Data

danah boyd, Kate Crawford

This essay offers a multi-discplinary social analysis of the “Big Data” phenomenon with the goal of sparking a conversation, and it continues to provide a point of reference for the launch and development of Data & Society.

Abstract: The era of “Big Data” has begun. Computer scientists, physicists, economists, mathematicians, political scientists, bio-informaticists, sociologists, and many others are clamoring for access to the massive quantities of information produced by and about people, things, and their interactions. Diverse groups argue about the potential benefits and costs of analyzing information from Twitter, Google, Verizon, 23andMe, Facebook, Wikipedia, and every space where large groups of people leave digital traces and deposit data. Significant questions emerge. Will large-scale analysis of DNA help cure diseases? Or will it usher in a new wave of medical inequality? Will data analytics help make people’s access to information more efficient and effective? Or will it be used to track protesters in the streets of major cities? Will it transform how we study human communication and culture, or narrow the palette of research options and alter what ‘research’ means? Some or all of the above?

Subscribe to the Data & Society newsletter

Support us

Data & Society Research Institute 36 West 20th Street, 11th Floor
New York, NY 10011, Tel: 646.832.2038

Reporters and media:
[email protected]

General inquiries:
[email protected]

Unless otherwise noted this site and its contents are licensed under a Creative Commons Attribution 3.0 Unported license.