Policy BriefOctober 25 2023

AI Red-Teaming Is Not a One-Stop Solution to AI Harms: Recommendations for Using Red-Teaming for AI Accountability

Sorelle Friedler
Ranjit Singh
Borhane Blili-Hamelin
Jacob Metcalf
Brian J. Chen

Red-teaming is a method where people — traditionally security engineers inside a company — interact with a system to try to make it produce undesired outcomes. The goal is to identify ways the system doesn’t work as intended, and then find fixes for the breaks.

Increasingly, red-teaming is being put forward as a solution to concerns about artificial intelligence — a way to pressure test AI systems and identify potential harms. What does that mean in practice? What can red-teaming do, and what are its limits? Answering those questions is the subject of this policy brief by Sorelle Friedler, Ranjit Singh, Borhane Blili-Hamelin, Jacob Metcalf, and Brian J. Chen. 

Based on ongoing fieldwork, interviews with diverse stakeholders, and secondary research, the authors find that red-teaming serves a very specific role to identify risks and advance AI accountability, but faces substantial limits in mitigating real-world harms and holistically assessing an AI system’s safety. The brief outlines the conditions under which AI red-teaming works well and those under which it does not, and argues that any use of red-teaming should be accompanied by additional forms of accountability, like algorithmic impact assessments, external audits, and public consultation.

Supported by a grant from Omidyar Network