Red-teaming is a method where people — traditionally security engineers inside a company — interact with a system to try to make it produce undesired outcomes. The goal is to identify ways the system doesn’t work as intended, and then find fixes for the breaks.
Increasingly, red-teaming is being put forward as a solution to concerns about artificial intelligence — a way to pressure test AI systems and identify potential harms. What does that mean in practice? What can red-teaming do, and what are its limits? Answering those questions is the subject of this policy brief by Sorelle Friedler, Ranjit Singh, Borhane Blili-Hamelin, Jacob Metcalf, and Brian J. Chen.
Based on ongoing fieldwork, interviews with diverse stakeholders, and secondary research, the authors find that red-teaming serves a very specific role to identify risks and advance AI accountability, but faces substantial limits in mitigating real-world harms and holistically assessing an AI system’s safety. The brief outlines the conditions under which AI red-teaming works well and those under which it does not, and argues that any use of red-teaming should be accompanied by additional forms of accountability, like algorithmic impact assessments, external audits, and public consultation.
Supported by a grant from Omidyar Network