Predictive policing systems are increasingly used by law enforcement to try to prevent crime before it occurs. But what happens when these systems are trained using biased data?
Machine learning algorithms are designed to learn and reproduce patterns in data, but if biased data is used to train these predictive models, the models will reproduce and in some cases amplify those same biases. At best, this renders the predictive models ineffective. At worst, it results in discriminatory policing
In this talk, Kristian elaborates on the concept of “bias in, bias out” in machine learning with a simple, non-technical example. She demonstrates how applying machine learning to police records can result in the over-policing of historically over-policed communities. Using a case study from Oakland, CA, she shows one specific case of how predictive policing not only perpetuates the biases that were previously encoded in the police data, but – under some circumstances – actually amplifies those biases.
Kristian Lum is the Lead Statistician at HRDAG. Kristian’s research focus has been on furthering the statistical methodology most commonly used by HRDAG—population estimation or multiple systems estimation—with a particular emphasis on Bayesian methods and model averaging. She is the primary author of the dga package, open source software for population estimation for the R computing environment. More recently, her research has expanded to include agent-based modeling and simulation-based analysis, through which has made contributions to understanding incarceration in the United States as a contagion. She is currently leading the HRDAG project on policing in the United States and has contributed to HRDAG projects in Colombia, Guatemala, and Kosovo.
Kristian received an MS and a PhD from the Department of Statistical Science at Duke University and a BA in Mathematics and Statistics from Rice University.
Data & Society’s “Databites” speaker series presents timely conversations about the purpose and power of technology, bridging our interdisciplinary research with broader public conversations about the societal implications of data and automation.