FW-HTF-R: Human-Machine Teaming for Effective Data Work at Scale: Upskilling Defense Lawyers Working with Police and Court Process Data

$2,000,000FY2021CSENSF

University Of California-Berkeley, Berkeley CA

Investigators

Aditya G Parameswarancontact Erin M Kerrison Joseph M Hellerstein Niloufar Salehi Sarah E Chasins

Abstract

This project will build tools to help defense attorneys do their work -- in particular, to help them use and understand the large quantities of data that they are now asked to handle. As more and more data about policing, courts, and individual cases becomes available, attorneys are finding that the evidence they need to advocate for their clients is locked in vast piles of messy, incomplete data. With the relevant information scattered across scans of hundreds of pages of paper forms or hours of audio and video, defenders do not have the programming and data analysis skills they need to extract key information from the public and private data at their disposal. This leaves defense attorneys at a disadvantage, particularly public defenders who have limited access to staff with data analysis expertise and who face high caseloads that leave them limited time to learn data analysis. To help address this gap, the project team will partner with legal associations and defense attorneys to develop data analysis methods and tools that do much of the work of collecting, organizing, and suggesting analyses of these messy police and court process data. Doing this will reduce the burden for defense attorneys, increase the value of data, and ultimately lead to fairer, better outcomes in criminal justice contexts. This project's data platform will leverage three key underlying techniques the project team will advance: (i) familiar no-code and low-code modalities like natural language search boxes and spreadsheet interfaces; (ii) program synthesis and machine learning to transform "fuzzy" queries in no-code interfaces into a space of possible interpretations (including improving predictions by generalizing from prior tool usage data); and (iii) interactive ambiguity resolution widgets that present visual representations of output data, allowing users to steer the tool towards their target programs or analyses by disambiguating between alternatives generated in (ii). In developing this platform, the team will contribute advances in program synthesis and ML-aided program generation, including novel algorithms for synthesis; develop novel mechanisms and algorithms for learning from users' prior activity in the context of data work tools; and invent new program recommendation algorithms, especially for recommending plausible tweaks to existing data analysis programs. These techniques will be incorporated into a larger user-centered design process toward building tools and interfaces that meet public defenders’ needs and take into account the legal context and constraints in which they work. The tools will be iteratively developed and evaluated among an increasingly large set of users, starting with individual defenders and public defenders’ offices, with the goal of producing off-the-shelf solutions that can be adopted by a range of legal entities and organizations. Together, the work will contribute to knowledge of how to build no-code and low-code tools to democratize data access more broadly. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →