Separating the Privileged Wheat from the Chaff - Using Text Analytics and Machine Learning to Protect Attorney-Client Privilege

Abstract

The digital age has created unique challenges for parties that engage in large-scale litigation. Safeguarding the attorney-client privilege is a critical task for litigators during discovery—one that becomes more difficult and expensive every year. Document review is now responsible for the vast majority of costs in the average legal matter, and costs are only rising. The volume of digitally-stored data doubles roughly every two years, driving up discovery costs and increasing the risk of inadvertent disclosure of privileged information. As the digital world evolves, the legal community has sought to evolve with it, particularly in the document review process. Keyword searching has been the dominant method of identifying digitallystored, privileged documents for the last several decades, but attorneys have conducted little research about the most efficient ways to use this method. Most legal teams rely on a combination of intuition and conventional wisdom. To subject those intuitions to the rigor of scientific experiments, we used three data sets and search term lists from real legal matters to determine which search terms were effective in identifying privileged communications. The results from our study revealed that thoughtfully crafted keyword term lists do identify a significant portion of the privileged document population. What may be surprising to experienced practitioners is that many commonly used terms that are believed to be imprecise proved quite effective at identifying privileged documents, while limiting the volume for review. Other popular terms proved to be ineffective. The study also compared the effectiveness of identifying privileged communications using predictive modeling and machine learning. The insights provided in this article can, if implemented by practitioners, add additional client protections against the disclosure of privilege documents and make privilege review more defensible and less costly.

Last Page

Recommended Citation

Robert Keeling, Nathaniel Hubert-Fliflet, Jianping Zhang & Rishi P. Chhatwal, Separating the Privileged Wheat from the Chaff - Using Text Analytics and Machine Learning to Protect Attorney-Client Privilege, 25 Rich. J.L. & Tech 1 (2024).
Available at: https://scholarship.richmond.edu/jolt/vol25/iss3/2

Download

COinS

Separating the Privileged Wheat from the Chaff - Using Text Analytics and Machine Learning to Protect Attorney-Client Privilege

Authors

Abstract

Last Page

Recommended Citation

Share

Search