Machine Learning Classification for Document Review
Thomas Barnett, Svetlana Godjevac, Caroline Privault, Jean-Michel Renders, John Schneider, Robert Wickstrom
Using keyword searches (and their variants) to identify potentially responsive data has become standard operating procedure in large scale document reviews in litigation, regulatory inquiries and subpoena compliance. At the same time, within the legal community, there is growing skepticism as to the adequacy of such an approach. Developments in information retrieval and extraction technology have lead to a number of more sophisticated approaches to meeting the challenges of isolating responsive material in complex litigation and regulatory matters. Initially met with resistance by judges, practicing attorneys and legal professionals, such approaches are garnering more serious analysis as the amount of potential source data expands and the costs of collection, processing and most significantly, review, strain corporate budgets. One of these new approaches is the subject of this paper. Specifically, applying machine learning classification to the human decision making process in litigation document reviews is addressed. The human (or manual) review phase of the e-discovery process typically involves large teams of attorney reviewers analyzing thousands of documents per day to identify and record (or ?code?) content responsive to document requests, regulatory subpoenas or related to specific issues in the case. The currently accepted approach to the review process is costly, time-consuming, and prone to error. Accurately and efficiently assigning appropriate coding (e.g., responsive/non-responsive) is essential as the volumes of data continue to increase while parties remain subject to strict deadlines imposed by courts or regulatory bodies. This paper suggests that automatic textual classification can expedite this process in a number of ways. It can reduce the quantity of documents that require human review by identifying subsets of non-responsive documents after which the remaining documents can be organized based on each document?s likelihood of responsiveness. Guided by this ranking, the review teams can prioritize manual review on selected subsets of documents most likely to be responsive. Further, machine learning textual classification can augment the ability to assess review accuracy by highlighting inconsistencies in reviewer decisions without the requirement of re-reviewing by more senior level attorneys.
Workshop DESI at ICAIL 2009 (12th International Conference on Artificial Intelligence & Law), Barcelone, Spain, June 8, 2009.
Full paper available on DESI Website