Soft Labeling for Multi-Pass Document Review
Jianlin Cheng, Amanda Jones, Caroline Privault, Jean-Michel Renders
In this paper we examine the use of machine learning classifiers utilized in technology-assisted review (TAR) and, more specifically, the multi-pass manual coding process that supports the training and testing of these classifiers. Manual document coding is known to be subject to error, misinterpretation, and disagreement in reviews conducted for litigation matters. It is also known that the accuracy and consistency of such coding has significant impacts on the performance and evaluation of resulting classifiers, since the classifiers utilize this coding as a basis for “learning by example.” Correspondingly, the value of rigorous quality control (QC) for the training and testing documents used for classifier development is also well established. In a traditional approach, coding decisions made during QC review are assumed to be accurate and are used going forward without reference to previous coding. We describe a method for integrating multiple coding judgments in the construction of a document classifier, based on multi-pass review efforts, and explain the benefit of such an approach.
ICAIL 2013 Internation Conference on Artificial Intelligence and Law, Rome, Italy, 10-14 June, 2013.