TAR 1.0 or TAR 2.0: Which method is best for you?
- March 15, 2020
- by Serge Kernisan
At one time or another, most lawyers involved in eDiscovery have felt the unique pressure of a slow-moving document review. That pressure to speed a review is one of the reasons that eDiscovery is effectively ground zero for today’s exploding use of artificial intelligence (AI) in law.
According to a recent Altman Weil survey, 58% of all respondents said that out of 10 options to improve law department efficiency, the most frequently cited is greater use of technology tools to aid in speed and accuracy.1
AI, one of the key technologies for increasing speed and accuracy in workflows in just about every industry, has been part of the litigation discovery process for nearly a decade. It started in 2010 with technology assisted review (TAR), also known as predictive coding. The TAR workflow is essentially an iterative process where a subject matter expert (SME) reviews document samples and then the computer applies coding to the total documents based on what it learned from the samples. That process is repeated until accuracy levels meet acceptable standards.
While TAR has evolved considerably and steadily gained popularity, new techniques such as active learning, or TAR 2.0, are now emerging. In TAR 2.0, the user can begin reviewing any set of documents and then use tagging calls to predict tags for other documents in the database. The more documents you review, the more accurate the results. This eliminates the need for an SME to conduct multiple reviews of random sample sets.
To make this distinction clearer, let’s take a more detailed look at TAR 1.0 versus TAR 2.0.
The legal marketplace offers many tools, methods, and protocols that claim to be TAR, including predictive coding, assisted review, advanced analytics, concept search, and early case assessment. All of these practices have all gained traction since the publication of the first federal opinion approving TAR use (Da Silva Moore v. Publicis Groupe, 287 F.R.D. 182 (S.D.N.Y. 2012)). TAR in its original form, however, is a multi-step process. Depending on whether one is using simple active learning (SAL) or simple passive learning (SPL), TAR typically involves anywhere from 6 to 10 steps. 2 These TAR 1.0 processes help cut the number of documents needed for review and thereby dramatically reduce the number of time humans must spend on document review, which, of course, can drastically cut review costs in the form of attorneys’ billable time to clients. The process also makes review more efficient by ensuring that contract reviewers are generally looking at only the most relevant documents.
But traditional TAR 1.0 also has drawbacks, chief among them the need to review randomly selected sample document sets or seed sets, and to do so multiple times. These sample sets sometimes include documents with low value (e.g., vague text or text unrelated to any other documents), and that means you need to perform multiple iterations of these sample reviews to achieve the desired accuracy. The use of randomly selected samples also means that human reviewers don’t have the opportunity to provide feedback to the machine learning models by introducing documents of higher value that are outside of the prescribed samples. And in fact, a chief complaint from users of TAR 1.0 is that it takes too many iterations and, ultimately, too much time to reach targets for accuracy.
These stated drawbacks contribute to why TAR 2.0 is so promising.
TAR 2.0 consists of Continuous Active Learning (aka CAL), which is a more recent TAR protocol that enables a system to continuously analyze the machine learning results as humans review documents without the need to begin by analyzing static, randomized samples. The result is a non-iterative implementation of TAR that continuously improves as the review progresses by re-ranking the entire data set with each new batch of data.
In Casepoint, for example, a user can begin a TAR 2.0 session by reviewing as few as 50 documents (although our recommended ranking threshold is every 100 documents), and at each ranking threshold, the model re-ranks the corpus automatically. Doing this in tandem with Casepoint’s Dynamic Batching feature, the user ensures that they are always looking at the highest-ranked documents. This allows you to strengthen your model faster because TAR 2.0 will continue to present documents in the batches until none of the documents presented are of relevance. Another benefit of TAR 2.0 is the ability to run multiple sessions simultaneously, where each session represents a different legal topic or issue you are trying to find relevant documents for. Being able to “bucket” groups of documents by relevant issues and have people dive into the review right away is a huge step forward. TAR 2.0 makes this possible, and it can be fully controlled by the user without the need for a project manager to oversee and facilitate the process.
Also, the active learning protocol repeatedly and continually refines its understanding of which documents are most likely to be relevant based on the user’s feedback regarding the documents already presented.
Because a number of steps commonly associated with TAR are unnecessary in TAR 2.0 with active learning, users typically achieve superior results with less review effort.
So which is better?
Whether you are using TAR 1.0 or TAR 2.0 protocols, the goal for both is the same: increase accuracy and greatly reduce review time and the number of humans needed to review documents for discovery. Both kinds of TAR improve workflows, but neither protocol necessarily provides better information to develop legal arguments or strategy. The primary advantage of TAR 2.0 is that it gives the end-user more control over the process and the ability to run multiple sessions at once. For most organizations, this makes it the obvious choice.
This brochure goes in-depth about the role of artificial intelligence in eDiscovery, including technology-assisted review.
Advanced Analytics & AI
1 Altman Weil, 2017 Chief Legal Officer Survey
2 Maura R. Grossman & Gordon V. Cormack, Evaluation of Machine-Learning Protocols for Technology-Assisted Review in Electronic Discovery, 2014 Proceedings of the 37th Ann. Int’l ACM SIGIR Conf. on Research & Dev. in Info. Retrieval, 153-62 (2014).