There’s no way to talk about eDiscovery or document review until 2021 without mentioning technology-powered review. In its broadest usage as a technical term, TAR can refer to virtually any type of technical support. In its narrower use, TAR refers to techniques that involve the use of technology to predict the decision a human expert would make about a document. In this narrower sense, TAR is often shipped with a version number – TAR 1.0, TAR 2.0, and more recently TAR 3.0.
While some tend to advocate the superiority of a single approach, each version has its merits and place. Understanding the underlying process and technology is required in order to choose the right approach for a specific detection need. It is also important to know some of the variables to consider when choosing the right TAR workflow for a particular matter.
The best solution to any discovery challenge can usually be determined by considering several factors and how these factors interact with the type of document set to be searched. Some of the separate but interacting considerations include:
Time: Time should be taken into account both in terms of how long it will take to reach key milestones – starting the review, understanding the contents of a document set, and ultimately producing – and how long it will take to technical experts a system, if necessary.
Costs: Aside from the harsh costs associated with in-house staff, as well as attorneys, suppliers, and document reviewers, the opportunity costs associated with distracting subject matter experts from other tasks must be considered in order to train a predictive model. In addition, it should be checked whether the chosen approach allows an early estimate of the number of documents to be reviewed as well as the number of expected response documents, which helps the team to plan the review as efficiently and cost-effectively as possible.
Knowledge of the matter: The degree to which facts of a matter are known prior to reviewing the document can affect the ability to train a model if necessary. In addition, prior knowledge can have an impact on how quickly a team needs to have access to “the right documents” in order to make both tactical and strategic decisions. Knowing the case or the information contained in the collection of documents can affect a team’s tolerance for finding surprises in the data relatively late in the review process.
Quality standards: As TARs become more prevalent in compliance and detection, so too are the setting of targets for precision and recall related to the fulfillment of detection obligations. If minimum threshold values are known for an acceptable quality, these can influence the choice of technology and workflow.
Document collection facts: Completeness is one of the important factors to consider when collecting the documents itself. So is all the data to be assessed available, or is a TAR solution expected to take into account the ongoing ingestion of new data? Additionally, the abundance or diffusion of responsive material in a document population can affect the performance of various technologies and workflows, and can significantly affect the time to completion.
The TAR landscape
After evaluating the case under review, the set of documents under review, and other variables described above, teams can make an informed decision about which TAR solution is best.
Predictive Coding (TAR 1.0) uses examples of relevant and non-relevant training documents – the training kit – to prepare a system for classifying documents. Typically, the training kit is coded by a subject matter expert so that the system can replicate an expert’s knowledge. A hallmark of TAR 1.0 solutions is that training is a finite process that precedes the evaluation or coding of all documents. The predictive model and related results will be frozen after the training is completed. Changes to the understanding of relevance for SMEs or to the documents to be assessed therefore require the creation of a new model.
One advantage of TAR 1.0 solutions over traditional linear review is that responsive documents are reloaded during the review process, making critical information available to teams as quickly as possible.
In the second generation of technology-assisted review solutions, TAR 2.0, the underlying continuous active learning (CAL) technique was specifically applied to improve the challenges posed by one-off training for TAR 1.0. Continuous learning reflects that the predictive model is updated throughout the review based on all human and active coding decisions made and related to the system that uses the updated model to advertise the documents with the highest likelihood of being at the beginning of the Responding verification queue.
With TAR 2.0 solutions, verification can begin immediately without prior training of a model. While it is preferable to include SMEs in the early review, this is not a strict requirement as the model will ultimately smooth out inconsistent decisions. The low upfront investment in training in TAR 2.0 is seen as an advantage over the TAR 1.0 process, especially if it reduces the burden and opportunity cost of using SMB code documents as part of initial training.
Now that TAR 2.0 is well established, innovators are pondering what the next generation of TAR might look like. Some believe that the best development of TAR is to put advanced tools in the hands of everyone, but TAR 3.0 might better be seen as an opportunity to combine the benefits of continuous active learning with techniques that minimize the risk of two types of surprises : in terms of content and costs.
- In order to minimize content surprises, TAR 3.0 solutions should be designed in such a way that the system can access a large number of documents at the beginning of the process. Minimizing surprises only requires in-depth knowledge of the document population. This can be achieved through rigorous sampling and validation.
- To minimize cost surprises, TAR 3.0 solutions should include total wealth determination methodology to support the principled review threshold and allow teams to predict total volume to increase staff efficiency.
The hallmark of TAR 3.0 solutions should be the enrichment of CAL through the use of statistically sound methods to allow early access to all documents, including examples of responsive documents.
Over the past few decades, the ability to create and store documents digitally has resulted in an explosion of discoverable data. This has resulted in a number of innovative tools that can be used to collect, create and review this data quickly and efficiently. Ever-changing technology touches every stage of the discovery lifecycle today, and nowhere is this more evident than when reviewing documents.
Every conversation today about document discovery or review involves technology-enhanced review. Therefore, it is important to understand the variables and the various solutions available as part of each TAR version when deciding which approach is optimal for each issue.