Document Review AI TAR
March 5th

What is Technology Assisted Review?

George Socha
George Socha

What is Technology Assisted Review?

If we are going to discuss Technology Assisted Review, or TAR, we should start at the beginning.

"Electronically stored information", or "ESI", is information that is stored in electronic form. Electronic discovery, or eDiscovery, is about what we do with ESI.

Stepping back a little, a lawsuit or investigation almost always is the outgrowth of one or more events where one or more people did something. Lawyers and investigators need to locate - or "discover" - information about those events, those people. They need this information for three basic reasons. First, they use it to figure out what appears to have happened. Second, they use ESI to help build that stories they need to tell, their explanations of what happened, as well as to counter the stories told by other such as opposing counsel. Third, they generally need to produce some portion of the ESI to others such as other litigants or regulatory agencies.

Often, this process is described as "finding relevant documents". The information they need to discover takes one of four forms:

  • Information stored in electronic form. This is ESI. ESI is anything stored on a computer, on a mobile device, on a corporate network, in the cloud. It covers all files types. Email typically accounts for the largest number of files, but eDiscovery processing tools typically work with hundreds of file types. Reveal Processing, for example, can process over 900 different file types.
  • Information in people's heads. This information is gathered by talking with people - chatting with them informally; interviewing them; or asking them questions during depositions, at hearings, at trials, or in other situations where they are under oath to tell the truth. Information gathered from people's head mattered then and it matters today.
  • Information stored on paper. Information stored on paper or similar media such as microfilm used to be one of two main sources of information lawyers and investigators turned to find out what happened (the other was, and still is, information in people's heads). Sometimes we only had to leaf through the contents of a single manila folder. Other times we had to scour the contents of entire warehouses. Today, however, information on paper generally is a minuscule amount of what we deal with.
  • Information stored in tangible objects. For some matters, you need to examine physical things. I used to work on all-terrain vehicle, snowmobile and motorcycle cases, and for those we almost always wanted to see the actual vehicle involved in the alleged event. In one notable lawsuit, the investigator called me to say, "You aren't going to believe this, but they sued the wrong company. Someone else's nameplate is on the gasoline tank!"

ESI changed everything. Even a small matter can involve a million messages. Large matters easily run to 10s or even 100s of millions of files. With this explosion in the volume of information we need to deal with, mere manual review no longer was enough.


Then Came Technology Assisted Review

Artificial intelligence came to the rescue.

TAR is a process of having computer software electronically classify documents based on input from expert reviewers, in an effort to expedite the organization and prioritization of the document collection.

"Technology Assisted Review", or "TAR", is one of several names the legal industry uses for a type of artificial intelligence called "supervised machine learning". TAR is, per the EDRM website, "a process of having computer software electronically classify documents based on input from expert reviewers, in an effort to expedite the organization and prioritization of the document collection."

Typically, TAR is deployed in lawsuits and investigations to help find ESI of interest or for culling out ESI that is deemed not worthy of further consideration. Other names for TAR include "predictive coding" and "computer assisted review".

TAR can be used in addition to or to a certain extent in place of "search terms" - specific words that are hoped will identify responsive documents. It also can be used to reduce the amount of manual review required to get through a dataset, reducing the number of documents on which reviewers need to lay eyes.

We discussed machine learning in greater detail in an earlier post, Legal AI Software: Taking Document Review to the Next Level. Modern machine learning dates back to the mid-twentieth century. It focuses “on developing programs that teach computers to change when exposed to new data and to grow. Its goal is to understand and follow the methods by using algorithms to do that task automatically without any human assistance.”

For our purposes today, machine learning takes two forms, unsupervised and supervised. Both are forms of categorization, ways to sort documents into buckets such as "relevant" and "not relevant".

Unsupervised machine learning is, essentially, an exercise in having computers “tell me something I don’t know.” Computer algorithms are pointed at datasets. The algorithms organize that data based on patterns, similarities, and differences. The algorithms work on their own; they do not rely on people to train them. They can, however, learn from their own past experience.

Supervised machine learning is, as much as anything else, a methodology to “find more like this.” Users are actively involved in process. They are presented with information, such as an email message, and asked to "classify", or make a binary choice about, that information. They might be asked to decide whether all or some part of a document is relevant, or privileged, or related to an issue in the matter. The machine learning system "learns" from that decision: If that document's contents suggested fraudulent activity, its says to itself, what similar documents can I find that might also suggest fraudulent activity.

For most practical purposes, TAR comes in two flavors, TAR 1.0 and TAR 2.0. With both flavors, the system presents batches of documents to review teams. Each reviewer goes through each batch one document at a time. The reviewer classifies each document, deciding whether that document (or some subpart of that document) meets a preset criterion. Often we describe this as the document's "responsiveness". Some systems limit reviewers to a binary choice: yes or no. Other systems allow the reviewer not to decide, instead skipping to the next document in the batch.

The main different between TAR 1.0 and TAR 2.0 is the workflow surrounding those decisions. TAR 1.0 systems are, essentially, ones designed to be trained and then set loose. TAR 1.0 starts with the creation of a specific set of documents (a "seed set" or a "control set") that reviewers will use to train the system. Reviewers go through that set until a predetermined threshold is met. Typically, arriving at that threshold is something akin to the system reaching a point where it says, in essence, "From the decisions you reviewers have made, I now know enough about what you are looking for that all by myself I can go through all the remaining documents and with a high degree of confidence find all the other documents you want." Reveal offers two flavors of TAR 1.0, "COSMIC Active Learning" in Reveal AI, and "Predictive Coding" in Brainspace.

TAR 2.0 systems are designed with a different goal in mind. They are meant to help push the most interesting documents to the front of the line. There is no need for a seed or control set. Each reviewer starts with a batch of documents, depending on the system anywhere from 10 or so documents up to several hundred. Those batches might consist of randomly selected documents, or those could be documents assembled with any of a wide variety of themes in mind. After a reviewer codes all documents in the batch, that batch does back to the system. The system looks at the coded information - the classifications. With that information, the system reevaluates the remaining documents. It places at the front of the line those documents most akin to the ones classified as responsive, privileged, or whatever the criterion was. Then it grabs the next batch of documents (10, 100, whatever the batch size might be), and feeds that to the reviewer. This process continues until someone decides it is time to stop.


TAR Can Help

As discussed above, by using TAR you can more efficiently and reliably find content of interest.

With TAR 1.0 systems, you are able to work through a large volume of data more quickly and more consistently that would be case if one relied on human reviewers following a traditional linear review process.

With TAR 2.0, you can greatly accelerate the review process without sacrificing, and indeed possibly enhance, quality. That way, you can find the content of consequence very quickly.

You also can use both processes, TAR 1.0 and TAR 2.0, quality control, for example to evaluate your review team's work.


TAR Too Has Its Limits

TAR is valuable in many ways, but it is not a tool for all seasons. Key considerations, when trying to determine whether TAR is right for you, include:

  • What you are trying to accomplish. If what TAR does fits with what you want to achieve, changes are good that TAR will be able to help.
  • What data you need to work with. TAR works with text. If the data you have is not text or cannot readily be converted to text, TAR probably won't be of much use. If you have audio files, TAR won't help. If, however, you convert the contents of those audio files to text, the TAR can become a valuable tool to work with that data.
  • The richness of the content. TAR systems are designed to work with larger blocks of text such as documents and paragraphs. If your population of ESI consists primarily of short messages, say 10 words or less, TAR may not be an effective tool.
  • Whether you have access to someone who knows how to work with TAR. TAR is a tool, and as with any tool, is only as effective as the people using it. If you want to use TAR but don't know where to start, find someone knowledgeable who can help you. If you are at law firm, or legal department you might have a person or even entire department dedicated to litigation support or eDiscovery; go there. If you don't have internal resources, look outside, for example to legal support providers (LSPs), with whom your organization may already work.


TAR Will Replace Me!

"TAR will replace lawyers" is a variation of the oft-cited "John Henry versus the steam drill". As I discussed in Legal AI Software: Taking Document Review to the Next Level, TAR is not about human versus machine.

Rather, TAR offers us the modern-day equivalent of The Six Million Dollar Man, able to accomplish superhuman data analysis and review. If you prefer fact to fiction, AI software is the legal world’s equivalent of the tethered exoskeleton system being created by Hugh Herr’s Biomechatronics group at the MIT Media Lab.

So no, TAR is not about to replace us. Instead, it opens up new and powerful possibilities for making smarter, faster, and more informed decisions, letting you get to key content and develop essential insights quickly.



If your organization is interested in leveraging the power of legal AI software, contact Reveal to learn more. We’ll be happy to show you how our authentic artificial intelligence takes review to the next level, with our AI-powered, end-to-end document review platform.