Legal Document Review
October 15th

Legal AI Software: Taking Document Review to the Next Level

George Socha
George Socha

Legal AI Software: Taking Document Review to the Next Level

Legal AI software is not man against machine, steel-driving John Henry versus the steam drill. Rather, today’s real-world AI solutions helps attorneys and allied professionals become the modern-day equivalent of The Six Million Dollar Man, able to accomplish superhuman data analysis and review. Or, if you prefer fact to fiction, AI software is the legal world’s equivalent of the tethered exoskeleton system being created by Hugh Herr’s Biomechatronics group at the MIT Media Lab.

Just as Professor Herr’s team has developed a system that provides a subject’s biological legs with an unprecedented level of gait enhancement, legal AI software provides lawyers’ and their colleagues’ inquisitive and dogged minds with tools that help them develop an unprecedented level of insight into the lawsuits and investigations they conduct.

Through well-planned and executed use of AI software, legal teams can hone in quickly on the data that matters to them most, while also identifying and setting aside potentially vast amounts of data that do not warrant further consideration.

Attorneys and paralegals can put unsupervised machine learning software to work from day one. Even as the legal professionals conduct their first interviews and evaluate the first documents they see, the technology can begin to ferret out concepts, documents, and communications to help bring shape and direction to the investigations and lawsuits. Some of the insights will confirm what the investigators suspect, others may tell them something important they did not know.

As they get farther along, counsel and allied professionals can call on supervised machine learning software to find more documents and communications similar to – or very different from – the ones they already know they care about.

At the same time, legal teams should be drawing on the capabilities offered by natural language processing – capabilities that let them apply linguistic intelligence to analyze text from email and documents, emotional intelligence to detect sentiment analysis and fraud signals, and behavioral intelligence to conduct anomaly detection and social network analysis.

For those matters with still images and video, in-house and outside personnel should also consider using computer vision, a third type of AI (in addition to machine learning and natural language processing) that can recognize and label entities depicted in photos and other images.

Artificial Intelligence: It’s History and Usage

In a 1950 paper, Alan Turing proposed to consider the question, “Can machines think?”, discarded that question in the first paragraph, and replaced it with “the imitation game” problem. With a 1947 lecture and then this paper, Turing launched the quest to define and deliver artificial intelligence (a phrase never actually used in the paper).

The term was first used in publication five years later, when Dartmouth professor John McCarthy wrote “A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence” in which he suggested that a 2 month, 10 person study be conducted the following summer.

A lot has happened in the intervening decades: reason systems, expert systems, and then neural networks all were rolled out. We have moved from Turing’s and McCarthy’s early academic inquiries to actual and effective deployment of an array of AI software. Google searches seem to know what you are looking for before you do, Siri and Echo understand voice commands, pictures and faces posted to social media are instantly recognized and tagged, and Amazon and Netflix feed up buying and viewing recommendations.

In the legal space, especially for eDiscovery. Components such as linguistic intelligence, content classification, emotional intelligence, and behavioral intelligence now are used by legal teams to mine data for patters and anomalies, map custodian relationships and conversations, autonomously classify millions of documents, and target incidents at the source.

Unsupervised machine learning identifies concepts, entities, and even images in documents and feeds that information to legal teams. Supervised machine learning, in the form of predictive coding or technology assisted review (TAR), is widely available. Natural language processing is used in a host of ways, up to and including the delivery of a library of AI models that are used for purposes as varied as identifying privileged content and finding conversations suggesting threatening behavior.

The use of legal AI software goes beyond eDiscovery. Legal professionals at in-house legal departments, law firms, and alternative legal service providers (ALSPs) use AI tools to help with [case outcome predictions], due diligence, contract review, contract management, and NDAs.

It is important to remember that AI remains “artificial” intelligence, not human intelligence. AI systems do not (yet) understand cause and effect, and situations they haven’t seen before continue to confound them. Even with advances such as sentiment analysis that draw on extra-linguistic or world knowledge to help better understand subtleties of language, the technology continues to have a long way to go before we get to a real John Henry v. steam drill moment.

Artificial Intelligence Defined

Artificial intelligence, according to John McCarthy, “is the science and engineering of making intelligent machines, especially intelligent computer programs. It is related to the similar task of using computers to understand human intelligence, but AI does not have to confine itself to methods that are biologically observable.”

The Internet Encyclopedia of Philosophy says, “Artificial intelligence (AI) would be the possession of intelligence, or the exercise of thought, by machines such as computers.”

IBM offers a more concrete definition: “In computer science, the term artificial intelligence (AI) refers to any human-like intelligence exhibited by a computer, robot, or other machine. In popular usage, artificial intelligence refers to the ability of a computer or machine to mimic the capabilities of the human mind—learning from examples and experience, recognizing objects, understanding and responding to language, making decisions, solving problems—and combining these and other capabilities to perform functions a human might perform, such as greeting a hotel guest or driving a car.”

artificial intelligence map

Legal AI software in eDiscovery and Document Review

What does artificial intelligence actually do, what does it look like in practice? With some AI tools, users point the tools to a set of data and ask the programs to train themselves. Concept searching is one example from eDiscovery, e-mail threading another. With other AI solutions, humans train the system. When using predictive coding tools, for example, lawyers, paralegals, contract reviewers, and subject matter experts tag documents they deem to be on point and the systems look for more like those.

Rather than deal with AI as a monolith bloc, let’s arrange AI capabilities into two levels of categories: machine learning (unsupervised and supervised), natural language processing (text and speech), computer vision (images and video), and robotics.

Under the second level of categories, we will explore legal AI software that are available today – not vaporware and new technology that might (or might not) appear at some point in the future.


Machine Learning

Machine learning’s origins date to a 1942 mechanical adding machine designed by Blaise Pascal. Modern machine learning was pioneered by Arthur Samuel, an IBM researcher and later Stanford research professor whose work in the 1940’s through 1960’s focused on making computers learn from their experience. Machine learning focuses “on developing programs that teach computers to change when exposed to new data and to grow. Its goal is to understand and follow the methods by using algorithms to do that task automatically without any human assistance.”

Two forms of machine learning widely used by the legal profession, particularly for eDiscovery, are unsupervised and supervised machine learning.

Unsupervised machine learning

Unsupervised machine learning is, essentially, an exercise in having computers “tell me something I don’t know.” Computer algorithms are pointed at data. The algorithms organize that data based on patterns, similarities, and differences. The algorithms work on their own; they do not rely on people to train them. They can, however, learn from their own past experience.

Unsupervised machine learning tools are especially useful for investigative activities. These include investigations, of course, but also lawsuits, particularly across the EDRM spectrum of eDiscovery activities, and any other situation where information needs to be unearthed, organized, and evaluated.

Types of unsupervised machine learning used for document review include clustering, email threading, pattern recognition, categorization.

Clustering, for example, is used to group together similar pieces of information. Unsupervised machine learning software is pointed at a body of data, such as a collection of email messages. The software creates clusters of data it deems to be similar to each other. To accomplish this, the software uses criteria built into it or fed to it. The system assigns a number to each cluster, called a cluster ID, that can be used to facilitate further analysis and display of the data and the clusters.

Email threading allows one to view email messages by conversation, rather than only in isolation. This capability is activated by default in Outlook and can be turned on with Gmail. It is not necessarily available in all eDiscovery platforms and review tools, which need to be able to handle email communications from multiple different platforms all at the same time.

Supervised machine learning

Supervised machine learning is a way to “find more like this.” With supervised machine learning, users train the system. In eDiscovery systems that use supervised machine learning, users typically are presented with pieces of information such as individual email messages. For each piece, the user is asked to make a choice about how that piece of information should be identified, such as whether it should be tagged as relevant or not relevant. The system draws on these decisions to make recommendations for how additional content should be handled. When used for document review, variations of this approach go by names such as predictive coding, technology-assisted review (TAR), TAR 1.0, TAR 2.0, and continuous active learning (CAL).

Varying forms of supervised machine learning are deployed for investigations and document review. These forms include classification, models, named entity recognition, custom-named entities, case outcome prediction, and contract clause detection.

Natural Language Processing (NLP)

“Natural language” refers to the languages people use, such as English, as opposed to the languages computers use. “Natural language processing” (NLP) refers to AI technology that analyzes and understands natural language.

NLP technology can be pointed to text, such as the content of email messages, social media posts, and other text-based communications, legal documents, advertisements, instruction manuals, warning labels, even web pages.

NLP then can work with that text in various ways. It can perform sentiment analysis, measure emotional signals, and summarize blocks of text. I can be used to normalize names and detect topics.

When combined with machine learning, natural language processing can be used to build a wide array of reusable models. These models can be used to find potentially privileged content; locate information that tends to support or refute asserted causes of actions and defenses to those causes; and bring to light problems before they have become formal complaints, brought on investigations, or led to lawsuits. They also can be used to identify content unlikely to be of interest in an investigation or lawsuit, such as out-of-office messages and communications about sports, social activities, family and personal plans.

Models built by Reveal include:

  • Privileged Content: To identify conversations involving requests for legal advice, legal advice itself, and documents used to prepare for depositions.
  • Asking for Advice: To identify communications where participants solicit or share advice.
  • Advertisements & Promotions: To identify communications containing advertisements, newsletters, and other forms of promotional material.
  • Gifts & Entertainment Kickbacks: To identify communications about gifts and forms of entertainment that might have monetary value such as tickets to sporting events, theater performances, and concerts.
  • Contracts: To identify contractual agreements even when found in loose files, documents attached to messages, and as excerpts included in emails.
  • Sexually Explicit Comments:  To identify conversations containing descriptive language related to sexual acts or inappropriate behavior, including ones that are general in nature as well as ones that target a specific person or situation.
  • Hate & Discrimination: To identify communications disparaging or suggesting animosity toward an individual or a group due to race, color, national origin, sex, disability, religion, or sexual orientation.
  • Pricing & Fees: To identify conversations with colleagues, clients, or outside vendors about pricing or fees for a good or service.


Computer Vision

Computer vision is a third form of artificial intelligence used in document review, albeit not yet with any great frequency.

Using images and deep learning models, legal AI software is able to recognize and label entities depicted in photos. In Figure X, image recognition applied over a dozen labels to a streetscape. The AI was even astute enough to identify which regions of the image represented each entity. Some image labeling models even can recognize logos, ad, and other targeted types of images.

Image labeling technology can allow legal professionals to more quickly select images likely to be relevant while excluding or deprioritizing ones that are not, significantly reducing discovery cost and time.

If your organization is interested in leveraging the power of legal AI software, contact Reveal to learn more. We’ll be happy to show you how our authentic artificial intelligence takes review to the next level, with our AI-powered, end-to-end document review platform.