April 5th

Using AI to Prepare Complaints: Part 2, The AI

George Socha
George Socha

Using AI to Prepare Complaints: Part 2, The AI

Artificial intelligence tools can help you get work done better and faster at every stage of a lawsuit. You can turn to AI as you prepare the initial complaint and continue to use it through trial and beyond.

Today, I’ll examine how you can use AI to help in preparing a complaint. As an example, I’ll work with the complaint from an action for trade secret misappropriation, patent infringement, and unfair competition, Waymo v. Uber, filed in the Northern District of California in 2017. I know nothing about the case beyond the assertions set forth in the complaint.

As you set out to prepare a complaint in a new matter, you have a duty to ascertain that the factual contentions you intend to set forth in the complaint have or are likely to have evidentiary support.

Evidentiary Support

Evidentiary support – or the lack of it – will matter for every element of a complaint. These include:

  • Initial statement of the case
  • Naming of the parties to the case
  • Basis for jurisdiction
  • Basis for venue
  • Statement of claims or facts
  • Assertion of causes of action
  • Demand or prayer for relief

Evidentiary support can come from four places:

  1. What the plaintiff and other individuals tell or show you,
  2. What you learn examining physical objects such as the treads of an escalator,
  3. What you unearth by going through paper and other analog media, and
  4. Most importantly for this discussion, what you discover by delving into electronically stored information, or ESI.

At this early stage of the lawsuit, you already should have access to ESI you can use.

You should have met at least once with your client, identified potentially relevant data, and made sure your client has taken reasonable steps to ensure preservation of that data. This data, or at least key portions of it, should be available to you to use as you craft the complaint.

If you have worked on previous matters for the same client, there may be ESI from those matters you can turn to.

If you have worked on similar matters for other clients, there may be resources from those matters you can draw upon.

The ESI you want to examine will vary depending on the nature of the matter. In most matters, you should look at communications. These could include email messages, text and chat messages in all their various forms, social media postings, audio files such as voice mail, and video recordings such as those from Zoom and other collaboration platforms. Depending on the client and the allegations, yet other forms of electronically stored communications also could be implicated.

You also may want to look at materials related to communications, such as attachments, calendar invitations and entries, and linked content.

Most likely, there will be other types of ESI you will need to include within the scope of content you explore. Among the more common types could be word processing, spreadsheet, and presentation files; pictures, photographs, and other images; audio and video files; and databases. Depending on the data available to you, you may need to work with hundreds of different types of files.

You might also need to work with content that is in other languages or in more than one language as well as with emojis or other non-text files used to convey emotions or similar concepts.

The Scenario

As mentioned above, the complaint I am using as an example comes from an action for trade secret misappropriation, patent infringement, and unfair competition.

In the “Introduction” portion of the complaint, the plaintiff alleged a series of key actions:

“3. Waymo was recently – and apparently inadvertently – copied on an email from one of its LiDAR component vendors. The email attached machine drawings of what purports to be an Uber LiDAR circuit board. This circuit board bears a striking resemblance to Waymo’s own highly confidential and proprietary design and reflects Waymo trade secrets….”

“4. Waymo has uncovered evidence that Anthony Levandowski, a former manager in Waymo’s self-driving car project – now leading the same effort for Uber – downloaded more than 14,000 highly confidential and proprietary files shortly before his resignation….”

“5. In the months leading to the mass download of files, Mr. Levandowski told colleagues that he had plans to set up a new, self-driving vehicle company. In fact, Mr. Levandowski appears to have taken multiple steps to maximize his profit and set up his own new venture – which eventually became Otto – before leaving Waymo in January 2016….”

“6. A number of Waymo employees subsequently also left to join Anthony Levandowski’s new business, downloading additional Waymo trade secrets in the days and hours prior to their departure. These secrets included confidential supplier lists, manufacturing details and statements of work with highly technical information.…”

The plaintiff expanded on these assertions in the “Factual Allegations” section of the complaint. In paragraphs 42, 48, and 58, for example, plaintiff wrote:

  • “42… And by January 2016, Mr. Levandowski had confided in some Waymo colleagues that he planned to ‘replicate’ Waymo’s technology at a Waymo competitor….”
  • “48. After downloading all of this confidential information regarding Waymo’s LiDAR systems and other technology and while still a Waymo employee, Waymo is informed and believes that Mr. Levandowski attended meetings with high-level executives at Uber’s headquarters in San Francisco on January 14, 2016.”
  • “58…. On December 13, Waymo received an email from one of its LiDAR-component vendors. The email, which a Waymo employee was copied on, was titled OTTO FILES and its recipients included an email alias indicating that the thread was a discussion among members of the vendor’s “Uber” team. Attached to the email was a machine drawing of what purported to be an Otto circuit board (the “Replicated Board”) that bore a striking resemblance to – and shared several unique characteristics with – Waymo’s highly confidential current-generation LiDAR circuit board, the design of which had been downloaded by Mr. Levandowski before his resignation.”

Opportunities to drawn on AI

Each of these paragraphs contains assertions amenable to testing with AI-driven capabilities. In the rest of this post, I will discuss nine AI-driven capabilities that could be used to search for information needed to frame an assertion, test its accuracy, or lend additional support to back up the assertion.


An obvious first start for an overview is to go to a dashboard. The dashboard should be one that be customized. You should, for example, be able to have it show information about all the data available to you or just a subset of that data, such as the contents of one person’s mailbox. You also should be able to choose which widgets appear on the dashboard. In the example below, you see six widgets:

  1. A graphical depiction of the number of documents sent, displayed by year, with the ability to drill down by month, day, hour, or minute.
  2. The total number of documents after any active filters or searches have been applied, along with a visual breakdown of originals, near duplicates, and exact duplicates.
  3. A list of the top terms found in the data, based on the frequency of their occurrence in de-duplicated documents.
  4. The top document types in the data, with a histogram showing volume by type.
  5. The top custodians and volume of data by custodian.
  6. The predictive scores of AI models that have been run, with indicators of the comparative ratios of low scoring, middle range, and high scoring documents.

Ideally, as you change your focus the dashboard changes the results displayed. Were we working with date from Levandowski’s mailbox, we could have the dashboard display information just from his mailbox in (1) through (6), giving us a quick overview of what we could expect to find there.



Communication Analysis

Some form of visual communications analysis can be especially useful to get a quick sense of who has communicating with whom, when, and how much.

In the example below, we can see that Sally Beck communicated with Fernley Dyson, Mike Jordan, Ted Murphy, and especially Shona Wilson. We also can see that Dyson and Murphy communicated with each other, Murphy with Rick Buy, and Wilson with Scott Earnest.

Selecting the line of communications between Beck and Wilson, we can see details about those communications such as how many messages Beck sent to Wilson, to, cc, and bcc, and how many Wilson sent to Beck, as well as the top terms that appear in their communications.

We were looking at communications collected from the plaintiff’s email system, we could quickly get a sense of others with whom Levandowski exchanged email messages, both inside and outside the company, when they had these communications, and what they discussed. We might look in particular for communications he had with others outside the company in the months leading up to the mass download of files.

That examination might open other lines of inquiry, leading us to look more closely at individuals and organizations we did not previously suspect might have been involved in the trade secret misappropriation, patent infringement, and unfair competition at the heart of this case.


When exploring ESI as we prepare a complaint, we look both for data we think might be out there (“find more like this”) and for data we might not even realize could matter (“tell me something I don’t know”). Concept search gives us the power to do both these operations with one set of tools.

In the example below, I wanted to search Enron data to find documents related to the phrase “ljm” in the search box. “LJM” partnerships were vehicles used by Enron to conduct transactions off its books. In the Concept Search pane, we see the top ten related concepts. To the side, we can see yet other concepts returned by the search. We can see, as well, that we have not simply performed a keyword search as many of the results do not contain the term “LJM”.

If you were using these capabilities as you prepared the Waymo complaint, you might have searched for phrases such as “machine drawing”, “circuit board”, or “Otto”.

To better understand the relationships between the results of a concept search, you might want to use a tool akin to Reveal’s Brain Explorer. With that, you can see which concept lead to which others and follow the path from, say, “ljm” to “enron’s accountants” to “preserve documents”.

A second way of exploring concepts is with a cluster wheel, an interactive visualization designed to organize and present large volumes of unstructured data for analysis. Here, documents are grouped together based on their lexical similarity or the vocabulary used with the records.

You can use those groupings, or clusters, to quickly understand the nature of what a set of documents contains without having to reach individual documents.

Had you been using these AI-driven capabilities – concept searching and document clustering – to help you as you prepared your complaint in Waymo, you might have explored concepts ranging from “download” to “trade secrets” as a way of better understanding what story the data could tell and you could support.

Entity Extraction

A form of unsupervised machine learning, entity extraction can be used to pull together scattered pieces of information about an individual in ways that Boolean searching, say, probably never would enable you to do.

If, for example, you have been able to gather a key witness's emails, by using entity extraction you can have access in one location to:

  1. Email addresses used by that person,
  2. Pseudonyms associated with them,
  3. Positions they have held,
  4. Concepts contained in their data,
  5. People with whom they have communicated, and
  6. Other people who have engaged in similar discussions.

For better understanding what communications Levandowski was having with others, you could use these capabilities to get a fuller picture of the various email addresses he used, the nicknames he used, whom he communicated with most frequently and what topics the discussed, and who else was communicating in a similar fashion and hence might have been involved with Levandowski’s activities.

Selfie Mode

The “selfie” mode allows you to see emails a person has sent to himself or herself.

Here you can see the people with whom the custodian, Vince Kaminksi, communicated. You can see, for example, that he had 46 communications with Jeffrey Shankman, also a custodian (denoted by the “C” in a blue circle), and 200 communications with Stinson Gibner, not a custodian.

You also can see that Kaminski sent 6,545 messages to himself.

If you were using a tool like this to explore data gathered from Levandowski’s mailbox, you most likely would want pay special attention to:

  • The individuals with whom Levandowski communicated directly, looking not just as the most frequent communicators but also the least frequent.
  • Communications Levandowski sent to himself, especially ones sent from a work email address to a personal one.
  • Communications sent within a narrow time frame, as shown in the example below.

AI Models

As you look at available data during your initial investigation, you can use pre-built AI models to quickly identify data that can give you a better understanding of the situation.

Reveal makes extensive use of AI models. Generally, an AI model is a software program that has been trained on a set of data to perform specific tasks like recognizing certain patterns. Artificial intelligence models use decision-making algorithms to learn from the training and data and apply that learning to achieve specific pre-defined objectives.

The picture above shows the results from running four AI models. For each, you can see the percentage of results with law scores (<30), those with medium scores (30-70), and those with high scores (>70).

Reveal offers a Model Library which consists of a collection of pre-existing models you can use straight out of the box, extend or adapt to suit your specific needs, or stack and pack to achieve a larger objective.

You can use pre-built AI models to identify and set aside content that is not likely to contain anything useful to you.

To help more quickly find data that could help you prepare your Waymo complaint, you might start with models such as Advertisements & Promotions, Contracts, and ones focused on fraud.

You might also choose to start building a model of your own and train it as you go through the ESI available to you. This can be as simple as create a new set of tags, like the responsiveness tags in the example below. As you look through and tag documents, the system learns from your decisions and regularly reprioritizes the remaining content, putting similar documents at the front of the queue.

High Precision Classification (HPC) 

You also can take document classification to another level by using high precision classification. With this capability, instead of “liking” an entire document, you select a specific body of text – a phrase, perhaps, maybe a sentence – and training the system with that content. The system, in turn, returns documents with the specific potentially interesting language highlighted.

Sentiment Analysis

Sentiment analysis, another form of unsupervised machine learning, helps you understand the emotional significance of communications.

With sentiment analysis, we can explore the emotional significance of language used to communicate with others. Reveal AI looks for seven types of sentiment, as we covered in Getting Sentimental: Using Emotional Signals in eDiscovery:

  1. Intent: Looking at someone’s intention or purpose. “I will do this.”
  2. Opportunity: Looking for circumstances making it possible for someone to do something. “I can do this.”
  3. Pressure: Looking for attempts to persuade, influence, intimidate. “You must do this.”
  4. Rationalization: Looking for attempts to explain or justify. “It’s okay to this because….”
  5. Sentiment Alternation: Looking for fluctuations in how people speak, detecting discomfort and tone changes, mainly to detect dishonesty.
  6. Positivity: Looking for content that tends to the positive and optimistic.
  7. Negativity: Looking for content where people discuss negative things, which can lead to discovering conversations about problems.

Working with the Levandowski content, you might filter for communications with high levels of intent and pressure.

Image Analysis

To find support for the assertions in paragraphs 3 and 58, it would help immensely to be able to search for drawing of circuit boards. To that end, it is valuable to have a system that can automatically identify content in pictures and populate the system with searchable tags describing that content.

Below you can see how such a system would operate, with the example of a driver’s license.


Learn More About AI

With the right AI-driven platform and even a starter set of data, you can made great strides toward accomplishing litigation tasks such as preparing a complaint in ways that previously were difficult if not impossible to achieve with the same level of precision and certainty.

If your organization is interested in learning more about making more effective use of AI across the life if a lawsuit and how Reveal uses AI as an integral part of its AI-powered end-to-end legal document review platform, please contact us.