May 12th

The eDiscovery Processing Software Buying Guide

George Socha
George Socha

The eDiscovery Processing Software Buying Guide

If you need to review, analyze, use, or produce data in litigation or investigations, chances are you need eDiscovery processing software. Maybe you're a law firm, legal team, government agency, or service provider that already uses a tool and is considering a change, or perhaps you are new to this. Possibly you want to process electronically stored information (ESI) in-house, or maybe you could be better served by outsourcing professional services to do the processing for you.

No matter what your starting point, here are fourteen factors to consider as you decide which eDiscovery platform and processing capabilities to use.

1. What Deployment Options Does My eDiscovery Processing Software Offer?

For this important step in the EDRM, make sure the litigation response software you are considering works in the environments that matter to you. Four leading deployment options are in the cloud; on your premises; as a cloud/on-prem hybrid; or as mobile offering, installed on an appliance or laptop.

Some eDiscovery platforms make the choice for you; if you use their software, you have to process data in the location they have chosen such as their cloud implementation. Others give you a choice. Some even let you mix and match for a cost-effective hybrid option.

For more, see Which eDiscovery Platform Deployment Options Are Right for You?

2. Do I Need Extra Software?

Is your eDiscovery processing platform self-contained or does it require additional software?

Some eDiscovery platforms require additional software in order to function, such as Microsoft SQL Server. Others need access to additional programs to accomplish certain tasks, such as redaction. Yet others give legal professionals a choice of whether to use their own functions or to unplug those capabilities and plug in outside tools.

While there are many potential configurations and no one answer to that fits everyone's requirements, there is one need everyone shares, the need to know what additional tools, if any, they might want to or have to have available.

3. Does My eDiscovery Software Require That I Preprocess Data?

Some litigation support systems require users to do some form of preprocessing prior to the ingestion of data. This preprocessing might include, for example, unzipping non-email archives, separating emails and efiles, and assigning data to a particular custodian.

Look for a platform capable of recursively going through subfolders and files to process an entire import. It should be able to unzip non-email archive file types, handle loose email file types as native files, extract attachments from email, identify and separate email from non-email file types, and allow flexible data assignment on import, folder, or file levels to one or more custodians.

4. What Data Does My Processing Tool Extract?

Every end-to-end eDiscovery platform should have a processing program that can extract data from files, but they don't all extract the same data from the same files in the same way.

Questions to ask include whether a tool culls all metadata in a file or only selected fields; whether it's culling all text in a file or only a subset, such are the for a number of characters; from how many types of files it can extract data; and whether additional applications are required to allow it to extract metadata and text from certain file types

5. What Are My Processing Program's Storage Needs?

Data processing software can be I/O, in input/output, intensive so speed and space matter in your discovery projects. The faster the storage, the better the performance. If you have more than one storage location, the faster the connections between the storage locations, the better the performance.

If a storage location runs out of space, the processing system may hang on the current operation or even crash the entire system. To account for data expansion and additional files created during import, you should have adequate free space available on the storage devices you are using.

6. Does My Process Software Support ECA?

The ability to perform early case assessment (ECA) or early data assessment (EDA) can give you a jump start on understand what data you have and what you might want to do with that data.

Some eDiscovery processing platforms have ECA/EDA capabilities built in, others allow for ECA/EDA capabilities to be bolted on, and with yet others that is a separate workflow. Finding out where your processing software fits on that spectrum can help you understand whether, when, and how you can deploy ECA/EDA.

For more, see Can Early Case Assessment Tools Reduce Review Costs?

7. How Does My Electronic Discovery Software Recognize File Types?

Look for eDiscovery processing software that identifies a file's type using the file's signature, not just its extension. A file extension can be altered, not so much a file signature.

A file extension is the characters at the end of a file name, such as ".pdf" or ".docx" in "file.docx". A file signature signature is a unique set of identifying bytes written to the header of a standardized file type. The file signature "50 4B 03 04" identifies the file as an MS Office Open XML Format Document.

8. How Does My Processing Tool Hash Files?

eDiscovery solutions typically hashes files as it imports them. Hash values can be used to identify identical content. They can be used to determine whether two copies of a files are the same. They might be used to help created email threads. You want eDiscovery tools capable doing that.

You also want to understand what just what hashing algorithms your processing software uses, what content the software hashes, how it hashes that content, and what it does with the hash values it gets.

9. Can I Filter Out Unwanted Files?

Not all files teed up to be processed really warrant processing. Identifying unnecessary files and filtering them out before heavy-lifting processing and later data analytics can be an effective way to identify relevant documents and streamline the eDiscovery process.

There are various methodologies used by processing software to identify and filter out unwanted files. These include using the NIST List, a set of hash values maintained by the National Institute of Standards and Technology (NIST); filters based on file type; and filters based on file dates.

10. How Does My Processing Software Handle Deduplication?

eDiscovery processing software should be able to identify and isolate duplicate files.

There is no detailed industry standard for deduplication, but it is still an essential component of data management. You should find out how the software defines duplicates, what criteria are used to identify duplicates, how duplicates are handled, and what is done to track information both about duplicates and the deduplication process.

Inquire, as well, as to whether the platform addresses both exact duplicates and near duplicates.

11. How Does My eDiscovery Software Handle OCR?

OCR, or optical character recognition, can be used to generate text from documents images.

When evaluating OCR, try to find out what technologies are use to perform OCR, as speed, accuracy, and capabilities can vary greatly. Factors to consider include whether the system deploys auto-language detection, how many languages it supports, stated and actual accuracy rates, and any measures in place to improve accuracy.

Ask, as well, what the platform does with OCR text. Find out whether it writes OCR text over extracted text, for example, writes extracted text over OCR text, or makes both sets available for use later on.

12. How Does My Platform Handle Foreign Language Content?

Today, content comes in many languages, not just English. Even if your organization is conducting a legal hold or internal investigation, it's possible to come across foreign language documents. That means legal departments need systems capable of handling a plethora of languages instead of relying on Google translate.

Your processing software should be able to translate that content from one language to another, not just from a limited number of languages to English. It should be able, as well, to produce documents with foreign language content in ways that allow it to be used by artificial intelligence tools such as active learning (or TAR, predictive coding or continuous active learning - whatever you want to call it).

For more, see Are You Spending Too Much For Foreign Language Document Review?

13. Can My eDiscovery Suite Process Pictures?

Not all electronic documents contain searchable text, but might still have relevant information. Images such as photographs, advertisements, and drivers licenses are in many of the document stores we use for data collection these days. Before you can search for images, you need to process them in a way that delivers searchable content.

Find out what your processing software is able to do with images. Does it detect their presence? Does it deliver searchable metadata, metadata already associated with the files such as file name, creation date, and location coordinates? Can it go farther, identifying content in pictures and delivering searchable descriptions of the content?

For more, see Image Recognition and Classification During Legal Review.

14. What Does My Processing Software Do with Audio Files?

Audio files have been part of eDiscovery content since the 1980s, but that does not necessarily mean your processing software is able to convert audio content into something you can search and analyze.

Ask how your processing software handles audio content. If the software can convert audio to text, ask how it performs the conversion, how it delivers the converted text, and what it does to maintain a connection between the audio content and the converted text to allow for more effective searching. Find out what the processing does to deliver acceptable accuracy rates. And inquire, as well, about what the system does to allow AI to be applied to audio content or converted text.

Back to the Big Picture

When Reveal acquired Mindseye in 2019, it added best-in-class enterprise-grade eDiscovery processing capabilities to its suite of offerings. Reveal has continued to enhance its processing tools. In Processing 10.1, for example, Reveal added features such as the ability to create separate instances within the same SQL environment and enabling processing agents to run as a service, and delivered more than 60 additional enhancements and corrections.

Reveal's commitment to delivering the best processing technology is part of its mission to lead to space with the overall best eDiscovery software solution - a mission further supported with the 2020 acquisition of NexLP and the 2021 merger with Brainspace.