Search in Reveal 11: Metadata Filtering

George Socha
George Socha

Search in Reveal 11: Metadata Filtering

This Search in Reveal 11 post looks at using metadata filters to search ESI.

In earlier posts in this series, I wrote about combining keyword and concept searches (Search in Reveal 11: Keyword, Concept, or Both), searching with just a few clicks (Search in Reveal 11: What a Click Can Do), and using the new Term List Search function in Reveal 11 (Search in Reveal 11: Term List Search). Today’s post focuses on another form of search: filtering on metadata.

Electronic files basically consist of two parts, content and metadata. They contain content such as the text of an email message, the slides in a PowerPoint file, and the sound in an electronic audio recording.

Files also contain metadata. The short and not so helpful definition of metadata is “data about data”. In the world of electronic discovery, metadata is information that defines and describes the electronic files we work with. (There is another broader definition of metadata, used by researchers, librarians, and others outside our field. If you want to read up on that, you might start with the TechTarget page on metadata by Garry Kranz.)

For us, metadata includes information from files, such as the file’s type (msg, pdf, doc, and so on). It includes information about where the file came from, such as the name of the file’s custodian. It also includes information about what has been done with the file, such as an issue tag or a tag for responsiveness applied during review.

Metadata is rich in potentially valuable information. A single email message might have 40, 100, or more fields of metadata that can help you understand who communicated with whom, when, and about what. Standard Office files metadata can provide insight into who created or worked on a file, and custom metadata can reveal much more. Audio and video metadata might indicate who created an AV file, where and when the file was created, with what equipment, and where it was loaded or stored. And, of course, yet other types of files will have yet more types of metadata.

With access to metadata and the ability to work quickly and easily with that information, you have an opportunity to build, test, and refine your understanding about events critical to a lawsuit or investigation in ways you cannot achieve with content alone.

With Reveal 11, we give you access to metadata and abilities to use it like you have never had before. When you go to Reveal 11, you can see three default options shown under FILTERS: Formats, Custodians, and Tags or you can dig deeper using Advanced Search. I will discuss both approaches.

1 - default options-2

Filtering on Formats

When I click on the Formats item, displayed under Formats are the top five document types in my data along with counts of documents for each of those document types. Document types are expressed by their common Windows file extensions. Also displayed is View all…, which I will discuss below:



Filtering on a Single Format

Under Formats, I can opt to select a single format. Here, I selected the xls filter. When I did that, the Dashboard was updated to display information about the 59,116 Excel files in the dataset of 984,054 Enron files. Under Formats, only xls is shown; the others, such as msg, no longer are displayed. At the same time, the search bar now includes a pill the says Extension: xls.

Because I filtered on xls files, I also updated the date type used for the Timeline. The default date is Date Sent, but spreadsheets do not have that information as part of their metadata and hence there would be no timeline to graph. To address that, I changed the date type to Master Data, as seen in the screen capture below:


Finding More Formats and Filtering on More than One Format

If the file type which I want to filter on is not listed in the top five formats, I can click on View all…:

Clicking on View all… opens an option box labeled Extension. The option box shows extensions and file counts for those extensions. By default, the option box displays 10 extensions at a time; you can change that to 25, 50, or 100. You can use the option box to search for and select one or more extension types:

Here, I selected two file extensions, jpg and gif, to search for:

Below are the results of the search. Under FILTERS, you can see two formats listed, jpg and gif, along with document counts for the file types. In the Document Types pane, you can see visual representations of the numbers of each of those file types that were found by the search. At the top, you can see a pill saying Advanced:


If I click on the Advanced pill, that opens the Advanced Search pane, where I can see details of the search I have created:

Searching for Formats

If I don’t see the format I want under FILTERS or in the first page of the Extension option box, I can do a search in the option box. Here, is entered mp and got nine results:

I might choose to select mpg, mp3, mpeg, mpe, mpa, and mp4 – file types containing audio content – and search for them:

Here is what that search looks like in Advanced Search:

From here, I could modify my search. I could, for example, add or remove an extension, change IS to NOT, or change OR to AND. I could preview results. And, of course, I could start evaluating the results of my search.

Filtering on Custodians

As with Formats, when I click on Custodians, the five custodians with the largest number of files are displayed along with View all…:

If I select a single custodian, my Dashboard is updated to display that custodian’s data. Grid, Clusters, and AI-Driven Batches are updated as well:

If I click on View all…, the option box is opened, this time displaying Custodian information, with the same capabilities as are available for the Extension option box:


Filtering on Tags

The third default under FILTERS is Tags. The tags displayed will vary. In this example, there are two sets of tags available. For this exercise, five tags have been set up. There is a Responsiveness tag, where for a single document a first level reviewer can choose one of four options: Responsive, Non-Responsive, Further Review Required, and Tech Issue. There also are four Issues tags – CHEWCO, JEDI, LJM, and RAPTOR – where for a single document a first level reviewer can choose any combination of the four issues:

When I expand Tags under FILTERS, I see those same options:

If I click on the Responsive tag, that adds a pill to the search bar and updates my view, showing information about the 11 documents tagged as responsive:

I can refine my Responsiveness tag search. One way to do this is to expand the search bar to show Advanced Search, and from there selecting to tags to search for. I also can designate whether to use AND or OR for the search:

I could, for example, modify my search to look for all documents that have been tagged as both Non-Responsive and Responsive:

As no single document should be coded both responsive and non-responsive, the result of the search should be zero. And that is what I get:

I can do similar searches with the Issues tags. I also can construct searches that include both Responsiveness and Issues tags. Here, I put together a search to find all documents tagged as Responsive that also have been tagged as either CHEWCO or JEDI:

Filtering on Additional Metadata

In addition to using the quick filters (Formats, Custodians, and Tags), you can filter on any available fields of data. Open Advanced Search, select Add Condition, and click on Fields to get a full list (in the data I am working with, there are nearly 300 fields listed):

Different types of fields have different options associated with them. Here are the more prominent types:

Fields with Lists

Some fields have searchable lists associated with them. When selected, these fields display lists of items along with counts for those items. One example is the Detected Languages field. Other examples from the Enron data include Author, Batch ID, Begin Number, Collection Location, Collection Source, Color Brightness fields, Company, Custodian, Custodian Title, Document Author, Duplicate Document Path, Email Item Type, Email Priority, Exception Type, Exported File Extension, Extension, Family Relationship, Foreign Languages, From, Has Alert, Hidden Content, Image Labels, Pattern Name, Sent on Behalf of, Time Zone, Type of Document, and Type of Parent Document:

Data Fields

When selected, date fields present options for searching for items with a specific date, ones that fall in a range of dates, or one before or after a specific date. Examples include Application Created Date, Application Last Saved Date, Appointment Begin Date, Appointment End Date, Date, Date Received, Date Sent, Email Date Received, Last Date Printed, Master Date, OS Creation Date, OS Last Access Date, and OS Saved Date:

Time Fields

Time fields are similar, with Before, After, and Between options. Examples include Application Created Time, Application Last Saved Time, Appointment Begin Time, Appointment End Time, Email Time Received, Last Time Printed, Master Time, OS Creation Time, OS Last Access Time, OS Saved Time, Time, Time Received, and Time Sent:

Number Fields

Number fields are ones that contain a single number. With these, you can search for an exact number or for a number that falls in a selected range. Examples include Attachment Count, Decision Engine Closest Concept, Decision Engine Email Threads, Discovery Manager Dupe ID, EXIF Latitude, EXIF Longitude, File Size fields, Item Id, Page Count, Parent ID, Production Page Count, QC Notes, Recipient Count, Reveal AI Score fields, Reveal AI Status fields, Short Message Conversation ID, and UTC Offset:

And There’s More

What I showed today are a few ways that metadata filtering in Reveal 11 lets you search your data, helping you enjoy the highest quality speed to insight in the industry. In posts to come, I’ll continue exploring the myriad ways you can use Reveal 11 and its greatly enhanced search capabilities.

For more information about how Reveal can empower your organization, contact us for a demo.