blog.1.image

Four Ways to Get to Documents: Part 4 – Searches

George Socha
George Socha

Four Ways to Get to Documents: Part 4 – Searches



A key function of any eDiscovery platform is to let you quickly and easily get to documents you care about. Reveal offers many ways for you to do that. In earlier posts, I discussed using work folders, document folders, and assignments.

In today’s post, I examine yet more ways to search for content

Basic Search

At the top of the Reveal 11 window is the basic search bar:

If you start typing in the search bar, you are given the option of performing a keyword search or a concept search.

Keyword

Keyword searching is what we tend to be most familiar with and comfortable using.

At its most basic, keyword searching allows you to search for a single string of characters, such as cause, and get back a set of documents containing that specific string. With this most basic form of keyword searching, you would not get back a document that (1) contained caused, causing, or because and (2) did not contain cause. You also could not search for a document that contained two different words, such as cause and action, or perform any more complex search involving wildcards, proximity, or other connectors or qualifiers.

Generally, when we discuss keyword searching with respect to eDiscovery, we mean something that includes those more extensive set of capabilities. We expect to be able to search for phrases, not just single words; use connections such as and and not; and conduct proximity searches where we can find one word or phrase within so many characters or words of another word or phrase.

In Reveal, the simplest way to start a keyword search is to begin to type your search in the platform’s search bar. Here, I typed the word raptor and was given the option to continue with a search that looks for the keyword cause or opt for a search that looks for documents containing the concept raptor:

By selecting the first option, All documents with keyword raptor, I created a search that will return 1,957 documents containing the keyword raptor:

I can do all the usual types of keyword search. Here are three examples:

  • Search for the phrase cause of action by enclosing it in quotation marks, narrowing my results to 5,760 documents:

  • Search for cause within three words of action, getting 6,267 documents:

  • Search for variations of cause within three works of variations of action, for 11,688 documents:

Concept

Concept searching goes beyond simply trying to find a document containing a keyword.

With concept searching as implemented in Reveal’s platform, you start by entering a term, phrase, or paragraph. The platform analyses its index to find documents containing patterns that it infers share relevance with the information you entered. It returns documents containing related concepts and presents them to you for analysis.

A quick way to initiate a concept search is to enter an initial term in the search bar and select the Discover concepts option:

This brings up the Concept Search modal, which shows the top 10 concepts related to raptor that the platform found in your data along with a list of an additional 90 concepts:

Each top concept has an adjustable weight:

The weights are:

  • Excluded: 0.0
  • Low: 0.33
  • Medium: 0.66
  • High: 1
  • Required: 1

You also can add concepts to the list of Top Concepts by clicking on concepts in the Additional Concepts pane on the right. Here, I added the concepts raptor ii and chewco, changed their weights to high, and am hovering over the concept raptor 1 in anticipation of adding it and changing its weight:

To see weights assigned to top concepts, click on the scales of justice in the lower left corner of the modal:

To see your concept query in Boolean form, click on the angled brackets in the lower left corner of the modal:

Advanced Search

If you click on the three dots at the right end of the search bar, you open Advanced Search:

From there, you can select +Add, Term List, Keyword, Concept, and Folders:

Keyword and Concept searches are address above. Folders searches, which include Word Folders, Documents Folders, and Assignments, were discussed in Four Ways to Get to Documents: Part 1 – Work Folders, Four Ways to Get to Documents: Part 2 - Document Folders, and Four Ways to Get to Documents: Part 3 – Assignments.

+Add Term List

Using Reveal’s Term List Search function, you can enter up to 1,000 lines of terms, display the list in Plain Text or Table form, get document and family counts for each line of terms and for the total, and modify those terms as needed:

For more information on using the Term List Search function, see Search in Reveal 11: Term List Search.

The Dashboard

The Dashboard typically is the first screen you see when you select a project in Reveal 11:

The Dashboard is your control center, providing an overview of the project you are in. It contains numerous clickable visual elements, all of which can be used to search data. By default, these include the Timeline; the Candy Bar; three selectable bar charts that default to Extension, All Entities, and Custodian; Documents by Predictive Scores, Senders + Recipients, and Domains bar charts; and the Emotional Intelligence bar chart.

The Timeline

At the top of the Dashboard is the Timeline (discussed in Search in Reveal 11: What a Click Can Do). An interactive chronological graph of the data you are looking at, the Timeline is a graph chart that shows document counts by date columns for which there is data (years, in the example below). The columns are mapped on a logarithmic scale, to better display a wide range of values in a compact way. The default date type is Master Date, which for email is the date the parent email was sent and for other files is the last date the file was modified:

The Candy Bar

The Candy Bar is an interactive data visualization that gives you the ability to filter documents based on duplicate type: Originals, Near Duplicates, Exact Duplicates, and Not Analyzed:

Clicking on any of the type segments adds that segment as a search and refreshes the page:

Selectable Bar Charts

Next on the page is a set of three bar charts that default to Extension, All Entities, and Custodian:

Extension

The first bar chart, Extension, shows file extensions for the documents in your database, in descending order. To do this, it draws on data in the metadata field for file extensions, FILE_EXTENSION, which is displayed at Extension.

In this example, there are 232 different file extensions. At the top of the list are 925,647 .msg files, 132,404 .doc files, and 59,188 .xls files. Clicking on any of the bars adds it to your search:

If you click on View All, a File Extension modal opens:

You can use this to search for specific extensions, by typing in the Quick Search… bar:

You can select specific extensions to use, as I did with .doc, .xls, and .ppt in this example:

You also can select a different metadata field to use for this chart by clicking on Extension and the searching for or scrolling to the field you want to use:

All Entities

The second bar chart, All Entities, lists entities found in your data, displayed in descending order by count.

An entity is a piece of data identified by Reveal by proper name. An entity can be a person, place, product, or even formatted data such as a credit card number. Standard entity types in Reveal include:

  • All Entities (also Brs Has Entity Type) – the default entity setting, this covers the entity types found in your dataset.
  • Entity Credit Card Num – sequences of numbers that are detected as patterns used by major credit card providers.
  • Entity Email – sequences of characters that appear to be legitimate email addresses.
  • Entity Location – cities, states, countries, regions, or other locations that contain both a population and a government; geographical places such as bodies of water, mountains, parks, or addresses; and structures such as buildings or monuments.
  • Entity Money – sequences of symbols, numbers, and/or words that are detected as referencing money.
  • Entity Nationality – references to a country or region of origin, such as American or Swiss.
  • Entity Organization – corporations, institutions, governmental agencies, or other groups of people defined by established organizational structures.
  • Entity Person – humans identified by names, nicknames, or aliases.
  • Entity Personal ID Num – series of digits in patterns detected as national or well-known personal identifiers.
  • Entity Phone Number – series of digits in patterns detected as phone numbers.
  • Entity Product – references to commercially available products.
  • Entity Religion – references to organized religions or theologies as well as their followers.
  • Entity Title – appellations associated with people by virtue of occupation, office, birth, or as an honorific.
  • Entity URL – web addresses.

Here is a view of all entity types in the test data I use, showing each type and the number of documents containing that type of entity, organized in descending order by entity count:

If you click on the bar for one of the entity types, that is added to your search and the window’s contents are updated. Here, I clicked on the entity type nationality to add it to my search and then switched to the Grid view where I can see nationalities identified in specific documents:

By clicking on All Entities, you can switch to a different entity type:

You also can click on View All for more options, just as you can with the other bar charts:

Custodian

The third bar chart, Custodian, used the Custodian metadata field to display custodians in descending order, and provides the same functionality as the two other bar charts:

Document by Predictive Scores

Next on the page are three specialized bar charts. The first is Documents by Predictive Scores:

Predictive scores are used for active learning classifiers and AI Models. For more about classifiers, see Building Classifiers: Part 1 – The Framework, Building Classifiers: Part 2 – Design and Build, and Building Classifiers: Part 3 - Evaluate and Refine. For more about AI Models, see What Is An AI Model?

Senders + Recipient

The next specialized bar chart is Senders + Recipients. This chart focuses on email communications. For each email entity displayed, it shows the number of emails sent and the number received:

Domains

Next is the Domains bar chart. This chart displays email domains. For each domain, it shows the number of unique documents associated with that domain, the number sent to that domain, and the number received from that domain:

Emotional Intelligence

The last bar chart on the page is for Emotional Intelligence, a topic I covered most recently in Getting Sentimental: Using Emotional Signals in Reveal 11.

And More

There are many more ways to search content in Reveal 11. These include the Grid, Clusters, the Heatmap, and more.

Combining Searches

As I noted above, searches can be combined in various ways.

Here, I combined a keyword search for the term raptor (1,957 documents) with a concept search for the same term (2,681 documents), asking the platform to return all documents that contained the concept raptor but that did not contain the exact word raptor – 724 documents:

Learn More

As you can see, Reveal 11 offers a wide array of ways to search for content, many of them driven by artificial intelligence. If your organization is interested in learning more about how Reveal uses AI as an integral part of its AI-powered end-to-end legal document review platform, contact us.

Blog Bottom Banner Ad - Youtube - Subscribe (1)