blog.1.image
Document Review AI
November 11th

How Many Hurdles are in Your Foreign Language Document Review Process?

George Socha
George Socha

How Many Hurdles are in Your Foreign Language Document Review Process?

Foreign language document review posses hurdles. Modern law suits and investigations pull in content from many languages, not just English. Attorneys, investigators, and allied professionals need to work with English language and non-English language content with equal facility. They need platforms that allow them to conduct foreign language document review, apply predictive coding and another analytical capabilities without concern about the languages involved, and even convert their user interface into other languages. And because they are dealing with ever-increasing volumes of continuously more complex and diverse data, they need legal technology powered by artificial intelligence so that they can get their work done quickly, reliably, and cost-effectively.

These are realistic goals. Today’s AI-driven technology is capable identifying numerous languages, converting content from many different languages to many other languages, and even converting the user interface itself into other languages.

Let’s step through the hurdles of foreign language document review and how they can be addressed.


Will Our Document Review Include Non-English Language Documents?

You just embarked on a new document review project. It’s for a case in a United States court, maybe Federal but probably state. All the parties are located in the US. All events of consequence took place in the US. As a result, everyone on your trial and review teams automatically assume all the content to be reviewed will be in English.

Right?

Wrong. Or, at least, probably wrong. Let’s start with some numbers.

I live in St. Paul, Minnesota, deep the middle of the country. Outsiders do not see us as teeming with a diverse, multinational population like, say New York or Washington, DC. But our students speak 113 different languages, according to the Saint Paul Public Schools Office of Multilingual Learning. And although Minnesota shares a border with Canada, Spanish is the second most common language spoken at home.

It’s not just St. Paul schools and Minnesota households; the US workplace is replete with languages. According to a 2018 survey conducted by Ipsos Public Affairs for the American Council on the Teaching of Foreign Languages, nine out of ten US employers rely on US-based employees with language skills other than English.

And it’s the courts. More than 3,600 interpreters are registered in the United States Courts National Court Interpreter Database, according to a 2017 report. These interpreters have expertise in over 180 languages, more than 120 of which are used regularly in court proceedings. In 2016, for example, the top interpreted language was Spanish, in 254,736 proceedings.

At Reveal, we routinely encounter different foreign languages. Our software, which automatically detects more than 160 languages, finds languages other than or in addition to English in the majority of our projects. Per a recent count, during processing our software has identified at least 129 languages. These languages range alphabetically from to Afar, spoken mainly in Ethiopia, to Wolof, a West African language. The languages span the globe geographically. They include all the usual European suspects, such as Italian and Portuguese, but also less well-known ones like Basque and Estonian. Arabic and Hebrew are there, as are Asian languages such as Mandarin and Traditional Chinese, Japanese, and Korean as well as Bengali, Hindi, and Javanese. The Americas are included as well, with languages such as Cherokee, Guarani, and Inuktitut.

So, whether you know it or not, you most likely do have documents containing content in languages other than just English. As a result, its not enough just to ready for review. You need to be ready for multi-lingual review.

(* In this post, I will use the terms “document” and “file” interchangeably. There are significant differences in meaning between the two terms, but that is a discussion for another time.)


We Have Non-English Documents. Now What?

If you suspect you have non-English documents, you will want certain capabilities at your disposal. In this context, “you” is very much plural; you may want to put multi-lingual review capabilities into the hands of a wide variety of people. These include lawyers; paralegals; everyone on your review team including document reviewers, whether a law firm’s document review attorneys or contract attorneys working freelance or hired through a staffing agency; foreign language reviewers.

You might decide that you want to use people to identify and translate documents. You might, choose, instead to rely on software for some portion of those tasks.

If you opt for the software route, you will want an eDiscovery platform that can:

Identify what languages that appear in your content;
Translate content from one language to another; and
For document reviewers working with languages other than English, allow them to change their user interface to other languages.
Some review platforms have all this options built in. Others have none. Yet others offer some capabilities, such as language identification, but require that translation be performed using other resources such as additional software or translators.

Chances are, you also will want contract attorneys or other reviewers fluent in those languages. You may find that in addition you need translation services.


Can We Use Our eDiscovery Platform?

Some eDiscovery tools have language identification and translation capabilities built in. Others do not. Yet others offer some capabilities, such as language identification, but require that other capabilities, such as translation, be performed using additional software or human translators.

While no one approach fits all needs, it is important to know what your options are before you begin a project involving foreign language document review.


Can We Identify Documents Containing More than One Language?

While some document reviews only involve one language, others contain files with content in multiple languages. It helps to have a platform that can identify the languages contained in the files loaded into it.

Reveal’s Discovery Manager can identify up to three different languages within a given file. When the project setting “Language Identification” is selected, the tool will read through the entire extracted and/or OCR text available for a given file and identify the top three languages found within that file, along with their corresponding percentages and character counts.

In the example below, the highlighted column, “Detected Languages”, show languages that Reveal’s platform detected in individual documents.

foreign-language-document-review


Can We Translate Content?

Ideally, your review tool will offer options for translating content from one language to another. You will want the ability to perform translations programmatically. You also will want to be able to translate content on the fly.

In the example below, I have selected three documents for translation. The first document is a Microsoft Outlook message file, the second a UTF8 file, and the third an HTML file.

foreign-language-document-review

Having selected documents to translate, next I chose:

  • Source Text Set: the source of the text to be translated;
  • Source Language: the language found in the source document or documents, which I want translated;
  • Destination Text Set: where I want the translated text to go, and if desirable I can create a new destination text set; and
  • Destination Language: the language into which I want the text translated.

translation-options

Finally, to perform the translation I would click on the “Translate” button.


Why Not Just Use Google Translate?

Free tools such as Google Translate and Microsoft Bing Translator can seem appealing. They are free. They are easy to access; all you need is a browser and in internet connection. They are easy to use; just paste in the text you want translated and they return the results.

There are questions, not always clearly answered, about the degree to which using those tools may lay bare one’s data – or one’s client’s data – open to others. For Google Translate, go to Google’s privacy policy and its terms of service. For Bing Translator, navigate to the Translator FAQ and Confidentiality pages and the Microsoft Privacy Statement.

The general consensus appears to be to proceed with caution, if proceed you must, or better yet find a more clearly protected option.


What About Human Translators?

There is – and assuredly long will continue to be – a need for translation performed by humans rather than by machine.

As the American Translators Association notes, style, tone, the intent of the text, and differences in culture and dialect can come into play. The need to work with highly specialized or arcane technical language in areas such as medicine, banking, and science can be another factor tipping the balance in the favor of human translation.

Sometimes, there is no choice. The Code of Federal Regulations, for example, requires in 8 CFR § 103.2 – Submission and adjudication of benefit requests that any document containing foreign language submitted to the U.S. Citizenship and Immigration Services “shall be accompanied by a full English language translation which the translator has certified as complete and accurate, and by the translator’s certification that he or she is competent to translate from the foreign language into English.”

You should weight costs and benefits of human translation versus machine translation to determine which path makes the most sense in a particular situation.


How Can Our Review Attorneys Work with Translated Content?

Once content has been translated, you want to be able to work with that content. Following is an example of a document where some of the content was identified as Spanish and then was translated from Spanish to English. The example shows four views of the file: Native/HTML, Extracted, OCR/Loaded, and Translated. In the first three views, you can see text in both English and Spanish. In the fourth, you can see the English version of the Spanish text.

spanish-to-english

Much of the translation done in Reveal is from other languages to English, but that does not account for all the translations. We also see data translated from English to other languages, as well as from one non-English language to another. A sampling of recent Reveal jobs include translations from Arabic, Chinese, Dutch, French, German, Hebrew, Japanese, and Spanish into English; from English into Chinese, Dutch, French, German, Japanese, and Spanish; and from Polish into Dutch.

foreign-language-document-review


Can Our Review Attorneys Have a Translated User Interface as Well?

Not every review is conducted in English, nor is every reviewer proficient in English. When that happens, you can provide a substantially better review experience by allowing users to change the platform’s interface into a language with which they are comfortable.

With Reveal’s platform, users can select to display the user interface in any of 95 languages:

foreign-language-document-review

 

The steps to accomplish this are simple. First, go to “Settings”:

foreign-language

In the “Options” box, go down to the “Language” option and click on the language displayed there (in this example, “English”):

foreign-language

Scroll to the language you in which you want the user interfaced displayed, click on that language, and select “UPDATE”:

language

Most elements of the user interface will now be shown in the selected language:

foreign-language


How Much More Will This Cost?

What you are likely to pay for translation capabilities will vary depending on what approach you take. With some eDiscovery platforms, identification and translation costs are included in the base price. For others, you pay extra. If you use translators, you will have to pay for them as well.

As a result, make sure you that before you commit to a course of action, you ask for, receive, and evaluate pricing information. On the software side, questions to ask include:

  • Does the cost to use your software include the cost for tools needed to identify languages? If it does not, what additional software is needed and what does the use of that software cost?
  • Does the cost to use your software include the cost for tools needed to translate content from one language to another? If it does not, what additional software is needed and what does the use of that software cost?
  • Does the cost to use your software include the cost to perform analytics work on data in more than one language? If it does not, what additional software is needed and what does the use of that software cost?

On the services, side, questions to ask include the cost to translate data and costs for any additional capabilities needed to perform the translation, such as for data transfer, processing, or hosting.

 

If your organization is interested in leveraging the power of legal AI software to work with foreign language ESI, contact Reveal to learn more. We’ll be happy to show you how our authentic artificial intelligence takes review to the next level, with our AI-powered, end-to-end document review platform.