Stylometry and the Fraud Triangle
Today’s post is about “Roundabout Style”, a capability Reveal will be releasing in late fall. The purpose for this new capability is to allow you to better find information about situations where someone is being evasive or attempting to avoid the truth.
This capability is a “stylometric feature”, a tool from computational linguistics that uses statistical methods to analyze the style in which text is written.
To help explain all these, we will start with a specific example, the challenge of finding indications of fraud – remembering that stylometric features and the Roundabout Style can be applied to many other topics as well.
An On-Going Challenge
An on-going challenge for those of us who handle litigation and investigations is to cost-effectively find documents containing indicia of fraud. Fraud is “[a]n intentional perversion of truth for the purpose of inducing another in reliance upon it to part with some valuable thing belonging to him or to surrender a legal right,” according to Black’s Law Dictionary, Fifth Edition.
Fraud appears as a cause of action in many different types of cases. Over 40 different types of fraud claims are listed by the SALI Alliance in their Legal Matter Standard Specification (LMSS) 2.0 Release Candidate, ranging from “Fraudulent Concealment by Fiduciary to Real Estate Broker’s Liability” to “Purchaser for Intentional Nondisclosure of Material Facts”. Health care fraud, mortgage fraud, bank fraud, procurement fraud, grant fraud, and other fraud in connection with federal monies and spending allocated under various stimulus and economic recovery legislation all are civil frauds investigated and prosecuted by the Civil Fraud Unit of the Civil Division of United States Attorney’s Office for the Southern District of New York. Yet more types include are insurance fraud, identify theft, scams, and more.
Fraud is an issue we need to deal with regularly. Recent searches for “fraud*” in the legal research platform Fastcase yielded about 901,000 judicial opinions and approximately 5.2 million underlying documents such as briefs, pleadings, motions, and orders.
The Fraud Triangle and Diamond
Two conceptual frameworks widely used when looking for communications containing indicia of fraud are the Fraud Triangle and the Fraud Diamond. Both models describe factors that might cause someone to commit fraud. The Fraud Triangle, first put forth by Donald Cressey in 1953, looks to three factors. The Fraud Diamond adds a fourth.
Cressey proposed the Fraud Triangle in his 1953 book, Other People's Money: A Study in the Social Psychology of Embezzlement. Based on data gathered from interviews with 133 incarcerated white-collar embezzlers, he identified three elements necessary for white-collar crime: Opportunity, pressure, and rationalization. The Fraud Diamond, a newer theory from David Wolfe and Dana Hermanson, adds a fourth element, capability. Wolfe and Hermanson describe the four elements this way:
• Pressure: “I want to, or have a need to, commit fraud.”
• Opportunity: “There is a weakness in the system that the right person could exploit. Fraud is possible.”
• Rationalization: “I have convinced myself that this fraudulent behavior is worth the risks.”
• Capability: “I have the necessary traits and abilities to be the right person to pull it off. I have recognized this particular fraud opportunity and can turn it into reality.”
Rarely does one who is setting out to commit fraud write, “I am setting out to commit fraud.” To the contrary, fraudsters are far more likely to use language
that is intentionally vague. This can mean that key word searches are of limited value. Rather, typically to find information indicative of fraud we need to look more obliquely. We need to sidle up to that content, if you will, using peripheral vision to notice that which a direct gaze does not reveal.
Sidling up to the content means looking for indicators such those identified in the Fraud Triangle and the Fraud
Diamond. These indicators might be emotional signals suggesting that someone was under pressure, presented with an opportunity, and attempting to rationalize their decision. With Reveal AI, you can search for these emotional signals today.
We also might want to look for circumstantial evidence. One way to accomplish that is via the use of stylometry and stylometric features.
A branch of computational linguistics, stylometry uses statistical methods to analyze the style in which text is written, looking at specific features of the language used. At its core, stylometry is a way of assessing features of linguistic style to distinguish between true stories and false ones.
The technique dates back at least to 1439 and 1440, when a medieval priest and scholar, Lorenzo Valla, proved that The Donation of Constantine, supposedly a document dating to the 4th century, actually was an 8th century forgery. Valla accomplished this by evaluating features of the document’s language such as the quality of the Latin used (too poor for a Roman text from the 4th century), similarities to text from a 5th century text, and anachronisms such as the use of the word “satrap”.
Stylometry has been deployed for a variety of uses. These include identifying authors, resolving disputes about authorship, identify changes in authors’ writing styles, and detecting plagiarism, and detecting contract cheating.
Systems using stylometry take advantage of natural language processing (NLP) to find and extract literary features, improving your chances of detecting deception. These stylometric features can include:
• Character frequency
• Word frequency
• Word character length
• Vocabulary richness
• Sentence length
• Sentence structure
• Sentence complexity
• Spelling errors
Reveal's Roundabout Writing Style
In the 2021 late fall release of Reveal AI 3.0, Reveal will introduce “Roundabout Writing Style”, a new emotional intelligence score based on stylometric analysis of written content. This will appear in Reveal AI under the “Emotions” tab, as “Roundabout Style”.
The newest arrow in Reveal’s quiver, Roundabout Style will analyze writing style, focusing on:
• Causal words
• 3rd person personal pronouns
Roundabout Style will score each of these features, and then combine those stylometric feature scores in one overall score. That score will be presented in the same way as other emotional scores such as those for Opportunity, Pressure, and the like.(a topic discussed in Getting Sentimental: Using Emotional Signals in eDiscovery).
With Roundabout Style, you will be able to detect conversations that are unusually wordy and avoid addressing the point directly. Similarly, you will be able to find text that contains signals suggesting untruthfulness or unease about specific conversations.
By combining Roundabout Style with sentiment analysis, high precision active learning, anomaly detection and other Reveal capabilities, you will be able to hone your research even further.
Examples help, so here is a strong of four messages that demonstrate what you can find in the Enron data when you use Roundabout Style along with other emotional scores such as Pressure and Opportunity:
Roundabout Style is useful for more than just detecting indicia of fraud. It can be used anytime you are trying to find evidence suggesting that someone has been evasive or was attempting to avoid the truth. It can be used to search for indications that a CPA was trying to cover up accounting issues, that an attorney had knowingly diverted moneys meant to be deposited into a lawyers’ trust account, or than a senior executive attempted to conceal research about serious negative side effects for the use of a new drug.
Consider adding this new arrow to your quiver as well, so that you too can take advantage of one more way to get to the heart of the story and the truth of the matter.
Getting Sentimental: Using Emotional Signals in eDiscovery. Reveal (2021).
Legal Document Review's New BFF: High Precision Active Learning. Reveal (2021).
The Exquisite eDiscovery Magic of Data Anomaly Detection. Reveal (2021).
Donald R. Cressey, Other People's Money: A Study in the Social Psychology of Embezzlement. Free Press (1952).
Karl F. Schuessler, Book Review – Other People's Money: A Study in the Social Psychology of Embezzlement. American Journal of Sociology, Vo. 59, No. 6 (1954): 604.
Frederick Mosteller and David L. Wallace, Inference and Disputed Authorship. CSLI Publications (1964).
Matthew L. Newman, James W. Pennebaker, Diane S. Berry, Jane M. Richards, Lying Words: Predicting Deception From Linguistic Styles. PSPB, Vol. 29 No. 5 (2003): 665-675
David T. Wolfe and Dana R. Hermanson, The Fraud Diamond: Considering the Four Elements of Fraud. CPA Journal 74.12 (2004): 38-42.
Carole E. Chaski, Author Identification In The Forensic Setting, The Oxford Handbook of Language and Law. Oxford University Press (2012).
Alexander Schuchter and Michael Levi, The Fraud Triangle revisited. Security Journal (2013).
Markus Krause, Stylometry-based Fraud and Plagiarism Detection for Learning at Scale. 5th KSS Workshop (2015).
Alex Wermer-Colan, Stylometry Methods and Practices. Temple University Libraries (2018).
Helena Gómez-Adorno, Juan-Pablo Posadas-Duran, Germán Ríos-Toledo, Grigori Sidorov, and Gerardo Sierra, Stylometry-based Approach for Detecting Writing Style Changes in Literary Texts. Comp. y Sist. vol.22 no.1 Ciudad de México ene (2018).
Massimo Poesio, Tommaso Fornaciari, Detecting deception in text using NLP methods. SIGNAL (2018).
Joshua J. Mark, Donation of Constantine. World History Encycolpedia (2019).
Ksenia Lagutina, Nadezhda Lagutina, Elena Boychuk, Inna Vorontsova, Elena Shliakhtina, Olga Belyaeva, and Ilya Paramonov, A Survey on Stylometric Text Features, Proceedings of the 25th Conference of Fruct Association (2019).
David C. Ison, Detection of Online Contract Cheating Through Stylometry: A Pilot Study. Online Learning Journal, Volume 24 Issue 2 (2020).
U.S. District Courts - Civil Cases Filed, by Jurisdiction, Nature of Suit, and District. Administrative Office of the U.S. Courts (2021).
What to Watch For
We plan to release Roundabout Writing Style in the late fall, along with HPC and other enhancements and updates, as part of Reveal AI 3.0.
If your organization is interested in learning more about stylometric features and how Reveal uses AI as an integral part of its AI-powered end-to-end legal document review platform, contact us to learn more.