What Is An AI Model?
Reveal makes extensive use of AI models. Generally, an AI model is a software program that has been trained on a set of data to perform specific tasks like recognizing certain patterns. Artificial intelligence models use decision-making algorithms to learn from the training and data and apply that learning to achieve specific pre-defined objectives.
Reveal offers a Model Library which consists of a collection of pre-existing models you can use straight out of the box, extend or adapt to suit your specific needs, or stack and pack to achieve a larger objective. We give you the ability to create your own AI models, which you can use for your own purposes as well as make available to others via our Model Marketplace. We also will work with you to create custom models such as the ones that drive DLA Piper's Aiscension and those used by Epiq in its new AI Model Library program.
Types of AI Models
There are various types of AI models, defined by the means used to create them. Three approaches used in data science are supervised learning, unsupervised learning, and semi-supervised learning models.
Supervised Machine Learning Models
AI models can be built using supervised machine learning. These models are trained by people, often ones with specific subject matter expertise, typically referred to as subject matter experts or SMEs. SMEs review new data points and label them. While training data, they might mark it as "responsive" or "non-responsive". They might tag it as relating to any number of issues, such as "Contains offensive language" or "Privileged". Models learn from the training the SMEs provide in real-time, and use that learning to find more similar content.
AI models built with supervised machine learning often are used to perform predictive analyses. They look to the past, assessing decisions made by the SMEs about the documents they have reviewed. Using artificial neural networks loosely designed after the human brain, the models use those assessments to attempt to predict the future, forecasting the decisions SMEs might make about the remaining documents. A common use of supervised machine learning in eDiscovery is TAR, or technology assisted review.
Unsupervised Machine Learning Models
AI models also can be developed with the help of unsupervised machine learning, an approach that incorporates more automation. These models are trained by software, sometimes using a process that mimics the training provided by people. They categorize your input data or identify patterns or trends without the need for initial human training.
AI models built using unsupervised machine learning typically are turned to for descriptive analyses. They might be used to summarize content. They could be used to classify content, with the classified content then displayed using a visual tool such as a cluster wheel. They could be used to extract rules about content.
Semi-Supervised Machine Learning Models
This subset of machine learning is often described as a middle ground between supervised and unsupervised machine learning and combines aspects of the two other approaches. SMEs label a small amount of data to start training a model. That partially-trained model is pointed to a larger body of data that the model then labels, a process referred to as "pseudo-labelling".
The results of the two approaches are combined and used to create a model that might be used for descriptive or predictive purposes.
The Ways We Use AI Models Generally
Real-world examples of AI models are all around us. In fact, we encounter them in many aspects of our lives.
In the healthcare industry, AI models help doctors and other medical staff diagnose pediatric diseases. They let companies identify opinion spam on e-commerce, social media, and similar sites. They are used to summarize communications, write articles, and detect credit card fraud.
Google uses AI algorithms to match riders with carpool drivers; detect and filter out email spam; and assist with composing messages. Microsoft's AI Builder provides AI models used to process forms, extract insight from product reviews, and automate inventory taking. AWS Sagemaker is a managed service that provides developers and data scientists with the ability to build, train, and deploy models for uses such as predictive maintenance, computer vision, and predicting consumer behavior. And Netflix, Amazon, and YouTube use AI systems as a means of service optimization by using big data machine learning to provide more relevant content recommendations.
AI Models at Reveal
With Reveal, AI models have several functionalities, such as bringing anomalies to light, data analysis, detecting patterns, image recognition, identifying positive or negative tones via sentiment analysis and natural language processing - and the list goes on.
You can start with pre-trained COSMIC and Entity models available from Reveal's AI Model Library and Marketplace.
You also can create and reuse your own AI models. You can do this entirely within the Review component of Reveal's platform or, if you prefer, you can create portable models in Brainspace.
Reveal's Model Library and Marketplace
Reveal maintains a growing number of pre-trained AI models built by our team of data scientists. These models are ready, out of the box, to apply to your data. We augment the list of models as we release new versions of Reveal AI. You can use these models as they are, or you can adapt and augment them. For more about Reveal's AI Model Marketplace, read Like Netflix for Legal AI Models: Reveal’s AI Model Marketplace Goes Deep.
As you create your own Reveal AI models (see below), you can publish those models to your library. Once you've published models to your library, you can re-use, adapt, and re-save them for similar use-cases.
The marketplace contains two forms of AI models, COSMIC models and Entity models.
COSMIC AI Models
COSMIC, or Cognitive Machine Coding, is a form of supervised machine learning from Reveal. Reveal's Marketplace contains pre-trained COSMIC models and you also can create your own models.
When you apply a COSMIC AI Model to your dataset, it returns documents based on the probability that they match the model. Each document gets a relevance probability number between 0 and 100. For convenience, scores are grouped into three ranges: low (0 to 40), medium (40 to 60), and high (60 to 100).
For example, I might want to look for privileged documents. With that end in mind, I could start with the "Privileged Content V.10" COSMIC AI Model, a model designed to help identify documents containing privileged content. It looks for, among other things, conversations involving requests for legal advice, legal advice itself, and documents prepared for depositions.
In the example below, I ran the Privilege Content V.10 model against a test dataset, "2.90 Test". Going to the statistics for that exercise, I can see that of the approximately 15,000 documents in the dataset, 383 were classified by the model as high, 175 as medium, and 14,617 as low.
From there, I might want to look at documents falling within a certain score range. I could do this by selecting "COSMIC score" and then checking "Medium probability (40 - 60)" and "High probability (60 - 100)".
Using this single search with my test dataset, I reduced the population to 1,148 documents. Had I searched only for high probability documents (scores from 60 to 100), I would have gotten back 190 threads containing 735 documents. If I had narrowed my search even more to scores between 90 and 100, I would had 127 threads with 302 documents.
Entity AI Models
An entity is a piece of data extracted by Reveal AI and offer an efficient way to search or filter for people, places, and things. Reveal identifies over 21 entity types such as topics, places, people, organizations, and categories. In the example below, you can see a few of the domain names found by Reveal AI in the Enron data.
If desired, you can enhance entity types already built into to the platform. You also can create custom entities using Reveal's training tools. You can use custom entities to quickly find sensitive data that pattern-based searches such as RegEx typically miss, such as addresses for PII, medical conditions for PHI, educational records for FERPA, or proprietary intellectual property. You can use custom entities to teach the platform about specific competitors, partners, products, or government officials, and with that knowledge uncover potential areas or compliance risk or problematic behavior. You also can use custom entities to make supervised machine learning algorithms more effective, helping to find additional relevant documents.
Creating AI Models in Review
You can create AI models directly in Review. If you have the appropriate rights, you can add coding tags. When you add them, you can choose to enable prediction. Enabling prediction turns on AI. In the example below, I added five issue codes: "Design Defect", "Failure to Warn", "Manufacturing Flaw", "Fraud", and "Conspiracy". I enabled prediction for each of those codes.
Each of those five issue tags powers a separate AI model. If you select one of the tags, you train the associated AI model. In the example below, I selected "Design Defect" and "Failure to Warn". By checking those two issues, I told the "Design Defect" and "Failure to Warn" AI models that I thought the document I was examining matched those two issues. Had I tagged the document as "Responsive" or "Non-Responsive", I would have trained yet another AI model.
Beyond Individual Models
As mentioned before, with Reveal you can go deployed individual pre-existing models. You also can do much more. You can extend existing models, prolonging their lifecycle by tailoring them to suit your particular circumstances and needs. You can build models of your own. You can stack and pack models, combining different individual models to create a whole that is greater than the sum of its individual parts. You can even place your models on our marketplace, making them available to others.
For more on supervised and unsupervised machine learning, see Legal AI Software: Taking Document Review to the Next Level; What is Technology Assisted Review?; and 5 Things You Can Do with Supervised Machine Learning.