Building Classifiers: Part 2 – Design and Build
For this three-post series, I designed, built, and evaluated a specialized active learning classifier designed to help me find board of director meeting minutes and related documents.
To recap, for my classifier I focused on what I hoped would be a small and manageable issue. I wanted to Enron Board of Director meeting minutes. Various keyword, concept, and other searches all had delivered overly broad results. I hoped that with an active learning classifier, I could get a better set of results.
Create AI Tag
The first step in the process of building a classifier is to create an AI tag. (For more about AI tags, see What are AI Tags?) To create an AI tag, you need administrative rights. Because I have those rights, I created a Enron BOD Minutes AI Tag:
- I went to a project in Reveal, then selected Project Admin:
- I selected Tags and then Add Tag and Choices:
- I gave the tag a name, Enron BOD Minutes; chose a type, Mutually Exclusive; added two choices, Responsive and Nonresponsive; and enabled prediction for both choices, Positive for Responsive and Negative for Nonresponsive; and assigned myself access to the AI tag:
Create New Classifier
When I created my new AI tag, Reveal created a new corresponding classifier, also called Enron BOD Minutes.
Reveal will use the new classifier to train the platform. During training, I will code documents using my AI tag. If I think a document is a set of minutes from a meeting of the Enron board of directors, I will code it as Responsive. This will tell the classifier that documents like this are documents I am looking for. If I think a document does not meet those criteria, I will code it as Nonresponsive. This will let the classifier know documents like this are not interesting to me.\
Create Tag Profile and Adds AI Tag to Profile
The next step for me is to create a tag profile that includes my new AI tag. (For more about AI profiles, see What are AI Tags?)
For this exercise, I created a new tag profile that I called Enron BOD Minutes:
- I went to Project Admin and selected Tags.
- I clicked on Add new profile:
- I gave the profile a name, Enron BOD Minutes, and assigned it to myself:
- I added a pane and called it Responsiveness:
- I added my Enron BOD Minutes tag to the Responsiveness pane:
I intend to use a function in Reveal called AI-Driven Batches to deliver prioritized documents for me to review.
Create a Search
To start the process, I need to tag at least four documents as positive (responsive) and one document as negative (nonresponsive.
For that, I created a search to find documents that met these criteria:
- Containing the following weighted concepts:
- minutes, 0.333
- board of directors meeting, 1
- board meeting, 1
- board book, 0.333
- board of directors, 0.666
- board committees, 0.333
- being no further business, 0.333
- notice of meeting, 0.333
- doc, 0.333
- quorum, 0.333
- meeting minutes, 0.333;
- Where the file extension is not msg; and
- Where the document is an original (rather than a near duplicate or an exact duplicate).
That search brought back 3,145 documents.
This is what the search looks like in Advanced Search:
This is what the Concept Search component looks like (to get to this, I clicked on the Concept pill in Advanced Search):
If I click on the scales of justice icon in the lower left corner, that displays all the concepts in my search along with their weights:
Tag Documents Until Reach Four Positive and One Negative
I began coding those documents from the Grid:
To build a classifier, I had to tag documents either as responsive or nonresponsive until I had at least four positive documents and one negative document. The mechanics of that process were easy – look at a document, evaluate it against the criteria I had set, and give it a thumbs up or a thumbs down:
As I tagged documents, initially I took a narrow view of what was responsive. I tagged as nonresponsive, for example:
- Minutes of meetings of SK-Enron Co., Ltd., Enron Federal Credit Union, and other similar;
- Documents that discussed Enron board of director meetings but were not, themselves, minutes (and that did not fit into the categories I subsequently added, as noted below).
As I went along, I decided to expand my scope to include the following categories of documents as responsive. I did this because those documents seemed to contain information that informed me about actions considered or taken by the board:
- Notices of meetings of the board;
- Board resolutions;
- Minutes of board committees such as the Compensation and Management Development Committee along with notices of their meetings and their resolutions.
Because I modified my approach once I started reviewing documents, I ended up tagging 172 documents – 12 as positive and 160 as negative.
At this point, I should have had enough tagged documents to start using AI-Driven Batches. To verify, I went to Supervised Learning and began typing enron in the search box. That brought up the Classifier Model Card for my classifier, ts23~Enron BOD Minutes. When I clicked on View Details, I could see that my new classifier is ready to use:
To get further information, I clicked on View Details, which brought up tagging and scoring information:
Create AI-Driven Batch
At the top of the page, I clicked on AI-Driven Batches. In the modal, I selected the Classifier I wanted to use, Enron BOD Minutes, and the Tag Profile I wanted to use, Enron BOD Minutes. Finally, I clicked the Select button next to the AI-Driven queue option:
The page updated itself to show that a batch had been successfully created:
I also received an email message confirming that a batch containing 20 documents had been created:
At the same time, a new folder was added to Assignments in my Sidebar:
I clicked on the assigned batch to bring up the 20 documents in that batch, and from there I clicked on the first document in the batch to start my first round of training with AI-Driven Batches:
With each document, I can see the score given it by the platform and can designate it either as Responsive or Nonresponsive:
When I reached the end of my batch of 20 documents, a notice popped up on the screen. The notice asked if I wanted another batch or would prefer to stop reviewing:
I chose to continue, and received an email notice in reply:
Checking my progress after completing my first AI-driven batch, I could see that I was up to 22 positive and 171 negative documents:
The first post in this series discussed the framework I used for designing, building, and evaluating an active learning classifier.
This post covered how I designed and built the classifier.
The next post will focus on evaluating and refining the classifier.
To learn more about Reveal’s AI capabilities and how Reveal can empower your organization, contact us for a demo.