eDiscovery Leaders Live

March 8th

eDiscovery Leaders Live: Damon Goduto of Lineal

Each week on eDiscovery Leaders Live, I chat with a leader in eDiscovery or related areas. Our guest on February 26 was Damon Goduto, a Partner at Lineal.

Damon and I had a discussion ranging from Mount Rushmore to Brazil. We started with Damon’s suggestions for the four busts to put up on the Mount Rushmore of eDiscovery. We pivoted to how weird data got during the five years he took off from eDiscovery and ChatCraft, Lineal’s response to the difficulties working with chat messages includes setting flexible chat boundaries and producing chat data in a useful form. Damon talked about the endless upside of using AI in eDiscovery, focusing in particular on Lineal’s automated bot detector, AI-driven threading, and AI to enhance privilege review. Damon discussed, as well, the benefits of reusable models, the value of pattern recognition, ways to use anomaly detection, and the use of models outside litigation. Damon closed with thoughts on Brazil’s burgeoning legal technology market – a first for these discussions.

Recorded live on March 5, 2021 | Transcription below

Note: This content has been edited and condensed for clarity.

George Socha:

Welcome to eDiscovery Leaders Live, hosted by ACEDS, and sponsored by Reveal. I am George Socha, Senior Vice President of Brand Awareness at Reveal. Each Friday morning at 11 am Eastern, I host an episode of eDiscovery Leaders Live where I get a chance to chat with luminaries in eDiscovery and related areas.

Past episodes are available on the Reveal website, go to revealdata.com, select “Resources”, then select “eDiscovery Leaders Live Cast”.

My guest this week is Damon Goduto. He is a partner at Lineal. He has been in the eDiscovery space for quite some time. Before Lineal, he was at NexLP as a Senior Vice President, VP of Sales and Marketing at a place called ThreadKM, and before that at Iris Data Services, Epiq, Xact and others. Damon has a JD from the Widener University School of Law as well as a bachelor's from Westchester University of Pennsylvania. Damon, welcome to the show.

Damon Goduto:

George, thank you so much for having me. It's nice to know that you're interviewing luminaries and since none could make it today, I'm glad I could fill in.

George Socha:

I hear that every week and I never believe it. I know better.

Damon Goduto:

I feel like to be a luminary, you should know what the word means.

George Socha:

Well, that's okay.

Damon Goduto:

So I’m getting my dictionary out right now.

George Socha:

Merriam-Webster Online. You can find out the meaning of it. I'd like to start this week with a slightly different question, which is from you, what would your Mount Rushmore of eDiscovery people look like?

Damon’s Mount Rushmore of eDiscovery

Damon Goduto:

Yeah, so I’m glad you asked because I suggested this one. And the reason I did is I looked at some of your past guests, who were very impressive people. I’ve got to call out to my good friend, Ron Best, and I've always told Ron, if anyone is ever building the Mount Rushmore of eDiscovery, Ron Best has to be on that list. From the late nineties, early two thousands, he essentially brought in a whole class of technology-driven lawyers that are really the modern day eDiscovery attorney, which didn't exist before, and he assigned them to every matter that came through the firm. So, Ron Best definitely on that list.

But backing up a little bit, I think you got to have Dan Willoughby. Dan Willoughby who was a partner at King and Spalding, built the King and Spalding Discovery Center, really started figuring out, how do we handle this massive amount of data coming in with the big tobacco stuff.

Next on my list, Tess Blair, Tess who pioneered, “Law firms do discovery, how do we take this forward as a business?” and creating the incredible Morgan Lewis resources team over there.

And then the fourth spot. And this is why it's so interesting. I love turning eDiscovery into sports talk radio. And this is the fun for me. The fourth spot has to be LeBron James; sorry, wrong list. The fourth spot... I think it could be George Socha, inventor of the EDRM model. I think you could very easily be on that spot.

George Socha:

But if you don't put me up there, who's going to go up there instead?

Damon Goduto:

Right. So then I'm thinking maybe you’ve got to go service provider and I thought, do we go Andrew Sieja, do we go Mark Hawn, John Davenport Jr., maybe somebody from the services departments. Again, this list is not about who did things first, it's about who made the most impact, and I think that's what the Mount Rushmore is usually about. I’m going to build the Mount Rushmore someday. And I'm going to bring it to Legal Tech and ILTA, it'll probably be papier-mache or something, but I'm going to do this. So anybody who's got an input, that someone we missed, I'd love to hear it.

George Socha:

For that number four slot, do you just have a tie? Is it Andrew and Mark and John and you just can't decide, so you put them all as a four slot to be determined?

Damon Goduto:

None of those guys ever invited me to be a luminary. So I think it's going to be you.

George Socha:

Okay, well, we'll go with that. That sounds good to me. So I'm waiting to see that Mount Rushmore. I don’t know, you can ship it to me. You know what my address is. I'll put it up on my new background, maybe.

Damon Goduto:

Perfect.

Responding to How Weird Data Has Gotten: ChatCraft

George Socha:

You've been in this industry for a long time, you've been handling data of many types in many ways for quite some time, but the world has changed a lot as well. And there've been a lot of changes since your time at say, Iris and Epiq, with the data we deal with, the data streams. What have you seen?

Damon Goduto:

For those that don't know, I was maybe asked to pause from the eDiscovery services business for five years as part of a transaction we had. When we came back into the space to do this Lineal venture, I think to me the biggest change was how weird data has gotten. The chat data of course is super important, right? Microsoft Teams and Google Chat, Slack data, obviously the mobile device data. And how do you come up with solutions to process, review and produce that kind of data set? That was a big focus of ours:”Cool, we're back in business, let's go to work.” And it's like, “Oh, how do you deal with these things?”

There didn't seem to be a lot of great solutions, so we developed something called Lineal ChatCraft which allowed us to collect data in a certain way, chat data in a certain way, preserve it through processing in a certain way so that we could deliver it into our review platforms as individual messages and carry over the coding palette. On an individual message basis, one day's worth of chat, the entire document is one chat - to provide some flexibility for our clients to help them review that information and ultimately produce it because you don't want to have 10,000 messages in the chat where you need to produce one and redact 9,999 others. Right? That was one of the big challenges that we faced.

That's really what we're about, at Lineal. It's, give us the tough problems, let's try to solve it. We've got an amazing team of dedicated people who love solving problems. And that's the fun. We rely on our software providers, of course. We love Reveal, we love Relativity, we love NUIX, we love all of our software providers, certainly NexLP. But we try to fill the gaps a little bit where we can and help out along the way and deliver outcomes for our clients.

And I'm realizing, as I'm saying this, this is not nearly as fun as the Mount Rushmore conversation.

Setting Flexible Chat Boundaries

"For us it's about choices. Over the years, the tools that have been successful in our space are the tools that are flexible enough to support different case teams’ requirements."

George Socha:

One of the things you were mentioning - you alluded to this with chats - is that it's really hard to tell where the boundaries should be. Do you just declare that everything in a stream, whatever it is, is the equivalent of a document? Do you deal with it just one item at a time? Do you give people choices? How do you all approach that?

Damon Goduto:

For us it's about choices. Over the years, the tools that have been successful in our space are the tools that are flexible enough to support different case teams’ requirements. You can't say, “This is the way you need to do it”. Even if it's the right way, certainly if I think it's the right way, that doesn't matter. Even if most people think it's the right way, there's going to be some cases teams that say, “Yeah, I don't care if that's the right way, I need to do it my way for this particular matter”. You've got to have the flexibility.

What we settled on was one chat per document, one day's worth of chat per document, or each individual chat as an individual document. Now we can tweak that, we can set that one day to six hours or 12 hours, or what have you, but you've got to provide that flexibility. And I think that's not only true of how to handle chat data, but probably true of how you want to handle most problems in life, right.

George Socha:

Is that a decision that needs to be made at project setup, you've got three options, choose your option for that project?

Damon Goduto:

No, we can come in at any time before we're loading into a review application. It can be during processing. Obviously it's better to have those conversations a little sooner in the process; you can make sure everybody's on the same page and you set proper expectations. All of that's really important. But sometimes the decisions change. I know it’s unusual here in this business, but sometimes a decision is made and as we start getting into the data, everything changes. The course of the litigation or investigation changes, and all of those things can shift.

Producing Chat Data

George Socha:

One of the perennial challenges with chat data, is how you go about producing it. And it sounds like you've given some thought to that and offers some different ways of dealing with it. How do you handle that part of it?

Damon Goduto:

It's all about the preservation. If you don't preserve the data in such a way when you're doing the collection or preservation efforts, you might lose a lot of that downstream flexibility. So we're preserving it in such a way, using certain applications to help us collect the data and preserve it. And then as you're processing, you've got to preserve the data and keep it formatted in individual messages. And then of course you're able to present it later. So, when that production happens, do you want to just produce this entire massive file, massive chat? Do you want to produce a small slice? How much do you want to redact? Is it just an individual message? All of those things have to be factored in, but like with most things in our business, you've got to think about them earlier on in the process.

Leveraging AI: The Upside Is Endless

George Socha:

One of the topics of discussion of late and it's expanded and expanded and is of course of particular interest from a Reveal perspective, is the use of AI and Litigation, particularly with respect to eDiscovery. What all are you doing on that front?

Damon Goduto:

Obviously we're big fans, right? Like you mentioned at the outset, I was at NexLP for a few years, I got to meet and listen to Dr. Dan Roth, leading AI scientist in the world. It was super informative to be able to bend his ear. We've got a guy named Kit Mackie at Lineal, who's fantastic. Kit, employee number one at Relativity, wrote a lot of the code over there. Leaves to form NexLP, builds that program, that application. Having Kit aboard has really given us an advantage of how we're leveraging AI.

One of the things that I learned at my time at NexLP, was the upside to using AI is endless. There are so many advantages that can be gained along the way and I can talk about some of those advantages that we've worked on. But you've got to make it somewhat simple for the case teams and the lawyers. I was thinking of it this way. You don't want to make the show about you. They've got a job to do. They've got judges to listen to and orders to follow, opposing counsel to fight. We don't want to take over and be like, “Let's just take this great dive into AI and sit back and learn a bunch of stuff”, because that's not productive. It's great if you have time in an academic sense.

Lineal’s Automated Bot Detector

We try to lift a lot of what we're leveraging out of NexLP and deliver it into our review application. We've got an automated bot detector that's leveraging machine learning to analyze, is a message generated by a bot. This goes above and beyond something like domain analysis, where we can all say, let's take Facebook and ESPN, Wells Fargo, and get those domains out of there because those domains can be important, right?

We start with those lists of domains, but then we measure the ratio between how many communications are coming in from those domains and then what happens downstream. Is anybody replying to those domains and those messages? Is anybody forwarding them around? Are they being discussed? Based on that, we deliver a score of the likelihood of each one of those communications being generated by a bot. We post that in the review applications. It's a scoring mechanism. We could say, “Hey, it's there if you want to use that, great. If not, we totally understand and we can explain the reason behind it, but we're saying these documents are highly likely to be generated by a bot and probably have low value in terms of reviewing information.” It's those little gains like that.

George Socha:

How much of a gain is that? What percentage of bot-generated messages are you seeing in documented populations?

Damon Goduto:

We're seeing pretty consistently about 10 or 15% of documents, communication data, being dropped from that specific analysis.

AI-Driven Threading

Similarly threading, doing advanced threading with a higher performing algorithm that's used inside an NexLP and grabbing more inclusive documents. If regular threading pull maybe 12, 13, 14% suppression, maybe we're getting that number to 20%. Some people might say, “Oh, seven, eight percent, is that a big difference?” But obviously we're dealing with millions of documents per dataset. You do that over and over again.

Incremental Continuous Improvement

That's part of what makes us successful. It's not always about, “Let's just find the Holy Grail here and deliver it and we're done.” It's about incremental improvements. It's about small bites at the apple. Do everything a little bit better each step of the way. And if you can take those small bites of the apple successfully, they end up being huge gains. It's one of our core values at Lineal’s, continuous improvement. Let's provide continuous improvement, always focus on how we can be a little bit better every step of the way.

Using AI for Privilege Analysis

George Socha:

So much the discussion I have heard over the years about eDiscovery, about lit support generally, has focused on how to find the smoking gun, that one file that's going to change everything. In the 16 years that I practiced, and the hundreds of, who knows how many, lawsuits and investigations I worked on, I remember one where we really did have a smoking gun, two pages that changed everything. But that was only one case. What we really did when we worked up matters, it was a question of accretion, one little detail on top of another, until you get somewhere meaningful. It sounds to me like that latter is the approach you're taking, building up bit by bit by bit until you get to something meaningful, your Mount Rushmore of information in the case, if you will. Am I reading correctly?

Damon Goduto:

"There's a lot of data in the world. I think the stat is every 18 months, the amount of data being generated in the world doubles. Our function in this whole big pie of the legal vertical is, 'How do we start with this and help our clients get down to this?'"

I think that's exactly right. We do a lot of work with privilege analysis. Being able to leverage machine learning, to start with the communication analysis. Look at what law firm domains is this information generated from; what's the role of the sender, as opposed to just did it come from a law firm domain, yes or no. What's the role of the sender? Is it a lawyer? Is somebody in marketing? Is it somebody in IT, et cetera? Who are the parties who are adverse, who are the parties who aligned, who's the in-house counsel?

You're starting with some communication analysis. We're using a bot detector, that kind of analysis, as part of that priv analysis. And then, doing a back and forth consultative approach with the lawyers and saying, “Fill in the blanks for us. Are there any other law firm domains we're missing? Who are the parties involved?” It's a step-by-step iterative process to just deliver something better. Then in our review platforms, we're saying, “Zero to one hundred, here are the documents that are likely classified as privileged.”

If you just want to prioritize your priv view and use that, great. If you want to ignore that and do what you typically do and run your search terms, fine. But we're providing that analysis because we think it's helpful and we think it drives efficiency. And that's how we view our role.

There's a lot of data in the world. I think the stat is every 18 months, the amount of data being generated in the world doubles. Our function in this whole big pie of the legal vertical is, “How do we start with this and help our clients get down to this?” And you're right, generally, it's not a smoking gun, but hopefully we don't spend a lot of time and resources on having the most costly piece of this equation, lawyers, very smart lawyers, looking at documents that just have no need to be looked at. Updates from your bank. Hey, your checking account is overdrawn or whatever it is.

George Socha:

With the privilege analysis that you do, have you been able to quantify the effectiveness of that in some fashion?

Damon Goduto:

We've definitely got some case studies that our clients have published that show the gains. I don't know the numbers off hand, but they appear to be pretty dramatic. If you start with, had we not had any kind of technology used, traditional technology, we're running search terms, we're doing some domain culling, we do this review, maybe you're reviewing 500,000, 100,000, 50,000 documents. But using our technology, maybe that 500,000 goes down to 200,000, maybe the 100,000 goes down to 30,000. Those gains again, incremental improvements that get you where you need to be faster.

And maybe sometimes it works in the reverse which is, we've run our priv review, we think these documents are privileged. And we say, okay, we found these documents that scored very highly. Here's the reason we think these might be privileged. Maybe it's just a QC mechanism, so that something doesn't slip through the cracks.

George Socha:

What sort of adoption rates have you seen with your approach to priv? Are people looking at it, looking skeptically and moving on, or are they trying it out? Are they trying it out and being enthusiastic users of it?

Damon Goduto:

I think it's near a hundred percent for our customers because again, there's not much they need to do. If they work with us a little bit to say, “Which parties are involved? What are we looking for here? Who's in-house counsel?”, all of those things, the results are there in the review application. Again, there's not much to do there other than look at the score and the reason. We are providing that reason in the fielded line item of, “Here's the reason we think this is privileged”. Once they see that it's possible and what's available to them, that there's almost no reason not to use something like that.

Using Reusable AI Models

George Socha:

One of the things that can be done within Reveal NexLP - Reveal AI as we're calling it now - is to create and then reuse models. And you're doing some of that, a lot of that. Right?

Damon Goduto:

Absolutely. Sure.

George Socha:

Talk about that a bit.

Damon Goduto:

We're big believers in this. When you look at the difference, for me it's always the difference between what's the status quo? And the status quo is we're going to run search terms, maybe we're going to recycle search terms from a previous matter because we've had some success with that.

Context

But models are just a more intelligent way of searching and categorizing your data. Some of the examples I love to give are, if I were to send an email out that said, “Listen, we've got to get together on our pricing that we're offering to our law firm clients, we’re pricing Reveal at this, let's make sure we're all pricing Reveal in the same range.” The text of that email is totally fine if I'm sending that to other members of my sales team, because that's just good business. We want to get together on pricing. We don't want different customers to have different experiences. We want to make sure we're aligned on pricing. But if I send that exact same message to a competitor of mine at a different provider and multiple competitors, well that's price fixing inclusion. It's illegal, it's problematic. And it's something you don't want to do, right?

But if you're running search terms, you can't tell the difference. What those models allow you to do is look at things like, “What are the domains involved? Are these people competitors? What are their roles?” That gives some richness and that machine learning kicks in and says, “This is something you might want to look at. It's not going to be a hit on your search term, but it is something you might want to look at.”

Another example is if I'm sending an email that says, “Hey, what are you doing and who are you with? I need to know.” And I send it again and again, then you see a bunch of those emails. Well, it might not be a big deal if I'm sending that at two o'clock in the afternoon and I've got some sort of rogue employee who's not being responsive. But if I'm sending that at three o'clock in the morning and it's a female employee and these are unwanted advances, Well, it's creepy. I don't know what the legal term is, but it's creepy. It might be harassment or whatever. Those are the kinds of things again, if you're relying just on searching, keywords, you might miss those kinds of things, where the models can go, “It's not just about the words. What time of day was it sent? What's the role between these two people?” It gives richness, much like an experienced lawyer.

Pattern Recognition

Things like pattern recognition are incredibly important. To me, it's one of the weakest areas that we have in reviewing information inside of legal, because if you've got one or two lawyers and they've got 500 or a thousand documents in a database, they know intimately what's going on in all of those documents. They see, they were talking about this, then the conversation shifted to this and it's all pretty easy to track. But if you've got 100,000, 500,000, 4,000,000 documents in a database and you've got 50 or 100 lawyers, each one of those lawyers, he or she can do a nice job looking at what's inside their own individual vertical, what's inside their own silo of information.

But no one's tracking things that rise and fall. If you're trying to spot things like codified language, that's one of the great examples we had, something we worked on at NexLP, was the sales team just up and left in the middle of the night. They just packed up and said, we're out of here. And the corporate client said, there must've been signs. There must've been signals that these people were planning something. Nobody just quits like that at once. So they brought in a law firm, they ran a bunch of search terms after a few weeks of review and they didn’t find anything.

Then sure enough, within minutes of using pattern recognition, they said, “Wait these people talked about these same 10 subjects, pretty much exclusively for five years, but about 90 days before they left, they started talking about fruit.” And there's no reason for them to talk about fruit, it wasn't any part of their job description, it’s not what they were selling, but they were saying things like, “Did you get those apples? I got those apples. Those are some tasty apples.” And they were substituting the fruit names for product lists and price sheets and those kinds of things.

Again, if you're just relying on search terms, you're probably going to miss information. If you're leveraging something that's available to you and readily available inside of Reveal, you're going to be able to find these things much faster and more efficiently and more accurately.

Anomaly Detection and Story Cards

George Socha:

A lot of what people do with artificial intelligence in eDiscovery, is to focus on one part in particular, call it TAR, call it predictive coding, whatever you want to call it, the technology that lets you find more like this. Which is fine if you know what you are looking about, Jay Leib, Irina Matveeva from NexLP, now Reveal, talk about anomalies and the importance of locating anomalies, those things that are different from what was going on. That sounds like a lot of what you were just discussing and I gather it's a large part of how you look at things, how to approach things. What might those anomalies be? How do we figure out if they're even there? What do we do if we don't even know what anomalies to look for?

Damon Goduto:

Yeah, no, it's a great point. The feature I love in NexLP is the story cards. It's essentially your anomaly detection. It's measuring, I don't know, hundreds of different baselines. And they say, “This is normal for these people, this usually happens about this amount of times, then this happens.” Whether that's useful to you or not, who knows? The technology doesn't know, it just knows that this is something that's incredibly unusual that's happening in the dataset and you may want to see it. And if not, you ignore it, you move on to the next card. Especially in an investigation sense, where you don't know exactly what you're looking for, you think something's going on, leveraging that kind of technology can be an incredible time savings.

Reusing Models

George Socha:

So you create a model, you're using AI active learning of some sort - predictive coding, TAR, whatever you want to call it - to create that model, then you are able to use those models on other matters, right?

Damon Goduto:

Yes. One of the questions that we would get over and over again at NexLP was, “Can I leverage one client's data over here on another matter on another client's data?” I think our response and I think this makes sense is, sure. If we went to a law firm today and said, draft a contract, how many times is the lawyer going to sit down with a blank slate and go “The party of the first part and the party of the second part, and let's get together on this day and agree that…” No, you're going to start with the template that you've created over and over again. What's important to know is the proper names announced are scrubbed out of those models. If I'm harassing someone, it doesn't say, “Damon harassed Billy”. It just says, “Manager harassed employee”. You don't get the specifics there. And of course, the fully encrypted can't be hacked into all of those things. It's the thinking behind those decisions that gets carried over from one matter to another, not “This person did something and we know who this person is, let's look for that person again”. That's a totally different kind of deal.

Models Outside Litigation

George Socha:

You’re using models across clients. You're using them in the context of lawsuits. But you're using them in other arenas as well, aren't you?

Damon Goduto:

Yes. We get into the monitoring space once these models are built. I think this is the future. I thought this was the future of NexLP. I know Jay Leib thinks this is the future. We'll see, I'm usually wrong about all of these things. I also am the one who said, “Privacy that GDPR stuff, never going to fly.” Now we've done a 180 on that, we've opened up our São Paulo office and they've got the LGPD, which is their version of GDPR. We are incredibly active in solving privacy challenges, but I was completely wrong. I tend to listen to my customers and the market really decides who's right on these things.

In my mind, if you can leverage some of the past knowledge and you can train data sets up on how to spot issues. If you've got a harassment model, that example I gave earlier, somebody reaching out to somebody late at night, what are you doing? Who are you with? Creepy behavior, and you can build a model around that; that now crawls the network and spots that in real time. You can start putting out the match before it turns into a forest fire. You can start counseling that employee: Why are you sending the emails like that? That seems silly.

To me, I think it moves the lawyers from a reactive mode where they've been for, you know… lawyers are part of a guild, right? A 2000 year old guild. So you move them from back where they've been in reactive mode, “Something happened, let's dig in and see what's going on”. Move them a little closer to the front where “Something is happening. Let's see if we can step in and really help our clients.” I think it makes the lawyers a little more sticky with their corporate clients as well.

Brazil

George Socha:

You mentioned almost in passing, São Paulo, and you'd mentioned to me earlier that you have spent more than a little time down in Brazil. Why Brazil? What's going on there?

Damon Goduto:

Oh, it's the Caipirinhas, mostly. They're delicious. I just can't get enough of the Caipirinhas.

No, but I mentioned, you've got the LGPD… The thing that most people overlook about Brazil, is that Brazil has the second highest amount of lawyers, not lawyers per capita, the second most lawyers in the world are in Brazil. A few years ago when I was an investor over at Thread KM, we were selling collaboration licenses to lawyers. So I said, well, where are the lawyers in the world? And of course, the U.S is number one. I think it was like 1.2 million lawyers in the U.S. Brazil was number two and it was a solid number two. I think they had like 600,000 lawyers in Brazil. And then it's like the U.K, with 300,000, in Germany with like 80. But Brazil was solidly second. And I went, wow, I never would have guessed that.

So I said, if you have that many lawyers and they're working on huge matters, it's a thriving, legal economy down there. I bet they have the same needs as our customers in the US and the UK would have. I started going down there a lot for ThreadKM. We had some great clients down there. It's always stuck in my mind as that market is traditionally very underserved by legal technology. I've got a great friend Jose Graciotti, he's been to, I don't know, 30 of the last 31 ILTAs. He's a wonderful gentleman and he really piqued my interest down there and brought me down there and introduced me to a lot of people.

Especially with the passing of their privacy law, that creates… Anytime there's seismic changes where the law starts to impact business, that certainly creates some opportunity and some confusion too, about how do we get our hands around this, how do we find PII and PHI, specifically in Portuguese? And those are just problems we'd like to solve.

George Socha:

Great. Well, Damon, thank you very much. Damon Goduto is a Partner at Lineal.

I am George Socha. This has been eDiscovery Leaders Live, hosted by ACEDS, and sponsored by Reveal. Thank you all for joining us today. Next Friday, March 12th, our guests will be Megan Lopp Mathias of the Lopp Mathias Law. Damon, once again, thank you very much.

Damon Goduto:

Thank you, George. Thank you everybody.

eDiscovery Leaders Live: Damon Goduto of Lineal