eDiscovery Leaders Live: Adam Bown and David Fisk of Salient Discovery
Adam and David started by letting us know the different paths that brought them to eDiscovery and gave an overview of their organization, Salient Discovery, and its background. We turned to “state capture” in South Africa, a form of investigation most likely not familiar to most of our audience. David and Adam discussed how they apply AI tools such as NexLP to reduce the size and scope of problems early, as well as the use of those types of tools proactively. From there, we moved to defensible deletion as well as building and reusing AI models, and closed with their thoughts on the ideal eDiscovery and investigations platform.
Recorded live on February 5, 2021 | Transcription below
Note: This content has been edited and condensed for clarity.
Welcome to eDiscovery Leaders Live, hosted by ACEDS, and sponsored by Reveal. I am George Socha, Senior Vice President of Brand Awareness at Reveal. Each Friday morning at 11 am Eastern, I host an episode of eDiscovery Leaders Live where I get a chance to chat with luminaries in eDiscovery and related areas.
I have two guests here with me this week, both from Salient Discovery. Adam Bown, is the Head of Cloud Forensics and Data Analytics at Salient Discovery, in South Africa. David Fisk is the Managing Director of Salient UK. Adam, David, welcome.
Hello there, George, and thank you very much.
Glad to have you.
Hi, morning. Thanks, George.
I'd like to start with each of you giving a brief background and tell me something about the motivation - how you got into this and why you're doing this. David, let's start with you.
What Brought David to eDiscovery
Thanks, George. It's interesting, in your introduction you talk about luminaries, and I feel a little bit of an impostor here in some respects because I don't necessarily consider myself to be a luminary in the space of eDiscovery. But my journey to where I am today is really around observing the convergence of various technologies that I’ve been involved with throughout my career. Maybe if I give a little bit of background to that, it might put into context and explain why. The first was the idea of archiving. I first came into the eDiscovery world with a company called Zantaz, I'm sure you're familiar with George and out in Pleasanton, California. Archives are a key source of eDiscovery, as I'm sure everyone is aware, and that was a great experience. Frankly, I think Zantaz was quite ahead of their time in some respects. They had in their Digital Safe products what you might consider in today’s terms to be a sort of a cloud archive, that kind of compliance-based journal archive of emails and other contents. On top of that they also build various sorts of supervisory methods and sampling techniques and so on to help people keep abreast and compliant with the various regulations. And of course, the Introspect piece they acquired from Steelpoint, I think it was. Very, in a lot of ways, that end-to-end journey that a lot of people are trying to promote. That was the first component.
The second was a natural progression from that, as Zantaz was acquired by Autonomy. I’m sure everyone knows that, and you know that they were big in the search world. What is eDiscovery if it is not search? That led me into, they coined the term, “meaning based computing”.
From there I moved on to a new company called RAVN Systems. They brought some great ideas to the market, combining the basics of full text search with objects storage models and graphing technologies to provide a model which frankly would serve the eDiscovery world extremely well and we never took it in that direction, to be fair. This is sort of technology that only today is becoming mainstream through products like the Microsoft Cortex project and so on.
The final component, which was also a second theme of RAVN, was the idea of extraction of entities from unstructured content for legal enrichment, primarily in those early days for due diligence purposes, but the whole concept of extracting concept of extracting information from unstructured data I think plays very, very well. The combination of those areas led me by, I don’t know, by luck or faith to a position where I wanted to be applying that to the world of eDiscovery, hence that’s my backstory to how I'm here and involved.
What Brought Adam to eDiscovery
Okay. Adam, how about you?
Hi, thanks George. My backstory is, if go way back I was actually a surveyor and found my way into IT and moved from there into infrastructure analytics, and then joined forensic IT in two of the Big Four firms and spent a fair amount of time doing forensic analytics and running eDiscovery projects.
In those days I used to find that those sort of analytics were run quite separately. The unstructured data would be done in eDiscovery and the structured data done in analytics tools. Over time you’ll see the convergence of those tools and overlaying of structured data in eDiscovery platforms and the visualization of data.
"What really excites me at the moment is the maturation of technologies like natural language processing and AI modeling and how that’s driving the advances in both of these spaces."
I’ve been through that journey, but I think what really excites me at the moment is the maturation of technologies like natural language processing and AI modeling and how that’s driving the advances in both of these spaces. I referred back to your webcast with Udi Hershkovich and how he was saying how that is moving to the left in the EDRM spectrum. I find that particularly exciting.
You are, both of you now, today at Salient Discovery. You have operations at least in the UK and South Africa. Tell us a little bit about Salient.
Salient is, as you say, a provider of eDiscovery and what we refer to as Cognitive Analytic Services, and we look to deploy that into both litigation and investigation type environments. Our heritage is South African in that we're involved at scale with very large investigations of fraudulent activity from the corporate level right up to the state level. It's our desire to take that skill and ability and experience that we’ve gained over the years, now into markets up in the northern hemisphere as well. There's a concept around that that we call “state capture” and I know Adam wanted to expand upon that a bit, being a local to it.
Please do, I was about to head there, because I think that's very different from anything that our listeners in the US deal with.
Yeah, the coin “state capture” in our context here in South Africa is that there have been some scandals where a politically connected family and their affiliates are close to the former president and had been given the inside track in terms of potentially appointing ministers and people of influence in what are called state-owned entities. They are sort of semi-privatized with majority state owned. Therefore if you have influence in those entities, you could win tenders and the like. There's a number of investigations taking place with various law enforcement agencies. Because that state capture was so widespread across so many entities, there are many point investigations taking place and commissions.
It’s our passion to see, as I mentioned earlier, how some of those technologies can be used to establish fact patterns across many disparate sets of data and investigations, because there’s common law players and connections in that data, and bring those to the surface earlier on and raise maybe persons of interest that are not particularly on the radar or well known. It’s very much in the forensic and investigators’ space, that sort of tools we are using in that space.
Intelligently Applying AI Tools like NexLP to Reduce Size and Scope of Problems Early
Well that gives me a great lead in. What sort of tools are you using, and in what fashions in that space, to help accomplish the objectives you need to take on?
"We found the NexLP technology as something that would complement our activities in the space: to be able to raise the sentiments that were there, to identify entities, and so on and so forth."
I don’t think it’s any surprise that we came to Reveal through the NexLP solution. We were, as Adam was saying, looking for practical applications of AI and NLP to apply to these vast volumes of content.
I think the AI angle has perhaps got a bad rep after the first phase of AI technologies that were released, where perhaps the deliveries versus the capabilities were overstated in some cases. It's the ability to apply them as tools, practical tools. I don't see them as a silver bullet. I see them as a tool. They are only as good as the craft person who is wielding the tool - as with anything, woodwork or whatever. I quite often use the term the “human in the loop”: how a human interacts with the tools, which frankly do what computers do well but not what humans do well. You have to always remember to play to the strengths of each. A human is fantastic at lateral thinking and following what’s up here [in our heads]: “Now, what's going on? I've got a hunch.” That's not something that's very easy to program. Whereas computers, obviously, you can apply them at massive scale to do the kind of stuff which would frankly take years to deploy.
We found the NexLP technology as something that would complement our activities in the space: to be able to raise the sentiments that were there, to identify entities, and so on and so forth. It was a logical approach to take that and built it into the approach.
As Adam was saying earlier, the whole idea of applying those earlier on in the process, further to the left… Most of the AI technology to date has been applied more at the review stage, CAL and predictive coding and so on, very much applied, in my mind, a bit too late, particularly in investigatory situations. Yes, it's great to do and I'm not disparaging the capabilities, however if you can intelligently apply those technologies to reduce the scope and size of the problem which you then ultimately pass to review, then surely that's a good thing in terms of getting the job done quicker. And in the case of state-sponsored and state-paid-for activities, helping the public purse as well. Those are key messages that we try to get across.
Using AI Proactively
Let's assume you are the beginning of a hypothetical version of one of those investigations, way to the left of the EDRM, some of their particular, and perhaps specific needs that are in place with state capture. What would some of the first things you do look like, and what would some of the first uses of AI that you’d put in place, look like at those early steps?
Adam, do you want to speak to that?
Yeah. I think a lot of eDiscovery practitioners find a challenge, that when you are called in, or an investigation starts, you’re really reactive. If you start moving to the left, and if you look at the EDRM model on the left, there’s this big circle of data governance. The first step is trying to get that right.
The AI could also be used and some of the natural language processing can be used in that space. I see it as a proactive eDiscovery, where it is initially maybe just promoting the preparedness and proactivity of getting your house in order and understanding what you have and how it's managed and therefore making sure you do the reactive work well.
That could be even moved further to the left as you say, where you can take the identification into a supervision type of activity, and using some of the technologies almost to be in a supervisory context and therefore mitigate risk earlier on. Our view is prevention is better than cure. In answer to your question, that's going far left, but what we could do is, after maybe, doing the initial connection, running natural language processing to identify sentiments, organizations, people involved, looking at the timelines and connections between people to identify persons of interests and other organizations,that we might not have considered, as well as grouping of threads of emails together, based on a number of vectors and factors that a human would not really be able to bring into play. I don’t know if you wish to add anything, David.
No, I think that covers it. From our perspective, the application of the technology is something which we're passionate about and it plays very nicely into our overall thinking about our direction of travel as a business. Yes, we’ve got this experience in large scale fraud investigation, but in terms of our general direction we see there being a very significant convergence between the worlds of data governance and eDiscovery. We have a sister company, which is big into the ideas of providing advisory services around data governance and journeys to the cloud and the security and the compliance and all the kind of funky stuff that you do in that space. At the end of the day, when we look at one of our target markets being the corporate market, there is also a very continuing trend of the legal services being brought back in-house in those corporates, and as such they're going to need some technology to support that drive. But also, why not try and apply the same technology that you need for a reactive environment, also to the proactive side of things?
When I talk about that convergence, at the end of the day you might be a bit controversial and say that the eDiscovery world is perhaps parasitic. It lives off of problems that have occurred and as such it's all about trying to help people resolve them as quickly as possible. Ultimately, the corporates themselves are or should be in control of their own destinies in terms of managing the sheer size and the scope of the problem insofar as it's data related and quite often it is. That could be everything from a data leakage problem, something under noncompliance in a context of a general data protection regulation or all the various other personal identifiable items around that. That's big for everyone now, as well as obviously the regulated industries who need to manage their data. I think that sort of convergence is important and we see that applying the same technology proactively is a great way of differentiating ourselves in some respect and saying that we want to bring this to you as a vendor, to help you help yourselves. That's the message that we're trying to promote.
Now, on the reactive eDiscovery side of things, the traditional eDiscovery side of things, you get brought in at the beginning of the lawsuit, early on in the investigation, similarly you might help with efforts at responding to a request to be forgotten or something like that under GDPR. Moving over to the proactive side, it's a bit different. How do you help the people on the proactive side of things?
I think it's a whole number of aspects of what you need to consider there, and not least of which is the regulatory environment within which a particular organization works. It's going to be different for everyone. Everyone has to be responsive to the GDPR or equivalents in whichever region they are in, but other than other industries will be highly regulated, in the financial services industry, for example, having to be transparent on trades, as in Dodd Franks types requirements and so on and so forth.
At the board level, you have to work out where the risk of retaining and not disposing of content outweighs the risk of being called out for not managing it correctly. We are a service provider at the end of the day and we also partner with a number of organizations in the legal markets and we call in law firms to assist us in some of that advisory capacity; we’re frankly not qualified to actually provide legal advice.
Our approach is to sit with people and take them through a journey; like a discovery workshop where we can work out who they are, what their challenges are, where they've previously fallen afoul of regulations and to understand what their journey is, what their processes are, what their resources are, how they've approached things, how well it's worked, how well it hasn't worked. And to take all of that and sort of almost to guide them - a facilitated journey from where they currently are to maybe where they want to be based around the technology sets that we can bring to the table. Adam, did you want to add to that at all or have I stolen all your thunder?
Thanks, David. As you say, it is a journey and the issue is about getting your house in order. That requires a good understanding and facilitating a process and awesome people. I suppose some of the questions that they don’t ask themselves as a corporate from an eDiscovery perspective. Our experience has always been when there is a large discovery matter that needs to be done, it's quite disruptive. Operational resources are taken away from doing things they would normally do to go and restore mountains of backups, so it’s about a good understanding of what you have and where it is and how it’s managed.
Circling back to the supervision aspect, you could put some of the natural language processing or platforms do that like a NexLP, for example, and do some sampling on corporate data in high risk areas or in response to maybe a whistleblowing allegation in your environment. Obviously, there are concerns when doing that in today's world, from are you profiling people or targeting people, and from a privacy perspective, so there are some compliance methods that would have to be dealt with in that way.
It also depends on your maturity in the organization around those, but generally if you're going to be doing automated processing of data, you would need to have transparency with the employees and make them know they've given their consent to that. That's where I see those technologies being used more and more.
One of the data governance and risk mitigation challenges that corporations have faced for a long time and are going to continue to face for a long time I would think, is how to appropriately identify the data they no longer need or want to keep, dispose of that in a fashion that is responsible and not careless and in a way that they can defend and justify if challenged later on. Do you have any pointers for folks on that front?
Well, I think you hit the nail on the head there. I think there has been for a long time a tendency to sit on the stuff because it's too difficult to get your head around what you need to do about it and coupled with a fear of perhaps chucking the baby out of the bathwater if you think there is perceived value in what you are throwing away.
In our experience to date, age to data has deteriorated in value and probably is overtaken by its inherent risk the longer you retain it. But in terms of the guidance, in our sister company we do a lot of work which is based around trying to do things like categorize content. Unless you've filed stuff intelligently from the get go, which you probably haven't, as the data governance world has matured, you probably don't know what you’re sitting on. That’s a part of the challenge. Yes, you might know physically where it is, and you might not even know that, but getting that data map is important. Beyond that then, understanding the specifics of what it is, so maybe you’re retrospectively going through and doing some classification of this. That can be achieved through a whole range of different technologies, not least of which some are in the Reveal camp, but dare I say it, the Microsoft world has a whole range of capabilities now.
We and our sister company are co-partners in that and spend a lot of time helping organizations to retrospectively categorize their content and indeed to apply that going forward so they're not compounding the problem. Once you've got a good understanding of what you're sitting on, then you can start to make some intelligent decisions. But of course you then need to consider, is there any legal hold on this content and so on and so forth. Ultimately once you've identified it and been through that exercise and taken appropriate advice, made sure you’re compliant with your regulated retention period and all that good stuff, then and only then are you in a position to start pressing the button on the delete. That's our strategy.
I mentioned the convergence. The convergence is not only what we see between the compliance and the eDiscovery world, but as two businesses within our group, we also see a lot of crossover, an ability to want one side to benefit the other and vice versa.
I think some of the challenges, to add to what you're saying David, is in classifying that information, is to do it in place and how to do it in place without doing a shift and move to analyze and so on. We find a lot of clients, that everyone talks retention policies when actually we feel you should turn it on your head. It’s less about retention, it’s more about defensible deletion. We find companies sitting on mountains of data that they have no visibility of and what to do with. They have aging hardware and aging systems. It becomes quite a problem and then an expensive issue to migrate that, analyze it, classify it, categorize it. I suppose things like machine learning models to identify PII or privileged information and the like that could be run in place on those sort of archives, would assist greatly. David was talking about some of the Microsoft technologies and some of the classifiers that are available in Microsoft 365. They have a long way to go, I think, they’re fairly immature, but they are moving in that space. Insider risk management and communication type compliance technologies will go to great lengths, I think, in helping in that space.
Building and Reusing AI Models
I’d like to add to that, George, if I may, the idea of applying some of the technologies retrospectively to try and build machine learned models. I imagine a scenario where a law firm has run a particular matter on behalf of a client in whatever solution they've used to do that. One of the services we can now offer is the ability to take the outcome of that, the markups being done, and to reverse engineer a machine learned model that can then be applied going forward on a similar matter. If a firm often deals in, I don’t know, certain blue collar crime or whatever, they can build some models around that which they can then add to their estate and then apply going forward. I think that's another area which is very interesting from our perspectives that we're very keen to do more of.
The Ideal eDiscovery and Investigations Platform
For my final question to each of you, I'd like you to put your thinking caps on and give me some thoughts or ideas about what an ideal technology platform would look like for investigations, dealing with state capture, but proactive work as well. Assume everything is possible, assume no limitations whatsoever, anything and everything is possible. What would your dream platform look like?
That’s a good question….
I can go. I’m scared you might take all my ideas, I’d have nothing left, because we think similarly. If you look at a dream platform, I think it would be able to effectively do early data assessment in terms of - I'm just looking at this from an investigative perspective for eDiscovery - early data assessment: get rid of the rubbish. Then do some enrichment using these AI technologies we’re talking about, adding additional metadata, enhancing the data. For example, before you even got to review, you’re able to say “We understand that these documents have a high probability of gifts and kickbacks” for example and you can rate them on a percentage scale and you have that metadata available. Then it needs to do the hygiene factors of what an eDiscovery platform would do in terms of early case assessment, document review, productions and TAR, CAL and the like.
"It needs to be able to operate inline through connectors or APIs or other fabric into line-of-business systems or where massive archives or repositories of information might be like Office 365 or archive platforms."
The challenges on the proactive space… I mentioned some of the technologies that it needs to be enterprise-ready and be able to do things at scale, and it’s hard to do when you have to shift and move data. It needs to be able to operate inline through connectors or APIs or other fabric into line-of-business systems or where massive archives or repositories of information might be like Office 365 or archive platforms. That will then allow you to move further into the practice space and potentially use the technologies more in a supervisory capability.
Thanks Adam. David?
I'm very disappointed I didn't go first, because Adam has taken 95 percent of what I was going to say. I was hoping he wasn’t going to say the piece about at-scale, because my addition was going to be “And being able to do that directly into line-of-business applications so that you can become more proactive from the outset.” That's a bit disappointing that I missed my chance to go first there.
We’ve talked about it too much together.
Indeed. But no, I think that pretty much covers it really. I guess the other point is, perhaps also to be very open in terms of its interoperability with third party solutions. Clearly, there are a whole host of applications and solutions out there in the marketplace. Being able to provide a methodology for simply and automatically moving content from A to B to C, depending on the components that are involved, would be a great addition to the suite and to be able to simply and easily fit in the relevant components of the process to any third parties existing estate.
Thank you, Adam and David. Adam Bown is Head of Cloud Forensics and Data Analytics at Salient Discovery, David Fisk is the Managing Director of Salient UK, joining us respectively from South Africa and from the UK. Thank you very much for your time with us today. I am George Socha, this has been eDiscovery Leaders Live, hosted by ACEDS, and sponsored by Reveal.
Next week, joining us as our guest will be Brad Wilson, Director at Berkeley Research Group or BRG. David, Adam, thanks again.
Thank you very much.
Thank you very much George. Have a good day.