eDiscovery Leaders Live: Julia Hasenzahl of ProSearch
Each week on eDiscovery Leaders Live, I chat with a leader in eDiscovery or related areas. Our guest on April 16 was Julia Hasenzahl, CEO and Co-Founder of ProSearch.
Julia and I had a wide-ranging discussion. We started with Julia talking about ProSearch’s innovative enterprise model, what’s different about it and why it has been so successful for them. Two crucial parts have been pricing and metrics, both of which Julia covered before turning to effective data reuse and repositories of data about data. On TAR, Julia borrowed a line from Nike – “Just do it” – and offered pointers on how to actually just do it. She talked about the challenges of data from collaboration tools and Office 365, but suggested advantage to be found in those different data types as well. We then discussed data and rest and at the source before closing with the cost of insuring and securing data.
Recorded live on April 16, 2021 | Transcription below
Note: This content has been edited and condensed for clarity.
Welcome to eDiscovery Leaders Live, hosted by ACEDS, and sponsored by Reveal. I am George Socha, Senior Vice President of Brand Awareness at Reveal. Each Friday morning at 11 am Eastern, I host an episode of eDiscovery Leaders Live where I get a chance to chat with luminaries in eDiscovery and related areas.
Our guest this week is Julia Hasenzahl, CEO and co-founder of ProSearch. Julia launched ProSearch in 2006, along with co-founder Trevor Allen at a time when, if we think back, data sizes were increasing rapidly and eDiscovery had become a huge bottleneck, especially dealing with high volume processing.
They invested in significant processing resources and IT infrastructure. (They didn't know it at the time, but I was watching from the sidelines.) Able to consistently process large volumes of data, they begin processing data for many eDiscovery providers while continuing to invest in infrastructure.
Then in 2009, with an eye to meeting their corporate need to manage discovery as a business process, which was really starting to come into its own at that point, Julia and Trevor rolled out their first fixed fee managed service offering, no no burst rates, no overages.
Managing an entire portfolio of matters that could be measured and improved created the freedom to focus on doing really great work. I know that as well because I had a client who was making use of just those capabilities. It allowed Prosearch to innovate and introduce automation, analytics and TAR and use these technologies consistently without the types of budgetary constraints people dealing with X number of dollars per gigabyte, times an unknown number of gigabytes were confronted with. Julia and Trevor demonstrated the value of concentrating discovery work with a single provider, enticing other corporations to follow suit and become ProSearch clients.
In 2018 they made the decision to expand their offering to include law firms and corporations with discovery portfolios that were more suited to subscription or transactional price models. In doing so, they were able to offer many innovative solutions, technologies, and applications created for the corporate roster.
Today, ProSearch managers 4.5 petabytes of active case data, it’s grown to 225 employees, headquartered in LA with offices in Dublin and Hyderabad. I don’t know, maybe your Dublin offices are just down the street from ours.
Before starting ProSearch, Julia was COO of BMC Group, a legal and compliance support service provider. Before that, she spent 9 years at EY focusing on enterprise shared services. Julia, welcome.
Glad you were able to join us today.
ProSearch’s Model: An Enterprise Partner for the Enterprise
And I hope you're not too embarrassed by the introduction. I’d like to turn to the first topic I'd like to talk about, and in part why I went through that introduction, which is to talk about ProSearch Enterprise and what I think is really the unique value you are offering to the corporate legal community. Could you go into that in greater detail?
Sure. Well, as you covered in your introduction, more than 10 years ago we introduced enterprise level services for a fixed fee. We’d worked through some other alternative fee arrangements, and the decision was kind of selfish. We wanted to be able to use technology, we wanted to be able to offer services with Trevor and myself doing the work without having to sell them. And this was at the time when threading was just emerging and we were using Equivio and Equivio threading and predictive coding. And so, we thought, well with the fixed fee model we will be able to use these tools and deliver these services on all matters without having to sell each one of those services within a matter. And so that really created an opportunity for us to do cool work, and that was our goal, was to do the interesting work.
Working closely with corporations we were able to develop relationships where everybody was invested in that model and that model being successful. We were closely tied to the corporation and their business tools. I think it's interesting now, looking at say CLOC and you see eDiscovery at the mature end of the model. We've been fortunate over the years to work with corporations who started moving their eDiscovery into a business process and treating it like a business process that can be managed very early. We've had the advantage of learning from that over the years, developing services around it, and more closely aligning what we're doing to their business objectives.
Now as we’ve pivoted a bit to transactional work and subscription models, we’re able to deliver some of that value, that same value in those models that we were delivering in our fixed fee model, really by focusing on business process and business process improvement and improvement from matter to matter. Ideas like reuse of data and reuse of work product and reuse of the same processes, the same business process, I think is something that we've become experts in and to enforcing those processes and standards with each matter and with each law firm and helping our corporate clients, not only establish, but maintain their standards and evolve their standards over time.
As we see a shift now in discovery and corporations wanting to take more control, not only of their data but their entire eDiscovery process, the role of a vendor changes as well and that is to help them do that, to be a business process partner. We're able to do that at an enterprise level. I think that that experience has really prepared us for this shift to be working alongside corporations as they re-imagine their own discovery process as a business process.
Creating a Viable Fixed Fee Model: A Conversation, Not Just a Price List
I'd like to break that down into a few parts and the first part is the pricing model itself. Certainly in 2009 and even today the consistent objection I have heard from so many organizations on the provider side and on the law firm side, has been: “Sure we understand you'd like one number for the year but we can't do that, it’s not practical to do it, so let us tell you our pricing model” and it boils down one way or another to X number of dollars per unit - gigabyte, user, whatever - times an unknown number of units.
Of course, on the corporation side, they go, “Well, yeah, but we don't know the number of units and we won't know until we have to deal with that, but we've got to do a budget now and figure out what we want to do in the coming year”.
How did you manage to come up with a viable fixed fee model for corporations to use?
Oh, I think I have to credit our clients as much as ourselves. Part of it came from a shared commitment to creating and making that model work. Certainly the first year we're working with a client, they might not understand their total volume and so we're taking a risk, they’re taking a risk.
I think that shared risk brings us together rather than pulling us apart and that's really what we've relied on. I’ll be honest, there are years when we've received twice as much data as we were expecting and years when we received less data than we were expecting. The nature of our relationships lets us look at it in kind of a rolling average: How have the last 3 years been? We've learned about each other’s businesses, we understand their portfolio, and we've gotten better at managing that risk. But I think there is some risk there and we start with a conversation about, not just how much data you have - I think that data and gigabytes used to be the marker - but what do you want to do with that data and what's the objective and how many matters does that data represents.
The calculation is a little bit different, but certainly it creates a relationship where we can go back to our corporate partner and say, “Hey we have a system, we have a business process, and we have a law firm that's not following that process and it's making this work expensive for you, it's making this work expensive for us”. Or, “You have a business unit that's not following this process”. I think it's really that shared objective of making a model that successful for both of us continue to be viable. It's something we do together, it's something we manage together. It's a conversation about the pricing, it's not just a price list. I think that the relationships are essential and our corporate partners are essential to our ability to be successful doing that.
I think it's also important that we try not to make that the problem of the law firm and or other partners that they're working with and really make it work between ourselves and the corporate legal department. Together we're supporting their discovery process and supporting outside counsel. It really requires us to work very closely together.
I would guess that good metrics and effective use of those metrics has to be key to all of those.This is me guessing, I would think you would want to be tracking information at a very granular level and be able to feed up with the analysis of that information to your internal people as well as to your clients in a way that lets them get at that, when they're looking at that, so they can have a better understanding of what they're doing, what they're consuming, where they're fitting and you the same. Am I correct in my guess there?
Absolutely. I think the advantage we have though, is we have that data. If we have most of a client’s data, then we have their metrics and so that reporting becomes more meaningful particularly over time. It's important when we're talking about the volume of work and how work has changed. I remember many years ago now, but working with a corporate client and work was really shifting in the nature of their litigation was shifting, and the way we express that and the way they expressed it internally to their legal department was really us reporting on actions that we created and not just our own but also outside counsel’s. Being able to talk about not just that the data had increased, but in this particular matter it's so complex; your outside counsel is performing 15,000 searches, they're performing X number, we're creating 25,000 searches a year to support them, etc. That information really has been able to inform how the work is changing and the nature of the work we're doing, not just for ourselves but also for all the other partners working with them. It also allows them benchmarking between to see what kind of work is being done on different types of matters between different firms, different managed review vendors, etc. Having that information actually benefits everybody in that little ecosystem, to be able to provide metrics like that across so many matters for a single client.
Effective Data Reuse
The metrics touches on a second area you mentioned in the beginning, and that's the effective reuse of data. I've been hearing people talk about reusing data for as long I've been dealing with data and eDiscovery and even before that, just paper data. I've seen very few people actually make effective reuse of their discovery data or the data about what they do with that discovery data. How do you help people in this?
Not sure it's going to help my case here, but I have to give more credit to our clients again. I think what we have is a process where they have tight controls on their collections and discovery and managing discovery internally and have a very strong understanding of the data that comes to us - how it was collected, date ranges, all the information about that collection process - and we keep it. When it comes time to reuse data, the information we have about that data is of a quality that allows us to make the decision to reuse it. Then we don't have to have concern about whether or not it was complete or how it was collected in the first place. It gives people confidence in reusing data and knowing what additional collections have to take place.
Again, the management of the data throughout the process is what makes that possible and viable. We have clients who probably, easily 25 percent or more of their data that we ingest into a new workspace each year is data we already have that's not being recollected. That's an important part of our model and it doesn't just come from having a key custodian database, it comes from really having good record keeping and tracking around all the data that we have in order to be able to identify it and reuse it, both from the collected data at the source, reprocessing it into a workspace and then also identifying it in other matters and moving it and oftentimes with quite a bit of work product with it to another matter.
It's a confidence everyone has in the systems around it and the tracking of it and automation around it that creates the ability to reuse it. Any uncertainty about it, even if we have uncertainty, leads to recollecting data. Being very transparent about that process makes it easier to reduce that data.
A Repository of Data About Data
So I’ll go out on a limb here, but I think it's a pretty thick limb and I don't think I'm going very far, and hazard that guess that you're not managing all of this just with Excel.
No, absolutely not. Certainly we've built tools that... I would name them, but we're very good at building tools and automating the process, we're very bad at naming our tools. The tool we use to manage data through the process is called “cockpit” (sp): chain of custody tracking power tool. From the moment we ingest data, we're collecting everything from the manifest, all the information about the data, and track that all the way through a production. Maintaining a data warehouse of all the activities we perform against that data, is what gives us reporting analytics and also confidence in that data to reuse it in that process. It's the data we can use to present to our clients to understand the work that we're doing. This data warehouse of data about the discovery we do is a very important component of what we're offering. Having that data repository that reflects years’ work of matters, thousands of matters for a client, becomes a valuable resource.
TAR: Just Do It
One technology that has gotten a lot of airtime over the years and has had some very strong proponents, me among them, is TAR or predictive coding or active learning or whatever you care to call it. We're not going to go into the debate about the name. What we do see though from the reports we can get generally, is their adoption levels of TAR are nowhere where many of us thought they would be when we were looking at this say a decade ago. What advice do you have for people who aren't using TAR?
Not to oversimplify, or to steal a phrase from Nike, but “just do it”. I think that we've been using TAR technology for over a decade, probably, I think, starting in 2008. One thing about our fixed fee again, it gave us the opportunity to use this on more matters than people probably had the ability to do in a transactional sense. And we had the partnership with the corporation to say, “Yes we want to use this technology”, and so that allowed us to experiment more and use more technology. Over the years we've used most of the commercial products. We've tested all the commercial products. We’ve found how we manage the quality of it and get the results people want,
But what we confront is really a lack … I think it's shopping. First of all, there's a lot of shopping for TAR and not a lot of doing of TAR. I was talking to Gina Taranto, our dead of Applied Sciences, yesterday about this very topic and I think well, what are people waiting for? Is it more proof, s it a new tool, or is it some sign from above?
I think the proof of TAR is really in the doing of it. If we can remove the obstacles to doing it and that's really what we focused on. For some attorneys it's the interface and some interfaces are better than others. For others they're never going to see the interface and really explaining the results to them is how you're going to prove it. Sometimes it's doing it alongside a traditional model and showing them what they missed out on. Either way we're not going to do more of it if we don't start doing some of it. I think that's what we really emphasize; let's just get started someplace, let's start with one matter.
With one of our corporate clients, when we first started using TAR, we had a coordinated effort to pitch it to every internal litigator, and then literally used us and the internal legal department. We built a success story and we told the next internal litigator, and within I don't know maybe 6 or 9 months we got to the point where we set a goal. Every single matter that has more than X documents in review or potential review, you were required to use TAR. Now it's not a question because we proved to them the value of it by doing it. I really think that's the key to advancing it and accelerating the use of it: using it and proving and using that proof to get the next project and to do the next project.
And also to keep those expectations realistic. Sometimes the sales pitch is complicated and people don't understand, “How am I going to interface with this? What's my role in this TAR process?” Where we can step in and say, “We'll run the machines for you, we'll make the technology go, we'll show you the results”, also helps with the adoption. We're explaining results, we're not explaining how the tool works. When we switch to the business of explaining results we get more traction than trying to explain how it's going to work. They might not ever understand how it's going to work but understand the output. We have to be better explainers as well in order to encourage that adoption.
Collaboration Tools and Office 365
Shifting gears (and hoping the dog doesn't bark over the top of me), collaboration tools and Office 365 are posing challenges to our industry and probably especially with the changes we've seen over the last year with COVID. How are you approaching that?
As a company we've decided to be part of the problem and we've dumped everything we do into Teams, we’ve shifted out of email. We look at ourselves as an example of what's going to happen.
Using Different Data Differently
A lot of the focus has been on chat and messages. The growing challenge will be not chat and messages but a whole different way of communicating and storing information that requires a different way of thinking about discovery. The focus right now is on the obvious challenge of collecting this data and processing this data. In the scheme of things that's the simple part. There are challenges there. Certainly there is no standard; it’s not like email. We have to figure out how to process each type of data and there are unique processing profiles for them and those are changing and evolving, but still that's just the data and we can get our hands around the data.
I think the more interesting thing will be, now that we have it, what are we going to do with it? How is it going to change review? How is it going to change TAR? What is TAR against chats and emojis and a lack of sentences? The patterns of data will be different. How we review will be different. Right now, everybody is still trying to take that and cram it back into this email-and-attachment metaphor. You see all this data going into EML formats. The format and how they look might be improving but your ability to manage the data, search the data, organize it, and make it into meaningful review sets is what's going to be valuable a year from now and two years from now.
As we see more of this data, we’ll go back to the same problem we had with email and that's how do I eliminate it? I have too much of it. How do I target those review population better? How do I target my productions in a way that I don't have to redact all of the content that isn't meaningful?
Bringing Those Conversations Together to Tell a New Story
The thing we see, I see, just in the way I use these tools, is how do we bring all those conversations together to tell a new story? If I'm in a meeting in Teams, I am most assuredly texting people who are in that meeting and carrying on three conversations. Pushing that data into an EML format isn't going to tell that story. You're going to need to say, what was I saying during these two hours on this day across all the platforms where I communicate? What was I doing? How was I exchanging information? And I think that that's the bigger challenge we have and the more exciting challenge, but it's on the organizing of the data, making it meaningful in review, and creating meaningful review sets and a way of producing them that is not as burdensome as that work is right now.
I'm kind of excited to get past the middle part of this first part of, can you process this data? Sure, we can figure out how to process that data. After I do, what are you going to do with it? I think that'll be the fun part, if there is a fun part of this, it’s reinventing how we review that data and present that data in this discovery process.
Data At Rest and Data At the Source
I know we're running short of time, but there are a couple more things I wanted to get to. And the first is, also looking to the future and what we're potentially going to be able to do or at least the challenge we face, addressing data at rest and data at the source. What do you think can be done and what do you wish could be done?
That’s the Holy Grail, to not move all of this data. I think it will be a challenge for the industry because, as you mentioned at the very beginning, this has been an industry based on gigabytes and transferring data, moving data around. In talking to our clients, they want to know how can we do a better job of culling at the source and I think the next step is, how can I process it at the source? How can I access it, process, do my value-added processing, and take the data about your data that we need for this discovery process and move it forward? And I think that the cloud is the middle part of that that will eventually lead to our ability to access that data directly.
When that happens it's going to change a lot about our industry for us and for everyone. First and foremost is the value proposition of what we do, will not be about how much data are you sending me but what activities am I performing against your data and what's the value of them, and where am I storing this data about your data.
Right now it probably seems like a threat but I think with the increased size of data, it will be an imperative for all of us because they'll be too much data to handle if we don't figure out a way to process it at the source and deal with it at the source just because of the time it takes to move data and move data in large volumes.
I think our conversations about the cloud, our conversations about hosting data and the size of the hosted data, and whether or not to have a hosting repository and how you're charging for it, I think that's the middle part. The next part will be, how do you not move any of my data, how do you keep it where it is, take what you need, add value, but not take all that data with you? I think that will emerge in the next 3 to 5 years. A lot of that's going to be contingent on security and access and the security controls that are required. And we're already seeing some of that now with just solutions around Microsoft Office 365. It requires access to that very data that you're trying to protect in order to accomplish that. Securing that data and being comfortable about the security of data will be essential to that concept of processing data in place at the source. I think security is the first thing that we'll have to solve to get to that possibility.
The Cost of Insuring and Securing Data
Which takes me to my final question about the cost of insuring and securing data. Any closing thoughts on that?
I think that the whole industry, the whole world - just this week more talk about SolarWinds. I think that the challenge here in securing data, the cost of securing data, and ensuring the security of that data, and we have not yet begun to see the kind of attacks we can expect from other countries, from hackers, wherever. And I think that they'll have to be a shift about how we view security, both more security, but also a shift in our understanding of the risk and the liability around that where the liability and risk is enormous and at some point securing data is as expensive as storing data. I think that there needs to be a lot of progress with respect to security in order to manage the risks and the cost of the risk and I don't think we know the answers to those questions yet. But I think they're growing and with each breach, not in our industry but in general, that concern increases.
On that perhaps not such an upbeat note, I’d like to bring it to a close Thank you, Julia. Julia Hasenzahl is CEO and co-founder of ProSearch. I am George Socha, this has been eDiscovery Leaders Live, hosted by ACEDS, and sponsored by Reveal. Thank you all for joining us today and please join us again next Friday, April 23rd 11 a.m. Eastern. Thanks Julia.