Slack Attacks to Zoom Booms, How Weird Data is Impacting eDiscovery
The last year has seen a proliferation of atypical data types in business communication and the litigations and investigations that arise from it. The way that humans communicate personally and professionally has not seen such a seismic shift since the advent of email. In 2020 alone over 218 billion apps were downloaded and increasingly those applications are designed to support business needs and not just fill idle time. For legal practitioners this app explosion is creating quite a bit of confusion but at the same time poses an amazing opportunity.
Newer methods of communication that have swept across the business realms like a digital tsunami have more in common with personal applications than traditional business communication. Many newer forms of communication are shorter in format, incorporate multiple data types and frequently contain a visual component. What does this mean for a legal professional? The old way of approaching digital evidence in eDiscovery simply may not work effectively when facing a sea of memes, gifs and emojis. Every step of the information governance and eDiscovery lifecycle should incorporate emerging ESI sources today.
Flavors of Weird Data
Even before the massive work from home (WFH) revolution, a major shift in the way people communicate personally and professionally was well underway. Long gone were the days of pen and paper correspondence in favor of email, but even that traditional business staple was starting to see a surge of newer, more informal, and rapid forms of communication. For legal practitioners, the universe of potentially relevant electronically stored information (ESI) in scope for a potential eDiscovery matter was already beginning to expand.
But, with the massive workforce shift from in the office to remote dramatically accelerated the reliance on and relevance of newer app-based methods of communication. Social media, app based short format communication and video conferencing technology use increased dramatically. As a result, the volume of potentially relevant ESI from these sources has soared. For lawyers and legal technologists alike, it is important to understand what these new methods of communication are, what benefits and limitations these data sources offer and where the heck to find them!
At the outset of the pandemic, social media engagement saw a massive surge increasing by over 60%. People are engaged for longer periods of time across more social media platforms because of the requirements for social distancing. The surge in usage was not solely in personal usage. Organizations and individuals alike used the social media goliaths to bridge the social distance, relying on LinkedIn, Twitter, Facebook, and Instagram to interact directly with potential clients, raise brand awareness, and answer customer complaints. This vast digital social footprint at the organizational and individual employee level is all potentially discoverable if deemed relevant to a case.
The main platforms that users engage with today are also seeing a shift. Earlier in my career the big dogs of social engagement were Facebook, Twitter, and a dash of Instagram. Today there are new entrants hitting the market all the time, with some gaining followers in the billions in a fraction of the time it took their predecessors. Today’s social media platforms fall into a few categories:
Old School Social Networks
Facebook, the preeminent traditional social network, is a combination of short format posts and images depicting personal lives. LinkedIn, the business networking analogue to Facebook, more professional in nature but still comprised of posts, images or video and connections. These platforms can provide quite useful in personal injury cases if a post provide evidence contrary to a defendant’s claim of injury or in cases where a person’s wealth or lack thereof is at issue. Twitter is a shorter format combination of the professional and private aspects of the other two platforms. Recent events have highlighted the tools used as evidence in major world events, instances of coordinating illegal activity and harassment.
New Social Giants
The newer entrants to social networking come in a few flavors but all share one thing in common, rapid adoption at a scale that makes legacy platforms look like child's play! In direct messaging applications, international messaging apps have seen massive adoption. WeChat and WhatsApp are the leaders with over a billion users each, but other international apps like QQ are gaining market dominance.
Short format video platforms have also seen a massive resurgence with TikTok usurping Snapchat and Vines at a breakneck pace. As I recently wrote, TikTok has taken the U.S. by storm, growing to a staggering 1.1 Billion users as of 2021.
There has also been a surge in alternative social media platforms in the last 2 years because of concerns around content curation and/or censorship. Depending on the nature of your matter, the style of evidence or ESI relevant to the issues on a case, platforms like Parler, Minds, and various Discord servers should potentially be included in your electronic discovery scoping and review.
Platforms designed to facilitate real time collaboration of remote teams, like Slack and Microsoft Teams were adopted en masse as a direct result of the WFH revolution. With the mass adoption of these collaboration tools, the potential to use their data or metadata as a key source of evidence has increased dramatically. The tools themselves function in a different manner than email, offering the ability to share files, have multiple channels of short format rapid fire communication rife with gifs and emojis across asynchronous communication channels. The rapid fire, short format nature of the communication allows for an increase of informality and can lead to a wealth of potentially discoverable information if the challenges of discovery on that ESI is overcome.
While these platforms can offer substantial efficiency and benefit to an enterprise, as noted by the rapid adoption across top organizations, they also pose unique challenges in terms of eDiscovery. Collaboration tools were not designed with eDiscovery in mind and often that means limited or challenging data export, hard to review content and ESI that does not fit neatly into a "document" structure during the eDiscovery document review process. Data exported directly from Slack and other tools are nearly undecipherable in its raw state, a file format called JSON. The data format is hard to decipher minus a platform that can effectively parse and render it.
Ephemeral messaging, sometimes referred to as self-destructing messaging, is a form of digital communication that lasts a very short time before encryption or deletion. Ephemeral messaging is App based short format communications hosted on mobile devices that disappear from the recipient’s screen after the message has been viewed. The self-destruct function is carried out in several ways, including programmatic self-destruct function, specific trigger event (e.g., opening or closing a message), or upon the expiration of a pre-defined time frame. Deletion happens concurrently on the receiver’s device, the sender’s device, and on the system servers.
Increasingly, people from all walks of life are relying on Ephemeral Messaging Apps like Telegram, Wickr and snapchat to communicate without leaving a digital trail. Given the ability to encrypt or potentially destroy messages, many people play fast and loose on these applications, and they can, as a result, contain a wealth of information. Some cases have successfully subpoenaed and recovered ephemeral messaging data and/or the mere use of this type of communication tool has been used to secure an adverse inference in cases. While not necessarily fully destroyed, material on these applications can be very challenging and costly to retrieve — so practitioners must validate relevance.
Few things are as ubiquitous to the WFH revolution as Zoom and the multiple other Video conferencing technologies organizations have relied upon to bridge the gaps caused by social distancing. Legacy tools like GoToMeeting, Teams, Skype, FaceTime, and a myriad of other app-based videoconferencing tools have also seen rapid increases in adoption. Email and in-person meetings have been supplanted by collaboration tools and video-conferencing tools with chat functions all rife with potentially relevant information for the deluge of cases facing practitioners.
And this technology also poss challenges to legal teams beyond the preliminary collection of evidence, as more courts are turning to video conferencing to support the need for socially distant court proceedings. This poses the potential that Zoom data may be relevant in appeals and motion practice as well as ESI discovery.
For practitioners, it is important to ensure that their eDiscovery technology can support image detection, categorization and search as well as effectively present and parse the chat components of the video-conferencing platforms many professionals are now spending upwards of 7 hours a day on.
SMS or text messages are not necessarily new kids on the block in terms of usage in personal life, but the use of texting for business purposes has also skyrocketed in the last 18 months. Today most datasets at issue and ESI subject to retention will include at least some text messages.
The nature of text messages, instantaneous and informal nature, can lead to a greater richness of information not seen in the more formal email exchanges. By now most people think of email as a business record and are less likely to share incriminating communication than perhaps in text. To gather a complete picture of a custodian’s activity, especially of the nefarious variety, digging into this sort of data can provide more insight than traditional sources.
With the increasing relevance of short form and mobile data there are some key considerations in a discovery effort, including extraction of all necessary metadata fields and parsing the data into a readily reviewable format.
Smart Stuff (IoT)
IoT refers to the network of physical objects (things) that are embedded with sensors, software, and other technologies to connect and exchange data with other devices or systems via the internet. The IoT ecosystem consists of over 35 billion web-enabled things (from smartwatches to doorbell cameras to smart cars) that are constantly performing tasks, collecting data, and sharing it via the web. Smart devices possess a wealth of information about human behavior, movement and many could contain relevant data dispositive to a case or investigation.
The IoT spans a variety of applications in personal and professional life including:
• Consumer IoT: Stuff a private person might use directly like smart cars, fitness wearables, and home automation
• Commercial IoT: Organizational-level devices in smart healthcare, transportation, logistics, and building automation
• Industrial IoT: Smart devices relating to industrial applications like agriculture and manufacturing
• Infrastructure IoT: Smart cities with operations of sustainable urban and rural infrastructures, monitoring energy and environmental impact
• Military IoT: Smart devices for reconnaissance, surveillance, and other combat-related objectives
Several cases have been decided based upon information harvested from IoT devices and that number is likely to grow significantly in the coming years. From a murder in Florida that Amazon Alexa "witnessed" to heart rate data from a Fitbit of a murder victim, and even trucking logistics trackers have been used in personal injury and property damage disputes. This category more than any other poses significant concerns from a GDPR and data privacy standpoint because of the wealth of Personally Identifiable Information (PII) and/or medical information they contain. Practitioners also should be aware that certain smart devices are more susceptible to data breach and other cybersecurity concerns.
Considerations when facing weird ESI
As weird becomes the new normal in uncovering key evidence in a potential litigation or investigation, it is of paramount importance that law firms and the general counsel alike ensure that they are including the atypical in their scoping and taking the quirks and nuances of each data type into account when planning and budgeting for electronic discovery. The new eDiscovery workflow, when taking atypical data into account, differs from the brute force linear approach and creates many opportunities to leverage Artificial intelligence (AI) to make connections across varied data sets. Every step of the eDiscovery lifecycle, from legal hold to production, there are nuances that practitioners must consider as they embrace atypical types of data.
ESI File Type
Since the 2006 amendments to the Federal Rules of Civil Procedure, lawyers and legal technologists alike have understood the need to include ESI in scoping and undertaking eDiscovery. But the types of data remained fairly limited to documents, spreadsheets, excel, email and perhaps some text messages if the case team was feeling ambitious. Today there are a variety of new data formats that pose challenges for effectively extracting both the face of a document and its associate metadata as well as visually presenting the data in a review platform in a manner that allows the case team to quickly understand and categorize it.
From .JSON files in Slack, which I like to say looks like code that threw up more code, to Mp4 video content from web-based video platforms or even recordings of video conferencing platforms, it is imperative that case teams understand the format of the data export from atypical data sources and the capabilities or limitations of their platform to support it.
With many of the newer data types residing in applications and not necessarily behind the firewalls of a corporation, it is increasingly important to understand where your ESI resides and what limitations there might be to fully exporting it. Practitioners must be sure to scope on premise and private cloud hosted enterprise data, applications that store data on a physical device and cloud hosted application data which may or may not be backed up by something like the Apple iCloud. The location of data may impact ease of collection and the type of consent and passwords necessary to extract the atypical data and all the relevant metadata associated with it.
What Tech to Use
Along with starting the workflow of your eDiscovery matter with broad scoping of the technology and an understanding of the nuances of file format and location, next is understanding whether your legal technology can support the filetypes and formats. Not all legal technology is created equally in this regard, so I always recommend requesting an example of the specific data type you anticipate engaging with as it appears in the eDiscovery document review platform you plan to work with.
Working with the right tool can be the difference between a painful, slow process and quickly making connections and uncovering key evidence. In the cases where multiple data formats and types are present, atypical, and traditional alike, using Artificial Intelligence to make connections across disparate data sources can make a massive difference. Machine learning and AI can uncover concepts across multiple communication threads and types and identify who is speaking to whom with what frequency even across many differing applications. Relying on an algorithm to improve insights is especially impactful when dealing with large data volumes and variety of data sets.
Legal technology that is supercharged with artificial intelligence can offer an additional support in whittling down the universe of data far earlier and surface key evidence. Attempting to use brute force human cognition to make connections across the many threads and disparate communications in atypical data is time-consuming and costly. AI powered legal technology solutions and workflows help case teams prioritize data sources based upon how frequently key custodians use them and understand that email may not be the first data source investigated. Use insights from each investigated data source to triangulate in on key periods of time, concepts, and data ranges to ensure that you are not boiling the ocean but rather taking a precise approach to mining for relevant information.
So, What is a Legal Eagle to do with weird ESI?
Simply put? Use it! There is a wealth of potentially relevant information living within the ones and zeros of atypical data sources. The service providers, law firms and corporate counsel alike managing eDiscovery today would be well served to include atypical data types in their information governance policies, ESI scoping for eDiscovery and risk calculations for cybersecurity risk mitigation. From the pandoras box of application hosting in Apple's iOS to Slack data or Zoom recordings, stakeholders in all areas of a case team should have a plan that factors in weird data, because it is becoming the rule and not the exception in modern eDiscovery