Meet Brett Tarr. As Head of eDiscovery at Caesars Entertainment Inc., he’s helped steer the casino and hospitality giant into the data-intensive world of eDiscovery. As information management around governance, privacy, and eDiscovery has become increasingly unwieldy in the big data landscape, Tarr’s focus has become more proactive, rather than reactive. We caught up with Tarr to get his take on the greatest challenges facing heads of eDiscovery, and how the flood of data will continue to affect the discovery process.
How is the growth of data affecting eDiscovery?
In the future, we are going to see more data, which means more people navigating eDiscovery with its increasing challenges. The first challenge is getting a handle on the high volume of data presented in a multitude of formats. In the early days, it was just email. Now it’s Dropbox, Yammer, Slack, and others. These different tools that transfer data require us to develop new technologies to capture that data. The second challenge is the identification and location of the data. Data rarely comes with well-designed plans for adding conducive eDiscovery systems. Locating the data is both time consuming and expensive. The last challenge deals with setting the scope of initial discovery requests in litigation. In many instances opposing parties cannot agree to discovery plans and resort to involving the courts in making determinations as to what data is discoverable. Therefore, not only the volume and type of data, but also the means by which the discoverable data is identified, significantly slows down the overall process.
Is privilege review a problem for your team?
Yes. Privilege review requires a great deal of manual effort and is very time consuming. The first pass review for responsive documents can take weeks of human review hours spread over dozens of reviewers. Once that’s completed, privilege review continues with yet another long process of human review by contract attorneys that requires even more effort and time. This process contains the inherent risk of dozens of different human minds interpreting documents and the definition of privilege itself differently. Not to mention the impact of fatigue and distractions. It is only after all these long human hours that outside counsel gets their hands on the privileged documents to do the final pass review and prepare the privilege log.
Predictive coding has been discussed as a technology that can deal with these data challenges. What are the pros and cons of predictive coding?
Predictive coding attempts to use technology and statistics to limit the volume of documents that human reviewers have to sift through. It requires an attorney trained in subject matter expertise to train the machines to study a small percentage of documents, with the goal of capturing a representative population. It then extrapolates those decisions on responsive documents across the entire document population. When successful, it is more consistent and faster than human review. However, a highly specialized attorney has to train the machine, taking time away from the high level work they can be better utilized for. This also sometimes leads to delegation to attorneys without the same level of expertise and who cannot adequately train the machine.s Moreover, the training set is only a small portion of a potentially huge data set. . As a result, the knowledge getting plugged into the algorithm is likely less accurate across the board while also being overinclusive.
Even once predictive coding is deployed, it only flags for responsive documents – it doesn’t analyze and tag for issue codes. So if you have a complex case with multiple claims, you still need individual reviewers to go through all of the overinclusive documents for issue code tagging.
Have you tried predictive coding for privilege review?
No. It is not designed to measure the nuanced determination of privilege. We have used predictive coding for responsiveness, but not for privilege review because it is too ineffective when dealing with high-stakes data.
What does “high-stakes” data mean to you?
In any matter, there is going to be a broad collection of documents that will potentially be relevant. The discovery process is overinclusive, so the reviewer has to find a way to refine and limit the results to what seems responsive to the discovery requests. High-stakes refers to the sensitive and confidential data that is gathered after the non-relevant data has been put aside. The challenge is to parse through all the documents to get through this small set of crucial data with the most efficiency and accuracy as possible.
In today’s litigation landscape, high-stakes data can be found almost anywhere. Attorney client privilege and work product are high-stakes in that once this information is seen by opposing counsel, it cannot be unseen even with clawback procedures.. High-stakes information also includes information that could potentially harm the organization or put it at risk. For example, if there has been a security incident where private information has been compromised – for example, if SSN or drivers license info has been mishandled either by internal or external parties – then this leaked information opens the company up to legal liability and and reputational damage.