Why the ESI Explosion Demands New Tools for Document Review

Why the ESI Explosion Demands New Tools for Document Review

Part 1 of a series about the role of AI in a world that’s leaving TAR behind.

1978 was the year that should have changed everything.  A massive antitrust lawsuit filed against CBS marked an unprecedented change in the scale of discovery.  Previous suits, even big ones, had dealt with small pools of typewritten documents and the occasional fax.  But CBS, quick to adopt new technology like early forms of email, had a pool of over 6 million documents, each of which had to be reviewed for discovery.  An army of attorneys sat in a warehouse stuffed with printouts and read each document from beginning to end, setting aside the few that were relevant to the case.  The endeavor took months. The cost was over $7.4 million in today’s dollars.

In the decade that followed, personal computers shrank and the internet was born, changing the face of communication and causing document populations to balloon.  Yet still, as industry after industry adopted digital communication and discovery demands grew with the new digital universe, the process for finding key documents remained exactly the same as it was in 1978.

“Weeks upon weeks in either a warehouse or a conference room flipping through bankers boxes and reading paper documents.”  That’s how Shannon Capone Kirk, now eDiscovery counsel with Ropes & Gray, describes her work as a document reviewer in the 1990s.  “If we found something that was relevant…we would tag it with Post-it notes.  That was how archaic it was.”

Kirk’s experience was typical in an industry that is stubbornly resistant to technological change.  By the mid-90s, it was clear a crisis was coming. The staggering size of the CBS document review, a shocking outlier in 1978, had become the norm just two decades later.  The law was slow to respond: document review stuck stubbornly to armies of reviewers with boxes of printouts until the late 90s. Early technological solutions were simple.  Documents were scanned and “petrified” as static image files; they could be sorted primitively into categories based on date or type, but finding key documents still meant looking manually at each one, like searching for old newspaper headlines on microfiche.

TAR promises to change the game

Technology Assisted Review – or TAR – promised an end to the technological stagnation.  In its earliest forms, TAR simply added Boolean search, allowing reviewers to pull relevant documents by keyword.  Soon, however, TAR leveraged burgeoning machine learning technology to take efficiency gains to the next level. A 2005 study by Anne Kershaw found that reviewers using TAR were able to identify key documents with 95% accuracy, compared with 51% accuracy when reviewing documents by hand.

The game-changing innovation was predictive coding.  Rather than grouping documents primitively by topic, predictive coding begins with a seed set of responsive documents, hand-coded by a subject matter expert.  An algorithm searches for patterns within that seed set and attempts to apply them to the entire document population, pulling potentially responsive documents for human reviewers to check against the topic model.  Over multiple review stages, the process becomes gradually more accurate until the computer can identify the relevant documents across an entire population.

The 2012 endorsement of TAR from the bench in Da Silva Moore vs Publicis Groupe seemed to indicate that the legal industry was finally catching up, technologically speaking.  By 2015, new federal guidelines had been implemented for the use of TAR in discovery, and litigators who opted not to use it faced penalties from the court.  In 2016, the defendants in Wallace vs Tesoro Corp were penalized for having “utilized an ESI search methodology that was virtually guaranteed to avoid finding relevant ESI” – in other words, for having tried to manipulate the case by robustly ignoring the capabilities of TAR.  Technology-Assisted Review was clearly the way of the future.

A connected world redefines the problem

As fast as TAR grew into widespread use, ESI – Electronically Stored Information – grew faster.  A connected world means more communication: in 2018, a staggering 281 billion emails are sent every day, to say nothing of texts, IMs, and other forms of digital chatter.  In a large civil suit, which could involve years’ worth of communications from hundreds of people, the number of emails alone can climb into the millions.  Improved processing power and CPU speeds have improved the workload that TAR can handle, but the problems of the new digital world go well beyond simple scale.

In the early days of digital communication, discussion of a topic was fairly linear.  Two people might exchange emails about a subject; if they discussed in person someone might take typed minutes.  Now, however, a conversation can wind through ten different formats, fragmenting along the way as someone starts a conversation over email, continues via text, and follows up in a private Slack channel.

The more a conversation fragments like this, the more crucial context becomes, and that’s the type of problem for which TAR wasn’t designed.  For all its power and speed in document retrieval, predictive coding can’t look beyond the four corners of a document. If an email reads “Jess says hold off on the phone call,” no increase in CPU can enable a predictive coding system to make meaningful conclusions about whether “Jess” is Jessica Findlay, attorney, or Jesse Weathers, delivery driver.  At best, the algorithm spits out the email for a human to review. At worst, it ignores the document completely.

Even if predictive coding pulls up such a document for human review, it still might not be recognized.  A reviewer, looking at “Jess says hold off on the phone call,” might have absorbed enough information from the documents she has already read to know that Jessica Findlay was planning a case-relevant call.  But, since even the fastest reviewer can’t read more than a fraction of the millions of documents in a large-scale review, there’s no guarantee that a reviewer will have read the right documents to make that connection.

The implications of these limitations are significant.  In November 2018, Apple frantically tried to claw back hundreds of accidentally-produced privileged documents in its patent suit with Qualcomm. The judge refused, on the basis that Apple had not taken sufficient steps to prevent their production in the first place.  Given that Apple – a company with considerable resources – certainly used expert reviewers and up-to-the-minute TAR solutions, it’s hard to imagine what steps Apple didn’t take in its privilege review.  Apple didn’t fail to use the right tools; the problem was just more complex than the “right” tools could manage.

Life after TAR

The problems of a connected and prodigiously-communicative world are only going keep expanding beyond predictive coding’s capabilities.  The communication explosion shows no signs of slowing. ESI continues to accumulate at an astonishing rate, and current solutions continue to apply a brute-force approach to a problem that needs complexity and intelligence.  New discovery regulations, intended to streamline the process, are only compounding the problem by requiring value judgements about the “proportionality” and “reasonable burden” of a review.

TAR was an astonishing breakthrough in the problem of large-scale document review, but the nature of discovery has evolved beyond what predictive coding’s pioneers could ever have anticipated.  The solution to the explosion of ESI isn’t in improvements to TAR: it’s in an entirely new approach, using technology as complex as the problem it needs to solve.