Head of Growth
At Text IQ, we developed our solution because we found many stages in the privilege review process where current methods were inefficient and inherently risky. These stages represent choke points that our technology—AI for Sensitive Information—can overcome.
Choke point 1: Normalizing all variations of a person
This figure shows an example where an attorney Sara Shackleton appears in the dataset in many variations: S, Sara, Sarah, SSHAKL, and so on. There is only Sara Shackleton, but in the datasets she appears in a nickname—”S”—that would return every single document, along with a handful of other variations, like email addresses, misspellings, and other nicknames and colloquializations.
It can be impossible enough to keep up with a complete list of attorneys in a dataset. This example shows that it can even be impossible to keep up with all the variations of one attorney alone.
Text IQ has developed a technique for entity normalization that’s unique in the marketplace. Existing methods for normalization rely on complex boolean search, using techniques like fuzzy string matching. They might normalize “Sarah” as Sara Shackleton, but they’ll struggle immensely to normalize the other variations shown here, like the nickname “S” or the LDAP for “SSHAKL.”
Our innovation, the Social Linguistic Hypergraph, allows for a superior form of entity normalization that follows two assumptions: if “S” and Sara Shackleton are the same person, they’ll use the same language and they’ll share the same communication networks. We combine the language signals and the social signals in the dataset to accurately determine whether they are the same person.
Choke point 2: Finding unknown attorneys
Most likely, your first step in analyzing an email, before looking at the content, is to look at the persons in the metadata. The first questions are: who is the sender? Who is the receiver? Are they attorneys? Are they conveying legal advice? Even if the body of the content doesn’t seem privileged, if one of the communicators is playing a legal role, this email will get scrutiny.
The simplicity of this fact belies the complexity of a central challenge. It can be impossible to determine whether a person is playing a legal role by looking at their job title, because job titles often don’t accurately reflect a person’s organizational role. For example, the email shown above does not have an attorney as the sender or the receiver. However, it turns out that Marie is not just an assistant, but an assistant to an attorney who transcribes emails on the attorney’s behalf. This makes Marie’s inbox a treasure trove of potential privilege.
The bottom line is this: titles are misleading. Marie’s email, where a non-lawyer conveys potential legal advice under the direction of an attorney, is one such example. Another example is when an attorney plays a dual role that includes giving business advice.
How do we automatically determine the true roles that people play in an organization, despite their often misleading titles? Using the Social Linguistic Hypergraph, we analyze an individual’s language and social networks to automatically infer the true role that is often belied by their misleading title. If that role is of a privilege-conferring entity, like an attorney or an employee acting under the direction of an attorney, then we will understand accurately that true role.
Choke point 3: Identifying attorneys mentioned using first name
In a communication like this, where there is no attorney in the metadata, and no obvious legal language in the body, the document would typically get missed. Our innovation has been to combine signals from the social network and linguistic analysis of the context of the email. The Social Linguistic Hypergraph can automatically infer which John, out of hundreds of Johns in an organization, is “John.” If it turns out to be John West, an attorney, we will flag this as potentially privileged for second level review to determine if it should be withheld and logged or produced.
Choke point 4: Understanding imprecise search terms
People will typically search for generic terms like “legal,” “attorney,” and “privileged,” because these search terms do catch privileged information missed by other methods. The problem is that these search terms, being imprecise, also catch a lot of junk and require review teams to put eyes on potentially tens of thousands of false hits.
For example, the top email above shows one of many non privileged emails that will hit the search term “legal,” because in general, about 90% of “legal” instances are not privileged. Text IQ has developed technology that can tease apart the different semantic uses of the same word. Understanding the context allows us to cast a tighter net, catching uses of the word that make the document potentially privileged and discarding the false hits to the not privileged set for production.
Choke point 5: Automating the Privilege Log
Parties are required in a second request to create a privilege log, which records the reason why documents were withheld under a claim of privilege. And crafting this privilege log requires a nuanced effort. You don’t want to give such a detailed reason as to why the document is privileged that you waive privilege. But you also need to say enough about what makes it privileged to satisfy the obligation.
Because there is a real risk and likelihood that apparently tenuous privilege claims will get challenged, a lot of money gets spent on privilege logs, because the people best equipped to make the nuanced analyses and craft the best privilege log entries are expensive.
Text IQ provides natural-language Reasons for every document that it identifies as potentially privileged. And, because our solution automatically normalizes entities, it also creates consistent references to every person in the data. We provide the starting point for the privilege log, where attorneys need only modify the existing Reasons rather than creating them from scratch.
Dethroning the privilege screen
The result of our technology is that we achieve a drastic reduction in risk, time, and cost. When we have tested our technology in completed matters with more than a million documents, we have invariably cast a tighter net: coding a fraction of the documents as Potentially Privileged, while also surfacing thousands of privileged documents that the traditional process missed.