On December third, as part of The Inevitable 2020 Series, Text IQ presented the webinar How Email Lets Us Understand Social Networks Within Organizations featuring guest speaker Owen Rambow.
Working in natural language processing (NLP) for more than 30 years, Owen Rambow is an IACS Endowed Professor of Linguistics at Stony Brook University. His research focuses on natural language processing and computational linguistics. He has worked at AT&T Labs, was a research scientist at Columbia University for 15 years and joined startup Elemental Cognition LLC to develop software for deep language understanding as a Senior Research Scientist in 2017. Owen is also currently a technical advisor to Text IQ. He received a Ph.D. in Computer and Information Sciences from the University of Pennsylvania.
Those who have been around e-discovery for a while know the centrality of the Enron discovery corpus in testing the search and retrieval efficacy of document repositories in the early years of e-discovery. Interestingly, Owen was the first recipient of a National Science Foundation Grant given to academia for the specific purpose of analyzing an email corpus. The goal was to see if we can understand how people interact through email, the language used that may infer power relationships, and see if it was possible to discern organizational hierarchy and other dynamics from the use of language in the correspondences. The email corpus used was Enron’s.
Discerning Social Networks and their meanings
“People who are writing and then reading documents are in social networks. They are ‘nodes’ in a larger social network... and the social network is actually very complex...” ―Rambow
The inputs to NLP are documents. Those familiar with technology assisted review (TAR) are certainly familiar with “applications such as summarization, translation into other languages, or finding named entities like person names, place names, company names, or categorization of documents,” notes Owen. And there can be all sorts of categories.
“Documents can be categorized as legal or not legal, emails can be categorized as personal or professional. Newspaper articles can be categorized as politics, arts and sport, and the documents that serve as inputs to these algorithms can be either single large documents...or collection of documents like a collection of emails.” And, as we all know of course: documents don't exist in isolation.
They are written by people who exist within social networks. “All sorts of possible relations contribute to a social network. It's not just a binary thing such as friends/not friends," notes Rambow. And this is where it gets really interesting: social networks effect language use...they effect linguistic choice and communications.
Understanding Social Networks Increases Accuracy of Analysis
“When I'm writing something, either a legal document or an email, I take into account whether the addressee knows me or not because it affects the kind of tone I choose...Hierarchical relationships entail a particular choice of language. If I'm a subordinate, I will typically show more differential language than the other way around. And as we know, even though English does not have a very elaborated politeness code...like French or German...even in English we communicate very differently.” ―Rambow
The implication of this is profound for the e-discovery and regulatory and compliance community where the need for accuracy is critical: particularly regarding identifying sensitive and personally identifiable information (PII). Or analyzing language within communications to flag potential compliance issues.
“NLP should use knowledge about the social network because it should be able to contribute to increasing the accuracy,” says Owen. “If we know that two people are friends, then we know the kind of language they'll be using, and we can interpret it accordingly. And, if we don't know that much about the social network, we can look at the documents they're writing and exchanging and infer the social network facts from the documents. So it's a two-way information flow.”
This is not mere speculation. The value of using social network cues in enhancing accuracy in privilege reviews has been proven. You can learn more about how this is used in practice here.
Separating the Hype from Reality
“This is one of the key values that our community appreciates about Text IQ. That we bring in academics from top universities to cut out the hype and noise and really talk about the actual results. What AI can and cannot do in today's world.” ―Apoorv Agarwal, Text IQ Co-Founder and CEO
This presentation is a must-listen for anyone interested in the reality of artificial intelligence and the use of NLP in practice versus the marketing hyperbole that pervades the industry. And Owen Rambow is uniquely positioned to discern the two. Not only is he a leading academic who produced seminal work in NLP and computational linguistics, but his work in the private sector, including as technical advisor to Text IQ, imbues his insights with a much-needed emphasis on real-world results.
Professor Rambow presents four case studies in detail (including accuracy measures) that include working with both large and small corpora of documents. These studies clearly demonstrate the ability to discern not just individual attributes, but power dynamics, professional hierarchies, and other relationships. Most importantly, they demonstrate the superiority of this approach over the “bags of words” approach – even when the focus on words is combined with metadata analysis.