When Accurate Identification of Personal Information is Critical, Text IQ is the Best

Demonstrated best in direct comparisons, Text IQ offers clear accuracy benefits to enterprises to better identify and protect consumer data



The Dilemma: It’s not if but when for a data breach. When the worst happens, it pays to have the best AI technology

When a client suffered a security incident tied to their CFO’s email which included thousands of messages and attachments that were potentially compromised, they immediately engaged the law firm Seyfarth Shaw and noted cybersecurity and digital forensics partner Richard Lutkus. Lutkus is a pioneer in law and technology with significant success in helping clients avoid or clean up costly cybersecurity incidents. He knew he needed a software solution to accurately and efficiently identify all of the compromised PI from the incident.

AI put to the test
Lutkus, who encounters companies struggling with PI identification in the data breach context daily, decided to test the industry’s leading tools for his client. The dataset consisted of 195,000 documents, mostly emails, and attachments. Lutkus ran the document set through both Text IQ’s AI solution and other leading solutions to detect PI for human assessment.

From this challenge,12,287 documents, post-deduplication, were identified as containing PI. “Text IQ’s AI was so much better at finding the true PI than the other approaches. Ultimately, we completed our review using Text IQ’s results after our sampling and evaluation,” Lutkus said.

Clearly, Text IQ’s AI solution was superior to the alternative options, but what about other providers of PI identification that utilize AI technology? In the pursuit of continuous improvement, Text IQ put the company’s AI for PI identification up against PI identification solutions from Microsoft Azure, AWS, and Google. While these cloud providers do not specialize in finding sensitive information like Text IQ, their APIs are widely available and extremely popular.

Richard Lutkus of Seyfarth assisted with this test by obtaining the client’s approval to utilize the subset of 12,287 documents that had been recently reviewed as a result of a privacy assessment following a security incident. Each of the documents in this subset had undergone both AI and human review and was known to include PI.

"Text IQ’s AI was so much better at finding the true PI than the other approaches."


PI Identification Champion

target (2)

2x More Accurate


Socio-Linguistic Hypergraph


The Results: Text IQ identifies PI 30–300% better than the other solutions

By all accounts, when comparing out-of-the-box PI detection solutions, Text IQ was at least 30%, and up to 300%, better than the others.

With an F-Score of 0.65, Text IQ’s accuracy was 30% better than AWS Comprehend and Azure (with F-scores of 0.50 and 0.49, respectively), 50% better than AWS Macie (0.43), and three times better than Google (0.14).

Actual F-Score Comparison: 

data breach document set f score comparison dark
The analysis demonstrated how difficult it is to reliably determine and identify personal information in unstructured and structured sources, including emails, patient lists, employment documents, and health records.

While Text IQ was the clear winner in this comparison of out-of-the-box APIs with no prior familiarity with the dataset, the nature of its socio-linguistic hypergraph technology would allow it to continue to improve with additional use and familiarity within an organization. Basically, the more it’s used, the more accurate and efficient it becomes.

risk white



time white



cost white




Download the case study