Insights From the Inevitable: What’s Hiding in That Massive, Unstructured Data Set?

Insights From the Inevitable: What’s Hiding in That Massive, Unstructured Data Set?

Your data could be putting your enterprise at risk

What is Unstructured Data?

Unstructured data, by definition, is information that lacks a pre-defined data model or organization. Ryan Zilm, Director of Information and Life Cycle Management at USAA, expands on this definition, noting that unstructured data “can live on your files, shares, your personal drives. It could be [found in an] inbox, potentially SharePoint, all of those other types of repositories.”

Unstructured, or semi-structured data, is growing exponentially within enterprises. In fact, up to 80% of a company’s data is unstructured. Justin Van Alstyne, Senior Corporate Counsel at T-Mobile, added during a recent Inevitable Series discussion, “The Elephant in the Room: What’s Hiding in That Massive Unstructured Data Set?” that enterprise data “sits on a spectrum. And, it isn't a one or zero sort of situation. You don't have one side structured and on the other side, unstructured.” This only adds to the complexity of the “elephant in the room”.

listing-bannerPanelists clockwise from top, left: Justin Van Alstyne, host Daniel Chapman, Russell Densmore, and Ryan Zilm.

An Expanding Problem

With 2.5 quintillion bytes of data created every day, data governance is becoming increasingly challenging for enterprises. When polled, 57% of webinar attendees answered that storing too much unneeded data, finding what they need in their data when they need it, and organizing their data are all top challenges at their companies. 

Russell Densmore, Global Data Protection Leader at Raytheon Technologies agrees. “With almost zero governance around it, finding sensitive information in data repositories during litigation can be very challenging.” And, just because data is “technically structured that doesn't necessarily mean that you have an idea of what you have or even they have an idea of what they have,” adds Van Alstyne. Relying on institutional knowledge of what’s in your data is no longer sufficient—or acceptable.pexels-taryn-elliott-6551925

Unstructured data brings with it a myriad of unknowns. Without good data governance or classification, it is almost impossible to know what your company is holding on to data-wise. Additionally, creating unstructured data is very simple. It is all too easy for employees to add unnecessary data fields to surveys or marketing campaigns, for example. Leaving behind a trail of a lot of unused, sensitive consumer data. Densmore points out that, “data minimization is huge. Why are you collecting data if you don't need it?” 

Regulation compliance adds to the data complexity. That massive unstructured data set? It’s still subject to CPRA, GDPR, and other privacy regulations. Without any classification or minimization guidelines, complying with regulations in a timely fashion can be a tall order for many companies. 

The global pandemic just served to emphasize the need for information governance practices when it created brand new data challenges for enterprises. Van Alstyne shared his experience at T-Mobile: 

“The pandemic [has] driven a lot of behaviors that created new data challenges. For example, recording meetings. That used to be something that happened, at least in my world, once in a blue moon. And now you see it all the time because people are in different time zones, dealing with their kids’ school, or whatever, and it's becoming a much more common thing. So it's like, that's the definition of unstructured data. How do you deal with that? The aftermath of that.”

Weapons In Your Arsenal

So how should enterprises slay their massive unstructured data sets and bring them down to a manageable size? Our webinar panelists offered their advice: 

  • Triage your data. Ryan Zilm suggests tackling your higher-risk data sets first. “When you can start focusing on that higher risk, you're going to reduce the risk across the organization. You’ve got to triage it. Ask: do you need to classify it? Is it sensitive data? Is there PI or other things that are embedded in there? You really have to understand that.” 
  • Know the boundaries of your data. “When you're dealing with a second request and one of the first things, in my experience, they're going to ask, is for a systems inventory,” shared Van Alstyne. “It’s heartburn-inducing to tell the DOJ that this is the universe of databases that my company uses because it's a very difficult thing to pin down.” Adding to that, Densmore explained that in his experience, most large enterprises don’t know the true boundaries of their networks, which is truly problematic. 
  • Start small. “You go for your high-risk stuff first, right? You take small bites, just start taking bites. Don't try to eat the whole elephant,” advised Densmore. Look for any classification, check for metadata or markers.  
  • Create Processes. Data governance should start with clear processes. Set retention limits for each platform. Don’t give people an option to hoard data. “You can start at some level by implementing those platform-based retentions to manage the content,” Zilm advised. “And, at least you have something consistent, and that makes it a little bit more defensible.”

    Van Alstyne agreed, “if you can go platform-specific, it puts a timer on people and that tends to drive action in my experience.” 

While not an exhaustive list, these tips will get you started in exploring and governing your unstructured data but our panelists agree that it is a huge undertaking. They warn that the process can take a couple of years or more. 


Technology to the Rescue

While technology like AI can be an effective tool in gaining control of your unstructured data, our panelists stressed that it is still just a tool and needs to be used alongside data governance principles and protocols. This should include regularly updating your data map and purging and decommissioning data as part of its normal life cycle. 

Our viewer poll uncovered that a lack of executive sponsorship (27%) and the need for a proof of concept, or pilot (36%), were the two biggest hurdles in adopting AI or machine learning at their companies. However, given the results of a Fortune survey in 2020, 57% of companies have AI pilots underway or have full-scale deployments, proving these hurdles are not insurmountable. 

To gain an executive sponsor, Van Alstyne recommends, “making sure that you get in front of them and explain the situation and try to get them to understand the risk associated with these large data sets.” Zilm adds that it’s important to bring to the conversation the ROI, the roadmap, and your plan. Cutting to the chase, Zilm shares a method he has found to be effective: “I like to take a lot of case law and say, ‘Hey, here's the case law. Here's how much it cost this company. This is one of our competitors. Do you want to have a fine of $550 million? No, you don't. So give me at least five and I'll do X with it.’” 

Van Alstyne looks at an investment in AI almost as insurance to reduce risk in enterprise data. For instance with a data breach, “just notification plus incident response—not even the claims that are going to come, or could potentially come out of it. So even with that AI is a very small investment compared to the size of the risk that you're talking about.”


Secrets Unveiled

Panelists from the webinar “The Elephant in the Room: What’s Hiding in That Massive Unstructured Data Set?” did answer the title question: within enterprise data lies risk, challenges, questions, nuances, confusion, and complexity. Not the most reassuring answer, but through their experiences, they also provided solutions to help tame and add structure to those “massive, unstructured data sets.” You can catch their entire discussion on-demand here