Demystifying AI Series: Unstructured Data

Demystifying AI Series: Unstructured Data

“Every block of stone has a statue inside it, and it is the task of the sculptor to discover it.” In this way, Michelangelo explained the creative process that led him to turn rough chunks of marble into some of the world’s most prized sculptures. Rather than transforming the raw materials into something new, the artist subtracted the unnecessary rock to reveal that which lay hidden inside.

In much the same way, artificial intelligence systems chip away at unstructured data to uncover the insights within. AI works upon data like a sculptor works on stone. At first glance, the data seems messy and confusing like a rock seems only a rock. However, by skillful application of hammer and chisel, AI can cut away at unstructured data to discover new, valuable truths.

To better understand unstructured data, let’s compare it to structured data. The latter is easy to deal with; AI algorithms readily consume it and mine it for insights. Machines are already great at understanding this highly organized data type, as programming languages like SQL (Search Query Language) have excelled at managing databases since IBM invented it in the 70’s. Essentially, these relational databases contain phone numbers, zip codes, names, dates, or many of the other fields you’re used to seeing in Excel spreadsheets. Since they’re already organized, machines can immediately get to work on them, as if a sculptor could put together a human figure out of prefabricated body parts.

Carving David out of a solid block of marble over 14 feet high, on the other hand, requires more effort and greater genius. Similarly, unstructured data is a bit trickier for computers to parse because it takes more work before an AI algorithm can consume it and provide meaningful insights. Since the data doesn’t have any predefined data models or schemata, the machine simply doesn’t know where to look. Here are some examples of unstructured data:

  •     Text documents
  •     Emails
  •     Instant messages
  •     Photos
  •     Audio and video files

People are great at understanding these things, but machines are not. However, as 80-90% or more of all organizational data is unstructured, there simply aren’t enough hours in the day for people to look through everything for insights.

If we’re going to mine these vast quantities of data for golden nuggets, we’re going to need artificial intelligence to make sense of it for us.

 

Structure for Understanding

When we look closely at our own understanding, we find that structure is vital. Take, for example, these very words. Our understanding relies on their adherence to grammatical structures. Even something as small. as a misplaced period can lead to temporary confusion. Syntax has a huge effect on meaning, as it’s responsible for the difference between ‘walking the dog’ and ‘the walking dog’.

Or consider driving a car. The rules of the road alongside sensory cues ranging from stop lights to yield signs to a beeping horn let us navigate this complexity. When new elements emerge (a merging car, an animal running into the road) our brains rapidly assimilate this information and enable our reaction in real-time.

So, while both language and driving rely on structure, neither is technically considered structured data. This brings us to our first key takeaway: unstructured data can (and often does) have structure. We call this “underlying structure.”

What we need to keep in mind is that we’re talking about structure in two different senses of the word. Structured/unstructured data is a technical term that refers to the way data is stored on a computer. On the other hand, structure in the mind is one of our sharpest tools for making sense of the world.

Our goal, then, is to build a machine that can sift through unstructured data, recognize patterns, and uncover the kind of structures that befit the understanding.

 

Your Organization on AI with Unstructured Data

As organizations look for more ways to convert data into value, we’re left with no choice but to turn to unstructured data. Although there are still opportunities to refine analytical techniques for structured data, the real wealth sits locked away in the unstructured data.

Unstructured data use cases range from marketing intelligence to customer analytics, but we want to take a moment to talk about privacy. With strict regulations like GDPR and CCPA entering the picture, enterprises face an intractable problem. They have tons of data containing personally identifiable information (PII), but, since most of it is unstructured, they can’t run queries on it or search for keywords. This means that organizations face difficulties when it comes to both utilizing that data and protecting individual privacy.

Now imagine that we’re able to build a machine that understands PII at a fundamental level. It’s not looking for specific tags, but, just like you or I know what PII means, the machine can figure it out and sort the data accordingly. The result is faster, more efficient, and more reliable compliance.

And that’s our mission at Text IQ. We add a layer of intelligence to human generated unstructured data so that you can unlock the structures hidden within your emails, chats, and the like. Just as a sculptor discovers the statue within the stone, our technology sculpts unstructured data into powerful insights. To find out more about how our AI brings these structures to light by creating a layer of intelligence, contact us.