How does data become “unstructured”? With the correct processes, policies, and procedures in place, one would assume that data would always be “structured”. This, however, is often not the case. So how does data lose its structure, despite careful planning? Simple—your business is full of humans. Regardless of what instructions they are given, and what procedures and policies you have in place, people are going to unavoidably and instinctively store data based upon their own independent logic. With different priorities and interests, along with varying levels of care and comprehension regarding the data, it would seem that there is no way to stop them from organising the data according to their own judgment.
Welcome to the reality of Unstructured Data
Unstructured data is a complication which is pretty well known. In an attempt to combat it, management systems such as OpenText and SharePoint are often implemented. These systems, albeit slightly helpful, do not address the elusive element of human logic. If the person who creates the data does not comprehend its importance, how can they classify or structure it?
The solution is to optimise this unstructured data, giving it an organised structure. Structured data is data which, at the time of its ingestion, is aligned to a schema (of logic and characteristics) which identifies data based on its various fields and attributes. This ensures that the data is stored in tables and columns, where you want it to live.
The perfect analogy
Take a cat photo for instance. What are the attributes of a cat photo? You could classify it under photo, cats, personal, you, your department, non-classified, animal, and much more. The cat photo is then ingested into the database, according to the Schema and in alignment with those attributes. This means when the data base is queried for any of those attributes, it will locate the cat photo data. This is how a data base can be structured in alignment to a schema.
Metadata
Okay, so what’s next? Organisation, despite human logic, through data descriptions. This can be done by allocating data specific locations in the database for storage, based upon its uploader’s identity. Hypothetically, there could be three folder options: personal, private, and classified. A receptionist’s data wouldn’t be allocated a place in a classified folder, and classified workers’ data would not be sorted into a personal or private folder, because this data is not trivial or personal, like a cat photo.
What is being discussed here is “Metadata”. This is not the data itself, but rather the data that describes the data. Metadata can be updated based upon attributes along the lines of who saved the data, when, and why. For example, at 9 AM on a Monday morning, thousands of MP3 files are saved onto the user network drive, after someone bought a new phone. It could be deduced, safely, that these files are not classified, nor are they business related. It is most likely someone’s iTunes syncing with the network, backing itself up. Should these files, therefore, be saved for long-term archive, or backed up permanently? No. By collecting and reading this metadata, and therefore deducing its importance, the aforementioned data can be accurately structured and stored within the database.
At Digital Sense, we can help you discern the “where, when, and why” aspects of your unstructured data challenges.
For tips on how you can optimise your unstructured data, read our latest blog: How to optimise your unstructured data?
Digital Sense and NetApp for Business Continuity
At Digital Sense, we can help you discern the “where, when, and why” aspects of your unstructured data challenges. To set up a free workshop with our team, contact us today!