Overcoming the Challenge of Unstructured Information
According to AIIM research, 75% of the organizations we surveyed view digital transformation as “important” or “very important” to their organization. Survey respondents point to techniques like advanced data capture, machine learning, and process automation to provide the powerful potential to reengineer and improve core business processes.
The trouble, however, is that the majority of information capture and content management solutions on the market have been built to work with highly-structured and pre-determined information and workflows. Feedback from our AIIM community of practitioners tells us that working with unstructured information is one of the biggest barriers to digital transformation.
Structured and Unstructured Information
So how can you begin to overcome the challenge? One place to start is to differentiate between structured and unstructured information clearly. We can do this in several key ways – and we need to because how we manage information is significantly influenced by whether it is structured or unstructured.
Structured information has a fixed structure, hence the name, and refers to information that consists fundamentally of columns and rows of data in a table, or spread across several or many linked tables. A spreadsheet is a simple example of this. A form could also be considered structured insofar as the purpose of most forms is to gather the information that is then put into this sort of structure. Most structured data is stored and managed in a database. In fact, most information repositories are a combination of this sort of structured data and someplace to store the binary files associated with them.
In contrast, unstructured information is much more variable both in format and in content. Consider a contract, or a project initiation document, or a personnel review. Each of these simple examples might be created or captured in a variety of formats. While each might have some rules that guide their content, all of these documents will vary greatly in terms of their form, format, content, and context to the business.
Capturing Structured and Unstructured Information
Capturing structured information is accomplished in several ways. Data can be input manually or extracted from structured forms. It can also be extracted through some sort of structured output from another system – for example, an HR or accounting application. There may be some requirements to transform the data from one syntax to another, but structured applications in the form of databases are designed to ingest structured content and apply appropriate access controls, business rules and logic, and lifecycle management.
But capturing unstructured information is more challenging. A common example is an email. Email may appear to have structure and context, as it is addressed to people and sits in an inbox, or maybe in a filing category in the inbox or private folders. But the emails in a user’s email system are not controlled. There are no rules to the retention and disposition of the information over time.
Current practice is usually to send emails to those who need it, and more often than not, also to those who may only be interested in the content. This creates many copies and reduces the likelihood and possibility of control.
Effective information management provides a clear policy and structure, and the ability to capture and save all types of unstructured information so it can be protected, retained, and searched.
Facing File Formats
How we capture and manage unstructured digital information is closely tied to the file format used to store it. Most organizations don’t give much thought to the file formats used to store their information – and this can cause problems in the short and long term. Many file formats are highly proprietary and can only be manipulated using a specific software application, or even a specific version of that application. When formats are less proprietary, such that more applications can interact with them, the resulting files may not be 100% compatible with each other and with every application.
A better approach is to determine the appropriate file formats for creating and/or capturing information based on a number of factors. Who is the intended audience? Are there any specific regulatory requirements to maintain information in a certain format, or in a non‐proprietary, open, or standard format? And perhaps most importantly, what’s the value of the information over time?
We believe that every organization should be on a digital transformation journey. Getting to your destination of innovation, efficiency, and process improvement will require that you form thoughtful strategies to better understand and manage unstructured data. It will take strong executive support as well as focused technical expertise to get it done. And look for providers and partners that have the right mix of capability, experience, and vision to help you make the most of your efforts.