4 Steps to Solve the Unstructured Data Problem


In 1998, Merrill Lynch stated that most data stored in an enterprise is unstructured and estimated to be as high as 80%. This number may have been a bit anecdotal at the time with only a few parties accepting this number unequivocally. Though this number remained unverified, some sources suggested that the actual number may  indeed be close to 80%.

Fast forward to 2020. IDC and Dell EMC predicted that by this year, there will be an increase of 40 zettabytes of data. Furthermore, IDC and Seagate reported that by 2025, the global datasphere will grow to 163 zettabytes and most of this data will be unstructured.

What do we observe from the above metric? Before making the deduction, we need to elucidate what ‘unstructured data’ means in the context of an enterprise. Unstructured data does not have a predefined structure and is usually written and presented in a free-flowing manner. The data could potentially include documents such as employee information, insurance policies, travel papers, legal contracts, agreements, invoices etc.

Making sense of this information stack to bring out themes and trends requires time and a huge effort on the part of the organization. As most of this data comes in as text, the language is ambiguous, and key messages buried in text data are not easy to discern or process. Also, as the merit remains in combining text data with structured data in decision-making contexts, the analysis of unstructured data remains a challenge.


Vic Gupta

Senior Vice President – Digital & AI

Coforge Limited

