With more organizations looking at ways to make the most out of unstructured text, there’s an abundance of tools and solutions to address this not-so-neatly packaged data. Are there different aspects to consider when evaluating software for unstructured and structured data? Yes and no. Here are similarities and differences you should know about when considering software.
Software for both types of data
Some of the similarities between technology designed to address both types of data relate to the flexibility and transparency of the analysis – as you answer questions, typically new (and often more complex) questions abound. Ensuring that the technology accepts different data inputs to get the best results is important to making a sound investment.
For structured data, sources can be from different storage repositories, from operating systems, sensors, data stored in matrices, itemized transaction data, as well as transposing from one type to another.
For text data, this means data stored in different document formats, inputs from RSS feeds, data crawled from internal file systems or from social media sites, full documents as well as snippets of text from notes or claims – all can be inputs to analysis. The technology needs to be configurable to accommodate new or changing data sources of both types, adaptable to changes in hardware environments and extensible to address the ever-increasing volumes of data.
As part of any analysis, the initial exploration and interaction with the data, often modifying it with filters, or refining it with the creation of new fields or terms, is a key functionality for any analytic technology purchase. Interactive visualization and exploration methods for both structured and unstructured data both prior to analysis, and in the assessment of results, delineates extensible from niche technologies. And data quality always matters, predominately highlighted in evaluations of social media data (as in the paper ‘Sifting Through the Noise of Social Media’), so being able to filter out what is irrelevant – thereby ensuring a strong analytical result needs to be integrated with the analytic technology and part of standard processing.
Effective analysis of both structured and unstructured data means that any technology investment needs to align with organizational vision and overall strategy in broad terms, in addition to being viable for any specific project defined to demonstrate the tangible value. If you are embarking on adding text analytics capabilities to your organization, check out some tips outlined in the paper, ‘How to Successfully Choose, Develop and Implement a Semantic Strategy’.
Differences in technology for structured and unstructured data
Some of the differences that impact a technology purchase are affiliated with the initial fit-for-purpose design to these different data types. As the chart indicates, structured data has been defined as machine-readable from the onset, actively structured by the operational system or issuing appliance.
Unstructured text data, on the other hand, is defined in a human-readable format that is, for the most part, opportunistic in nature with little or no structure. This means that specialized technology that learns how to read and interpret human language is required. With the nuances of different languages, alternate styles and localizations, technology addressing text data must permit detailed options to be defined in order to provide optimal answers.
And with the goal of getting the utmost from your investment, a combination of discovery algorithms, in addition to user-defined specifications, needs to be included in a single, integrated platform.
A bonus IT outcome from text analytics
Both structured and unstructured data analysis leads to models that can be used to score naïve data for extended insight. Text analytics can also provide well-defined metadata for use in applications more traditionally managed by IT than by the business.
So even if your initial text projects are driven by the need to improve the customer experience, better price products, improve the triage of claims for auditors or decrease the number of fraudulent charges (all best resolved by analyzing both structured and unstructured together, and typically associated with initiatives driven by the business), the same technology can drive measured value for your IT department, which benefits from having more and better metadata.
With better metadata, IT can improve search and retrieval systems by removing the need for retrospective indexing of content. IT can enrich the relevancy of content surfaced to web users and it can better manage federated stores of data. This means that a text analytics technology needs to suit the IT user as well as the business user – for more than deployment activities, but also for model development.
For a deeper look at ways to select a vendor technology that addresses unstructured data with text analytics, read the white paper, Finding the Right Fit: How to Evaluate Text Analytics Software .