Just about everywhere you turn there is unstructured data. Emails, Web and social content, survey responses and customer call center recordings – the list goes on. Organizations make sense of this data through various ways, including the tireless job of reading through it, or hiring consultants whose jobs are to decipher its meaning. Others are applying a particularly innovative element of business analytics, text analytics, to validate the structured data that is often perceived as gospel. In several recent studies, we’ve found that the text and the structured data don’t match.
This is best illustrated through a couple of examples. An online payment company sought to analyze sentiment of survey data. The company concluded that one of its top 10 problems identified in the survey every month was issues with passwords. Customer service representatives (CSRs) were encouraged to categorize the top issue that was described in each survey. They had a choice of approximately 300 “reason codes” to select. The CSR could choose only one reason code, when in fact there could be several issues described in the survey. Statistics for the evaluation time period identified “password” issues 2.62 percent of the time.
But after using text analytics – more specifically content categorization – to identify password issues described in the call center notes, specific mentions of passwords (or abbreviations or synonyms for password) along with password-type functions (reset, change, update, retrieve, set up, setup, forgot, etc.) were automatically detected.
The automatic classification found that more than 50 percent of the surveys that were coded by the CSRs as password issues were incorrectly classified. It also identified other surveys not originally classified as password issues as having specific problems related to passwords. The total percentage of surveys that actually discussed password issues was only really 1.14 percent.
There is no doubt that choosing from more than 300 reason codes was a contributing factor to incorrect categorization. Perhaps CSRs were just selecting “passwords” rather than sifting through the entire batch to find the best reason code.
At 1.14 percent, passwords weren’t a top 10 issue at all. The original objective of this analysis was to prioritize resources and take actions based on the top issues (issues that were likely to be associated with negative sentiment). The structured data did not give the company a true picture of the top issues at all, and further analysis based on incorrectly categorized data would likely lead to improper conclusions. Deriving the real issues based on the text proved to be a better strategy than relying on the reason codes input by the CSRs.
A second example involved an auto manufacturer that sought to identify problems with a car model component. It used content categorization to analyze the data, and it immediately developed a rule for crashes and fatalities. When the manufacturer cross-referenced this data with the National Highway Traffic Safety Administration (NHTSA) reports, however, it discovered a significant disconnect. The auto manufacturer’s number of crashes and fatalities was only 10 percent of that reported by NHTSA – and it turns out that the manufacturer’s number was more accurate.
The manufacturer explained that when consumers fill out reports with NHTSA, they complete a freeform text box about their situation, but they can also select four check boxes: fire, fatality, crash and injury. Many consumers check all four boxes even if none of those actually happened because they think their (severe) record will get more attention. Those deceptions add up, inflating the numbers that NHTSA reports. Once more, the text provided a better means to uncover the true content within the data.
Utilizing text analytics can add value to all sorts of traditional structured data analysis by providing factual information regarding why and how something occurred. An effective way to bring this analytic capability into your organization is by using it to validate (or invalidate) existing structured information sources. The old saying “garbage in, garbage out”continues to hold true. Text analytics can be a low cost, effective means to examine the validity of your data – helping organizations focus and align resources to the proper issues for business improvement.
Read more about text analytics in the paper, What Are People Saying About Your Company, Your Products, or Your Brand?