When Joan Morgan got a delinquent tax notice for her Wake County property taxes, she was surprised. She had an escrow account with her mortgage company, and they were supposed to pay the tax bill from the escrow.
“After many calls to my mortgage company, a lot of time on hold and being transferred from one person to the next, I finally got someone who told me that ‘the funds were disbursed.’ I was relieved. Until I got my second delinquent tax notice and had to start the process all over again.”
Unfortunately, the mortgage company continued to put Morgan off. Her final recourse was to submit a complaint with the Consumer Financial Protection Bureau (CFPB), a government agency set up to protect consumers from unfair, deceptive, or abusive practices and take action against companies that break the law. The CFPB reached out to Morgan’s mortgage company on her behalf -- and got the issue resolved.
The CFPB sends thousands of consumers’ complaints about financial products and services to companies for response. Since the agency was established in 2011, they’ve handled more than 1.2 million complaints and are responsible from $11.9 billion in recompense to individuals who’ve submitted complaints.
Download free SAS Global Forum paper to learn more
For an in-depth description of how applying text analytics and machine learning can help assess consumer financial complaints, download this paper from SAS Global Forum.
Keeping up – and doing more with all that data
But given the exponential increase in complaints year over year, how can the CFPB not only keep up, but help more people with the wealth of data they have? For instance, is there a way for them to quantitatively assess the data for various trends? Is there a better way to discover the areas of greatest concern for consumers and help address those problems on a macro-level, before they become unmanageable?
“Adding more readers for a manual analysis of the text is not the answer,” says SAS’ Tom Sabo, who has explored the problem at length. “First, unless very specific standards are adopted, the method that one reader uses to address and tag a complaint can be quite different from the method a second reader uses. Scale this difference up to many readers, and you have many different, qualitative interpretations of the textual data.”
Reader fatigue is also a problem, points out Sabo. If you’re responsible for reviewing over 100 complaints a day, the way you assess the first 10 is not going to necessarily receive the same detail as the last ten.
“And suppose a trend is uncovered, and readers are told to go back and retag all the data from the past year with this new trend,” says Sabo. “This is a case where manual analysis doesn’t scale, and a simple search operation for a trend pattern won’t be sufficient.”
Adding more readers for a manual analysis of the text is not the answer. Tom Sabo Principal Solutions Architect SAS
Applying text analytics and machine learning
Sabo had a solution: Apply SAS technology to assess consumer complaints. He applied text analytics to publically available CFPB data to explore sentiment in consumer complaints, and he used machine learning to model the natural language available in each free-form complaint.
The benefits are many. Each record has already been tagged with a disposition code, denoting the action taken by the organization against which the complaint was filed. “Applying machine learning against the free-form comments associated to these disposition codes can semi-automatically generate a taxonomy highlighting key issues that the CFPB deals with,” explains Sabo.
“Interactive reporting allows the analyst to explore the pre-existing data for complaints enhanced with the sentiment and rules that are generated from text analytics,” says Sabo. That allows the analyst to sub-divide and prioritize avenues for exploration, guided by the relative levels of sentiment toward each of the categories. This drilldown report also includes a time series line chart so that analysts can observe trends over time.
“With this information, we can uncover trends surrounding the actions taken. For example, what were the defining characteristics of complaints where the organization in question paid monetary compensation to the individuals filing the complaints vs. complaints that were simply closed with an explanation?” says Sabo.
How this analysis could have saved us all a lot of time and money
To demonstrate, Sabo pulled a segment of complaints from March 2015 to Oct. 2015 and ended up with 37K complaints for his analysis. Using text analytics, he specified his category as ‘company response to consumer.’
“I wanted to see if there were specific phrases and terms that, when put together, differentiated those complaints that received monetary relief,” says Sabo. “As opposed to a method where I go in and write down the rules after some kind of manual review, this is where machine learning, in a very short amount of time,does its magic and identifies patterns without the input of business rules,” says Sabo. It leverages the subject matter expertise inherent in the way each complaint was encoded via the disposition code.
Sabo’s high-level results for complaints that received monetary relief uncovered frequent use of the term “good faith estimate” (GFE). A GFE is a document that breaks down the approximate payments due upon the closing of a mortgage loan.
From this, an analyst could infer that that lending organizations might be taking advantage of the complexity of good faith estimates and using them to misrepresent or hide fees.
“In cases like this, text analytics can quantitatively depict that there’s a practice by lending institutions that’s likely being abused or misused, and provide the opportunity for an overseeing organization like the CFPB to take action,” says Sabo.
When Sabo drilled down into the complaint data over time, he saw that in September 2015, the GFE complaints started to flatten out.
“In October of 2015, Congress directed the CFPB to revise the GFE so that costs were more transparent to the consumer, and therefore more difficult for financial organizations to misuse,” says Sabo. “The question remains, if analytics had been part of the assessment, could this ultimate decision to protect the consumer have been reached sooner? Regardless, a quantitative analysis like this one backs up the decision of Congress and the CFPB to do away with the GFE.”
While text analytics and machine learning would certainly help an overseeing agency like the CFPB account for growing volumes of data and better protect consumers, there are interesting broader applications.
“I’ve applied this methodology to tip line data, to surveys, to medical encounters -- for instance, after a hurricane, there are a number of stand-up clinics (in the absence of functioning hospitals) that provide assistance and record what people are experiencing post natural disaster. There’s a lot that we can do with that data to determine the type and quantity of materials needed at the clinics to ensure medical needs are met for survivors,” says Sabo.
“Similar analysis could also improve the ability of epidemiologists to catch and fight infectious disease outbreaks early on, and of public health researchers to identify prescription drug users at-risk of overdose,” says Sabo. “The possibilities are endless.”
- Article Optimizing well placement to eliminate water poverty How data visualization is helping Water for Good bring fresh water to the Central African Republic.
- Research Nerd in the herd: protecting elephants with data scienceA passionate SAS data scientist uses machine learning to detect tuberculosis in elephants. Find out how her research can help prevent the spread of the disease.
- Article Analytics tackles the scourge of human traffickingVictims of human trafficking are largely invisible. And they're all around us. Now organizations are applying analytics to combat the problem -- and are achieving initial success.
- Article #Data4Good: Treating cancer, one patient at a timeDrawing on her experience as a cancer patient, Susan Weidner devotes her career to helping oncologists identify personalized treatments based on massive amounts of data.