Want more Insights from SAS? Subscribe to our Insights newsletter. Or check back often to get more insights on the topics you care about, including analytics, big data, data management, marketing, and risk & fraud.
Don’t let data management issues derail your insurance fraud analytics project
By Dan Donovan, Principal Solutions Architect, SAS Security Intelligence Practice
Access to high-quality data is a major prerequisite for any insurer looking to undertake a successful insurance fraud analytics project. You should begin the process by evaluating the amount of data you have available, the quality of the available data and your ability to access that data.
A common concern is: How do we access and provision data of a sufficient quality to create predictive models and use additional analytic methods, such as anomaly detection and social network analysis?
Many insurers are still operating in an environment of legacy applications that are supported by multiple, siloed databases that don’t integrate with one another. Useful claims and investigation data may be captured in spreadsheets and Microsoft Word documents that are stored on shared servers. Here, the data’s existence is unknown to most and the important data is useable by only a few. Valuable third-party data sources, such as medical billing data and industry watch list data, may not be getting captured and stored by the insurer in a useable format for analytics.
Challenge No. 1: Entity resolution
Data access and integration are the first challenges in insurance fraud analytics. Once a company has accessed all of its data, how does it determine what entities the data relates to? Insureds, claimants and providers may have separate profiles within the multitude of databases an insurance company maintains. It’s highly likely that there will be inconsistencies between the data in these multiple systems. For example, names may be spelled incorrectly, personal identifying information may be entered erroneously or several different addresses might be entered for the same individual. Being able to identify an individual across all of an insurer’s available data – and resolve what appears on the surface to be multiple entities into the one true entity that all of this data represents – is the second part of the data challenge.
Challenge No. 2: Incorporating unstructured data
Finally, there is the issue of unstructured text data. This is a valuable data source for analytics that most insurance companies have an abundance of in the form of loss description notes and claims notes. Some estimates indicate that as much as 80 percent of an insurance company’s data is in unstructured text. If insurance companies are not tapping into this rich data source, they are leaving a vast amount of important data out of the analytics process. The challenge here is accessing this data in a way that it can be effectively used for analytics; this is not a simple process.
So how does an insurance company overcome these data challenges? By applying the following four best practices, insurers can position themselves to have accessible and clean, high-quality data they can use for both fraud and claims analytics.
- Integrate data silos. To effectively apply analytics, insurance companies need to integrate data that resides in multiple internal-source systems as well as external third-party databases. That doesn’t mean a company has to create a centralized data warehouse environment to capture this data, but it should have the tools and processes in place for capturing the required data from the source systems and pulling it together for analysis.
- Implement an entity resolution process. When an insurer has an individual with profiles across multiple systems, it needs to be able to identify that as the same person and resolve different data variations into a single entity. There are two options for accomplishing this objective that insurers should explore. The first is business rules, which allow you to resolve an entity by matching on a set of consistent variables such as last name, SSN and address. While effective and easy to implement, entities are not going to match up as consistently as they would if you took a more statistical approach. The second and more effective (but more complex) option is probabilistic matching, which uses statistical analysis to resolve entities based on all available data variables.
- Improve data quality. Insurance data used for fraud analytics often has problems with both accuracy and completeness. Insurers can achieve significant improvements in data quality through data format standardization, data entry automation through predefined drop-down selections, and implementing standard work processes and quality control in regard to how data is entered into source systems. If you improve the quality and accuracy of the data going into your source systems, it will reduce the data cleansing effort required when extracting it for use in fraud analytics.
- Develop a process to take advantage of unstructured text. As mentioned earlier, insurance companies could be leaving as much as 80 percent of their data out of the analytics process if they don’t find a way to use unstructured text data. Utilizing this data can significantly improve fraud detection rates for insurance claims. This can be a complex process and is unfamiliar territory for most insurers, so the recommendation here is to start slowly with some simple business cases. One of the most basic approaches is to use watch lists of known fraudulent service providers and search for their names in the claim notes. From here you can progress to more sophisticated techniques that involve not just looking for keywords, but also the context in which those keywords are being used.
It’s been estimated that 50 percent or more of the work involved in a fraud analytics implementation goes into the data management activities outlined above. By committing to and implementing the right data management plan, you greatly increase the likelihood of a successful insurance fraud analytics project. You don’t need to solve all of your data issues in advance of undertaking your project, but by focusing on the four best practices I’ve discussed, you can improve the access to and quality of your high-priority data feeds. This will allow you to successfully build and deploy fraud models that improve fraud detection rates, decrease false positives and lead to better investigation outcomes.
Dan Donovan is a Principal Solutions Architect in the Security Intelligence Practice at SAS. He provides pre-sales and business consulting support for software sales activities focused on fraud detection, claims analytics and investigation management solutions for the property, casualty, life and disability insurance markets in the Americas.