Using analytics to predict fraud

Irish Tax and Customs tax officer describes how data mining fits in

Today, governments and their public sector agencies everywhere are under pressure to perform more efficiently and effectively; essentially, doing better with less. Tax and customs authorities are no exception, many of which are dealing with decreased resources and ever-increasing risks, often in difficult economic circumstances.

Traditional methods for addressing risk have served many authorities well, but there is now a need to use more advanced methods to combat fraud, error and waste. To arm themselves for this battle, more and more tax and customs authorities have turned to data mining and analytics to improve their business processes, resulting in better compliance with new rules and regulations and better customer service.

Ultimately, it is the taxpayers and citizens who will benefit the most if the public sector adopts data mining as part of its day-to-day business. So if analytics can help to reduce fraud, error and waste, then the taxpayers deserve nothing less.

Duncan Cleary
Senior Statistician in Revenue for Irish Tax and Customs

So where does data mining fit into the risk analysis toolkit of a tax authority? Business rules aimed at detecting risk – and the intelligence gathered from differing channels – have their place and can be effective. Add data mining to the mix and you've got a powerful combination to prevent and detect fraud and error.

Data mining can be defined as the application of the scientific method, including statistical analyses, to large amounts of data to uncover valuable information from that data. It can often detect patterns in data that cannot be recognized manually, as well as make predictive estimates of outcomes of interest, such as the likelihood of a tax return containing errors.

Broadly speaking, there are three types of data mining that can be used to combat fraud:

  1. Supervised techniques (also known as predictive analytics), where a target is predicted.
  2. Semi-supervised techniques, where some business knowledge can direct the analyses.
  3. Unsupervised techniques, such as segmentation, which are exploratory.

Fighting fraud with predictive analytics

By creating a predictive model, predictive analytics uses a specific set of data that contains known outcomes for a particular target. This target could be likelihood to yield if a case is audited, the likely amount of yield, the likelihood of a business failing, the likelihood of a claim for benefits or refunds being fraudulent, and so on. Models perform better where the target has been clearly defined. Techniques for creating predictive models are numerous, but it is often hard to beat the well established warhorses: logistic regressions, decision trees and neural networks.

The real power of predictive models comes from their ability to score new cases against some target of interest, even if these cases or events have never been previously evaluated. Cases can be ranked in descending order of priority and worked according to resources and the severity of the risk. Feedback is critical to evaluating model performance, and improving models is an iterative and cyclical process. Feeding information back into the model will help reduce the number of false positives (false alarms) over time, as well as reducing the number of actual bad cases escaping attention (false negatives).

Many tax agencies are now using these predictive techniques, in conjunction with their other tools such as business rules and intelligence, to prevent and detect fraud and error. Some have even deployed these techniques in real time in their live transactional systems, including the Irish Tax and Customs authority.

Exploring fraud with unsupervised techniques

Unsupervised techniques can be a powerful means of understanding your case base. Often there is so much data available that it is difficult to understand the underlying structure of the population without using such methods as cluster analysis and segmentation. Especially if a target is not available, cluster analysis can help identify groups in the population that are alike within the group but different from members of other groups.

Once segmented, cases can be assigned a group membership. This label can be used to determine treatment strategies, identify service channel options and even monitor the effectiveness of a tax authority's efforts to change taxpayer behavior over time.

Combining methods: semi-supervised techniques

Additional insight can be gained from overlaying outputs from unsupervised techniques with supervised techniques and vice versa. Some segments might emerge as inherently more risky, thus semi-supervised techniques are often useful where some business knowledge can be used, even where only minimal training data is available.

Outlier detection, where anomalies are identified and investigated, also can be an important weapon in the fight against fraud. With any population, there will always be anomalous cases, many of which will be perfectly legitimate. However, some cases may point to fraud, and these can be identified and investigated.

The network view of the case base is also becoming increasingly important in detecting fraud and error, with group risk and risk propagation through a network of connected entities becoming increasingly easier to detect using techniques such as social network analysis. Unstructured data – including text, voice, image and spatial data – has just begun to be used for fraud detection on a large scale, and its importance and usefulness will undoubtedly grow in the future. Since tax authorities have the unusual position of having population data, not just sample data, there are few limits to how data mining techniques can be used to improve their performance.

Making analytics a core part of your processes

So what is there to prevent the use of data mining techniques in a tax authority? There are many potential obstacles: lack of good quality data, data that has not been integrated or merged, lack of skilled resources, lack of senior-level sponsorship, IT challenges and cultural challenges.

Do not let these – or other issues – stop you from using advanced analytics to detect fraud, or stop you from using data mining to improve an agency's performance. Starting with a small achievable project with a clearly defined goal can often be the first step on the path to success. The results do not need to be spectacular, but if they show how data mining can add value and potentially reduce fraud and error, then the case will be made and analytics can start to become a core part of the agency's business processes.

Ultimately, it is the taxpayers and citizens who will benefit the most if the public sector adopts data mining as part of its day-to-day business. So if analytics can help to reduce fraud, error and waste, then the taxpayers deserve nothing less.

Bio: Duncan Cleary is a Senior Statistician in Revenue for Irish Tax and Customs. He specializes in the use of research and analytics methodologies and their application in the Irish Tax and Customs Authority, including the use of predictive analytics, customer segmentation, risk analyses, large scale surveys, evidence-based decision support, social network analysis and real-time risk.


Like many tax agencies, the Irish Tax and Customs Authority needed an affordable solution to predict and prevent fraud.


SAS® Fraud and Improper Payments


The Irish Tax And Customs Authority used SAS along with traditional fraud detection methods to reduce fraud and ultimately reduce costs to the Irish taxpayer.

The results illustrated in this article are specific to the particular situations, business models, data input, and computing environments described herein. Each SAS customer’s experience is unique based on business and technical variables and all statements must be considered non-typical. Actual savings, results, and performance characteristics will vary depending on individual customer configurations and conditions. SAS does not guarantee or represent that every customer will achieve similar results. The only warranties for SAS products and services are those that are set forth in the express warranty statements in the written agreement for such products and services. Nothing herein should be construed as constituting an additional warranty. Customers have shared their successes with SAS as part of an agreed-upon contractual exchange or project success summarization following a successful implementation of SAS software. Brand and product names are trademarks of their respective companies.