As the Basel Accords continue to drum up attention in the global financial markets, many institutions are looking at how they can strike a balance between capital requirements and competitive advantage. One area of focus is consumer credit risk modelling and scoring - as the more accurate and robust the models are, the lower the risk institutions face. While credit modelling has traditionally been based on linear models, it is becoming more apparent that non-linear techniques (e.g. Gradient Boosting, Neural Networks) can make significant improvements to the accuracy of default models and ultimately support an institution’s bottom line.

## Breaking with tradition

Is it time for financial institutions to break free from using traditional linear models simply because “that is the way we have always done it” and accept the capabilities and advantages that more advanced predictive modelling techniques can bring?

Over the last few years (both during my PhD research and working in the financial sector), I have assessed and developed predictive modelling techniques that are applicable to estimating:

- The
**probability**of a customer going into**default**(PD). - The resultant
**loss**experienced by the company**given**a customer**defaults**(LGD). - The
**exposure**faced by an organisation**at**the point in time a customer**defaults**(EAD).

These three aspects make up Pillar 1 of the Basel Accord which prescribes financial institutions calculations for their minimum capital requirements (the minimum amount of capital they are regulated to hold) and are fundamental in determining how much institutions must hold and how much they can lend out to customers in the form of personal loans, mortgages and other forms of credit.

Under the advanced internal ratings-based approach (AIRB), banks have the ability to provide their own internal estimations to the regulators for each of these three aspects: PD, LGD and EAD. As a rule of thumb, linear regression models are used in the estimation of LGD and aspects of EAD, whereas logistic regression is used in the derivation of PD. A typical distribution for an LGD portfolio, for example, is bi-modal with two large point densities around 0 and 1, with a shallow distribution between the peaks (see figure). In practice, it is common to apply a beta transformation to the target variable, and then estimate this transformed value with a linear regression model.

## In with the new

From the research conducted, however, a vast improvement in the estimation of LGD could be made with a two-stage approach where a neural network model is trained on the residuals of a linear regression model, therefore combining the comprehensibility of a linear regression model with the added predictive power of a non-linear technique. (For a more detailed discussion of the issues related to implementing a two-stage approach for estimating LGD, please see Loterman, et al. 2011).

One of the major fallouts of the 2008 global financial crisis was that regulators clamped down on financial institutions to make sure that both regulators and institutions fully understand their internal risks and can fully prove without doubt they understand their underlying models. The problem with this requirement is that financial institutions have subsequently become more averse to adopting new ideas and ever more entrenched in the ways of the past. They have also spent a huge investment of time and resources catching up with and providing documentation to the regulators. I am all for the stringent controls of regulatory bodies, but I believe there is still room for these financial institutions to think outside of the ‘white’ box and explore other approaches to model development.

There is merit to using linear regression techniques due to their clarity and ease of use, and more importantly advanced analytical techniques need to be fully understood before data is thrown into them. But with the right amount of knowledge and openness to try new ideas, financial institutions could potentially reap the benefits of applying novel analytical techniques (i.e. improved prediction rates, more accurate capital estimations).

The key would be for financial institutions to embrace the potential of using approaches novel to the financial sector that have been proven in a number of other sectors, such as healthcare, fraud detection and marketing (neural networks for credit card fraud detection, for example, have been used successfully in the detection of abrupt changes in established patterns and recognising typical usage patterns of fraud).

This use of innovation for modelling their credit risk portfolios would also encourage institutions to not fall behind in other sectors in the use of novel analytical techniques, as well as challenge the regulators to show that advanced analytical techniques can in fact lead to better models and better estimations of risk.

For links to papers I’ve written on the area of applying SAS based analytical modelling techniques in the financial sector please see:

- Brown I (2011), Regression Model Development for Credit Card Exposure At Default (EAD) using SAS/STAT® and SAS® Enterprise Miner™ 5.3,
*SAS Global Forum 2011*, Conference proceedings - Loterman G, Brown I, Martens D, Mues C & Baesens B (2011),Benchmarking Regression Algorithms for Loss Given Default Modeling,
*International Journal of Forecasting*, In press - Brown I & Mues, C (2012) An Experimental Comparison of Classification Algorithms for Imbalanced Credit Scoring Data Sets,
*Expert Systems with Applications*, Volume 39, Issue 3, 15 February 2012, Pages 3446-3453

*NOTE: Originally published on SAS Voices.

## 4 Comments

Iain,

Great post – you’ve made some very good points about the need for innovation. On a cautionary note though, it isn’t always a matter of finding a method that performs better. The modeler must also take regulations into account. Two US examples: in credit granting, neural networks may outperform a scorecard, but fail the test of being statistically sound under Reg. B. Also, a non-linear model may do a better job of estimating loan loss reserves, but from a regulatory view, the model must conform with GAAP, which provides for a straightforward historical averaging method over a set number of months.

In effect, in the wake of the financial crisis, less intuitive, less familiar, more complicated algorithms are less in favor than more common sense, simple and familiar methods. Innovation – including repurposing what works in other industries – may pose significant risks.

Clark

Hi Clark,

Thank you for your comment. You are of course right, the ability to interpret model parameters is key, and this must be taken into consideration when formulating any model wishing to be implemented, even more so in a highly regulated environment such as the financial sector. However there is currently some very cutting edge work being carried out both in academia and industry with these concerns in mind. For example, a paper by Van Gestel et al (2005) in collaboration with a large European bank, highlighted the use of a non-linear kernel based modelling technique (Support Vector Machines) applied in a two stage approach with logistic regression for the development of a rating model. In their study it was shown that good model readability with improved model performance could be achieved through a two stage implementation. I am also of the understanding that this approach was taken forward into practice by the bank and accepted by the respective regulators.

I am of the hope therefore that the potential exists with the combined approach of the financial sector adopting new ideas and also working closely with the regulators to bring them to fruition,

– Iain

I’m looking new method for several years and I think the banks and credit risk need them. I’ve develop internal rating system for 10 years and it’s 9 years that I’m using logistic regression (the first year I’ve still used discriminant analysis!). I’m very concerned with new statistical method, hovewer I’ve seen, in my experience, always the same advanced methods (neural networks, genethic algorithms, support vector machine and so on). So I have a question: are the new methods real new ones? I think that also these advanced method are not grown up so much. It’s changed the software and the hardware so there aren’t computational problem but I think the real problem is the complexity. We don’t need more complexity but we need more performance for getting a competitive vantage. In my job another big real problem is the data mining. We have terabyte date and we can work out them but we have lost the variable that I call “information”. When we will have database more meaningful and strong we will be able to enhance complexity with more sophisticated statistical method. We could explain the methods and the vantages to use them. Meanwhile I like investigating the added value we get using an advanced method instead a traditional one, like logistic regression for example.

Thanks for your post

Dario

Welcome Dario and thank you for your post.

The crux of your comment is complexity, and I agree this is an issue which must be dealt with carefully. Advanced modelling techniques will never add value without the full understanding of their practicalities and implications. However, as I allude to in my post and comment, there is potential scope for adaption of more complex analytical techniques, in combination with simpler more industry standard concepts (i.e. logistic regression). Through this process a higher performance can be achieved without adding adverse levels of complexity. I would be interested to know, what are your thoughts on approaching internal ratings systems with a two-stage modelling method?

You also raise a separate, but pertinent, issue with regards to data mining. There is the potential to conduct data mining visually, i.e. with Enterprise Miner, so that the gems of information can be gleaned from the data. Then the results of the mining process can be illustrated to business managers through visualization solutions (VDD, JMP, AMO, etc.). I also recommend a recent interesting blog post on the issues of big analytics through a data mining perspective: http://blogs.sas.com/content/sascom/2011/12/14/the-promise-of-big-analytics-from-an-analyst%e2%80%99s-perspective/

Thank you again for your comment,

Iain