Measuring error for "maybe" predictions
In a recent blog post, Executive Vice President John Sall asked, “What is the error if the probability of rain is .5 and it rains?"
Suppose that a mortgage aggregator came to you and said that this triple-A assemblage of loans had one chance in a billion of losing money. Then you evaluated the package and found out that it really had a one-in-a-million probability of failing. The error in the probability estimate was just .000001, not much. So, with so little error in the claimed probability of failure, you ignore the discrepancy. The problem, of course, is that the expected loss in the latter case is a thousand times greater. Sometimes with small probabilities, little errors become big errors.
So what if we were told that the chance of something was zero – that the event was impossible. If you believed that, with no reservation, you would bet your life on the certainty. Now suppose that the event claimed to have zero probability actually happens. What should the penalty be for being wrong? Should it be a dollar for getting the probability wrong by one? Should it be infinity dollars, because it was a lie?
This post is all about accounting systems for events that happen in the face of fitted probabilities that they would not happen. The events are response categories and the categories will be important (whether someone will buy a product or make a fraudulent transaction, for example). We fit models that attribute the probabilities and then we need to find out how well these models predict.
What is the best way to measure how good a prediction is in this situation?
We have always known pretty well how to measure prediction for continuous responses. We use squared error to measure the fit, estimating to minimize the sum of squared residuals. Least squares is the foundation of most of our fitting arsenal for continuous responses. A measure of fit is the R-square, based on the sum of squared errors.
With categorical responses, it’s not so obvious how to measure how well a model fits. What is the model supposed to do? We can think of predicting either as classification, i.e., picking a category we predict will result, or fitting probabilities so that the actual response is generally associated with a larger probability. For weather, the first approach would be to predict that it will rain; the second approach would be to assert that the probability of precipitation is 90 percent.
For a statistician, the latter expression in terms of probability is preferred, because it expresses the degree of uncertainty. If you are calculating gains and losses from planning an event, if you know the probabilities, you can make decisions so that you can maximize your expected gain. If you are planning an open outdoor concert and it is very expensive to have valuable instruments rained on, you will avoid presenting the concert unless the probability of no rain times the revenue of the event is more than the probability of rain times the lost value in ruining the instruments. If the model only asserted that it will rain or not rain, you won’t be able to calculate your expected gain and make the best decision about whether to go ahead with the outdoor concert.
Continue reading this post on my executive blog, bLog-Normal Distribution. I discuss four measures of error and tell you why I like entropy error the most. I also explain why it’s a good idea to hold back data if you have lots of it.
John Sall is co-founder and Executive Vice President of SAS. He leads the JMP business division.