Estimation versus Decision Making; Thoughts on Asymmetric Cost Functions; Thoughts on Stability; Generalization
I've been thinking a lot about prediction lately because of work I've been tooling around with and had some thoughts I'd like to bounce off people.
The Dilemma
I've been tooling around with more useful ways to generate useful relationships between input and output variables and had the following worry (this is pretty basic; sorry everyone). Least squares minimizes the sum of squared residuals. Normal neural networks also use MSE as their objective function which they try to minimize... through error back propagation instead of through a few simple statistics. The thing about MSE, though, is that it implicitly assumes that positive residuals are just as costly as negative residuals. In other words, if my program predicts that a stock will return 10% over a 3 month horizon, it's just as bad if the stock actually returns 12% as it is if it returns 8%. This obviously doesn't jive with intuition on a couple levels. Not only is incremental loss worse than incremental gain, losses generate more dis-utility than gains generate utility. I honestly don't remember off the top of my head what the actual multiple is, but this is pretty established through controlled experiments. Therefore, I was thinking it is then not terribly accurate to use algorithms which minimize MSE. Why not tweak the cost function to make the residual error condition on the sign of the error, so that one becomes more sensitive to losses? In this way, if one can find sub-segments of ones data which, post-regression, still show up with nice expected returns, one can rest more assured that such trades are indeed good trades to put on, and place your bets accordingly.
Seemed to make some sense. Stunk in practice.
Issues with Asymmetric Cost Functions
The issue with what I just described is that it's combining two things-- estimation and decision making. The regression estimates; the asymmetric cost function adjusts ones decision. It's not right to combine these things, in my opinion.
Standard regression and a traditional neural network are pure estimators of the multivariate relationships within ones data. "Pure" in the sense that given an asymptotically large amount of data, one should theoretically converge to the true stochastic relationship. If one can't, then ones algorithm isn't a good estimator, and it's hard to have faith in the results. Estimation is looking to give its best guess of the current situation, paying no heed to utility or cost.
I tend to think that once the estimation has been done, only then is it valid to begin the decision making process. Now if the relationships in ones data were completely deterministic, there would be no need for decision making. The estimator would leave you with zero residuals, and one could trade away, theoretically, as long as it's legitimate to assume that the future will behave like the past.
Of course in the real world there is stochasticity. One cannot eliminate it. So in my opinion, the proper way to go about incorporating asymmetric cost is to take the original data set, run the best pure estimation algorithms you've got, and base your decision making on the resulting residual plot. Of course when I say algorithms I'm also factoring in not only a neural net and/or regression model, but also the resulting residual analysis, looking for serial autocorrelation and returns-based factors. The whole deal. Net it all out. What I want is a set of residuals which has no trending. I believe that a pure choice can be made on the distributional properties of this set of residuals. If I can safely assume this normalized set of residuals to have some volatility clustering, any other conditional volatility effects, and some other effects which perhaps I can't really explain, then I'm finally getting somewhere with regards to decision making. Maybe I can go back into other datasets and find some possible reason why the process behaved the way it did during its abnormal period, and do my best to generalize that in such a way that I can incorporate some risk of that happening again in some form.
I guess I'm essentially fundamentally splitting the concept of risk from the concept of return and dealing with each entirely separately. Expected returns are what they are. One has used algorithms which hopefully give you unbiased estimates of them. It seems to me that the decision then is simply a function of "risk," with an elementary adjustment for return. That makes sense to me. Much more sense than packing everything into the regression itself. It's sloppy, it doesn't seem to me to be pure, but perhaps I'm missing something.
Stability
Another fundamental issue with prediction is stability. Am I looking at a statistical fluke in my data-set, or is this real? There are of course many techniques which can help give good guidance in this direction like bootstrap and cross validation; the general intuition behind bootstrap being 'well, just how different is this from noise, if I were to make the assumption that I am indeed looking at noise?', and the intuition behind cross validation being 'wait a minute, wouldn't it make more sense to get some feel for how my predictive algorithm does on out of sample data, on average?' Which of course naturally leads to PMSE.
Anyways, one perhaps "dumb" way to increase stability is as follows. Feedback heavily encouraged btw. Take the hypothetical case that I am trying to predict 3 month forward returns. I can of course simply gather the 3 month forward return for all stocks. However this could be a major strain if I am dealing with a small-ish data set and a lot of predictor variables. The problem I see with this is that 3 months is, in some ways, arbitrarily fixed, and it is just one data point among hypothetically tons. The POINT of the regression is in some ways to be able to identify outperformance. Outperformance over 3 months is awesome, but there is nothing special about that number. Therefore, throw in 1 month, 3 months and 5 months, for example, and minimize the aggregate cost function on all three. If the results at 3 months were a statistical fluke, it would be more likely that that stock would then underperform over the two other time horizons. Conversely, if the stock is truly an outperformer, it would have heightened probability of outperformance at 1 month and 5 months as well. By throwing in additional outputs which straddle yours, you stabilize your results, I would think. But I could be wrong.
Generalization
I really think good decision making comes down to properly delineation between risk and return, and how one can go about gaining confidence in one's estimate of the two. It's so simple to say. If only it were easier to implement in practice!
The Dilemma
I've been tooling around with more useful ways to generate useful relationships between input and output variables and had the following worry (this is pretty basic; sorry everyone). Least squares minimizes the sum of squared residuals. Normal neural networks also use MSE as their objective function which they try to minimize... through error back propagation instead of through a few simple statistics. The thing about MSE, though, is that it implicitly assumes that positive residuals are just as costly as negative residuals. In other words, if my program predicts that a stock will return 10% over a 3 month horizon, it's just as bad if the stock actually returns 12% as it is if it returns 8%. This obviously doesn't jive with intuition on a couple levels. Not only is incremental loss worse than incremental gain, losses generate more dis-utility than gains generate utility. I honestly don't remember off the top of my head what the actual multiple is, but this is pretty established through controlled experiments. Therefore, I was thinking it is then not terribly accurate to use algorithms which minimize MSE. Why not tweak the cost function to make the residual error condition on the sign of the error, so that one becomes more sensitive to losses? In this way, if one can find sub-segments of ones data which, post-regression, still show up with nice expected returns, one can rest more assured that such trades are indeed good trades to put on, and place your bets accordingly.
Seemed to make some sense. Stunk in practice.
Issues with Asymmetric Cost Functions
The issue with what I just described is that it's combining two things-- estimation and decision making. The regression estimates; the asymmetric cost function adjusts ones decision. It's not right to combine these things, in my opinion.
Standard regression and a traditional neural network are pure estimators of the multivariate relationships within ones data. "Pure" in the sense that given an asymptotically large amount of data, one should theoretically converge to the true stochastic relationship. If one can't, then ones algorithm isn't a good estimator, and it's hard to have faith in the results. Estimation is looking to give its best guess of the current situation, paying no heed to utility or cost.
I tend to think that once the estimation has been done, only then is it valid to begin the decision making process. Now if the relationships in ones data were completely deterministic, there would be no need for decision making. The estimator would leave you with zero residuals, and one could trade away, theoretically, as long as it's legitimate to assume that the future will behave like the past.
Of course in the real world there is stochasticity. One cannot eliminate it. So in my opinion, the proper way to go about incorporating asymmetric cost is to take the original data set, run the best pure estimation algorithms you've got, and base your decision making on the resulting residual plot. Of course when I say algorithms I'm also factoring in not only a neural net and/or regression model, but also the resulting residual analysis, looking for serial autocorrelation and returns-based factors. The whole deal. Net it all out. What I want is a set of residuals which has no trending. I believe that a pure choice can be made on the distributional properties of this set of residuals. If I can safely assume this normalized set of residuals to have some volatility clustering, any other conditional volatility effects, and some other effects which perhaps I can't really explain, then I'm finally getting somewhere with regards to decision making. Maybe I can go back into other datasets and find some possible reason why the process behaved the way it did during its abnormal period, and do my best to generalize that in such a way that I can incorporate some risk of that happening again in some form.
I guess I'm essentially fundamentally splitting the concept of risk from the concept of return and dealing with each entirely separately. Expected returns are what they are. One has used algorithms which hopefully give you unbiased estimates of them. It seems to me that the decision then is simply a function of "risk," with an elementary adjustment for return. That makes sense to me. Much more sense than packing everything into the regression itself. It's sloppy, it doesn't seem to me to be pure, but perhaps I'm missing something.
Stability
Another fundamental issue with prediction is stability. Am I looking at a statistical fluke in my data-set, or is this real? There are of course many techniques which can help give good guidance in this direction like bootstrap and cross validation; the general intuition behind bootstrap being 'well, just how different is this from noise, if I were to make the assumption that I am indeed looking at noise?', and the intuition behind cross validation being 'wait a minute, wouldn't it make more sense to get some feel for how my predictive algorithm does on out of sample data, on average?' Which of course naturally leads to PMSE.
Anyways, one perhaps "dumb" way to increase stability is as follows. Feedback heavily encouraged btw. Take the hypothetical case that I am trying to predict 3 month forward returns. I can of course simply gather the 3 month forward return for all stocks. However this could be a major strain if I am dealing with a small-ish data set and a lot of predictor variables. The problem I see with this is that 3 months is, in some ways, arbitrarily fixed, and it is just one data point among hypothetically tons. The POINT of the regression is in some ways to be able to identify outperformance. Outperformance over 3 months is awesome, but there is nothing special about that number. Therefore, throw in 1 month, 3 months and 5 months, for example, and minimize the aggregate cost function on all three. If the results at 3 months were a statistical fluke, it would be more likely that that stock would then underperform over the two other time horizons. Conversely, if the stock is truly an outperformer, it would have heightened probability of outperformance at 1 month and 5 months as well. By throwing in additional outputs which straddle yours, you stabilize your results, I would think. But I could be wrong.
Generalization
I really think good decision making comes down to properly delineation between risk and return, and how one can go about gaining confidence in one's estimate of the two. It's so simple to say. If only it were easier to implement in practice!