Exclusive vs Inclusive; Thoughts on Model Building
This is a work in progress. I actually disagree with some of what I say below. I think working on a trading desk, trying to piece things together as a trader would, is what pushes in-house models to be more complex than less. Humans are great at capturing some forms of weird idiosyncrasy. That naturally causes the models they would 'like' to build in a very complicated direction.
That being said, there is a world of difference between models which attempt to reach very specific conclusions and then expand, and models which start by making very sweeping, broad statements and over time becoming increasingly granular. Perhaps the market in question and the granularity of your data determing to some extent what the "optimal" problem solving paradigm is.
Thoughts on Quantitative Trading
Being able to identify homogeneity in the financial markets seems to be a driving concept when doing quant trading. Classification and homogeneity are two sides of the same coin-- if all securities in the financial markets were unique, all being driven by uncorrelated processes, it seems that you're shit out of luck. There's no way to build a trading system which makes buy and sell recommendations based on cusip (well, perhaps... there is actually some homogeneity here too (image placeholder)); we're in business when we can find ways to classify securities in some way. A useful classification is able to identify things which tend to trade the same way-- and of course when two things trade the same way, we quants would call a proper long-short of the two a stationary, mean reverting process (this, by the way, is the essence behind cointegration-optimal hedging and indexing).
So let's assume for a moment that the goal is identifying homogeneity in some way, shape or form in the financial markets. Where the hell do you begin. I believe you begin by making the decision of whether or not to adopt an inclusive or exclusive paradigm.
The inclusive paradigm, which seems to be the most popular (perhaps because it relies on the least granular information?), is to identify very broad trends in the market. For example, there may be tens of thousands of stocks trading right now, but if I were to bucket them into capitalization-based deciles, trends begin to form when looking at one-year-forward expected returns. In other words, broad-based homogeneity begins to surface. At that point, we may attempt to identify what we consider to be "the next best classifier," which would then split the deciles into subdeciles, each of which is then even more homogeneous. I bet a lot of people have made good money adopting this paradigm, and to be honest, it's the paradigm I personally have had the most experience with up until this point.
But inclusive classification has many downsides which aren't entirely obvious. First of all, the sometimes extreme level of broadness makes it all the more difficult to identify what classifier is indeed the 'best'. Second of all, inclusive classifications tend to carry with them longer time horizons, which aren't necessarily able to be traded on by desks or funds which need strong enough mean reversion to ensure them a decent probability of success over shorter time intervals. That being said, there are some serious benefits to a proper long-short-based inclusive classification trading strategy. Most notably, as long s one is dealing with securities that have less dimensionality—less complexity—than others, the value of this paradigm IMHO improves dramatically. The reason is because there is so little one needs to then control for. It makes some sense, then, why this seems to be the sort of paradigm from which most ETFs have been created. They strip away idiosyncratic risk as much as possible, they can carry with them lower transactions costs, and retain the ability to expose you broadly to the form of risk you’d like to be exposed to.
But the same isn't really true of other forms of securities. Most securities, in fact, are extremely complex when you think about it. Take municipal bonds, for example. While it may be conceivable to construct a broad trading strategy around municipals, a ton of polluting factors makes things more difficult. First of all there is the issue of liquidity (this actually exists with equities as well). Two securities may look the same and be structured in the same fashion, but if one happens to be less liquid than the other, the more liquid security in an efficient market should demand some sort of a premium. This would then require quantifying the bid ask spread. But that is a classification nightmare in and of itself. Next take the fact that bonds can be issued in any number of states, have all sorts of varying call provisions, bond types (ie. revenue, GO, double barrel, water and sewer, credit rating, insurance, ..., ). It's a fixed income instrument, but it has quite a few idiosyncratic elements. Broad categorizations inevitably fall into the trap of being too general.
So rather than pursue the inclusive paradigm, the paradigm then becomes that of exclusion. That is, find on some truly granular level those securities which tend to be homogeneous in some fashion. Then (as long as your dataset is granular enough), peel off the layers of idiosyncracy from your generic set to other sets, quantifying the various credit spreads which should be applied relative to your reference rate (in the case of municipals, the municipal curve).
It's interesting that these paradigms are so vastly different from one another.
It's also interesting to contrast these lines of thought with that of value investing. Value investing seems to thrive on the idiosyncracy of individual stocks. And yet that is what in some ways kills quant strategies.
Thoughts on Implementation of an Exclusive Trading Model
The question which inevitably pops up is how you actually implement an exclusive model. There may be some theory which is more established, but I think I've come up with a decent work-around. First of all your dataset will of course have to be reasonably large. Even then, the question becomes how one can create a truly homogeneous set of securities when securities have so many differentiating characteristics.
Well, how about this-find the largest group of securities with a reasonable sample size that is as homogeneous as you can possibly make it. I'd call it the path of smallest descent. Lets say you've got a humongous database and you query for data through this a program (ie. SQL). Then scan through all of your variables and identify the one which, when fixed, leads to the smallest decrease in securities. Then do that again. And again. And so on until you are left with the biggest possible generic and homogeneous set of securities as you can find. If you have exhausted all of your variables and you still have a good sample size from which you can get statistically significant insights, good for you. Typically that's not possible if your dataset is granular enough, in which case things get uglier. You start relaxing some of the fixations. You allow for more than one moving part at a time. But if this is the case, then now you have a new objective- relax the fixations which pollute any inference you want to make the least. If you want to examine the behavior of 20 year bonds, for example, you might want to consider making that a range from 19 to 21. Or at the very least, if you want to make an inference on how variable A affects yield, and you need to let one other variable float, it would probably be best if that variable didn't have any sort of systematic relationship to variable A. That way, on average, your inference on variable A should still be correct.
That's just a start. The guiding theme is to make sure that you are making clean inferences. Clean inferences come about when all polluting factors are held constant. So once you reach whatever conclusions you wanted to reach with your relatively small generic set, expand that set by allowing a new parameter to vary, then solve for how that new parameter affects your system. And so on. It's an iterative process which takes a long time. It might not be the best way to go about trading, but it is capable of using your entire dataset and it's highly specific.
The methodology above is interesting but not always useful, and probably doesn’t jive well at all with how the typical value investor thinks about investments. The way I see it, we have a sort of mental playbook which we cycle through when analyzing a stock. Is it a stock which is beaten down hard but has had strong profit growth over the past 5 years, historically strong margins and what have you? This is an exclusive way of looking at the market, whether we call it that or not. We are mentally filtering the market down to very specific subsets, excluding all the rest, knowing well that there are probably a large number of stocks which have as much or more potential than the ones we’re looking at. It might be of value to chew on this a little.
That being said, there is a world of difference between models which attempt to reach very specific conclusions and then expand, and models which start by making very sweeping, broad statements and over time becoming increasingly granular. Perhaps the market in question and the granularity of your data determing to some extent what the "optimal" problem solving paradigm is.
Thoughts on Quantitative Trading
Being able to identify homogeneity in the financial markets seems to be a driving concept when doing quant trading. Classification and homogeneity are two sides of the same coin-- if all securities in the financial markets were unique, all being driven by uncorrelated processes, it seems that you're shit out of luck. There's no way to build a trading system which makes buy and sell recommendations based on cusip (well, perhaps... there is actually some homogeneity here too (image placeholder)); we're in business when we can find ways to classify securities in some way. A useful classification is able to identify things which tend to trade the same way-- and of course when two things trade the same way, we quants would call a proper long-short of the two a stationary, mean reverting process (this, by the way, is the essence behind cointegration-optimal hedging and indexing).
So let's assume for a moment that the goal is identifying homogeneity in some way, shape or form in the financial markets. Where the hell do you begin. I believe you begin by making the decision of whether or not to adopt an inclusive or exclusive paradigm.
The inclusive paradigm, which seems to be the most popular (perhaps because it relies on the least granular information?), is to identify very broad trends in the market. For example, there may be tens of thousands of stocks trading right now, but if I were to bucket them into capitalization-based deciles, trends begin to form when looking at one-year-forward expected returns. In other words, broad-based homogeneity begins to surface. At that point, we may attempt to identify what we consider to be "the next best classifier," which would then split the deciles into subdeciles, each of which is then even more homogeneous. I bet a lot of people have made good money adopting this paradigm, and to be honest, it's the paradigm I personally have had the most experience with up until this point.
But inclusive classification has many downsides which aren't entirely obvious. First of all, the sometimes extreme level of broadness makes it all the more difficult to identify what classifier is indeed the 'best'. Second of all, inclusive classifications tend to carry with them longer time horizons, which aren't necessarily able to be traded on by desks or funds which need strong enough mean reversion to ensure them a decent probability of success over shorter time intervals. That being said, there are some serious benefits to a proper long-short-based inclusive classification trading strategy. Most notably, as long s one is dealing with securities that have less dimensionality—less complexity—than others, the value of this paradigm IMHO improves dramatically. The reason is because there is so little one needs to then control for. It makes some sense, then, why this seems to be the sort of paradigm from which most ETFs have been created. They strip away idiosyncratic risk as much as possible, they can carry with them lower transactions costs, and retain the ability to expose you broadly to the form of risk you’d like to be exposed to.
But the same isn't really true of other forms of securities. Most securities, in fact, are extremely complex when you think about it. Take municipal bonds, for example. While it may be conceivable to construct a broad trading strategy around municipals, a ton of polluting factors makes things more difficult. First of all there is the issue of liquidity (this actually exists with equities as well). Two securities may look the same and be structured in the same fashion, but if one happens to be less liquid than the other, the more liquid security in an efficient market should demand some sort of a premium. This would then require quantifying the bid ask spread. But that is a classification nightmare in and of itself. Next take the fact that bonds can be issued in any number of states, have all sorts of varying call provisions, bond types (ie. revenue, GO, double barrel, water and sewer, credit rating, insurance, ..., ). It's a fixed income instrument, but it has quite a few idiosyncratic elements. Broad categorizations inevitably fall into the trap of being too general.
So rather than pursue the inclusive paradigm, the paradigm then becomes that of exclusion. That is, find on some truly granular level those securities which tend to be homogeneous in some fashion. Then (as long as your dataset is granular enough), peel off the layers of idiosyncracy from your generic set to other sets, quantifying the various credit spreads which should be applied relative to your reference rate (in the case of municipals, the municipal curve).
It's interesting that these paradigms are so vastly different from one another.
It's also interesting to contrast these lines of thought with that of value investing. Value investing seems to thrive on the idiosyncracy of individual stocks. And yet that is what in some ways kills quant strategies.
Thoughts on Implementation of an Exclusive Trading Model
The question which inevitably pops up is how you actually implement an exclusive model. There may be some theory which is more established, but I think I've come up with a decent work-around. First of all your dataset will of course have to be reasonably large. Even then, the question becomes how one can create a truly homogeneous set of securities when securities have so many differentiating characteristics.
Well, how about this-find the largest group of securities with a reasonable sample size that is as homogeneous as you can possibly make it. I'd call it the path of smallest descent. Lets say you've got a humongous database and you query for data through this a program (ie. SQL). Then scan through all of your variables and identify the one which, when fixed, leads to the smallest decrease in securities. Then do that again. And again. And so on until you are left with the biggest possible generic and homogeneous set of securities as you can find. If you have exhausted all of your variables and you still have a good sample size from which you can get statistically significant insights, good for you. Typically that's not possible if your dataset is granular enough, in which case things get uglier. You start relaxing some of the fixations. You allow for more than one moving part at a time. But if this is the case, then now you have a new objective- relax the fixations which pollute any inference you want to make the least. If you want to examine the behavior of 20 year bonds, for example, you might want to consider making that a range from 19 to 21. Or at the very least, if you want to make an inference on how variable A affects yield, and you need to let one other variable float, it would probably be best if that variable didn't have any sort of systematic relationship to variable A. That way, on average, your inference on variable A should still be correct.
That's just a start. The guiding theme is to make sure that you are making clean inferences. Clean inferences come about when all polluting factors are held constant. So once you reach whatever conclusions you wanted to reach with your relatively small generic set, expand that set by allowing a new parameter to vary, then solve for how that new parameter affects your system. And so on. It's an iterative process which takes a long time. It might not be the best way to go about trading, but it is capable of using your entire dataset and it's highly specific.
The methodology above is interesting but not always useful, and probably doesn’t jive well at all with how the typical value investor thinks about investments. The way I see it, we have a sort of mental playbook which we cycle through when analyzing a stock. Is it a stock which is beaten down hard but has had strong profit growth over the past 5 years, historically strong margins and what have you? This is an exclusive way of looking at the market, whether we call it that or not. We are mentally filtering the market down to very specific subsets, excluding all the rest, knowing well that there are probably a large number of stocks which have as much or more potential than the ones we’re looking at. It might be of value to chew on this a little.
0 Comments:
Post a Comment
<< Home