The Art of Streetplay

Wednesday, December 28, 2005

Randomness Kills Simplicity, But Hey, That's Reality

“Things Should Be Made As Simple As Possible, But Not Any Simpler”

I was actually feeling quite content as I boarded a bus to New York City today regarding some of the concluding thoughts in the last post about schema theory, and the ebb and flow from complexity to simplicity. As usual of course I brought with me my pseudo-bible, “Fooled By Randomness,” to re-read it... again. As has always been the case, it definitely put me in my place, so I thought I’d temper some of the optimism of the last post with a dose of what Taleb knows best—randomness. Thoughts are again welcome.

Inductive Reasoning
Schema Theory and inductive reasoning have a lot in common. Inductive reasoning involves observing empirical data and searching for patterns of behavior which form the basis for hypotheses about the nature of reality. In other words, it wades through large amounts of data and attempts to make sense of it all through causal links and unifying properties. This is somewhat similar to how a financial analyst gathers a lot of information which at the start seems independent and distinct, but which over time (hopefully) comes together under some line of logic to form a complete understanding of the company and the nature of its business and dynamics.

Taleb’s Issue With Inductive Reasoning
Taleb took more than a few shots at inductive reasoning, and rightfully so. Inductive reasoning’s conclusions are very sensitive to the properties of the process whose observations we analyze. If some process we see is very well behaved, for example is normally distributed, then the information gain we receive with each additional observation is a quantifiable amount which we know a priori. But how can we know in reality with no ability to see the future that a process will continue to behave in a normal fashion going forward? And when the distribution underlying the process becomes increasingly non-normal, we start to run into serious information gain problems.

Taleb characterized this as playing Russian roulette using a gun with 1,000 chambers. If I were to play this Russian roulette with no knowledge of the number of chambers or the number of bullets in the gun, and it just so happened that after 500 trials I was still standing, I would probably start believing there were no bullets in the gun in the first place—induction might lead to a conclusion like this given my knowledge of guns and the number of trials, but this would obviously be wrong.

The main point I’d like to drive home then is the fact that induction naturally and unavoidably simplifies the world. Drawing positive conclusions from an incomplete data set is to some extent what we have to do if we want to do anything, and yet it naturally leads to error. Knowing that such error is always possible and will probably lead to mis-evaluation requires an acceptance and appreciation of randomness. And randomness is the bane of the simplification process I mentioned earlier. The company no longer occupies one mental slot in my brain. All those facts relating to the company which cannot be logically connected to my paradigm of “the company” must sit uncomfortably in other mental slots. It’s inefficient, but it’s also how things are, so what can you do but accept that.

So when Einstein said “things should be made as simple as possible, but not any simpler,” what I think he’s acknowledging is the fact that there is a natural limit to the amount of simplification which can occur. Because of randomness, many things cannot and should not be connected if ones goal is to obtain a rational view of reality for the purposes of forecasting.

It’s a bit sad to believe that we can only truly know that which is false, and can never really know that which is true (Popper). We can only make our best guesses, over and over again, and hope that through personal risk management, the randomness which plagues the decisions we make based on those guesses aren’t so correlated that we suffer terribly. This was Taleb's conclusion, to the best of my understanding. It's not as if he ceased to make decisions. He used statistical inference for all that it was worth to make investment decisions, but then made sure to separate that process from his weighting methodology to tailor his risk profile to his liking.

Not too happy a blog post, sorry guys.

Sunday, December 25, 2005

Taking Another Look at Arnott (Why Not?)

As long-time readers know, I am interested in indexation. I have a few thoughts on Arnott’s Fundamental Indexation. Before diving into the improvements though, I thought it might be of value to take a closer look at the theoretical underpinnings of his rationale, which I break up into a few parts.

I'd break things down to two claims. One claim is that the S&P is inefficient because of cap weighting and the other is that Fundamental Indexing can do a better job. They seem to be theoretically somewhat orthogonal so this could help flesh things out. In the interim, I throw out some implications and a test I’d be interested to see.

As usual if anyone has any feedback I would be highly interested to hear it. This is one of the more technical posts as a word of warning.

The Inefficiency Claim

Inefficiency is pretty clear. As I see it, it's due to the fact that deviations from intrinsic value, net-net, tend to have zero expected value in terms of returns and mean revert.

Assume that all stocks have some deviation which is due to intrinsic value and another due to idiosyncratic noise. Hypothetically if I know a priori the future evolution of the changes in intrinsic value of all stocks, and I were to net all stock prices by my perfect estimates of intrinsic value, I would be left with a set of residuals whose returns should have zero mean and a mean reverting tendency. If deviations are comparable in terms of returns and not dollar value, then small caps and large caps are equally likely to deviate by, say, 1% from intrinsic. In reality this might not be exactly the case but it is within a reasonable level I would expect. However the dollar value impact of the deviation will be much larger for the large cap relative to the small cap. On a period by period basis then, if I were to invest as if I were the S&P, I would systematically emphasize fluctuations of large cap stocks more than small cap stocks-- and rightly so if the variation were due to intrinsic value shifts. But if one were to run the simulation mentioned above, one would see that if all stocks' prices were initially set to intrinsic value, the idiosyncratic variations force the market to over-emphasize the fluctuations of the stocks with the positive idiosyncratic residuals relative to a market which fluctuates entirely off of changes in intrinsic value. The mean reverting property of the idiosyncratic noise is then the killer, as it probabilistically speaking puts some drag on the stocks with the over-emphasis. Thus, the problem.

Is there a flaw in that logic?

The Implications of S&P Inefficiency
If the S&P is indeed inefficient, there are quite a few consequences. "The market" is supposed to be mean variance efficient. We use it all the time in our finance courses as the basis behind the market risk premium. We use it to get our hands around the tradeoff between risk and expected return. All of this would basically be wrong. If the S&P is indeed inefficient, we might have to raise the hurdle rate of our projects by a couple hundred basis points.

Of course, it was wrong beforehand too. To be technical, the stock market is a pretty poor proxy for the real market—the whole economy, with a lot of very particular nuances (Zack, I’m sure you explain this 10x better than I can). This just means that even when representing the stock market, the S&P does a poor job.

The Improvement Claim
The second claim is that Fundamental Indexing can do better.

I can't be as confident but I guess the rationale from my point of view goes something along these lines. All stocks in the S&P are supposed to be weighted by their intrinsic values. But if one makes the assumption that stocks deviate from intrinsic, the argument above implies cap weighting, although it is a great proxy for company size, has problems. Why not try out other things which are proxies for company size which might not have the bias that cap weighting has? Income, for example, has a 95% return correlation with the S&P, almost as much capacity as the S&P, also tends to favor very large companies, and doesn't create marked deviant industry allocation. It doesn't take on much more small stock risk from Fama-French, and rebalancing schemes can bring turnover down to the level of the S&P itself. It definitely has more F-F "value" to it but it's not taking on more risk in terms of liquidity, interest rate regime or bull/bear market cycle. It's just trying to proxy for market size without bias, albeit with lower data resolution.

Tempering Expectations; Possible Improvement
While the above rationale is intuitively appealing, its improvement relative to the S&P is a function of the degree of mean reversion there is to the idiosyncratic noise. If “irrational” price movements take years to correct themselves, then attempts to trade this noise, while expected value positive, could take so long and suffer large enough drawdown that it could very well be unfeasible to trade on.

That being said, Arnott himself showed that historically, a fundamentally indexed portfolio outperforms by approximately 200 basis points—this is a sizable margin considering the large back-testing period he considered.

To take a closer look at the inefficiency, one can make a direct link between a fundamental metric and market cap. Take free cash flow (‘FCF’), for example, as our fundamental metric. Market cap (‘MC’) is simply FCF multiplied by MC/FCF, the FCF multiple. Looked at from this angle, the, the inefficiency implies mean reversion in the multiple-- MC/FCF for example. But he never does out the statistics from what I could see in his paper-- he simply turned to other stats which implied mean reversion somewhere. So I'm thinking he could be missing some alpha which could be gotten with a little additional complexity. If all companies are reduced to two numbers-- FCF and P/FCF for example-- then weighting entirely on FCF implies independence between FCF and the multiple on forward returns, right? But I would think that a company which does 50M in FCF on a 20 multiple has a different payoff profile than a similar company which does 50M on a 3 multiple. The multiple implies something about the quality of the underlying earnings, and quality isn’t picked up by FCF on a standalone basis. While Arnott's methodology would definitely reallocate towards the lower multiple company relative to the higher multiple one, it might still be giving too little credit to the 20 multiple, because the market seems to be saying there is something about that FCF which is more valuable to investors.

Has anyone seen a test done which buckets the market by FCF, then buckets again by multiple, creating a matrix of subgroupings, then populates that matrix with 1 year forward returns on a year by year basis? Collection of say 50 years of data would create a 3D matrix. With this one could test the claim that FCF and P/FCF are indeed independent of one another and see if there is any additional insight which could be gained.

Closing Thought (Thanks Mike!)-- Schema Theory
Mike over at TaylorTree posted a kind reference to a couple of my prior posts in one his last entries. I agree with him completely when he references the tradeoff between simplicity and complexity. I just thought I'd chip in with a few thoughts which come from the intriguing field of cognitive development... and my favorite theory of how we acquire knowledge, Schema Theory.

Under schema theory, knowledge takes the form of a multitude of 'schema', which, broadly speaking, are mental representations of what all instances of something have in common. As an example, my "house" schema represents what is common to all houses that I've been in. A house has parts, it's made of many things, it can be used for a variety of purposes, ... the list goes on. This is important because when I look at 1,000 houses, they aren't all completely different from eachother-- they have broad similarities which I have mental categories for with which I can compare the houses.

The transition from complex to simple and back to complex might at least partially be explained by how schema theory explains our learning process. Schema decompose complexity through categorization and abstraction. I'm not big on terms so I thought an example might make things a little more clear.

When dealing with new experiences, we have a tendency to treat them as new and different from what we've experienced in the past. For example, if someone were to throw me a ticker and have me look at its business, I would, at the onset, treat all new information I take in regarding the company as new. I would probably begin by gathering general information about the company-- business line, industry, margins, growth, etc. To a large extent, those data points I pick up, at least at the start, don't really have a place. They are just distinct facts. From a cognitive utilization point of view, this is really, really inefficient! I'm being forced to use all of the slots I've got up there in my brain just to digest all these little random tidbits of information!

What happens over time though is that linkages form. The high margins of the company make sense because they've been able to grow sales without any corresponding growth in assets, so much of the sales growth is simply going straight through to the bottom line. Assets aren't growing because their business does a remarkable job of flexing capacity. Their margins are staying up because of cost-related nuances. The magnitude of the sales growth is explainable by the geography the company resides in and the customers it does business with. All the facts-- the qualitative concepts and the hard numbers-- naturally fall into place, and instead of thinking of the company as 10,000 distinct data points all independent of one another (complexity), it is instead "the company" (total simplicity). All facts are entagled in an fact web which sticks so tightly to itself that they really are all one idea in your head. It goes from using all of our cognitive slots to one of them. And it does so by characterizing the company through the same analytical categories which were used to analyze the hundreds of other companies that have been looked at.

In this context it kind of makes sense that things naturally ebb and flow from simple to complex. We are constantly trying to expand our intellectual borders, learning new tools, new ways of looking at things... but at the same time we are naturally also doing some heavy duty simplification. Making things complicated and simple are the pillars of cognitive development, and something which can be optimized on.

Sunday, December 18, 2005

Responding to a comment; model building thoughts

I'm not quite sure how but a comment by one of my readers somehow evaded me until now. I thought it might be of value to post some thoughts in response.

I would first of all emphasize how extremely basic that article is, and some of the major caveats which might be of value to consider. I'll walk through it a little.

"Step 1. Decide on the time frame and the general strategy of the investment. This step is very important because it will dictate the type of stocks you buy."

While this sounds stupidly simple, it's surprising how often it isn't adhered to, directly or indirectly. As investors, we are subject to a wide range of psychological biases which cloud our ability to make rational investment decisions. Quite a few of them revolve around irrational response to unexpected events... which can have pretty dramatic repercussions on all aspects of our investment making process, including time horizon. I think a lot of this can be dealt with by thinking a little more deeply about the assumptions underlying the investments we make, which I wrote about a while back in Assumptions Management. I can't stress enough how important I think it is to come to grips with the assumptions we are making when we invest in the companies we invest in-- if I fix my time horizon at six months, does that imply I'm willing to stomach any and all price movement in between? Why? Might it be of value to consider risk re-evaluation points so that you can adapt to the changing underlying fundamentals of the companies you've invested in? If so, what is a logical structure for those re-evaluation points-- a function of time? A function of the influx of news? Quarterly, after the release of the latest K or Q? Could one also deal with adaptive conditions by making shorter term forecasts so that, should negative residuals appear, you could go in and figure out why reality deviated from expectation?

More fundamentally, why will my strategy do any better, risk-adjusted, than the market in the long run? If I know that it can't, then why do I believe that it can outperform over the short run, and how do I know when to switch out because my system has stopped working? If I can't answer all these questions with some degree of confidence, I think one is probably making an uninformed investment decision.

"If you decide to be a short term investor, you would like to adhere to one of the following strategies:..."

This is somewhat silly. First of all "momentum trading" and "contrarian strategy" are two sides of the same coin. The author is referring to autocorrelation trading, or the identification of companies whose price processes tend to be serially autocorrelated with past price movement in some form under a certain set of initial conditions. Yes, autocorrelation can have a positive coefficient (trend following) or a negative one (mean reverting, aka contrarian). Great.

While a lot of short term trading is autocorrelation based, this isn't the case for all short term trading, unless one greatly expands ones definition of "autocorrelation" to include a lot more than past price history. I know very little, but I can assure you that these are two of many, many other forms of short term trading.

"Step 2. Conduct researches that give you a selection of stocks that is consistent to your investment time frame and strategy. There are numerous stock screeners on the web that can help you find stocks according to your needs."

I am surprised that steps 1 and 2 have made no mention of historical backtesting of some form or another. Again, I think this comes back to two of the pillars of investing IMHO-- risk exposure and investment assumptions. Different investment methodologies expose us to different forms of risk. Do we know exactly what risks we are exposing ourselves to, and is there a reason why we want to be exposed to them? Even if I have run all the statistical tests in the world and all seemingly indicate that I am looking at a sustainable chunk of alpha, is there no way in some state of the world for that relationship to not hold in the future?

Let's say I'm looking at Greenblatt's magic formula. Its generated some great returns on a risk adjusted basis over the past couple decades. As an individual investor looking to invest my retirement savings for the next 20 years, what sort of things should be running through my head? One possible concern is that given the increased exposure this strategy will get, a large following of individuals will pile on. ETF's will be created which will do the same. If the investment management business were to universally believe that this will generate alpha relative to straight investment in the S&P, then would the marginal buyer, the guy who gets in after everyone else has bought, expect to outperform as well? One of the sad things about many if not all short term trading strategies is that they are only valuable if no one else knows about them and you are able to trade without creating any footsteps.

But there are more concerns. Let's say Greenblatt's formula became extremely popular. At some point, would it be unheard of for companies to tailor their financials to attain better ranking, even if this didn't accurately represent underlying financial reality? While this sounds like a silly concern, I can guarantee you that hordes of companies are doing exactly this in some way shape or form-- window dressing, tailored compensation schemes, ...

"Step 3. Once you have a list of stocks to buy, you would need to diversify them in a way that gives the greatest reward/risk ratio (The Sharpe Ratio). One way to do this is conduct a Markowitz analysis for your portfolio. The analysis is from the Modern Portfolio Theory and will give you the proportions of money you should allocate to each stock. This step is crucial because diversification is one of the free-lunches in the investment world."

This is a whole other topic of its own and is typically used by quants. Again, we are looking at risk... except now it's portfolio risk we're dealing with. We all deal with portfolio management to varying degrees. The only point I'd make about Markowitz has to do with stability. Markowitz optimality is only as good as the assumptions underlying that optimality. Just because a portfolio historically had a certain risk/reward profile doesn't mean that it will continue to have that into the forseeable future. Thus stability becomes important as a measure of just how realiable the past data is.

One insight about Markowitz portfolios for example is that historical risk happens to be a better indicator of future risk than historical return and future return. Knowing that, I would heavily discount a portfolio whose performance as defined by some measure of risk adjusted return like Sharpe if it is driven by return. I would also then perhaps choose portfolios which as a pre-condition jive with my risk tolerance, because I know I can trust historical risk to some degree, and then spend the bulk of my time assessing the expected return of the stocks in my portfolio.

The most true line in that article, IMHO, is the one below:

"Stock picking is a very complicated process."

Hope this helps.

Response to Gavekal's Indexation Article

Response to “How do we invest in this brave new world? Is indexing the answer?”by Charles and Louis-Vincent Gave and Anatole Kaletsky

Gavekal's article was quite thought provoking and very interesting, and revolved around a few central tenets.  One tenet is that the existence and rise of indexation will lead to more inefficiency in the market rather than less.  There were a few reasons.  One reason was that the increased importance of the index made the index the reference point for risk.  Another reason was that due to its being capitalization weighted, the purchase of the index led ones portfolio to be systematically overweighting the stocks which probabilistically speaking are the most overvalued, and vice versa.  It’s a relatively complicated article and there’s no way I can do it justice in one paragraph, so I recommend checking it out.  It is available to the public for free over here.

I just had two questions.

It seems an underlying assumption made in the paper is that “indexation” is and will be, primarily, investment in something which tracks a broader market segment like the S&P.  However might this be changing, albeit slowly, as more investors begin to see the investment appeal of ‘alternative’ indices?  Arnott’s indexation methodology is being implemented at PowerShares and Allianz.  Rydex is actively pursuing a number of innovative strategies.  Its S&P equal weight has a tinge of Arnott in it and has outperformed materially for a quite while, arguably not only because of its relative overweighting in small caps but also perhaps because of some degree of nuances due to its rebalancing. WisdomTree is supposedly coming to market with other innovative products.  Greenblatt at the Conference last week made a very compelling case for a strategy which could very easily be converted into an ETF product, and I would be highly surprised if it isn’t.  All of these products can be invested in by your typical individual investor.  I agree that to some degree these are untested concepts, but they are interesting trend in the ETF world, and one which might have implications many years down the road for those investors who don’t have the time to be effective price choosers in the market mechanism.  

Secondly, at that point it could be of value to take a second look at how investment professionals add value to the market. They are compensated for efficiently pricing stocks so that capital is more properly allocated to those who need and deserve it.  If alternative indexes like Greenblatt’s become very popular, it’s as if a sliver of alpha has left the system for a mere handful of basis points.  It would put mutual funds in an awkward position because the relative importance of their benchmark is diminishing, and yet they are forced to remain chained to its fluctuations.  This could have a crippling effect on them and their performance.  And hedge funds would then have to find increasingly innovative ways to generate the alpha their investors are looking for in a seemingly shrunken opportunity set.  Under this paradigm, money would begin to flow, probably slowly at the start, out of mutual funds and the market would evolve into how I think it probably should be—ETFs and professional money managers (hedge funds), focused on absolute returns and on products all up and down the risk spectrum in all shapes and sizes to accommodate the risk preferences and hedging needs to better serve their investors.  While mutual funds have a leg up organizationally and operationally because of their firm entrenchment in various retirement programs, I am optimistic that market efficiency will overcome this if an ETF which charges a mere handful of basis points can do all of what the mutual fund does but at a far cheaper price.  We can somewhat see it coming already, with the rise of ETF’s which are much more friendly to employees at companies—ETF’s which are actively trying to gain ground in the various channels which have traditionally been dominated by mutual funds.  It is in their best interests, and rightly so, to push as hard as they can into these channels to steal market share.  Given their structure, I believe they have a good shot at succeeding.  
Once again great article, I was just wondering what your thoughts are on these questions and thought they might be interesting.

Exclusive vs Inclusive; Thoughts on Model Building

This is a work in progress. I actually disagree with some of what I say below. I think working on a trading desk, trying to piece things together as a trader would, is what pushes in-house models to be more complex than less. Humans are great at capturing some forms of weird idiosyncrasy. That naturally causes the models they would 'like' to build in a very complicated direction.
That being said, there is a world of difference between models which attempt to reach very specific conclusions and then expand, and models which start by making very sweeping, broad statements and over time becoming increasingly granular. Perhaps the market in question and the granularity of your data determing to some extent what the "optimal" problem solving paradigm is.

Thoughts on Quantitative Trading

Being able to identify homogeneity in the financial markets seems to be a driving concept when doing quant trading. Classification and homogeneity are two sides of the same coin-- if all securities in the financial markets were unique, all being driven by uncorrelated processes, it seems that you're shit out of luck. There's no way to build a trading system which makes buy and sell recommendations based on cusip (well, perhaps... there is actually some homogeneity here too (image placeholder)); we're in business when we can find ways to classify securities in some way. A useful classification is able to identify things which tend to trade the same way-- and of course when two things trade the same way, we quants would call a proper long-short of the two a stationary, mean reverting process (this, by the way, is the essence behind cointegration-optimal hedging and indexing).

So let's assume for a moment that the goal is identifying homogeneity in some way, shape or form in the financial markets. Where the hell do you begin. I believe you begin by making the decision of whether or not to adopt an inclusive or exclusive paradigm.

The inclusive paradigm, which seems to be the most popular (perhaps because it relies on the least granular information?), is to identify very broad trends in the market. For example, there may be tens of thousands of stocks trading right now, but if I were to bucket them into capitalization-based deciles, trends begin to form when looking at one-year-forward expected returns. In other words, broad-based homogeneity begins to surface. At that point, we may attempt to identify what we consider to be "the next best classifier," which would then split the deciles into subdeciles, each of which is then even more homogeneous. I bet a lot of people have made good money adopting this paradigm, and to be honest, it's the paradigm I personally have had the most experience with up until this point.

But inclusive classification has many downsides which aren't entirely obvious. First of all, the sometimes extreme level of broadness makes it all the more difficult to identify what classifier is indeed the 'best'. Second of all, inclusive classifications tend to carry with them longer time horizons, which aren't necessarily able to be traded on by desks or funds which need strong enough mean reversion to ensure them a decent probability of success over shorter time intervals. That being said, there are some serious benefits to a proper long-short-based inclusive classification trading strategy. Most notably, as long s one is dealing with securities that have less dimensionality—less complexity—than others, the value of this paradigm IMHO improves dramatically. The reason is because there is so little one needs to then control for. It makes some sense, then, why this seems to be the sort of paradigm from which most ETFs have been created. They strip away idiosyncratic risk as much as possible, they can carry with them lower transactions costs, and retain the ability to expose you broadly to the form of risk you’d like to be exposed to.

But the same isn't really true of other forms of securities. Most securities, in fact, are extremely complex when you think about it. Take municipal bonds, for example. While it may be conceivable to construct a broad trading strategy around municipals, a ton of polluting factors makes things more difficult. First of all there is the issue of liquidity (this actually exists with equities as well). Two securities may look the same and be structured in the same fashion, but if one happens to be less liquid than the other, the more liquid security in an efficient market should demand some sort of a premium. This would then require quantifying the bid ask spread. But that is a classification nightmare in and of itself. Next take the fact that bonds can be issued in any number of states, have all sorts of varying call provisions, bond types (ie. revenue, GO, double barrel, water and sewer, credit rating, insurance, ..., ). It's a fixed income instrument, but it has quite a few idiosyncratic elements. Broad categorizations inevitably fall into the trap of being too general.

So rather than pursue the inclusive paradigm, the paradigm then becomes that of exclusion. That is, find on some truly granular level those securities which tend to be homogeneous in some fashion. Then (as long as your dataset is granular enough), peel off the layers of idiosyncracy from your generic set to other sets, quantifying the various credit spreads which should be applied relative to your reference rate (in the case of municipals, the municipal curve).
It's interesting that these paradigms are so vastly different from one another.
It's also interesting to contrast these lines of thought with that of value investing. Value investing seems to thrive on the idiosyncracy of individual stocks. And yet that is what in some ways kills quant strategies.

Thoughts on Implementation of an Exclusive Trading Model
The question which inevitably pops up is how you actually implement an exclusive model. There may be some theory which is more established, but I think I've come up with a decent work-around. First of all your dataset will of course have to be reasonably large. Even then, the question becomes how one can create a truly homogeneous set of securities when securities have so many differentiating characteristics.

Well, how about this-find the largest group of securities with a reasonable sample size that is as homogeneous as you can possibly make it. I'd call it the path of smallest descent. Lets say you've got a humongous database and you query for data through this a program (ie. SQL). Then scan through all of your variables and identify the one which, when fixed, leads to the smallest decrease in securities. Then do that again. And again. And so on until you are left with the biggest possible generic and homogeneous set of securities as you can find. If you have exhausted all of your variables and you still have a good sample size from which you can get statistically significant insights, good for you. Typically that's not possible if your dataset is granular enough, in which case things get uglier. You start relaxing some of the fixations. You allow for more than one moving part at a time. But if this is the case, then now you have a new objective- relax the fixations which pollute any inference you want to make the least. If you want to examine the behavior of 20 year bonds, for example, you might want to consider making that a range from 19 to 21. Or at the very least, if you want to make an inference on how variable A affects yield, and you need to let one other variable float, it would probably be best if that variable didn't have any sort of systematic relationship to variable A. That way, on average, your inference on variable A should still be correct.

That's just a start. The guiding theme is to make sure that you are making clean inferences. Clean inferences come about when all polluting factors are held constant. So once you reach whatever conclusions you wanted to reach with your relatively small generic set, expand that set by allowing a new parameter to vary, then solve for how that new parameter affects your system. And so on. It's an iterative process which takes a long time. It might not be the best way to go about trading, but it is capable of using your entire dataset and it's highly specific.

The methodology above is interesting but not always useful, and probably doesn’t jive well at all with how the typical value investor thinks about investments. The way I see it, we have a sort of mental playbook which we cycle through when analyzing a stock. Is it a stock which is beaten down hard but has had strong profit growth over the past 5 years, historically strong margins and what have you? This is an exclusive way of looking at the market, whether we call it that or not. We are mentally filtering the market down to very specific subsets, excluding all the rest, knowing well that there are probably a large number of stocks which have as much or more potential than the ones we’re looking at. It might be of value to chew on this a little.