The Art of Streetplay

Thursday, July 14, 2005

Adding distinctions to the ACF

The ACF we use assumes a form of linearity. For example, we can look at the lag one autocorrelation of a stock, assuming that the stock follows an AR(1). The return today in this model equals rho times yesterday’s return plus some epsilon. To calculate that rho we use the method of moments, multiplying the above equation through by yesterday’s return. When we take the expected value, the epsilon term goes to zero. We are then left with the expected value of yesterday’s return times today’s return is equal to rho times the expected value of yesterday’s return squared. For example, for the expected value of the return at time (t-1) times the return at time t, simply take the time t demeaned returns times the time t-1 demeaned returns, sum up over all T data points and divide by T. The ACF doesn’t make any distinction between large price fluctuations and small ones—all data in the return series is weighted by T; no more no less.

Perhaps there is some significant autocorrelation conditional on there being a large price movement, but when the price movement is no longer significant, the autocorrelation subsides, hiding the true value of the autocorrelation under the typical weighing scheme.

It would be interesting to be able to construct some sort of ANOVA test, where one divides up return data into 5 deciles, and compares the autocorrelation function in each case. If the data were truly as it should be, there shouldn’t be much if any statistically significant difference between the various deciles. It would be interesting to see if the 5th is the same as the first. What one can look for—if there is any sort of trend in the autocorrelations as one moves from one decile to the next for a single stock, that would be interesting. Then one could potentially reduce the forecasting error inherent in the prediction. One could also create an EMA, weighted by returns, in the same way that the ‘typical’ EMA weighs by time.

One could look at a large number of stocks and determine if there is any return-size-dependency on the ACF inherent in all stocks in general. If that data provides any sort of information, perhaps it too can be used to augment single stock predictions in a univariate sense. Or it could be used in an ordered univariate sense by looking at all stocks in the market, ordering them by absolute returns in descending order, and making bets on the top few.

I would further distinguish between industries, and determine if there are any sector-specific relationships which differ from those of the industry.

The catch? Cutting up your price series into smaller bits means you’re dealing with a smaller sample size, which means your results aren’t nearly as robust.

Generalized Methodology

In general, to properly gain more insight into the ACF, one should focus on all the various data points which we typically (dumbly) categorize into lags and then average together to form an autocovariance. Instead of averaging them all together blindly, I would take a step back and decide why I am using the damn thing. In the end, what I am looking for is a methodology which generates situations for me in which the autocorrelation between tomorrow’s return and that of one of its prior lags is high. That way, I can make a directional bet for which the probability of my guessing the future return on the stock is higher.

In essence, the ACF should be one of many functions which forecasts future returns using lag dependency as its starting point (certain lags are important while others aren’t) in past return data. The ACF is a one dimensional test in that it makes no other distinctions beyond that of lag dependency.

I would modify the ACF. I would also begin my endeavors by calculating a massive pool of data points-- returns multiplied by lagged returns, as the ACF does. Some data points will be high will others will be low. However rather than simply adding together all the data points which have the same lag relationship, I would add a vector of characteristics to each data point. For example, one cell I would add is volume data. To do so, for each data point, I would calculate the volume which occurred on the lag’s date. I would also add the absolute return on the lag. Perhaps industry makes a difference, so add an entry for industry classification. I would also add what the lag is, which is basically at the heart of what the ACF does. And so on.

I would then take an econometric standpoint. Finding a “good” autocorrelation value from an ACF involves nothing more than taking all of the above data points, indexing them by what the lag is, throwing all data points with the same lag into distinct piles, and averaging the predictive power of the lagged return on the non-lag return for each pile. Good piles have high average values for predictive power. Along the same lines, perhaps returns data has different properties when the volume is high rather than low. Or perhaps tech stocks which experience sharp return shocks on high volume tend to have higher predictive power than non-tech stocks moving on low volume. So I would run tests to find out if such relationships exist. I would first look at each variable in isolation and determine whether or not, all else being equal, variations in that variable have an impact on autocorrelation. I would also look at combinations, for example determining whether or not sharp returns in conjunction with variations in volume have an impact on average predictive power. I ultimately end up with many more piles. Instead of dividing all of the returns into piles categorized simply by their lag, I would aim to divide up all of the returns into piles, categorized by all variables which have a statistically significant effect on the predictive power of a lagged return on its non-lag counterpart. Thus, I would aim to make more distinctions on the data than the typical ACF function, without throwing out the ACF methodology entirely.

0 Comments:

Post a Comment

<< Home