The Web Site of Victor Niederhoffer & Laurel Kenner
Dedicated to the scientific method, free markets, deflating ballyhoo, creating value, and laughter; a forum for us to use our meager abilities to make the world of specinvestments a better place.
Write to us at: (address is not clickable)
Inside Day Contest
A recent study by Mr. Downing found that out of 2,111 days from 1996, only 227 or
11% were inside days for the S&P. An inside day is defined as a day where the
current high is lower than the previous high, and the current low is higher than
the previous low. The number of inside days strikes me as low relative to
randomness. It is interesting to speculate about the best way to test this naive
supposition as there are many things in the philosophy of where the close is
relative to the high and low of a day, and how this relates to the next day that
might not be dreamed of. (An interesting aside is that the number of outside
days during this period was 230) . We will offer a prize of $500 to you
or your favorite charity for the best answer to this question. And your answer
may be via closed form, simulation, or reference to tables of extreme values, or
what have you. Note that this a study for pure increase in knowledge as opposed
to one of practical value. A meal for a life time rather than a day.
-- Victor Niederhoffer
Victor Niederhoffer: The Four Winners
I was so impressed by the responses to our question on Inside Days that I decided to post the four best. They are by Munawar Cheema, Alston Mabry, Blake McShane, and Charles Pennington.
Statistical Analysis of the Occurrence of Inside Days in the S&P 500 by Munawar Cheema
Disclaimer: All the above conclusions and observations are dependent on a thorough checking of my work and methodology.
Alston Mabry on Inside Days
Here's one take on your Inside Day question:
First, the data I used is the S&P 500, taken from Yahoo, for the dates 2 January, 1996 through 9 July, 2004. In that set I count 2143 total days, of which 251 are "inside" days, where High T1 is lower than High T-zero, and Low T1 is higher than Low T-zero. (Note that because I did not include the last trading day of 1995, 2 January, 1996, is counted only as a T-zero and not as a possible T1.)
I took the approach that an inside day is like putting a basketball through a hoop. The basketball must be smaller than the hoop, and there is some probability with each shot that a small-enough ball will go through a large-enough hoop. (The mean High-Low gap for the total distribution is 1.53%, whereas the mean High-Low gap for the subset of 251 inside days is only 1.05%.)
I calculated for each day Tx its High-Low gap, measured as a % of its Open. Then I calculated for each day Tx how many of the other days in the total distribution had High-Low gaps larger than High-Low Tx, i.e., how many of the other days were large enough "hoops". I then turned that into a percent probability that Tx might follow a day with a large-enough High-Low gap.
For example, 9 July, 2004, had a High-Low gap of 0.57%. Of the total population of 2143 days, 2046 have larger High-Low gaps. So, were I to randomize these days, 9 July, 2004, has a .9543 probability of following a day with a larger gap.
Having calculated the through-the-hoop probability for each day, I summed them for all 2143 days, to get a total of 1070.4958. I then divided this sum by the total number of inside days (251) to get an "inside day constant" of 4.26492351.
Now, for any group of days with a certain range of High-Low gaps, I should be able to predict how many inside days I would get by summing their probabilities and multiplying by this inside day constant.
For example, out of the total distribution of days (2143), there are 59 days with High-Low gaps between 0% and .50% inclusive. The sum of the inside day probabilities for those 59 days is 58.14226. This sum, divided by the inside day constant, produces 13.63266. So, we would expect 13-14 inside days from this group. Looking at the actual group of 251 inside days, we find that there are 20 days with High-Low gaps between 0% and .50% inclusive.
Running the predictor equation for the distribution, split into segments, produces these results:
The equation underestimates on the small end of the distribution, but I think that seeing the equation work fairly accurately over different segments of the distribution demonstrates its usefulness.
To test the idea out of sample, I took the S&P 500 daily data for the period 4 January, 1988, through 29 December, 1995, and applied the "inside constant" derived from the 1996-2004 period to predict how many inside days would come from different High-Low gap groups in the 1988-1995 period. Results as follows:
As with the 1996-2004 period, the predictor is less accurate at the small end of the 1988-1995 distribution, where, evidently, the small size of the "basketballs" gives us more hits than the predictor equation predicts. But overall, I would say the predictor equation works well in the out-of-sample test. (Interesting to note the accuracy of the predictor over segments with very different numbers of days, e.g., the 45%-.52% segment versus the .95% segment.)
I would conclude that the distribution of inside days is a function of the probabilities of one High-Low gap fitting inside another, as opposed to some underlying market structure or tradable anomaly. One might do a more detailed analysis of the probabilities for specific combinations of ball size and hoop size, to try to understand how the "inside constant" operates at the micro level, or to analyze the divergence at the small end of the distribution -- but probably just for sport.
Thanks for a stimulating question. I look forward to seeing other responses on the site.
Blake McShane on Inside Days
I tried solving it analytically assuming a price process of geometric brownian motion, but, unfortunately, could not figure out the process for the supremum and infimum of the underlying price process. Thus, I resorted to simulation and discretized with 10,000 "steps" per day. After running 1,000 such simulations of two days and comparing the highs and lows, I observed an inside day percentage of 12.7%. I am in the process of running 10,000 but this should take several five hours. If anyone wants the spreadsheet with the simulation, email me and I will send it.
Victor Niederhoffer Comments
Is very erudite. Perhaps if he modifies his work to take into consideration that the moves of the open relative to the close are quite different from any other tick, and in addition the serial correlation between consecutive ranges is of the order of 25%, he will win hands down the prize for the best solution to this pearl.
Blake McShane Responds
The serial correlation of the daily S+P High - S+P Low has been .157 over the last ten years, with the 90 day correlation ranging from -.33 to +.33 and averaging about zero. With this in mind, I retweaked my study so the volatility term in what before was a discretized geometric brownian motion now follows a GARCH(1,1) process. After running several simulations, I am seeing the same inside and outside day percentages (ie, about 9-13%). The range serial correlation is obviously dependent upon the weights chosen for the GARCH model but the inside/outside day percentages seem roughly constant despite variations in the level of range serial correlation.
Close to open prices are determined in a manner similar to intraday prices except that 2,000 "steps" are taken from close to open. Intraday I use 10,000 "steps" with each step representing one tick whereas 2,000 "steps" represent the close-to-open tick. 2,000 was chosen arbitrarily and I am open to changing it, but it clearly must be greater than one to reflect the distinctive qualities of this tick.
Charles Pennington on Inside Days
The statistics of inside days can be modeled on the E***l program, if you have not already uninstalled it from your machine as you were instructed.
I broke the day up into 100 time steps; ultimately the first 25 steps represent the pre-market hours (close-to-open) when the market is closed, and the next 75 steps represent regular market hours. Our daily "high" and "low" will be the max and min, respectively, of these 75 time steps. (The chosen ratio 75-to-25 will be explained shortly.)
The market started at zero; then for each subsequent time step, I added a random number, evenly distributed between -1 and 1. I put in 65,000 or so time steps (the limit of E***l), and now I have a diffusing market. Each day is 100 time steps, so there are ~650 simulated trading days.
The ratio of 75 to 25 was chosen as follows: I took the mean square percentage move from open-to-close and close-to-open, using about 10 years of data for SPY (the exchange traded fund version of the S&P 500). The mean square open-to-close move was 3.1 times the mean square close-to-open move. That's how I chose the 3-to-1 ratio for the number of time steps. (The root-mean-square diffusion distance grows like the square root of the time, but the mean-square distance grows linearly with time.)
For the time step sequences #25 to 100, 125 to 200, 225 to 300, etc., I stored the max and min prices, each representing the max and min price for one day.
Then, if today's max (H) was less than yesterday's (H1), and if today's min (L) was greater than yesterday's (L1), then today was an inside day.
654 simulated trading days.
73, or 11%, were inside days.
The statistical error in the # of days is probably the square root of 73, or about 8, so for an error bar we could use about 1%.
So with just random walk, one expects that 11% plus/minus 1% of all trading days should be inside days. The number that Tom observed was 11%. So all is well here in the best of all possible worlds.