Daily Speculations

The Web Site of  Victor Niederhoffer & Laurel Kenner

Dedicated to the scientific method, free markets, ballyhoo deflation, value creation, and laughter;  a forum for us to use our meager abilities to make the world of specinvestments a better place.



Write to us at:  (not clickable)

Victor Niederhoffer



The central concept in statistics is that members of a population differ. How much they differ or vary is key to all further inquiry. The usual measure of variability is the standard deviation. We usually observe a sample from a population that contains these varying individuals. The individuals differ because of:

When we take several samples from a population and group them, we expect the groups to vary. The key question to be solved is how much the differences are due to random or ephemeral factors vs. real differences.

Related questions are, How consistent are the differences? How big are the differences? How important? How likely are they to have arisen by chance?

In taking five samples from a population, we expect the samples to have different attributes such as the proportion of rises in each sample. In a population with 53% rises, we might expect that the normal random variability in the proportion of rises in successive samples is the square root of 53% x 47%/n, or approximately 1/2 divided by the square root of n. For a example of 500, that comes to approximately 0.02.

In comparing two such proportions with equal numbers of observations, we must take account of the fact that the variance of differences between means is about twice as great as the variation in an individual mean. Scaling down to the linear standard deviation, we expect a normal variation between proportions of groups of 0.03, when each group numbers 500 and when the true proportions are near 1/2). An exact formula for any two n and any proportions can be found in a statistics text, such as Snedecor, under "Comparison of Properties in Independent Samples," p. 124 of my 8th edition.

The difference between proportions from random samples thus will have a standard eviation of 0.03. The difference compared to this standard deviation is a normal deviate. Thus, about 5% of the time, we would expect that the difference between any two proportions from random population would e 6% or more.

There are 10 such comparisons we can make between five samples from a random population. And we would expect that it is at least a 50% shot to find a difference of 6% or more from the random average of 53% up, between the largest and smallest proportion in sampling from random and ever-changing and meaningless non-predictive and highly variable magnitudinous samples such as stock market moves classified by days of week."


A Technical Note from Stephen Stigler
Dear Vic,

I am in Glasgow, where Adam Smith was both student and teacher.

You ask how to explain the concept of variabiity. This is a hard topic to explain to general audiences.

Essentially it is what I was trying to explain at the beginning of the chapter of Stats on Table on regression (persistent factors and transient factors etc - I don't have the book here). You do quite well. I do think people like the idea you use that if you take two indep measures X and Y on the same quantity under the same conditions (cycle, season) so they only differ by random/unpredictable factors, then the ave diff between them is twice the standard dev of an X - and this average is actually measured by squaring (X-Y), averaging, then taking sq root. The analogy with our work on golf scores is that if X1 and X2 are the scores of the same golfer under the same conditions, the the expected value (or "average") of (X1-X2)^2 is twice the variance of the luck factor. And if X and Y are the scores of two different randomly selected golfers on the same day, the the exp val or ave of (X-Y)^2 is twice the variance of the luck+skill factors combined. By subtraction you get twice the variance of the skill factor. Divide by two and take sq roots to get sds.

Best wishes,