Daily Speculations The Web Site of Victor Niederhoffer & Laurel Kenner
Dedicated to the scientific method, free markets, ballyhoo deflation,
value creation, and laughter; |
Write to us at: (not clickable)
Victor Niederhoffer
04/26/2004
Variability
The central concept in statistics is that members of a population differ. How much they differ or vary is key to all further inquiry. The usual measure of variability is the standard deviation. We usually observe a sample from a population that contains these varying individuals. The individuals differ because of:
When we take several samples from a population and group them, we expect the groups to vary. The key question to be solved is how much the differences are due to random or ephemeral factors vs. real differences.
Related questions are, How consistent are the differences? How big are the differences? How important? How likely are they to have arisen by chance?
In taking five samples from a population, we expect the samples to have different attributes such as the proportion of rises in each sample. In a population with 53% rises, we might expect that the normal random variability in the proportion of rises in successive samples is the square root of 53% x 47%/n, or approximately 1/2 divided by the square root of n. For a example of 500, that comes to approximately 0.02.
In comparing two such proportions with equal numbers of observations, we must take account of the fact that the variance of differences between means is about twice as great as the variation in an individual mean. Scaling down to the linear standard deviation, we expect a normal variation between proportions of groups of 0.03, when each group numbers 500 and when the true proportions are near 1/2). An exact formula for any two n and any proportions can be found in a statistics text, such as Snedecor, under "Comparison of Properties in Independent Samples," p. 124 of my 8th edition.
The difference between proportions from random samples thus will have a standard eviation of 0.03. The difference compared to this standard deviation is a normal deviate. Thus, about 5% of the time, we would expect that the difference between any two proportions from random population would e 6% or more.
There are 10 such comparisons we can make between five samples from a random population. And we would expect that it is at least a 50% shot to find a difference of 6% or more from the random average of 53% up, between the largest and smallest proportion in sampling from random and ever-changing and meaningless non-predictive and highly variable magnitudinous samples such as stock market moves classified by days of week."
4/27/2004
A Technical Note from
Stephen
Stigler
Dear Vic,
I am in Glasgow, where Adam Smith was both student and
teacher.
You ask how to explain the concept of variabiity. This is a
hard topic to explain to general audiences.
Essentially it is what I was trying to explain at the
beginning of the
chapter of
Stats on Table on regression
(persistent factors and transient
factors etc - I don't have the book here). You do quite
well. I do think
people like the idea you use that if you take two indep
measures X and Y on
the same quantity under the same conditions (cycle, season)
so they only
differ by random/unpredictable factors, then the ave diff
between them is
twice the standard dev of an X - and this average is
actually measured by
squaring (X-Y), averaging, then taking sq root. The analogy
with our work
on golf scores is that if X1 and X2 are the scores of the
same golfer under
the same conditions, the the expected value (or "average")
of (X1-X2)^2 is
twice the variance of the luck factor. And if X and Y are
the scores of
two different randomly selected golfers on the same day, the
the exp val or
ave of (X-Y)^2 is twice the variance of the luck+skill
factors combined. By
subtraction you get twice the variance of the skill factor.
Divide by two
and take sq roots to get sds.
Best wishes,
Steve