Daily Speculations The Web Site of Victor Niederhoffer and Laurel Kenner

Write to us at:(address is not clickable)

07-Nov-2006
A Primer on Correlation and Regression, by Victor Niederhoffer

Applying Regression and Correlation, by Jeremy Miles of the RAND Corporation and Mark Shevlin of the University of Ulster, illustrates the proper and pitfall-laden path that lead to the many beautiful and illuminating things that correlation and regression can accomplish . The book is written for psychology students without any training in calculus, and it contains simple examples and extensive commentary on the regression output from standard statistical programs such as SPSS. However, the applications for psychology are almost identical to those that would be used in markets, with such variables as industries substituted for classes and companies for individuals.

And what a wonderful array of applications and extensions this book contains. I found myself augmenting my knowledge or learning something new on almost every page, and I have read many dozens of books on this subject. There are great sections on how to code your data so that you can do categorical regression, categorical covariance , structural equation analysis. There is a very good section on how to go through all the steps of logistic regression with simple examples and calculations to show how the maximum likelihood solution is computed. There is a very fine discussion of the reasons that you should never use stepwise regression and why hierarchical regression is much better. There is a complete chapter on all the computational methods of measuring the individual contributions to the prediction and the influence of each independent variable and each observation in the regression.

One of the main themes of the book that hold everything together is that everything that can be done with the usual analysis of variance techniques can be done with regression, but that regression does so much more. While I had read this before, I had never seen such a clear exposition of how to code the data so that you can actually accomplish the transformation and always come up with the more complete and useful regression solutions to such problems.

Chapters in the book cover simple model building with regressions and correlation, multiple regression, categorical regression, assumptions in regression, issues in regression, nonlinear and logistic regression, moderator and mediator analysis, and multilevel modeling and structural equation modeling. Its amazing that after reading this book, one comes always with a good appreciation of how to accomplish all the bells and whistles that a researcher might wish to accomplish in all these fields. The chapter on multilevel modeling is particularly useful as it fits in naturally with the simple approach and groundwork of the previous chapters, and by the time you come to these not-often-used techniques, you have a feel for the extra information and utility and practicalities of actually employing such techniques. The discussion of power that the author gives as an aid to determining the proper sample size for one, two and multi variable regressions, with helpful and easily understandable charts was also crystal clear and highly illuminating.

I found at every stage of the book that I was thinking that regression should be used much more often in market work. The residuals that are routinely examined in regression -- checked for such things as outliers, skew, kurtosis, autocorrelation, changes in variance, degrees of predictability at various stages of the analysis, clustering, influence of individual observations -- should be subjected to exactly the same scrutiny by anomaly and system researchers that psychologists would examine in determining the appropriateness of their own sample and conclusions. The influences of path and intervening variables that the psychologist studies to find the true causes should be considered by any market researcher as he strives to get to the roots of any intermarket relation, or study of the influence of economic events on markets. The same necessity that leads social science investigators to use multilevel analysis, i.e. that individuals are clustered into classes, groups, areas or sexes are the same reasons that market researchers should use this analysis for companies which are clustered into industry groups, P/E groups, periods when the market went down or up. Indeed, once you read a book like this which tries to boil down all the advanced techniques of regression into a form designed to perform practical research, you'll be seeing applications everywhere, hierarchies galore in seasons, years, interest rate environments, sentiment levels, differences in the recent correlation of consecutive observations, indirect effects that should be taken into consideration, methods of reducing the number of variables, ways of measuring the improvement that adding variables to a prediction would provide, problems of multiple comparisons, methods for removing data points which lack independence, methods for handling differential mortality of companies that go bankrupt or merge, or were retrospective added or deleted.

It's a bonanza and a cornucopia. The authors style and personas in this book is that of two rather average scholars who have struggled with and solved by hard work many problems and opportunities that a student might have in dealing with regressions and correlations, and how they would guide others of a similar mind . While many good things devolve from this framework, they do leave the student a bit up in the air for some of the references and mathematics behind more sophisticated extensions of regression. Topics like the regression bias, validity shrinkage, reliability, alternate correlation measures, range restrictions, distribution theory, prediction intervals are well covered in a book by an expert statistician Philip Bobko in the book Correlation and Regression, which we'll review shortly.