|
|
|
Daily Speculations |
![]() |
James Sogi
Philosopher, Juris Doctor, surfer, trader, investor, musician, black belt, sailor, semi-centenarian. He lives on the mountain in Kona, Hawaii, with his family. |
8/22/2005
Recursive Estimation Procedures for Diagnos, by Jim Sogi
Diebold, in Elements of Forecasting, suggests a procedure and some tests for model selection and testing parameters in the chapter on Recursive Estimation Procedures for Diagnosing and Selecting Forecasting Models for univariate series, in which he begins with a small data sample, estimates a model, adds an observation, and re-estimates the model, and continues until the sample is exhausted. The recursive residuals are computed one step ahead on out of sample data at a 95% interval forecast. He uses a a series of tests and a CUMSUM of the standardized residuals to learn about inadequacies of various models. this estimates out of sample forecasts using in sample residuals. The model is then tested on out of sample data more than one step ahead..
Why is this not curve fitting? Good models fit signals, not noise, because by construction, noise is unforecastable ... if it is it is not noise. The noise is what remains after accounting for component signals. Data mining expeditions, in contrast, lead to models that fit very well offer the historical sample, but do not predict well because they tailor the model to fit the idiosyncrasies of the in sample noise, which improve fit, but not predictive power. Presumably this can lead to what the right questions are to ask and the appropriateness of the parameters estimated.
R-Cran has the suggested tests:
F- test, R Squared
Schwarz Information criterion
Box.test(): Ljung -Box test used to determime the mull hypothesis for
independence in a given time series (stats)
dwtest(): performs the Durbin-Watson test for autocorrelation of
residuals (lmtest)
Akaike information criterion.
Charles J. Geyer
discusses model selection a bit differently than Diebold.
Is this a red herring or a worthwhile avenue of further research?
Phillip J. McDonnell adds:
"Why is this not curve fitting?" is a deep philosophical question underlying all of statistics. In essence everything we do is curve fitting in some form or other. Sometimes people will warn of the dangers of curve fitting. Although the dangers are real they usually come from one simple source - too many degrees of freedom. Too many degrees of freedom can lead to spurious results even if the curve you are fitting is a straight line.
The real issue is better understood of one thinks in terms of fitting too small a data set with too many variables. The key to avoid this is to have enough data and the most parsimonious choice of parameters possible. The tests to understand whether a given study has overfitted the data are: F-test, Schwarz Information Criterion (SIC), and the Akaike information criterion (AIC). It is notable that each of these three tests explicitly includes a correction for degrees of freedom.
In Diebold's sub-chapter on recursive diagnostics the process starts with a fit of k parameters to k observations. Obviously this fit has zero degrees of freedom and no statistical validity. But that's not the point! The point is to look at what happens as we add on more observation (k+1) to estimate the k parameters and then k+2 and so on. What we want to know is if and how the fitted parameters and residuals change as we add more data and as we vary the time frame. The purpose is as a diagnostic only, not as a way to fit a model per se. Note that Diebold makes no reference to F tests, SIC or AIC statistics because they aren't relevant to the diagnostic intent.
This sort of technique has applicability in models where the underlying model is known to change with time. In particular the vix index changes with time and appears to cycle in a semi predictable fashion. The time varying nature could be identified using the recursive diagnostic techniques suggested by Diebold. If one believes in everchanging cycles, one believes in time varying parameters.
c
Jim Sogi, May 2005