Mar
5
Curve Fitting, from Newton Linchen
March 5, 2009 |
How can we avoid curve fitting when designing a trading strategy? Are there any solid parameters one can use as guide? It seems very easy to adjust the trading signals to the data. This leads to a perfect backtested system - and a tomorrow's crash. What is the line that tells apart perfect trading strategy optimization from curve fitting? The worry is to arrive to a model that explains everything and predicts nothing. (And a further question: What is the NATURE of the predictive value of a system? What - philosophically speaking - confer to a model it's ability to predict future market behavior?)
James Sogi writes:
KISS. Keep parameters simple and robust.
Newton Linchen replies:
You have to agree that it's easier said than done. There is always the desire to "improve" results, to avoid drawdown, to boost profitability…
Is there a "wise speculator's" to-do list on, for example, how many parameters does a system requires/accepts (can handle)?
Nigel Davies offers:
Here's an offbeat view:
Curve fitting isn't the only problem, there's also the issue of whether one takes into account contrary evidence. And there will usually be some kind of contrary evidence, unless and until a feeding frenzy occurs (i.e a segment of market participants start to lose their heads).
So for me the whole thing boils down to inner mental balance and harmony - when someone is under stress or has certain personality issues, they're going to find a way to fit some curves somehow. On the other those who are relaxed (even when the external situation is very difficult) and have stable characters will tend towards objectivity even in the most trying circumstances.
I think this way of seeing things provides a couple of important insights: a) True non randomness will tend to occur when most market participants are highly emotional. b) A good way to avoid curve fitting is to work on someone's ability to withstand stress - if they want to improve they should try green vegetables, good water and maybe some form of yoga, meditation or martial art (tai chi and yiquan are certainly good).
Newton Linchen replies:
The word that I found most important in your e-mail was "objectivity".
I kind of agree with the rest, but, I'm referring most to the curve fitting while developing trading ideas, not when trading them. That's why a scale to measure curve fitting (if it was possible at all) is in order: from what point curve fitting enters the modeling data process?
And, what would be the chess player point of view in this issue?
Nigel Davies replies:
Well what we chess players do is essentially try to destroy our own ideas because if we don't then our opponents will. In the midst of this process 'hope' is the enemy, and unless you're on top of your game he can appear in all sorts of situations. And this despite our best intentions.
Markets don't function in the same way as chess opponents; they act more as a mirror for our own flaws (mainly hope) rather than a malevolent force that's there to do you in. So the requirement to falsify doesn't seem quite so urgent, especially when one is winning game with a particular 'system'.
Out of sample testing can help simulate the process of falsification but not with the same level of paranoia, and also what's built into it is an assumption that the effect is stable.
This brings me to the other difference between chess and markets; the former offers a stable platform on which to experiment and test ones ideas, the latter only has moments of stability. How long will they last? Who knows. But I suspect that subliminal knowledge about the out of sample data may play a part in system construction, not to mention the fact that other people may be doing the same kind of thing and thus competing for the entrees.
An interesting experiment might be to see how the real time application of a system compares to the out of sample test. I hypothesize that it will be worse, much worse.
Kim Zussman adds:
Markets demonstrate repeating patterns over irregularly spaced intervals. It's one thing to find those patterns in the current regime, but how to determine when your precious pattern has failed vs. simply statistical noise?
The answers given here before include money-management and control analysis.
But if you manage your money so carefully as to not go bust when the patterns do, on the whole can you make money (beyond, say, B/H, net of vig, opportunity cost, day job)?
If control analysis and similar quantitative methods work, why aren't engineers rich? (OK some are, but more lawyers are and they don't understand this stuff)
The point will be made that systematic approaches fail, because all patterns get uncovered and you need to be alert to this, and adapt faster and bolder than other agents competing for mating rights. Which should result in certain runners at the top of the distribution (of smarts, guts, determination, etc) far out-distancing the pack.
And it seems there are such, in the infinitesimally small proportion predicted by the curve.
That is curve fitting.
Legacy Daily observes:
"I hypothesize that it will be worse, much worse." If it was so easy, I doubt this discussion would be taking place.
I think human judgment (+ the emotional balance Nigel mentions) are the elements that make multiple regression statistical analysis work. I am skeptical that past price history of a security can predict its future price action but not as skeptical that past relationships between multiple correlated markets (variables) can hold true in the future. The number of independent variables that you use to explain your dependent variable, which variables to choose, how to lag them, and interpretation of the result (why are the numbers saying what they are saying and the historical version of the same) among other decisions are based on so many human decisions that I doubt any system can accurately perpetually predict anything. Even if it could, the force (impact) of the system itself would skew the results rendering the original analysis, premises, and decisions invalid. I have heard of "learning" systems but I haven't had an opportunity to experiment with a model that is able to choose independent variables as the cycles change.
The system has two advantages over us the humans. It takes emotion out of the picture and it can perform many computations quickly. If one gives it any more credit than that, one learns some painful lessons sooner or later. The solution many people implement is "money management" techniques to cut losses short and let the winners take care of themselves (which again are based on judgment). I am sure there are studies out there that try to determine the impact of quantitative models on the markets. Perhaps fading those models by a contra model may yield more positive (dare I say predictable) results…
One last comment, check out how a system generates random numbers (if haven't already looked into this). While the number appears random to us, it is anything but random, unless the generator is based on external random phenomena.
Bill Rafter adds:
Research to identify a universal truth to be used going either forward or backward (out of sample or in-sample) is not curvefitting. An example of that might be the implications of higher levels of implied volatility to future asset price levels.
Research of past data to identify a specific value to be used going forward (out of sample) is not curvefitting, but used backward (in-sample) is curvefitting. If you think of the latter as look-ahead bias it becomes a little more clear. Optimization would clearly count as curvefitting.
Sometimes (usually because of insufficient history) you have no ability to divide your data into two tranches – one for identifying values and the second for testing. In such a case you had best limit your research to identifying universal truths rather than specific values.
Scott Brooks comments:
If the past is not a good measure of today and we only use the present data, then isn't that really just short term trend following? As has been said on this list many times, trend following works great until it doesn't. Therefore, using today's data doesn't really work either.
Phil McDonnell comments:
Curve fitting is one of those things market researchers try NOT to do. But as Mr. Linchen suggests, it is difficult to know when we are approaching the slippery slope of curve fitting. What is curve fitting and what is wrong with it?
A simple example of curve fitting may help. Suppose we had two variables that could not possibly have any predictive value. Call them x1 and x2. They are random numbers. Then let's use them to 'predict' two days worth of market changes m. We have the following table:
m x1 x2
+4 2 1
+20 8 6
Can our random numbers predict the market with a model like this? In fact they can. We know this because we can set up 2 simultaneous equations in two unknowns and solve it. The basic equation is:
m = a * x1 + b * x2
The solution is a = 1 and b = 2. You can check this by back substituting. Multiply x1 by 1 and add two times x2 and each time it appears to give you a correct answer for m. The reason is that it is almost always possible (*) to solve two equations in two unknowns.
So this gives us one rule to consider when we are fitting. The rule is: Never fit n data points with n parameters.
The reason is because you will generally get a 'too good to be true' fit as Larry Williams suggests. This rule generalizes. For example best practices include getting much more data than the number of parameters you are trying to fit. There is a statistical concept called degrees of freedom involved here.
Degrees of freedom is how much wiggle room there is in your model. Each variable you add is a chance for your model to wiggle to better fit the data. The rule of thumb is that you take the number of data points you have and subtract the number of variables. Another way to say this is the number of data points should be MUCH more than the number of fitted parameters.
It is also good to mention that the number of parameters can be tricky to understand. Looking at intraday patterns a parameter could be something like today's high was lower than yesterday's high. Even though it is a true false criteria it is still an independent variable. Choice of the length of a moving average is a parameter. Whether one is above or below is another parameter. Some people use thresholds in moving average systems. Each is a parameter. Adding a second moving average may add four more parameters and the comparison between the two
averages yet another. In a system involving a 200 day and 50 day
average that showed 10 buy sell signals it might have as many as 10 parameters and thus be nearly useless.
Steve Ellison mentioned the two sample data technique. Basically you can fit your model on one data set and then use the same parameters to test out of sample. What you cannot do is refit the model or system parameters to the new data.
Another caveat here is the data mining slippery slope. This means you need to keep track of how many other variables you tried and rejected. This is also called the multiple comparison problem. It can be as insidious as trying to know how many variables someone else tried before coming up with their idea. For example how many parameters did Welles Wilder try before coming up with his 14 day RSI index? There is no way 14 was his first and only guess.
Another bad practice is when you have a system that has picked say 20 profitable trades and you look for rules to weed out those pesky few bad trades to get the perfect system. If you find yourself adding a rule or variable to rule out one or two trades you are well into data mining territory.
Bruno's suggestion to use the BIC or AIC is a good one. If one is doing a multiple regression one should look at the individual t stats for the coefficients AND look at the F test for the overall quality of the fit. Any variables with t-stats that are not above 2 should be tossed. Also an variables which are highly correlated with each other, the weaker one should be tossed.
George Parkanyi reminds us:
Yeah but you guys are forgetting that without curve-fitting we never would have invented the bra.
Say, has anybody got any experience with vertical drop fitting? I just back-tested some oil data and …
Larry Williams writes:
If it looks like it works real well it is curve fitting.
Newton Linchen reiterates:
my point is: what is the degree of system optimization that turns into curve fitting? In other words, how one is able to recognize curve fitting while modeling data? Perhaps returns too good to believe?
What I mean is to get a general rule that would tell: "Hey, man, from THIS point on you are curve fitting, so step back!"
Steve Ellison proffers:
I learned from Dr. McDonnell to divide the data into two halves and do the curve fitting on only the first half of the data, then test a strategy that looks good on the second half of the data.
Yishen Kuik writes:
The usual out of sample testing says, take price series data, break it into 2, optimize on the 1st piece, test on the 2nd piece, see if you still get a good result.
If you get a bad result you know you've curve fitted. If you get a good result, you know you have something that works.
But what if you get a mildly good result? Then what do you "know" ?
Jim Sogi adds:
This reminds me of the three blind men each touching one part of the elephant and describing what the elephant was like. Quants are often like the blind men, each touching say the 90's bull run tranche, others sampling recent data, others sample the whole. Each has their own description of the market, which like the blind men, are all wrong.
The most important data tranche is the most recent as that is what the current cycle is. You want your trades to work there. Don't try make the reality fit the model.
Also, why not break it into 3 pieces and have 2 out of sample pieces to test it on.
We can go further. If each discreet trade is of limited length, then why not slice up the price series into 100 pieces, reassemble all the odd numbered time slices chronologically into sample A, the even ones into sample B.
Then optimize on sample A and test on sample B. This can address to some degree concerns about regime shifts that might differently characterize your two samples in a simple break of the data.
Comments
Archives
- January 2026
- December 2025
- November 2025
- October 2025
- September 2025
- August 2025
- July 2025
- June 2025
- May 2025
- April 2025
- March 2025
- February 2025
- January 2025
- December 2024
- November 2024
- October 2024
- September 2024
- August 2024
- July 2024
- June 2024
- May 2024
- April 2024
- March 2024
- February 2024
- January 2024
- December 2023
- November 2023
- October 2023
- September 2023
- August 2023
- July 2023
- June 2023
- May 2023
- April 2023
- March 2023
- February 2023
- January 2023
- December 2022
- November 2022
- October 2022
- September 2022
- August 2022
- July 2022
- June 2022
- May 2022
- April 2022
- March 2022
- February 2022
- January 2022
- December 2021
- November 2021
- October 2021
- September 2021
- August 2021
- July 2021
- June 2021
- May 2021
- April 2021
- March 2021
- February 2021
- January 2021
- December 2020
- November 2020
- October 2020
- September 2020
- August 2020
- July 2020
- June 2020
- May 2020
- April 2020
- March 2020
- February 2020
- January 2020
- December 2019
- November 2019
- October 2019
- September 2019
- August 2019
- July 2019
- June 2019
- May 2019
- April 2019
- March 2019
- February 2019
- January 2019
- December 2018
- November 2018
- October 2018
- September 2018
- August 2018
- July 2018
- June 2018
- May 2018
- April 2018
- March 2018
- February 2018
- January 2018
- December 2017
- November 2017
- October 2017
- September 2017
- August 2017
- July 2017
- June 2017
- May 2017
- April 2017
- March 2017
- February 2017
- January 2017
- December 2016
- November 2016
- October 2016
- September 2016
- August 2016
- July 2016
- June 2016
- May 2016
- April 2016
- March 2016
- February 2016
- January 2016
- December 2015
- November 2015
- October 2015
- September 2015
- August 2015
- July 2015
- June 2015
- May 2015
- April 2015
- March 2015
- February 2015
- January 2015
- December 2014
- November 2014
- October 2014
- September 2014
- August 2014
- July 2014
- June 2014
- May 2014
- April 2014
- March 2014
- February 2014
- January 2014
- December 2013
- November 2013
- October 2013
- September 2013
- August 2013
- July 2013
- June 2013
- May 2013
- April 2013
- March 2013
- February 2013
- January 2013
- December 2012
- November 2012
- October 2012
- September 2012
- August 2012
- July 2012
- June 2012
- May 2012
- April 2012
- March 2012
- February 2012
- January 2012
- December 2011
- November 2011
- October 2011
- September 2011
- August 2011
- July 2011
- June 2011
- May 2011
- April 2011
- March 2011
- February 2011
- January 2011
- December 2010
- November 2010
- October 2010
- September 2010
- August 2010
- July 2010
- June 2010
- May 2010
- April 2010
- March 2010
- February 2010
- January 2010
- December 2009
- November 2009
- October 2009
- September 2009
- August 2009
- July 2009
- June 2009
- May 2009
- April 2009
- March 2009
- February 2009
- January 2009
- December 2008
- November 2008
- October 2008
- September 2008
- August 2008
- July 2008
- June 2008
- May 2008
- April 2008
- March 2008
- February 2008
- January 2008
- December 2007
- November 2007
- October 2007
- September 2007
- August 2007
- July 2007
- June 2007
- May 2007
- April 2007
- March 2007
- February 2007
- January 2007
- December 2006
- November 2006
- October 2006
- September 2006
- August 2006
- Older Archives
Resources & Links
- The Letters Prize
- Pre-2007 Victor Niederhoffer Posts
- Vic’s NYC Junto
- Reading List
- Programming in 60 Seconds
- The Objectivist Center
- Foundation for Economic Education
- Tigerchess
- Dick Sears' G.T. Index
- Pre-2007 Daily Speculations
- Laurel & Vics' Worldly Investor Articles