Using out-of-sample data as a tool

Strategies might be attractive on the surface but fail moving forward. Why? Here we discuss the qualities of effective trading strategies, and the many traps one should avoid.

Using out-of-sample data as a tool

Postby Overload » Wed Feb 22, 2006 2:51 pm

Trading systems are built using limited data in their backtests, and using the stocks from specific sectors. This is normal, and in fact helpful as it allows you to take full advantage of the characteristics of certain groups of symbols and how they performed during certain periods. But with the benefits of that focus comes the risk that alternate scenarios might be overlooked by your back test.

Testing your trading system in out-of-sample data is a great way to test its robustness in a large variety of alternate situations. Out-of-sample testing can mean testing your trading system in Evaluation Periods outside of the period in which it was created. For example, perhaps your trading system is built primarily on 2003-to-present. It might be worthwhile to see how it also performed during 1999 to 2001. Even if you decide that such situations won’t appear again soon, it might be helpful knowing how your system would hold up during a large market drop, or how it would compare during an extreme rally.

Likewise, testing your trading system against alternate sectors can give you an indication of how well it would perform in alternate situations. While it’s true that trading systems are almost always built to take advantage of a sector’s unique characteristics, if the trading system fails miserably when run against a sector of similar characteristics, a red flag is indicated.

Testing your trading system in out-of-sample data is a great way to not only see how robust it is, but to also give you an indication of what to expect from your trading system should market conditions change.

Pete
Overload
 
Posts: 2246
Joined: Wed Nov 30, 2005 12:14 pm

Postby rjay » Sat Aug 19, 2006 5:39 pm

I'm not really clear on how I would test a system on out-of-sample data. Right now I'm using 3rd Jan 2001 to 31st Dec 2005. If I wanted to test a system over the first 6 months of this year in SS, the only way I know how is to load the relevant signal and then click the 'Last 180 Days' tab to review the trades. Is that what you are referring to in your post ? In that case how would I run it against other sectors ?
rjay
 
Posts: 116
Joined: Wed Jul 26, 2006 6:51 am

Postby Overload » Sat Aug 19, 2006 10:16 pm

I think you're referring to the Daily Signals Listing that allows you to view your current buy and sell signals, as well as a list of those signals from the prior months. I was referring to a full back test and Detailed Analysis that one typically examines prior to even creating daily signals for a system.

Assuming you're starting with a OneClick Search, you would normally right-click on your result in the Combination Results Listing and select "Create Signal" to create the setup to start running the Daily Signals. Well also from that right-click menu, you can select "Create Multi-System". When you do this, you can create a Multi-System Setup that can be run via the Run > Run Combinations menu.

You can also Use the Setups > Trade Settings menu to set the Evaluation Period of your choice. After you've run the combination and it's displayed in the Combination Results Listing, right-click on it and select "Run Detailed Analysis". When that's complete, you can the select "View Detailed Analysis" to view the full back test information for that revised evaluation period.

If you wish to run against alternate sectors, use the Setups > Multi-System Setups menu, and then double-click on your new Multi-System Setup. You can then revise each "system" to use an alternate sector as desired for your testing.

I know the above has quite a bit of information, but the basic idea is that you'd want to test alternate time periods and sectors through a Multi-System and Detailed Analysis rather than through the Daily Signals Listing. A Detailed Analysis not only has a great deal more flexibility, but also contains a large amount of additional information.

Pete
Overload
 
Posts: 2246
Joined: Wed Nov 30, 2005 12:14 pm

Postby rjay » Sun Aug 20, 2006 8:03 am

I must be doing something wrong then, because when I right-click in the Combination Results window I don't get a 'Create Multi-System' in my popup menu ....
rjay
 
Posts: 116
Joined: Wed Jul 26, 2006 6:51 am

Postby Overload » Sun Aug 20, 2006 10:28 am

Okay, sorry about that. That's what I get for not verifying my steps while looking at the program.

A OneClick Search creates a Multi-System Setup automatically. Enter the Setups > Multi-System Setups menu, then look for setups beginning with "OC:". The remainder of the name will match the OneClick Setup that produced it. Find the one that you wish to do additional testing on, highlight it, then click the Copy button. Give the new Multi-System a name, then click OK to save it.

You can then Run it, change the Evaluation Periods, or change the Sectors as I mentioned in the prior message.

Again, sorry about those bad instructions. I'll definitely be more careful in the future.

Pete
Overload
 
Posts: 2246
Joined: Wed Nov 30, 2005 12:14 pm

Postby rjay » Sun Aug 20, 2006 1:30 pm

OK, got it now thanks.
rjay
 
Posts: 116
Joined: Wed Jul 26, 2006 6:51 am

Postby rjay » Wed Aug 30, 2006 4:40 pm

So, I have now had an Opportunity to try a few trading systems on out-of-sample data. One single-symbol system that had a decent return, with about 10 trades per year, generated only 2 trades in 2 years when tested on different data of the same symbol, and made a bad loss !!

When you run across a situation like that, how do you begin to improve it ?? The system seems obviously curve-fit, yet there was nothing worrying in the original systems performance tab - skew, Monte Carlo & others all looked fine. My initial test data was 2002 - 2005 so maybe that was not enough data for a single symbol ?
rjay
 
Posts: 116
Joined: Wed Jul 26, 2006 6:51 am

Postby rjay » Thu Aug 31, 2006 9:11 am

Looks like I can partially answer my own question ....

I've just done some more testing and I think my initial strategy was curve-fit because there were about 5 evaluation fields (consecutive losers, number of trades, etc.). I tried it with just two: Annual Return and Unrealized drawdown (important for futures) and then when I forward-tested the results were in line with the backtest. Phew :D

Any other comments welcome though ...
rjay
 
Posts: 116
Joined: Wed Jul 26, 2006 6:51 am

Postby Overload » Thu Aug 31, 2006 9:38 am

Glad to hear you may have solved your own puzzle. In the meantime, I had already written the following, which will hopefully still be of some use....
_________________________

Congratulations, you have just bumped into the ultimate challenge of technical analysis: choosing systems that work into the future. To be honest, it's fortunate you're exploring this now rather than while making live trades.

Unfortunately, there is no magic criteria that can be used in every case that will guarantee a system will work into the future. Every market, sector, symbol and system is different. But while it may be challenging, it is not hopeless. To say that it cannot be done would be to say that profitable system trading is an impossibility. And I can pretty much tell you that that is not the case.

Regarding your system, there may be reasons why it is working properly, and there may be reasons why it isn't working properly. Here are some ideas:

To get the most out of a system, traders typically focus their back testing on markets that are similar to the ones they'll be trading. Specifically, I see you focused your search on 2002 to 2005. Why not 1999 to 2001 as well? The reason may be that 1999 to 2001 were exceptional years that few believe will happen again anytime soon. So it almost doesn't make sense to put too much emphasis on those years. Instead, to get the most out of your system, it makes sense to focus on historical periods that more closely simulate what we'd expect to see in the next year or two, and that probably doesn't include the years 1999 to 2001. However, that isn't to say that the 1999 to 2001 phase couldn't happen. But it's a personal choice to decide that they probably will not, and to adjust their system trading accordingly.

You mention that there were only 2 trades in the out-of-sample data in which you tested. First, I would say that 2 trades might not be sufficient to really make a judgment. Perhaps, given more time in similar market conditions, the system would have been profitable in the end. Second, that "bad loss" should be a red flag. What conditions caused that, and can such conditions be minimized in the future? Perhaps the implementation of a stop loss? Or... perhaps you can say that the conditions were unique (i.e. year 2001), and have little likelihood of happening again. Finally, this should make you consider the choice of trading a single symbol system. When you put ALL your money into one stock at a time, there is little diversity. And diversity is one of the greatest ways to manage risk.

Without seeing the exact details of your system, I can't really add much about what indications there may have been about curve-fitting. StrataSearch forces us to put curve-fitting identification into mathematical terms, which I believe is better than leaving it to impressions or general feelings. But sometimes knowing what those mathematical terms are, or should be, is a question in itself.

While I understand the desire to not want to post your system up on the forum, you can certainly post pieces of it, or questions about specific areas. Getting input from others is, I think, one of the best methods for examining the weak areas of a system. There is a certain desire to turn a blind eye when evaluating our own systems. We want it to work, we want it to be profitable, we want it to be a good system. And to facilitate this desire, we minimize or justify its weaknesses. Others do not have this bias, and are therefore an excellent resource in helping us look twice at the areas we'd perhaps rather not.

I had no idea this was going to be such a long reply. Obviously it's a huge topic and there is a lot to discuss. Hopefully it's of some help.

Pete
Overload
 
Posts: 2246
Joined: Wed Nov 30, 2005 12:14 pm

Postby rjay » Thu Aug 31, 2006 9:48 am

Thanks Pete, that's very helpful.

It's interesting that you mention stop-losses because that is one thing I haven't yet figured out how to do in SS. I don't see it as an evaluation field, so how exactly do I set a stop-loss ?
rjay
 
Posts: 116
Joined: Wed Jul 26, 2006 6:51 am

Postby Overload » Thu Aug 31, 2006 12:37 pm

A stop loss in StrataSearch is implemented as a trading rule in the Exit String. So it really operates just like any other trading rule. For end-of-day stop loss processing, the formula is stopdown(). Additional information can be found in the Help, or at this link:

http://www.stratasearch.com/trading/formulas_stopdown.html

With the above EOD implementation, tests for whether a stop has been hit are only made at the end of each trading day. StrataSearch also allows you to test intraday stop losses. This is done with the formula GTCStop(). Additional information on this one can also be found in the help, or at this link:

http://www.stratasearch.com/trading/formulas_gtcstop.html

Naturally, when using a GTC Stop in real trading, you'll need to immediately place your good-till-cancelled stop order right after you've entered a position. Since StrataSearch only provides Signals on an EOD basis, the GTC order must be manually handled by you.

On any of the AutoSearch Setups, you can go to the Supporting Exit tab. There you will see a rule named "Long Stop Loss / Short Stop Gain". By default, this is included only in the "Generic AutoSearch Optimization", so it is currently implemented only during optimization phases of a OneClick Search. It does, however, provide an example of how it can be used.

Pete
Overload
 
Posts: 2246
Joined: Wed Nov 30, 2005 12:14 pm

Postby Barnabyj » Thu Aug 31, 2006 9:18 pm

I've never ever considered any mechanical system that didn't generate a large enough(?) number of trades to give me enough confidence in any testing (in sample or out of sample).

Anyway, this raises in my mind the question - when does a system fail - and other pitfalls going forward with a mechanical system? Any statistically minded folks on the forum might be able to help here.

Say we searched for and found a good system, did all the testing and started to trade. I know we could 'get a gut feel' that the system has failed/is failing if we see the trading performance is declining. But can we quantify when a system has failed (and therefore should discontinue using it)?

If our backtest results showed (say) monthly return/benchmark figures like 1.1,1.3,0.9,1.1,1,2....... couldn't this enable us to calculate a band of 'statistical significance' going forward, such that if we suddenly got a number that is a low outlyer (say, 0.7), or sequence of low outlyers - then statistically, the system has failed. We should then only use that system with caution, if at all.

Is all this practical?
Barnabyj
 
Posts: 118
Joined: Wed Aug 02, 2006 5:58 am

Postby Overload » Thu Aug 31, 2006 10:06 pm

One method I use to ensure a system is operating within normal parameters is maximum drawdown. For example, if the maximum drawdown in my back tests is 7.5%, my system will be on Red Alert if the drawdown ever exceeds that. If, on the other hand, my system is down only 5%, I rest easy knowing the system is still operating within normal parameters.

Pete
Overload
 
Posts: 2246
Joined: Wed Nov 30, 2005 12:14 pm

Postby Barnabyj » Fri Sep 01, 2006 12:20 am

Yeh, well that's the point - it sounds too much like a 'gut feel' system (I might hang in there, it might just come back...). What's the chances with a more mechanical approach? The lights red, not amber. Get out - now. [just for discussion Pete].
Barnabyj
 
Posts: 118
Joined: Wed Aug 02, 2006 5:58 am

Postby Overload » Fri Sep 01, 2006 8:27 am

Actually, I think that maximum drawdown is still a good mechanical method. I admit I made it sound like a "gut feeling" type of thing because there's still the question of what to do about it when it happens.

If a mechanical system ever begins to operate outside of the parameters in which it was back tested, and that includes exceeding the maximum drawdown, it is operating in uncharted territory. One then has to decide whether the parameters should be loosened or whether the system has entirely failed. I'm not sure if there's an across-the-board answer to that.

An alternative might be to give your system an extra 2% of play before pulling the plug. So, for example, if my system had a maximum drawdown of 7.5% in the back test, I might pull the plug if the drawdown ever exceeded 9.5%.

If one decided upon that 2% allowance prior to actually witnessing it, then that would remove the gut feeling. It would be fully mechanical.

Still just an idea.

Pete
Overload
 
Posts: 2246
Joined: Wed Nov 30, 2005 12:14 pm

Next

Return to Curve-Fitting and Other Pitfalls

cron