Why Most Backtests Don’t Match Live Trading

Why Most Backtests Don’t Match Live Trading
Photo by Julian Rösner / Unsplash

Disclaimer these posts are for educational and/or entertainment purposes only. If you want more warnings and disclaimers check every footer on every page, or click here, it sends you to the same disclaimer if you're too lazy to hit [end] on your keyboard. These posts are not advice, trades you do are of your own, if we hint to try something, it's to test on a demo or at your own risk, we never guarantee profitability.


question mark neon signage
Photo by Emily Morter / Unsplash

Why Backtests Don't Match Live Trading?

So if you're an automated trader, or just getting started, the backtesting engine is one of your tools.

As a beginner, developing new strategies using tools like StrategyBuilder by NinjaTrader 8 was like a revelation. The manual strategies you traded are now automated… somewhat. You got stuck in the backtest trap of seeing it run and spitting out results. If it wasn't what you liked, you tried new parameters again and again to see if it improved. Think of it like rolling dice until you get the outcome you want.

From experience, this is dangerous and reinforces bad habits and expectations.

Bad habits: these come in the form of thinking that developing strategies with cut-and-dry builders with cookie-cutter templates makes trading easy. This likely forces you to lie to yourself in thinking that, given development is easy, thus trading will become easier. It blinds you into thinking that the backtest results will be the same in live markets.

You'll be in for a big surprise.

Bad expectations: to add onto the bad habits discussed earlier, bad expectations are another trap you can fall in. Expectations of riches, perfect fills and execution, expectations of edge, and similar results in live markets as they happened in the backtest.

Think of a backtest engine as a controlled environment in hindsight. You can go back in time and find out what price did and try to adjust your strategy to what the market did.

The flaw in this thinking is that, in that sample size of backtest, one foolishly assumes that those market conditions will persist and conform to their strategy parameters. This is curve fitting, aka self-induced delusions of grandeur.

Think of it this way: if you're practicing to hit a grand slam in batting practice against your pitching coach, you went 9 for 12 and hit it out. Do you actually think your results will be the same in a real game?

With live major league pitching, someone that the other team called in to pitch against you as a righty, they brought someone in that throws up-and-in cutters and down-and-away sinkers and is 10-0 against you. These are the live markets. You're likely not going to hit it out even if you're in a grand slam situation. Luck will have to come into play.

In trading, one doesn't rely on luck.

Foolish Fills

So NinjaTrader Strategy Analyzer is the backtesting engine. There are many settings you can tweak to get results. One of which is fill limit orders on touch. This looks cool if the market is perfect and does what you want. But in real life, limit orders guarantee price, not execution.

The market can miss your limit order as the orders have been filled in the central limit order book. Your limit just wasn't in time or was off by a few ticks.

In the Strategy Analyzer, the central limit order book doesn't exist. There are other backtest methods such as market replay that's a replay of past sessions with downloaded tick data, but that's a conversation for another time.

Anyways, the fill on touch is another bad habit/expectation because running a test with that option on disregards live market dynamics. You will always think limit orders will fill when this is not the case.

I can also add a bonus: you can game it to where limit orders fill on touch and you set your target as 1 tick to 4 ticks to sell. You can fake results and think that you'll make a million in 6 months live after testing these absurd settings.

Other platforms may differ, so play around with it, but be wary of such settings that override the reality of market conditions.

Next we'll discuss another issue: the disregard of slippage.

Sidestepping Slippage

So there should be a slippage option in your platform that enters slippage into the calculation of the test. For example, you can assume a 2–4 tick slippage to be fair and grounded in your tests.

Turning it to 0 or not implementing it is a bad move because your live results will be skewed to perfection. The markets are not perfect, especially in your case, the odds are stacked against you.

Thinking slippage doesn't exist is the same as believing limit orders get filled 100% of the time. No they don't.

Foolish mistakes that occur when testing and disregarding slippage show that you have an edge on paper, but when you introduce slippage, that edge deteriorates, and in live conditions the results are abysmal at best. Your simulated strategy actually loses in real market conditions.

Abstract blurred pink and blue light streaks
Photo by Trophim Laptev / Unsplash

Liquidity Illusions Simulated

Another fallacy that persists is that of assumed liquidity. With the prior errors listed above, if one thinks, “Oh it worked and made millions on 1 contract simulated, let's just increase to 100 contracts!”

Not only do you multiply your failure thinking this way, you learn nothing about how the market works. You assume that it is forever liquid, you will get filled at your price no matter what and no matter what size.

These trains of thought are reasons why simulations never translate well into live trading.

You're not going to get filled on 100 contracts at the price you want to enter.

You're likely going to get slipped on all, if not most, of that order, worse if it's a market order and you don't know how many limit orders you need to hit to get filled.

If you go in at the market and you want it at 10,000, you buy at market 100 long, and the book only has 100 buys and sells in the range of 3 points… well that's your slippage. You're going to get in at an average of 10,002 if you're lucky.

That's a lot of slippage for 100 contracts.

Furthermore, trading unrealistic contract sizes in the simulator? Good luck even getting approved to trade that much with your broker/FCM. They have position limits, and if you don't have the capital, you're stuck below your dream number.

The Fragile Curve Fit

I mentioned curve fitting. This is the process of optimizing heavily in a small data set and fooling yourself with that past data's near-perfect results, that when you go out of sample, the entire strategy collapses and the parameters only perform in that instance.

This is a form of fragility.

Don't be like this. Build systems that are anti-fragile and robust in all the conditions you developed it for.

Testing in small sample sizes of 1 day or 1 month? Why don't you try for a year? This is a better sample size than the quick results of a day. The longer the duration, the higher the durability.

In Closing

The above are the major reasons why backtests don't equate to live results.

Add on emotions, the need to intervene when your strategy should be automated adds another variable that skews your results. The change in market conditions affects the anti-fragility of the strategy if it was robust in the first place.

Few strategies survive various market cycles of low volatility and high volatility. In between those cycles, some optimizations take place.

When optimizing, it's best to not over-optimize where you end up curve fitting again. Optimization is another conversation for the future.

Well, that does it for my thoughts on why backtests are not equal to live results and the variables that cause such outcomes.

Bactests, show the potential cause. Live markets are the final effect.

Till then. Trade well.


Lastly

If you find these journal entries helpful, please consider subscribing here.

~Asymmetric_Vol



a close up of a plant
Photo by Matt Brown / Unsplash