If not considering over optimization , what method would be most similar to live market?I use trade data but sometimes it’s so bad and “false” due to dark pools and late reported trades that I wish I had bid/ask data just to validate the trade data.
You’ll run into hundreds of problems that you will be solving, so what you will need will also depend on each problem, trading parameters and style, trading speed, etc.
Though there is no such thing as “backtesting more accurately”. Most “accuracy” results in overfitting and false alpha where you spend time tinkering and adjusting things that aren’t scalable and don’t matter in the grand theme of things. The market doesn’t provide “accuracy” and if you can’t make money without accuracy then you won’t make money at all. Unless you’re doing HFT and fighting with sophisticated bots and algos, which I don’t think you’ll want to.
If not considering over optimization , what method would be most similar to live market?
For constructing bars would quote data be wrong?For market orders I assume worst price during a specific bar, whether 1 sec, 15 sec, 1 min, or even 5 min, as I analyze strategies at different time intervals/bars.
This not only tunes out false alpha from bots and general price fluctuations and market inaccuracy, but also is similar to top traders who may react to price changes in seconds or minutes and need to instantly make a trade at the current price, without trying to improve that price further.
For limit orders, I only check whether the price reached my limit within the above bars, and make sure my volume is a % of traded volume.
For constructing bars would quote data be wrong?
Exactly why you should use a simulator. The quotes are not fake but yes they can be withdrawnNot sure, as I don't have or use that data (I have it in live trading, but not historically).
Generally I think it could be useful together with trade data, but on its own may not be as useful without volume and actual trades. We also know that bots post fake bids and can withdraw them at any time, so a quote may not always be valid.