I'm working through a book on ML trading and there's a concept I don't understand.
What does optimal training data for an ML trading model look like? Would it consist of just OHLCV values? Wouldn't you have to train the model on data that shows profit, since that is the ultimate goal of traders? How would you express that profit in a training set?
Strange question you pose...
Im not an expert in this topic in any way. I'm even leery to say I know enough to be dangerous because I don't think I do.
Anyway...
Unless you can reverse engineer the profits/losses as to the criteria used for a trade, you have to build your "model". Then you can add-in profit/loss and come up with with things that work. You can not see the car in front of you in your side and rearview mirrors.
If the only input you are using is OHLCV (aka market data), then for a start, you can calculate the first derivative which is usually a slope. Hey, Its a start.
Armed with a slope you can begin to build "something" based on similarity of slopes past. And ascertain other derivatives like duration, range, length, etc.
IN YOUR CASE, three "concepts" come to mind. One being price only, another being volume only, and third, a combination of volume and price.
If price-only, than you need to formulate some form of commonality that exists in all pricing... slope, S&R, range, length, sentiment, etc.
Volume-only is different. Unlike price, every volume measurement starts at 0! Beyond a numeric value, you are looking for geometrics and pace...
Peaks, troughs, shapes, acceleration, deceleration, etc. Again, you have to formulate based on only OHLCV and it's derivatives, because those are your only inputs. Then you can add-in the profit/loss and come up with with things that work.
Speaking of inputs, # of trades and open interest (where applicable) is missing from your "market data" inputs.
And there are many other inputs you can throw into your mix. Here's just a few, definitely non-exhaustive...
Official financial texts (dates, times, actual data vs data expectation, etc)
Geopolitical news and events
Specific sector news
Weather
Fundamentals (earnings, call transcripts, etc)
Cloud data. GoogleTrends, X, tiktoc, etc
And a few of the obvious... day of week, time of day, even/odd numbered year, political regime, # of ET members/posts, etc.
Hope that helps just a little bit.
Good luck