1. That's fine. I'm no expert either. I've taken two 'graduate level' machine learning courses at school and that is the extent of my knowledge. No reinforcement learning was discussed.
2. You're correct that we're guiding the algorithm to a given outcome, but won't that be the case in any situation where we argmax/min the cost function? I suppose max min of a function is different than learning from successive results. I'm not sure that what I'm suggesting is necessarily supervised learning because supervised learning provides the learning algorithm with a dataset of input and output pairs known to be correct in my experience. This would not be the case in my example. You're correct that what I've suggested doesn't actually learn from successive outcomes, so in that sense it is indeed not reinforcement.
3. I agree that modeling whether you get a fill is the responsibility of your backtesting application. That being said, I think in practice still steering my algorithm toward less latency sensitive opportunities may be useful.
I'm just feeling the concept out at this point, seeing how useful it might be. Do you know of any references that cover trading or timeseries learning?
I don't disagree that steering the solution towards the higher probability fills (less latency critical fills) isnt a bad idea. I do the same thing in my strategy by being conservative in my back-test fill model. As they say, there is more than one way to skin a cat (who the hell is out there skinning cats???).
As for references, I can't really give any... most of my training has been "on the job", and in Aerospace applications (optimal control, filtering, trajectory shaping, etc...), not trading. I'm toying with it for trading, but I'm not live yet. I know I saw a article on QuantStart about time series analysis, but I didn't read it.
Last edited: