I've been experimenting with running some intraday models to complement me and one of the problems I consistently have is my tweaking often leaves me stuck with no idea what I changed from one model to the next. This is actually a form of robustness because I delete everything and start again quite regularly when my results go crazy.
Anyway, I currently have a "series" table which can hold date/observation pairs associated with a series. Considering (ab)using this table to store model history as well. Would allow me to revert, compare and test trained models. However, models are only one part of it. The other part of it is the training dataset (preprocessing, normalization, etc). So perhaps each instrument model would have one or more associated characteristic:
* Instrument id
* Model
* Training data
* Test data
* Metrics
Creating tables isn't expensive or annoying for me, so I suppose I should simply have a Model table. The problem was likely to occur when looking at storing training data. But I just checked that and it looks like it's about 5MB per model according to pd.DataFrame.memory_usage() so I could compress it as I did before with tick data and make it nearly cost-free.
Not ready to commit to this yet, but just collecting my thoughts.
Anyway, I currently have a "series" table which can hold date/observation pairs associated with a series. Considering (ab)using this table to store model history as well. Would allow me to revert, compare and test trained models. However, models are only one part of it. The other part of it is the training dataset (preprocessing, normalization, etc). So perhaps each instrument model would have one or more associated characteristic:
* Instrument id
* Model
* Training data
* Test data
* Metrics
Creating tables isn't expensive or annoying for me, so I suppose I should simply have a Model table. The problem was likely to occur when looking at storing training data. But I just checked that and it looks like it's about 5MB per model according to pd.DataFrame.memory_usage() so I could compress it as I did before with tick data and make it nearly cost-free.
Not ready to commit to this yet, but just collecting my thoughts.