1. Did you frame your earlier optimization problem using training set, validation set and testing set?
No. I don't use a 'validation set' per se.
2. Let's say if we have 10 years of data, what would be the best split for training set, validation set and testing set?
I always use expanding out of sample windows in annual chunks. So for the first year, we can do no testing. Then we fit a model based on year 1 data, and test it throughout year 2. Then we fit a model based on years 1 and 2 data, and test it on year 3. And so on. That means the split is always N year training set (where N is as large as possible without cheating) and 1 year test set.
We can tweak this with rolling windows, or some kind of exponential weighting where more recent data gets weighted more highly.
'3. For my explanation below, let's assume first 8 years of data to training set, 1 year of data to be validation set and last year of data to be testing set
4. Training set: Bootstrap (independent or block) for first 8 years of data (assume training set), with each bootstrap sample 10% of the training set. Find the optimal set of rule weights in each sample to max sharpe?
5. Validation set: Should the average optimal weights (across all samples) derived from training set be considered vis-a-vis the validation set e.g. Sharpe ratio in training set shouldn't differ from validation set by 5%. If it differs, find a set of weights with reasonably high sharpe ratio (not necessarily the highest) but doesn't differ from validation set by 5%'
'e.g. Sharpe ratio in training set shouldn't differ from validation set by 5%.': Hahaha, good luck with that! The inherent randomness of SR means that over 1 year even a perfectly good rule could easily end up with a completely different SR. You'd need to eithier make the SR bands very wide (i.e. over 1 year the 95% uncertainty range of a 0.5SR trading strategy is -1.46 to 2.46) or make the validation set decades long (which means you're wasting a huge amount of training data that isn't affecting the model, the more valuable data as well which is more recent).
'6. Testing set: Not used in model fitting but used in backtested equity curve?
Yes, I think so.
'Steps 3 to 6 repeated with expanding window.'
'My concern would be the validation set. How long should it be and should validation be set up like a slightly modified version of K-fold cross validation in machine learning problems? E.g. using K-1 folds of data but excluding data before validation set to prevent lookahead bias.'
I think the whole validation set idea here is flawed, because there is too much noise in the data. You are better off using the entire set of data that is in the past at a historical point, and doing a fitting process on that which is robust i.e. which accounted for the number of years of data and the noise.
GAT