GAT,
After a lot of thinking and a little bit of work, I was able to implement this tracking error minimization technique in Excel. Actually, it took less than 2 hours to create and troubleshoot the necessary sheets and write the Visual Basic code.
The only problem is that my so-called greedy algorithm is slow to run because Visual Basic is notoriously slow reading and writing from Excel and there's a lot of that going on (and because I'm working on a 5-year-old laptop). Fortunately, I can run this the prior evening, so it doesn't matter too much.
How long does it take the algorithm to run in Python? I'm sure it's faster, but how much faster? (If I'm reading your code correctly, I believe my greedy algorithm is a slower methodology but one more likely to get closer to the optimal solution, though I'm sure the difference would be small. Next step will be trying to figure out ways to improve the speed of the algo.)
It's a fraction of a second for my current use case: 103 instruments with ~$500K of capital on my production machine (32gb/i7 8 threads / 3ghz). I can run a full backtest with 100 instruments in a less than five minutes, so that's approximatly 10,000 optimisations (40 years *250) on my laptop (16gb/16 threads / 1.8ghz).
That solution is relatively sparse; with more capital there are more possible solutions and hence it would take longer.
GAT