Quick update on this; I found that process pool actually slowed this code down! (by about an order of magnitude)
So basically I'm doing:
loop over dates
generate possible grid points for a given date
And then this code:
Code:
grid_possibles = list(itertools.product(*grid_points))
if use_process_pool:
with ProcessPoolExecutor() as pool:
results = pool.map(
neg_return_with_risk_penalty_and_costs,
grid_possibles,
itertools.repeat(optimisation_parameters)
)
else:
results = map(neg_return_with_risk_penalty_and_costs,
grid_possibles,
itertools.repeat(optimisation_parameters))
Since each evaluation is doing something fairly short and simple, I think the overhead of spinning up a new process for each grid point is far exceeding the benefits from parallel execution.
Process pool would probably be faster if I did the pool.map on each individual date, but because I'm using a trading cost penalty I need to evaluate them in date order, knowing what the positions were yesterday. So I can't do:
Code:
with ProcessPoolExecutor() as pool:
results = pool.map(
find_optimal_portfolio_for_date,
list_of_dates,
itertools.repeat(optimisation_parameters)
)
What I did find *much* faster was ensuring the code that run the evaluation of each point was in a single file all by itself, rather than clumped together with loads of other stuff. I'm guessing that map creates a copy of the entire name space around neg_return_with_risk_penalty_and_costs for each call, which obviously will be much smaller if that function is in a file by itself. I found this out entirely by accident, having written all my research code in a single massive file and then when refactoring it into smaller files.... which goes to prove "Refactor then optimise" is the way to go.
GAT