Fully automated futures trading

wopr · Jun 18, 2021

newbunch said:
Now that the commodities, especially the grains, are rallying back, the question is how many people got stopped out or had signal reversals only to see those commodities rally so strongly today? That happened to me in some but not all of my instruments.

I had one contract in soybeans and corn each, and yesterday after the close, system computed optimal positions as 0.791 and 0.984, so didn't sell any. However, I *strongly* considered interfering yesterday evening when the grains markets opened and selling. I read somewhere that this was a 7 sigma down day in soybeans which happens once every many years or whatever and it got me thinking, but I managed not to touch it.

KevinBB · Jun 18, 2021

Well, that was a nice week!

My small account size, makes me (relatively) a little bit less exposed to commodities / over exposed to currency and equities than others reading this post may be. Monday to Thursday weren't too bad, but when the Friday statement comes in it will show the biggest trend following down day since my system started last October or so.

I can't complain, though. Over the total portfolio (which is made up of mainly Aussie large caps and an index based ETF), unless the short term trend changes, this June looks like it is heading for the first down month since last September.

These volatile weeks are good for me, because they expose all the design faults / potential design faults in the way I have implemented Rob's framework. Its not the framework, but my implementation of it. The biggest lesson for me from this week is that I need to go back and look at how I've implemented buffering. The portfolio experienced quite a few whipsaw events during the latter part of the week, and I've put that down to buffering (or lack thereof) with a smaller number of contracts for each security.

Still working on that.

KH

globalarbtrader · Jun 22, 2021

Elder said:
I use ProcessPoolExecutor in concurrent.futures which is a more user friendly wrapper to multiprocessing.pool for wimps like me.

Quick update on this; I found that process pool actually slowed this code down! (by about an order of magnitude)

So basically I'm doing:

loop over dates
generate possible grid points for a given date

And then this code:

Code:

grid_possibles = list(itertools.product(*grid_points))

    if use_process_pool:
        with ProcessPoolExecutor() as pool:
            results = pool.map(
                neg_return_with_risk_penalty_and_costs,
                         grid_possibles,
                        itertools.repeat(optimisation_parameters)
                         )
    else:
        results = map(neg_return_with_risk_penalty_and_costs,
                      grid_possibles,
                      itertools.repeat(optimisation_parameters))

Since each evaluation is doing something fairly short and simple, I think the overhead of spinning up a new process for each grid point is far exceeding the benefits from parallel execution.

Process pool would probably be faster if I did the pool.map on each individual date, but because I'm using a trading cost penalty I need to evaluate them in date order, knowing what the positions were yesterday. So I can't do:

Code:

        with ProcessPoolExecutor() as pool:
            results = pool.map(
                find_optimal_portfolio_for_date,
                         list_of_dates,
                        itertools.repeat(optimisation_parameters)
                         )

What I did find *much* faster was ensuring the code that run the evaluation of each point was in a single file all by itself, rather than clumped together with loads of other stuff. I'm guessing that map creates a copy of the entire name space around neg_return_with_risk_penalty_and_costs for each call, which obviously will be much smaller if that function is in a file by itself. I found this out entirely by accident, having written all my research code in a single massive file and then when refactoring it into smaller files.... which goes to prove "Refactor then optimise" is the way to go.

GAT

Elder · Jun 22, 2021

globalarbtrader said:
Quick update on this; I found that process pool actually slowed this code down! (by about an order of magnitude)

So basically I'm doing:

loop over dates
generate possible grid points for a given date

And then this code:

Code:

grid_possibles = list(itertools.product(*grid_points)) if use_process_pool: with ProcessPoolExecutor() as pool: results = pool.map( neg_return_with_risk_penalty_and_costs, grid_possibles, itertools.repeat(optimisation_parameters) ) else: results = map(neg_return_with_risk_penalty_and_costs, grid_possibles, itertools.repeat(optimisation_parameters))

Since each evaluation is doing something fairly short and simple, I think the overhead of spinning up a new process for each grid point is far exceeding the benefits from parallel execution.

Process pool would probably be faster if I did the pool.map on each individual date, but because I'm using a trading cost penalty I need to evaluate them in date order, knowing what the positions were yesterday. So I can't do:

Code:

with ProcessPoolExecutor() as pool: results = pool.map( find_optimal_portfolio_for_date, list_of_dates, itertools.repeat(optimisation_parameters) )

What I did find *much* faster was ensuring the code that run the evaluation of each point was in a single file all by itself, rather than clumped together with loads of other stuff. I'm guessing that map creates a copy of the entire name space around neg_return_with_risk_penalty_and_costs for each call, which obviously will be much smaller if that function is in a file by itself. I found this out entirely by accident, having written all my research code in a single massive file and then when refactoring it into smaller files.... which goes to prove "Refactor then optimise" is the way to go.

GAT

Yes agreed. I have very limited used cases for overlaying multiprocessing onto my code and I use it sparingly. However, it has proven to be quite useful, as you have observed, when each process has to do some serious lifting.

The overhead of multiprocessing is explained very well here:

https://stackoverflow.com/questions...ool-slower-than-just-using-ordinary-functions

djames · Jun 22, 2021

globalarbtrader said:

Code:

grid_possibles = list(itertools.product(*grid_points))

    if use_process_pool:
        with ProcessPoolExecutor() as pool:
            results = pool.map(
                neg_return_with_risk_penalty_and_costs,
                         grid_possibles,
                        itertools.repeat(optimisation_parameters)
                         )

GAT

Hey Rob, I think what you are looking for is the chunksize param to "pool.map"

Code:

grid_possibles = list(itertools.product(*grid_points))

    if use_process_pool:
        with ProcessPoolExecutor() as pool:
            results = pool.map(
                neg_return_with_risk_penalty_and_costs,
                         grid_possibles,
                        itertools.repeat(optimisation_parameters),
                        chunksize=len(grid_possibles)/num_processes
                         )

Then as you say, each process will be fed with a large number of iterables, rather than spinning a new process for each iterable. Weirdly the default is chunksize=1, which is surely pants.

globalarbtrader · Jun 22, 2021

djames said:
Hey Rob, I think what you are looking for is the chunksize param to "pool.map"

Code:

grid_possibles = list(itertools.product(*grid_points)) if use_process_pool: with ProcessPoolExecutor() as pool: results = pool.map( neg_return_with_risk_penalty_and_costs, grid_possibles, itertools.repeat(optimisation_parameters), chunksize=len(grid_possibles)/num_processes )

Then as you say, each process will be fed with a large number of iterables, rather than spinning a new process for each iterable. Weirdly the default is chunksize=1, which is surely pants.

Thanks will try this.

Out of practice with this stuff! Been nearly 8 years since I was playing with AHLs massive research cluster....

GAT

globalarbtrader · Jun 22, 2021

djames said:
Hey Rob, I think what you are looking for is the chunksize param to "pool.map"

Code:

grid_possibles = list(itertools.product(*grid_points)) if use_process_pool: with ProcessPoolExecutor() as pool: results = pool.map( neg_return_with_risk_penalty_and_costs, grid_possibles, itertools.repeat(optimisation_parameters), chunksize=len(grid_possibles)/num_processes )

Then as you say, each process will be fed with a large number of iterables, rather than spinning a new process for each iterable. Weirdly the default is chunksize=1, which is surely pants.

Yeah that works well; I experimented with num_processes and anywhere between 4 and 16 does pretty similar; I used 8 since that is the number of cores I've got

GAT

Kernfusion · Jun 22, 2021

Not an expert on how to do it in Python, but in general yeah, if it spins\kills new process every time - that sound expensive, also copying large amounts of data to\from workers might overwhelm the benefits of parallel processing..
So if it's possible to pre-load some static data into every process and pass only small changing parameters for every next computation (perhaps also in bulk) and keep reusing the same running workers that might help..

Elder · Jun 23, 2021

globalarbtrader said:
Yeah that works well; I experimented with num_processes and anywhere between 4 and 16 does pretty similar; I used 8 since that is the number of cores I've got

GAT

If its working well you probably don't need to change anything but fwiw you can also pass at instantation the max_workers param to avoid the overhead of spinning up to too many workers as in:
with ProcessPoolExecutor(max_workers=cpu_count) as pool:

The optimal setting is trial and error though, if there is a lot of waiting to read/write data your optimal may be slightly higher than cpu_count.

globalarbtrader · Jun 25, 2021

Back to the drawing board

https://qoppac.blogspot.com/2021/06/optimising-portfolios-for-small.html

"This was a cool idea! And I enjoyed writing the code, and learning a few things about doing more efficient grid searches in Python.

But it doesn't seem to add any value compared to the much simpler approach of just trading everything and rounding the positions. And for such a hugely complex additional process, it needed to add significant value to make it worth doing.

In the next post I'll try another approach: using a formal 'static' optimisation to select the best group of instruments to trade for a given amount of capital."

GAT