Parameter optimization with GPU

BlackPhoenix · Dec 16, 2023

metalztrader said:
What loss function are you minimizing? Most likely it is wrong.
You probably just don't understand the parameters of your "system". Last year I thought I needed AWS and its compute to figure out something completely subjective within my model. I would have spent so much minimizing bullshit.
You can't just RL a 100 free parameters without finding your system is bullshit.

Lol, I'm not optimizing 100 free parameters

This is just simple matter of fact that even 20k parameter permutations takes long time to optimize.

BlackPhoenix · Dec 16, 2023

ph1l said:
I have found moving calculations from CPU (multithreaded C++) to GPU (opencl 1.2) can decrease execution times by a factor of 30 - 60.

But, porting to make good use of a GPU is not always that easy. For example, instead of three nested loops with sizes M, N, and O, one might be able to have one opencl kernel (function) with M * N * O work items (threads) where each work item produces the data from the innermost and surrounding loops. This can quickly use a lot or memory. And, it's not as simple as writing the opencl kernel because you have to get the necessary data into the GPU's memory (e.g., precalculate all possible indicator values) and map data in and out of the GPU as needed.

Another example where it might be tricky to use a GPU is if your optimization needs randomness (e.g., genetic optimization), you might want to port a good pseudorandom number generator to be available inside opencl kernels.

And when things don't work as expected inside an opencl kernel, it might be a lot harder to debug (e.g., no debugger available, and I haven't found printf statements to be reliable inside opencl kernels).

Yes you are right, it's not necessarily a trivial task to port CPU code to an efficient GPU implementation and doesn't automatically mean that running code on GPU is faster. However, I have 20+ years of professional experience writing code for GPUs so I know a thing or two about it

I think main thing is to understand the SIMT execution model and how that influences the performance.

I have been using George Marsaglia's "Multiply with carry" PRNG on GPUs in the past (e.g. for Monte Carlo integration), which is very efficient PRNG with decent random distribution, so that's probably my first go-to PRNG if I need one.

Quanto · Dec 16, 2023

BlackPhoenix said:
Even for the most simplest MT5 strategy it takes about 100 mins for single single forex pair to run 10 year walk forward analysis (4 year in-sample window, 1 year out-of-sample WF phase) on 18 core PC.

Are you doing this maybe in a scripting language?... :-)

Such things need to be done im fast low-level languages like C, C++, or even Assembler, IMO.

BlackPhoenix · Dec 16, 2023

Quanto said:
Are you doing this maybe in a scripting language?...
Such things need to be done im fast low-level languages like C, C++, or even Assembler, IMO.

This was with MQL5 for MT5 and yes C++ implementation is couple of orders of magnitudes faster. However, when you increase the number of instruments and optimization interval even C++ implementation takes awhile.

Quanto · Dec 16, 2023

@ph1l, you are a smart guy with very good programming skills, unfortunately on a wrong track, if I may say so

My advice: you rather should concentrate yourself just on creating and testing options strategies (just throughly/deeply studying & really understanding some well known options strategies is even much better), not the classical/usual stock-only strategies.

Options is the way to go in programmatic trading b/c you yourself can pre-define the max risk to take in the trade (ie. study the PnL diagram of options spreads and similar ones like this)...

And specialize on as few as possible, so that then you can fully concentrate on these few only, w/o getting distracted by the rest. The options field is broad and sometimes also complicated, but is very logical & mathematical.

And: there is no need to do such complicated & time consuming "industry standard" backtests like you do and did. Things are in fact much simpler...

So, don't waste your time with unreliable things...

BlackPhoenix · Jan 2, 2024

Looks like optimizing my stock database (1000+ stocks) even for a simple algo takes ballpark 24h which is unfeasible for iterative algo development. I could iterate on a single stock to cut the time down to couple of minutes, but that would massively undersample the domain, so I'm looking into adding OpenCL support to improve the performance by ~100x on a single GPU.

BlackPhoenix · Jan 28, 2024

OpenCL got me nice 60x performance boost with my weak 3 TFLOPs laptop GPU (vs multi-threaded CPU optimization on 8 cores). This combined with dynamic programming for walk forward optimization I got 340x speed-up (1y optimization window for 2y with monthly optimizations). This is quite nice for algorithm iteration that I don't have to wait that long to test on a larger dataset.

Quanto · Jan 28, 2024

@BlackPhoenix, what is being computed on the GPU via OpenCL?
I would like to test it too.

BlackPhoenix · Jan 28, 2024

I run algo backtesting on the GPU for N parameter permutations and calculate a trading score for each, from which I pick the parameters for the best result. I can now quite easily write algos to run on both CPU & GPU on my platform and use the same code for both without need for separate implementations. For example in this screenshot I run 100000 backtests per day for AAPL (1 year optimization window) for total of 24 days (once a month for 2 years), so it effectively runs 2.4M backtests in 1.36 seconds

Quanto · Jan 28, 2024

Just curious: why AAPL and similar titans? IMO there is no money to make with tiese giants as they have become colosses which can't make any big moves anynore due to their size.
I prefer Smallcaps with high volatility.

Anybody doing such calcs on the options chain tables to find good trades?