I have found moving calculations from CPU (multithreaded C++) to GPU (opencl 1.2) can decrease execution times by a factor of 30 - 60.
But, porting to make good use of a GPU is not always that easy. For example, instead of three nested loops with sizes M, N, and O, one might be able to have one opencl kernel (function) with M * N * O work items (threads) where each work item produces the data from the innermost and surrounding loops. This can quickly use a lot or memory. And, it's not as simple as writing the opencl kernel because you have to get the necessary data into the GPU's memory (e.g., precalculate all possible indicator values) and map data in and out of the GPU as needed.
Another example where it might be tricky to use a GPU is if your optimization needs randomness (e.g., genetic optimization), you might want to port a good pseudorandom number generator to be available inside opencl kernels.
And when things don't work as expected inside an opencl kernel, it might be a lot harder to debug (e.g., no debugger available, and I haven't found printf statements to be reliable inside opencl kernels).