Quote from gip3:
You aren't a friendly guy are you?
What does this have to do with parallel factorization algorithms and their relatively efficiency compared to CPU-only implementations? It does appear you like to throw jargon around, however irrelevant.
Okay - now we are getting to what I actually asked. And there are indeed plenty of papers on this subject. I take it you haven't read them? In that case, your contribution of "you can go google that" is noted and thanked.
I'm going to to go read those papers now.
Good luck. I am completly disinterested in matrices/factorization etc and it is not area I am interested in other than knowing how I would approach it if needed. And for that I need technical detail of the hardware which directly translates into how you write your software.
Do not be scared with technicalities. I was under the impression that it cannot be stated simpler on ET board than this and regardless what is the algorithm GPU programming is actually hardware you are programming to which is different than CPU jargon. I was just making the point that it can be done since you did not state size of your large matrix.