DFT/FFT on graphics hardware

Quote from lilboy716:

nitro..

SSE3, i believe has a FFT instruction.

its also available on AMD64 3500+ Venice, San Diego chips
lilboy716,

Reeeeeally? Can you provide me with a link? I tried googling it, but nothing came up for me.

nitro
 
Ok, I see. These are not FFT routines, but routines that are used by complex FFTs that make complex FFT routines faster.

http://www.ffte.jp/

some packages are available to take advantage of it..however, i have no experience with this at all. can't help you any further
Yeah, I am aware of these packages.

why do you need FFT?
The very article (pdf) you gave me above gives the reason!

nitro
 
Quote from nitro:
The very article (pdf) you gave me above gives the reason!

nitro [/B]
The code sequence above shows how to implement a double-precision complex multiplication using SSE2 only or with the new SSE3 instructions, where mem_X contains one complex operand and mem_Y the other; mem_Z is used to store the complex result and xmm7 is a constant used to change the sign of one data element.

Since the main speed limiter of this code is the number of execution uops (7 for SSE2, 4 for SSE3), <i><b><u>the new instructions can improve complex multiplications by up to 75%.</i></b></u>

On SPEC CPU2000, the compiler is able to use SSE3 to improve 168.wupwise by 10-15%.

Nitro...
is this 'across the board' improvement for CPU usage for ALL custom indicators that require a lot of math or am i out in left field for the reasons to implement this...

cj...

:confused:

__________________
HAVE STOP - WILL TRADE

If You Have The Vision We Have The Code
 
I doubt 95+% of standard TA indicators would benefit from any of these instructions.

nitro
Quote from EdgeHunter:

The code sequence above shows how to implement a double-precision complex multiplication using SSE2 only or with the new SSE3 instructions, where mem_X contains one complex operand and mem_Y the other; mem_Z is used to store the complex result and xmm7 is a constant used to change the sign of one data element.

Since the main speed limiter of this code is the number of execution uops (7 for SSE2, 4 for SSE3), <i><b><u>the new instructions can improve complex multiplications by up to 75%.</i></b></u>

On SPEC CPU2000, the compiler is able to use SSE3 to improve 168.wupwise by 10-15%.

Nitro...
is this 'across the board' improvement for CPU usage for ALL custom indicators that require a lot of math or am i out in left field for the reasons to implement this...

cj...

:confused:

__________________
HAVE STOP - WILL TRADE

If You Have The Vision We Have The Code
 
-most traders dont have SSE3 enabled CPU

-most trading software wont be compiled with SSE3 instructions enabled. unless you're writing your own, in nitro's case.
 
You can do a really fast FFT by programming a PLD to do it in hardware. The divide is just a shift with the FFT anyhow, that is why it's called the "Fast" Fourier. I don't see how any processor could be faster than that but I never made a comparison.
 
Sorry a little OT here, but relative...

I wonder how anyone who uses Fourier deals with non-staionarity as well as impulse events that are messy (imperfect) due to the constraints of Fourier.

I found wavelet transforms much more applicable to the type of time-serie traders tend to quantify.

Any comments as to why you choose Fourier over wavelet would be greatly appreciated, since I'm always trying to improve comprehension generally.

kt
 
Back
Top