Tesla Personal Supercomputer by Nividia

Quote from jprad:

Charging you extra for what amounts to an amateurish hack shouldn't be tolerated in something that's advertised for trading professionals.

I agree. But it still does what I want even with the hack so I don't care how it's done. I don't know that much about parallel computing anyway.

Quote from jprad:

Actually, there's a special term for this sort of problem and it's treated fairly well here:

http://en.wikipedia.org/wiki/Embarrassingly_parallel

Maybe my knowledge is too limited on this subject but I don't see how the example I gave is embarassingly parallel when the evaluation of one variable depends on the value of another variable.

Can you explain how to make something like the example I gave you run in multiple threads?

AS far as APS I understand that searching for patterns in 2 data files can run in parallel but the question is whether they can do a single search to run in paraller processing mode. In fact, all I do is a signle search at a time.
 
Quote from intradaybill:

Maybe my knowledge is too limited on this subject but I don't see how the example I gave is embarassingly parallel when the evaluation of one variable depends on the value of another variable.

No, your example is pretty straightforward, and it can be parallelized. But, it's easier to think of all this in terms of atomic functions. Let's start with:

a = f(b)
c = f(d)

Since the dependent variables in each, 'a' and 'c' are independent of each other their functions can be parallelized.

On the other hand, the sequence:

a = f(b)
c = f(a)

cannot be parallelized since 'a' has be be computed first because 'c' is now dependent on 'a.' (and no, 'y' in your example isn't dependant on 'x' in the same was as here since the value of 'x' is constant during the entire iteration of the inner loop.)

AS far as APS I understand that searching for patterns in 2 data files can run in parallel but the question is whether they can do a single search to run in paraller processing mode. In fact, all I do is a signle search at a time.

From a functional perspective why would the input to a function that searches for a cup w/handle pattern be dependent on the output from a function that searches for a head & shoulders pattern?

The only possibility is poor program design with the use of global variables almost always at the top of that list.
 
Quote from jprad:

No, your example is pretty straightforward, and it can be parallelized. But, it's easier to think of all this in terms of atomic functions. Let's start with:

a = f(b)
c = f(d)

Since the dependent variables in each, 'a' and 'c' are independent of each other their functions can be parallelized.

On the other hand, the sequence:

a = f(b)
c = f(a)

cannot be parallelized since 'a' has be be computed first because 'c' is now dependent on 'a.' (and no, 'y' in your example isn't dependant on 'x' in the same was as here since the value of 'x' is constant during the entire iteration of the inner loop.)

WTF you bozo retard, wiki freak.

Look at his example carefully:

x = 0.
y=0.
for i = 0 to 100
x = x+i
for j = 1 to 1000
y = x+2j
end
end

This translates to:

a = f(b) // b = i
c = g(a,d) // d = j


Calculation of c is dependent on a. This cannot be parallelized (easily).

Bozo...
 
Quote from vikana:

The biggest issue/problem with the Tesla architecture is that you have to re-design your software around their APIs. For some that's easy, but for many, it's probably not a good fit.

If your software already is highly distributed and parallel without lots of locking, cuda might fit. Otherwise, it's a bit project to support it.


Haven't done any real work with CUDA yet but I have read the documentation as well as visited their forum.

To gain a large performance boost you do need to parallelize your algorithm, but unfortunately that is not enough. Equally important are the memory access patterns of your algorithm. The GPU reads data in blocks and if your problem does not map to the access pattern it will need to synchronize which will slow things down a lot. For complex algorithms it seems this could become harder than making the algorithm parallel.

/Hugin
 
Quote from jimbojim:

This cannot be parallelized (easily).

Bozo...

Execute 101,000 threads in parallel with the following kernel:

y = (i*(i+1)/2) + 2j

Where:

i is an input from 0 to 100 and
j is an input from 1 to 1000

Analagous to pixel rendering for each x, y co-ordinate on the screen where the screen is 101 by 1000 pixels in dimensions i.e. perfectly suited to parallelism.
 
Quote from jimbojim:

WTF you bozo retard, wiki freak.

Look at his example carefully:

I did, but since you insist...
Code:
main()
{
  x = 0
  y = array[100]

  for i = 0 to 100
    x = x+1
    fork_thread(i, proc_x(y, i, x))
  end

  wait_thread(100)
  print(x, y[100])
}

proc_x(array y, int i, int x)
{
  for j = 1 to 1000
    y[i] = x+2j
  next

  return
}
Both fork_thread() and wait_thread() are OS dependant. A decent treatment can be found on wiki, but you don't seem open to that. So, here's one of the books that I've got, about 10 years old by now:

http://www.amazon.com/Win32-Multith...=sr_1_8?ie=UTF8&s=books&qid=1242728857&sr=8-8


Bozo...

Dipstick...
 
Back
Top