Great analysis, but the class of problems known as "embarrasingly parallel" are those which require little or no communication between processing units.
An easy example of an embarrasingly parallel problem is 3D rendering. You can send the underlying data for the movie (with M frames) to every node in the (N-Node) network, along with instructions to process M/N frames each. Since the renderer only cares about the scene description file and not any of the other frames, there will be no further communication until the work is finished and the data is sent back to the control node.
As you can see, the embarrassingly parallel problem requires a minimal amount of inter-node communication.
The next class of cluster problems are those which require local communication. Once you get your film back on the control node, you may want to encode it into a compressed MPEG file. Since MPEG compression depends on one or more frames behind and in front of the current frame, some local communication is required. However when processing frame 1024, the compression algorithm is unlikely to need to reference frame 65535, so communication required is greater than in embarassingly parallel problems.
The worst case is when the parallel agorithm operating on node N needs to frequently communicate with ALL of the nodes in the network. These are the problems for which communication becomes critical. For some of these problems, a NUMA (Non-Uniform Memory Access) architecture will work. In the worst cases, a great deal of custom code is required to shoehorn a problem into a NUMA system. But given the limitations on CPU <-> Storage communications in massive systems, there is little else to choose from.
So, depending on the problem class, interconnects may or may not be the limiting factor. With dual port GigE cards selling for under $500 and 12 port GigE switches for $999, it is easy to build fast hub and spoke networks and even faster hypercube networks with commodity off the shelf hardware.
Regarding the Mac, I agree that few SGI customers will consider an iMac as an alternative. However, the G5 is a very compelling machine, particularly since SGIs are now using the same ATI chips (
http://news.com.com/2100-1010-1025324.html) that are available for Macintosh and PC systems. Some 3D animators are switching from their old multi-million dollar SGI systems to Apple G5s for design, and clusters of G5s or PCs for final rendering.
I'd love to see SGI take off again, I just don't see it happening. Apple is a highly erratic company, so I wouldn't place a big bet on them taking over the workstation market that is currently being offered to them on a silver platter. I am hearing the same kind of negativity from the traditional unix world about Apple and OS X that the same group offered towards Linux in the mid-90s. Notice that IBM's current Chariman was an internal linux advocate...
Quote from nitro:
Yes,
All that is bad news for SGI. However, I take exception to point 5 below.
The SGI superclusters are _way_ more powerful than "standard" (beowulf or otherwise) clusters. It is well known that the limiting process in clusters is the interconnect (assuming "embarrasing parallel" applications.) In the past, people have used things like Myrinet or the like, which is essentially a fibre interconnect offering about 2Gbit/sec of thoroughput.
The SGI are using NUMA technology as the "interconnect." The bandwidth of this interconnect is 2 orders of magnitude (100 times) that of the fastest "old technology," including everything you sited below in point 5.
Although NUMA is hardly new, SGI are the first to bring the technology together in a package made for massive parallelism and "out-of-the-box-don't-need-to-know-anything-to-use-it-and-start-getting-benefits-immediately."
These machines are being ordered by places that have the kinds of clusters that you are talking about - there is no comparison.
Point 6 has nothing to do with SGI. No one who in the market for an Altix is wondering whether they should get an IMac instead.
nitro