My next motherboard

I posted this on software but as it is directly relevant to the hardware topics touched upon in this thread, I' ll also post the references in this place. Pretty interesting is that Linpack seems to be used for benchmarking.

In fact this evolution might very quickly progress into something totally unthinkable a short while ago. You remember the 66MHz anno 1994? Nitro, keep on going, the wind may blow into your direction.

http://www.internetnews.com/ent-news/article.php/3414721

The blue gene/L runs linux. (the /L stands for linux)
Read more about blue gene /L at:
http://news.com.com/IBM+details+Blue+Gene+supercomputer/2100-1008_3-1000421.html?tag=st.rc.targ_mb

Be good,

nononsense
 
64-bit Windows running on Quad Opteron:

======
Report file for timing the various timers.

*** Key number is the avg time.
The smaller this number, the faster the timer.


QueryPerformanceFrequency() freq = 0 1804000000


method 0:
QueryPerfCntr..() 100 times
tot: 0 37602
avg: 376.020000
avg time: 2.08437e-007
method 0:
QueryPerfCntr..() 500 times
tot: 0 184766
avg: 369.532000
avg time: 2.0484e-007
method 0:
QueryPerfCntr..() 1000 times
tot: 0 368402
avg: 368.402000
avg time: 2.04214e-007
method 0:
QueryPerfCntr..() 10000 times
tot: 0 3680579
avg: 368.057900
avg time: 2.04023e-007



method 1:
GetTickCount() 100 times
tot: 0 2276
avg: 22.760000
avg time: 1.26164e-008
method 1:
GetTickCount() 500 times
tot: 0 7209
avg: 14.418000
avg time: 7.99224e-009
method 1:
GetTickCount() 1000 times
tot: 0 13423
avg: 13.423000
avg time: 7.44069e-009
method 1:
GetTickCount() 10000 times
tot: 0 130595
avg: 13.059500
avg time: 7.23919e-009



method 2:
TimeGetTime() 100 times
tot: 0 13170
avg: 131.700000
avg time: 7.30044e-008
method 2:
TimeGetTime() 500 times
tot: 0 53613
avg: 107.226000
avg time: 5.94379e-008
method 2:
TimeGetTime() 1000 times
tot: 0 106387
avg: 106.387000
avg time: 5.89728e-008
method 2:
TimeGetTime() 10000 times
tot: 0 1060592
avg: 106.059200
avg time: 5.87911e-008



method 3:
Pentium internal high-freq cntr() 100 times
tot: 0 1223
avg: 12.230000
avg time: 6.77938e-009
method 3:
Pentium internal high-freq cntr() 500 times
tot: 0 4580
avg: 9.160000
avg time: 5.07761e-009
method 3:
Pentium internal high-freq cntr() 1000 times
tot: 0 8026
avg: 8.026000
avg time: 4.449e-009
method 3:
Pentium internal high-freq cntr() 10000 times
tot: 0 78559
avg: 7.855900
avg time: 4.35471e-009


Quote from prophet:

Here is code that measures the overhead of various timer functions in windows:

http://developer.nvidia.com/object/timer_function_performance.html

On my dual Opteron I am seeing overheads of:

QueryPerformanceCounter: 238 ns
GetTickCount: 8 to 16 ns
TimeGetTime: 80 ns
Pentium internal timer: 7 ns

The Pentium internal timer has the least overhead because it is implemented in 4 assembly instructions without any system call. The system calls have higher, though not unacceptable overheads.

There is long-term drift with these timers. This is not a problem if one requires time differences between closely spaced events such as market ticks. Calibration techniques can be used to obtain millisecond or microsecond precision over minutes or hours. However, it is questionable why anyone would need even millisecond precision for elapsed times of more than a minute.
 
I’ve attached my timer report. Your quad system probably gets lower overheads due to its higher clock speed. My CPUs run at 1.6 GHz. What CPUs are you using? 850s?

Yesterday I added 1 GB, 2x512MB to my existing 4x256MB (two per cpu). First I move 2x256MB from CPU1 to CPU0, then install the 2x512 on CPU1. Immediately I get a blue screen whenever running matlab with CPU0 affinity for more than a minute or two. Running matlab with CPU1 affinity caused no problem. What's strange is that the new DIMMs were installed on CPU1.

Now I had been getting “machine check” warnings since I got the machine, but never any blue screens or crashes. This is the error:

Event Type: Warning
Event Source: WMIxWDM
Event Category: None
Event ID: 106

I should have investigated these. I suspected some of the DIMMS were bad, removed 2x256 on CPU0 banks 2 and 3. Blue screens and machine check errors were eliminated. One of the DIMMS had some oxidation spots on the contacts and heat spreader. I believe some thermal paste dripped from CPU1 onto this dimm. I did not assemble this board. I’ll get the vendor to replace the DIMMs. It's interesting this DIMM worked for two months on CPU1, with very heavy use without ever crashing the system until I move it to CPU0.
 

Attachments

The very first number

QueryPerformanceFrequency() freq = 0 1804000000

tells you the frequency of the CPU, or 1.8 Ghz. These are then 844s. The 8xxs series Opterons support 4way+ and have an extra line on the HyperTransport over regular Opterons. I am not sure any of that makes any difference in this case. It is probably one case where CPU clock speed is all that matters.

My goal was to get the most bang for the buck now, and spend on dual cores when they came out to get in effect an 8-way system.

nitro
Quote from prophet:

I’ve attached my timer report. Your quad system probably gets lower overheads due to its higher clock speed. My CPUs run at 1.6 GHz. What CPUs are you using? 850s?

Yesterday I added 1 GB, 2x512MB to my existing 4x256MB (two per cpu). First I move 2x256MB from CPU1 to CPU0, then install the 2x512 on CPU1. Immediately I get a blue screen whenever running matlab with CPU0 affinity for more than a minute or two. Running matlab with CPU1 affinity caused no problem. What's strange is that the new DIMMs were installed on CPU1.

Now I had been getting “machine check” warnings since I got the machine, but never any blue screens or crashes. This is the error:

Event Type: Warning
Event Source: WMIxWDM
Event Category: None
Event ID: 106

I should have investigated these. I suspected some of the DIMMS were bad, removed 2x256 on CPU0 banks 2 and 3. Blue screens and machine check errors were eliminated. One of the DIMMS had some oxidation spots on the contacts and heat spreader. I believe some thermal paste dripped from CPU1 onto this dimm. I did not assemble this board. I’ll get the vendor to replace the DIMMs. It's interesting this DIMM worked for two months on CPU1, with very heavy use without ever crashing the system until I move it to CPU0.
 
Quote from nitro:

The very first number
QueryPerformanceFrequency() freq = 0 1804000000
tells you the frequency of the CPU, or 1.8 Ghz. These are then 844s.
Doh! I missed that.

My goal was to get the most bang for the buck now, and spend on dual cores when they came out to get in effect an 8-way system.
Smart choice.
 
I unleash CPUs operating at 1 BILLION operations a second

My $60 Celeron does 2.4 BILLION operations a second (actually more, because it can do some operations in parallel)
 
Ok,

Just finished colocating the Tyan. It was an excercise in patience.

I had not opened the package that had my rack rails in it. I soon discovered while I am at the colocation center that Tyan had put in a 1U rack kit instead of a 2U kit. In theory this does not matter, but apparently it does. The included screws did not fit my case, so I had to go running all over the place looking for screws that fit. I finally found some at a local Ace Hardware.

I had to be careful because I could not find screws smaller than 3/8's in length with the required width and I needed to make sure that I was not going to run a screw into anything inside the case. All went well except for one of the rear holes on the rack. For some unknown reason, we could not get this screw in. Probably the hole was simply too small and it kept stripping the screws. Fortunately, we had attached enough of the rail screws in to the case so that not fastening this one mattered much.

Although I am very sastisfied with the Tyan, they can be sloppy with their Q/A and with the case in general.

nitro
 
Speaking of Q/A and attention to detail, all motherboard manufacturers, Tyan included, need to document BIOS settings much better. The manual that came with my Tyan K8W MB was a whole 63 pages long. Only two pages were devoted to node/bank interleaving and ECC settings, neither of which are explained in any depth. They need to devote a whole page to each BIOS setting, especially the memory and HyperTransport stuff, describing performance, stability and compatibility issues for each setting. Otherwise we are stuck wasting our time benchmarking these settings. Testing for stability differences can take a long time.

Nitro, do you remember your MB’s memory configuration: bank interleave and node interleave settings? The choices for both are disabled or auto. I have bank interleave set to auto, node interleave set to disabled. Node interleave will interleave memory addresses between CPUs, defeating NUMA. I get better performance with this disabled (NUMA enabled).

Tyan’s FAQ http://www.tyan.com/support/html/f_s2885.html alludes to these settings, but never actually documents them:
The second way is by the interleaving by page of all processor "memory. For example, a page from processor 0's memory, followed by a page from processor 1's memory and etc. The interleaved mechanism (option #2) has better overall uniformity of latency but the concatenation mechanism (option #1) is more usable for an operating system capable of implementing memory affinity management.
Apart from this I can’t find any better explanation of these settings. Have you seen any?
 
All of my settings are the default settings. Using 64-bit Operating Systems, I get tremendous thoroughput by comparison to other machines it's class. But if I run 32-bit Windows, this machine is average by today's standards.

As I mentioned earlier, you not only have to have NUMA turned on in the BIOS as well as the correct interleaving of memory, but if you are runnig windows you have to turn on PAE (Physical Address Extension).

nitro
Quote from prophet:

...Nitro, do you remember your MB’s memory configuration: bank interleave and node interleave settings? The choices for both are disabled or auto. I have bank interleave set to auto, node interleave set to disabled. Node interleave will interleave memory addresses between CPUs, defeating NUMA. I get better performance with this disabled (NUMA enabled).

Tyan’s FAQ http://www.tyan.com/support/html/f_s2885.html alludes to these settings, but never actually documents them:
Apart from this I can’t find any better explanation of these settings. Have you seen any?
 
I am installing gentoo linux distro now onto an old machine in order to gain experience installing it and optimizing it before I install it on the Tyan. I can tell you that installing gentoo is definetly not for beginners. Relatively easy for me, but man this better be WAAAAY superior than my old friend FreeBSD considering all this effort.

The reason for gentoo is to see if I can get every ounce of performance out of the machine, and linux in general because I cannot wait around for a retail 64-bit version of Windows.

I am going to experiment with several "linux" distro's to see which one gives me the best SMP performance. Really, what I need is the best I/O (networking) performance and the best kernel.

I have been away from *inux systems for so long that I feel a little bit out of touch.

nitro
 
Back
Top