My next motherboard

prophet · Sep 25, 2004

Quote from damir00:
right. a system call, which will get executed based on a system time slice boundary. you see the problem, right? you're now dealing with 2 separate discretizations of time. this gets hairy in a hurry, and the problems are grossly magnified when running multiple processors.

If the windows kernel was waiting for time-slice boundaries (typically 10 to 15ms) then calls to queryPerformanceCounter would have a 15ms overhead. It is ridiculous to think that queryPerformanceCounter could have less resolution than getlocaltime!

This source http://www-106.ibm.com/developerworks/linux/library/l-rt1/ measured the overhead of queryPerformanceCounter at 2 microseconds. Linux gives less than 1us with gettimeofday(). So yes, Linux is better here, but not dramatically so. The previous link I posted measured 1.5 and 5.7us resolution for 933 and 333 MHz CPUs respectively. Overhead scales with cpu speed. Clearly the windows scheduler is switching immediately to the kernel for calls to queryPerformanceCounter. It is not waiting for regular time-slice boundaries. If it did it would be useless!

prophet · Sep 25, 2004

Quote from nitro:
My earlier answer to this was at best incomplete. Take a look at

http://msdn.microsoft.com/library/d...nosticsprocessclassprocessoraffinitytopic.asp

This is the .NET way. There are plain SDK ways to do it too.

Thanks for the link. Seems like microsoft is adding support for ccNUMA, automatically optimizing cpu <-> memory locality:

http://www.microsoft.com/whdc/driver/kernel/XP_kernel.mspx#EGAA

http://msdn.microsoft.com/library/d..._0dd92ce5-8ca1-4956-b7ad-8d8272239a93.xml.asp

damir00 · Sep 25, 2004

Quote from prophet:

...you are too quick to dismiss this as a phantom or some artifact of network latencies.

but that's not what i said. what i said was patterns of this type will appear due to comm issues whether or not there is a "trading" basis for them. in some times/places clustering clearly happens in and of itself (all one has to do is adjust timescale and see start and end of day are a high "clusters" of activity).

These patterns are consistent over time and across different symbols, including ES, NQ, 6E, ER2 and YM. They involve the timing and ordering of bid/ask depth/price changes, and trades reported.

More...

right, and that is exactly what one should be expect based solely on the communications issues. the "over time" bit should be especially worrisome. the trick is to filter the effect out and see what clustering is left over. some of your symbols are eCBOT - i assume you are aware data from globex and ecbot are presented quite differently, with the latter not actually being true tick data.

I am finding useful patterns with 1/100 second timestamp resolution and still I am only scratching the surface with this form of analysis.

More...

my point is that is exactly the timeframe you should expect to see clustering from current-tech communications latencies. again, i am not saying there isn't *also* trading-based clustering happenning as well, i am saying you can't tell whether or not it is happening unless you account for the communications stuff.

one easy but costly way to do this is to trade against the pattern you believe you see. a few tens of thousands of such ultra high speed trades will tell you very quickly how real the clustering is.

damir00 · Sep 25, 2004

Quote from prophet:

If the windows kernel was waiting for time-slice boundaries...

i am out of this part of the conversation. there is some serious trivialization going on of what is a very difficult problem, but i've said my piece, good luck with endeavour.

nitro · Sep 25, 2004

Quote from prophet:

Thanks for the link. Seems like microsoft is adding support for ccNUMA, automatically optimizing cpu <-> memory locality:

http://www.microsoft.com/whdc/driver/kernel/XP_kernel.mspx#EGAA

http://msdn.microsoft.com/library/d..._0dd92ce5-8ca1-4956-b7ad-8d8272239a93.xml.asp

Ja,

It's been in release for a while.

I am experimenting with setting thread affinity. It is really hard to measure performance gains...

nitro

prophet · Sep 25, 2004

Quote from damir00:
but that's not what i said. what i said was patterns of this type will appear due to comm issues whether or not there is a "trading" basis for them. in some times/places clustering clearly happens in and of itself (all one has to do is adjust timescale and see start and end of day are a high "clusters" of activity).

The structure I see is much more defined than what one would expect from a population of automatons with different latencies. Maybe I shouldn't have called them clusters. A better term is structure. Part of this is pairs consisting of a trade and a change in the last/bid/ask price/size. The ordering and timing seems to reveal how each trade acts on the order book, and how changes in order book generates trades, sometimes nearly instantly, sometime a little later. The particular structural assumption used by my model/filters has macroscopic implications towards longer term predictions. Again I am not sure if this is a Globex/ecbot or IB phenomenon. I am not the exchange so I can not confirm if the ordering I see is what really happend. This is what I know, and am willing to reveal here.

right, and that is exactly what one should be expect based solely on the communications issues. the "over time" bit should be especially worrisome. the trick is to filter the effect out and see what clustering is left over. some of your symbols are eCBOT - i assume you are aware data from globex and ecbot are presented quite differently, with the latter not actually being true tick data.

It is consistent over time, suggesting it is an exchange or IB generated phenomenonâ¦. not something due to particular latencies of market participants. Regarding YM, Iâve only checked the macroscopic models on YM, not examined the clustering directly for YM. So ecbot could be different here.

my point is that is exactly the timeframe you should expect to see clustering from current-tech communications latencies. again, i am not saying there isn't *also* trading-based clustering happenning as well, i am saying you can't tell whether or not it is happening unless you account for the communications stuff.

Yeah, I canât be sure of anything until I can obtain exchange-generated timestamps or latencies. Until then, this is just a consistent microscopic structure with macroscopic predictive ability.

one easy but costly way to do this is to trade against the pattern you believe you see. a few tens of thousands of such ultra high speed trades will tell you very quickly how real the clustering is.

Yes this would be quite informative.

prophet · Sep 25, 2004

Quote from damir00:
i am out of this part of the conversation. there is some serious trivialization going on of what is a very difficult problem, but i've said my piece, good luck with endeavour.

Iâm not trying to trivialize anything. Sorry you see it that way. I merely pointed out a paradox with what you are saying. If system calls are waiting for time-slice boundaries then the empirical accuracy tests would be showing much worse resolution. Any process that issued a lot of system calls per time slice would get blocked to hell. None of this is observed. Sorry if the simplicity of this argument offends you. I only want to get to the bottom of this accuracy issue.

This all assumes a single process not having to compete too much with other processes, or the kernel. Otherwise accuracy depends greatly on the timing algorithm and kernel scheduler.

Can someone please supply a link or official documentation on this, lest we trust the emprical tests for accuracy and overhead. Thanks

nitro · Sep 26, 2004

http://msdn.microsoft.com/msdnmag/issues/04/03/HighResolutionTimer/default.aspx

http://www.devx.com/SummitDays/Article/16293/1411/pdo/29FD2B1375EBAC824CE7A420FAC318F4:3835

nitro

prophet · Sep 26, 2004

Here is code that measures the overhead of various timer functions in windows:

http://developer.nvidia.com/object/timer_function_performance.html

On my dual Opteron I am seeing overheads of:

QueryPerformanceCounter: 238 ns
GetTickCount: 8 to 16 ns
TimeGetTime: 80 ns
Pentium internal timer: 7 ns

The Pentium internal timer has the least overhead because it is implemented in 4 assembly instructions without any system call. The system calls have higher, though not unacceptable overheads.

There is long-term drift with these timers. This is not a problem if one requires time differences between closely spaced events such as market ticks. Calibration techniques can be used to obtain millisecond or microsecond precision over minutes or hours. However, it is questionable why anyone would need even millisecond precision for elapsed times of more than a minute.

nitro · Sep 27, 2004

I will post my results later

nitro

My next motherboard

prophet

prophet

damir00

Guest

damir00

Guest

nitro

prophet

prophet

nitro

prophet

nitro