L1 and L2 data stream question

ScoobyStoo · Sep 25, 2009

I have a question for those people that understand the nuts and bolts of market data streams. For reference, I'm using a Zen Fire feed and the main instruments I trade are the CME FX futures.

When the price or volume at the top of the book changes then this is obviously reflected in both the L1 and L2 data streams. However, I am sometimes seeing situations where these get wildly out of synch. For instance, a data message indicating a change of volume on the ask might sometimes arrive more than a second later on the L2 data stream than the L1 data stream.

Now, I would expect that in fast markets if there's any lag then it should occur equally on both data streams but apparently not. There's only one event occurring at the exchange to generate both these messages (the act of the posted volume being changed), so you'd assume the exchange's data feed technology would generate both the L1 and L2 messages at exactly the same time. Has anyone else noticed this, and if so, do you have any ideas as to why it might be occurring? I'm sure it's something obvious that I've missed.

Many thanks in advance for your thoughts...

P.S. I've posted this in the 'automated trading' forum as I'm assuming that it's only going to be algo traders such as ourselves who are going to be concerned by such things.

MGB · Sep 25, 2009

Quote from ScoobyStoo:
There's only one event occurring at the exchange to generate both these messages (the act of the posted volume being changed), so you'd assume the exchange's data feed technology would generate both the L1 and L2 messages at exactly the same time. Has anyone else noticed this, and if so, do you have any ideas as to why it might be occurring? I'm sure it's something obvious that I've missed.

Hopefully, that Rithmic guy can answer this question.

PocketChange · Sep 25, 2009

This is a frequent occurrence. The Level II data projects depth of market for 10 levels = 10 quotes, Level I broadcasts bid/ask/last. CME updates the feed every 14ms but many data providers filter and throttle the data you receive.

Keep in mind in fast moving markets the data is already history by the time you receive and process the updates.

Our solution was to build an "executable" price feed based off of the Level II data.

We calculate the averaged bid/ask price to fill 54 contracts (Our max order size) from the level II data.

This filters out much of the noise. Especially when Level 1 price reflects 1 - 5 available contracts. Less surprises and our fills 99% of the time are what we expected or better.

Quote from ScoobyStoo:

I have a question for those people that understand the nuts and bolts of market data streams. For reference, I'm using a Zen Fire feed and the main instruments I trade are the CME FX futures.

When the price or volume at the top of the book changes then this is obviously reflected in both the L1 and L2 data streams. However, I am sometimes seeing situations where these get wildly out of synch. For instance, a data message indicating a change of volume on the ask might sometimes arrive more than a second later on the L2 data stream than the L1 data stream.

Now, I would expect that in fast markets if there's any lag then it should occur equally on both data streams but apparently not. There's only one event occurring at the exchange to generate both these messages (the act of the posted volume being changed), so you'd assume the exchange's data feed technology would generate both the L1 and L2 messages at exactly the same time. Has anyone else noticed this, and if so, do you have any ideas as to why it might be occurring? I'm sure it's something obvious that I've missed.

Many thanks in advance for your thoughts...

P.S. I've posted this in the 'automated trading' forum as I'm assuming that it's only going to be algo traders such as ourselves who are going to be concerned by such things.

ScoobyStoo · Sep 25, 2009

Quote from MGB:

Hopefully, that Rithmic guy can answer this question.

Agreed. Jonathan (I think his handle is 'jjw') is a very helpful guy and certainly knows his stuff.

ScoobyStoo · Sep 25, 2009

Quote from PocketChange:

This is a frequent occurrence. The Level II data projects depth of market for 10 levels = 10 quotes, Level I broadcasts bid/ask/last. CME updates the feed every 14ms but many data providers filter and throttle the data you receive.

Keep in mind in fast moving markets the data is already history by the time you receive and process the updates.

Our solution was to build an "executable" price feed based off of the Level II data.

We calculate the averaged bid/ask price to fill 54 contracts (Our max order size) from the level II data.

This filters out much of the noise. Especially when Level 1 price reflects 1 - 5 available contracts. Less surprises and our fills 99% of the time are what we expected or better.

Well, given that the Zen Fire/Rithmic platform sells itself on providing unfiltered and unthrottled data I should be receiving everything that the exchange pumps out.

Even though there's more data messages being generated on the L2 data stream due to the additional depth of market, I still don't understand why messages relating to the top of the book should be out of synch. Both the L1 and L2 data messages should be generated synchronously by the exchange because they both relate to a single event within the exchange's matching engine. The only way L2 could lag so badly is if there was a cached backlog of L2 messages on the exchange waiting to be sent...and I just can't believe that the CME is using a platform that allows a backlog to build up that's big enough to take a full second to clear. I mean, 1 second is an eternity in the HFT arena.

PocketChange · Sep 25, 2009

Unless you are receiving 70 updates per second per quote you are receiving partial data. As far as i know CME only time stamps last trade.

For 6E you would be receiving 770 updates per second. if you are HFT in milliseconds you should colocate at equinix and get a fiber connection to the exchange.

What CME processes on their computers and make available to their licensed data providers is different then what you ultimately receive depending on the transport mechanism.

ie. RTD/DDE/COM.. The most efficient feed I've tested for Level II was actually Bloomberg DDE because they transmitted the data feed in an array.

Try monitoring the ES feed and calculate how many updates you receive in a one minute period. Record starting volume and ending volume and last trade size. Add it up and the difference will give you an idea of how many updates your missing.

Quote from ScoobyStoo:

Well, given that the Zen Fire/Rithmic platform sells itself on providing unfiltered and unthrottled data I should be receiving everything that the exchange pumps out.

Even though there's more data messages being generated on the L2 data stream due to the additional depth of market, I still don't understand why messages relating to the top of the book should be out of synch. Both the L1 and L2 data messages should be generated synchronously by the exchange because they both relate to a single event within the exchange's matching engine. The only way L2 could lag so badly is if there was a cached backlog of L2 messages on the exchange waiting to be sent...and I just can't believe that the CME is using a platform that allows a backlog to build up that's big enough to take a full second to clear. I mean, 1 second is an eternity in the HFT arena.

jjw · Sep 25, 2009

Quote from ScoobyStoo:

Well, given that the Zen Fire/Rithmic platform sells itself on providing unfiltered and unthrottled data I should be receiving everything that the exchange pumps out.

jjw: correct. we send everything, what you get will depend upon what you have registered with us programtically. if you only subscribed to best bid/ask then that's all you will get. if you subscribed for market depth updates then you will get all market depth updates. you can subscribe to both. however, if you are using a third party provider's software it might be the case that that software filters data. ninja, to my knowledge, does not filter data and gives you everything we give it. as a control, run R | Trader in addition to what you normally use.

Even though there's more data messages being generated on the L2 data stream due to the additional depth of market, I still don't understand why messages relating to the top of the book should be out of synch. Both the L1 and L2 data messages should be generated synchronously by the exchange because they both relate to a single event within the exchange's matching engine. The only way L2 could lag so badly is if there was a cached backlog of L2 messages on the exchange waiting to be sent...and I just can't believe that the CME is using a platform that allows a backlog to build up that's big enough to take a full second to clear. I mean, 1 second is an eternity in the HFT arena.

jjw: your logic is correct but your assumptions are not. it turns out, that when we began developing a feed handler for globex data (think cme) the frequency of best bid/ask data messages from the exchange did not match the frequency of the changes in the best bid/ask prices as seen in the order book. so we decided to ignore the best bid/ask messages published by the exchange and published our own best bid/ask messages based upon the changes to the order book. after a while, globex changed its data dsitribution platform and, i can confirm this later, abandoned publishing best bid/ask altogether. they only publish market depth updates so any best bid/ask prices must be derived from the order book.
now the question is, how can you get a lag between l1 and l2 best prices, which seems to be what you are reporting to have observed, given that the source of the l1 and l2 data is identical and published consecutively (by us) ?
my immediate response is that, except for processing issues on your machine, this could not be happening. perhaps your l2 caching theory is happening not at the exchange, but on your machine. perhaps the following is happening:
1. you have several order books open at the same time.
2. at the time you observe the lag, there are many more l2 messages sent to your screen than l1 messages.
3. the updates for l1 and l2 messages are in different threads so the l1 data, even received after some l2 messages, can get to the screen more quickly than older l2 data (which are in the (larger) l2 queue).
4. this would imply that your machine cannot keep up with the l2 data traffic.

please let me know if this seems reasonable. if this is happening to you then it is likely to be happening to others. next week i will discuss this with one of our developers to see if thread handling could cause what you report you see.

again, try running R | Trader and see if the price feed led turns from green to yellow when this happens. if so then definitely data is getting to your machine faster than it can process it.

with respect to market data rates we often get more than 30,000 message per second from globex sustained for a minute. i run R | Trader watching 1 order book (ES) and a quote board containing about 20 popular instruments. i can see that many times during a trading day my screen gets more than 1,000 messages per second.

NetTecture · Sep 26, 2009

Quote from jjw:
my immediate response is that, except for processing issues on your machine, this could not be happening

This is where I would point. Using NInja as I still do on some stufff. I have chart windows out of sync with each other sometimes by some seconds.

What front end do you use? Is it possible... that this simply is not doing a decent job?

ScoobyStoo · Sep 26, 2009

Yes, I'm using the Ninja API. I know that Ninja has serious UI performance issues due to their threading architecture...perhaps they also have performance issues with the threading for the L1 and L2 data streams. JJW's hypothesis definitely sounds reasonable.

I think what I'll do is this:

1. Subscribe to both the L1 and L2 data streams for a single instrument (let's go for the ES to get max data volume) using both the Ninja and R APIs.

2. Dump all the L1 and L2 events as they are raised into separate log files (one file for each API).

3. Run a file comparison utility over both the log files to pick up on any discrepancies.

4. Examine whether these discrepancies point to Ninja suffering lag on the thread dealing with raising the L2 events.

P.S. My machine isn't struggling with the load. With Ninja the CPU never usually gets above 40% and with R-Trader it hardly registers anything. I suspect, as you say, the issue is probably due to Ninja's multithreaded handling of the 2 datastreams.

NetTecture · Sep 26, 2009

Sounds more than reasonable. Whever Ninja is involved, I am reluctant to point fingers to anyone else in the first place

They are FAR too likely to have some brain dead design decisions or simply programming errors somewhere along the path to start with

With your approach you can acutally compare the data. I would not wonder if anything gets lost in the Ninja internal processing.

L1 and L2 data stream question

ET Sponsor