Bogus data?

Here is some data collected today as the ES went to 1505.25 for the last time, then began going up. It was captured from IB's data feed, and it is a sequence of price and size events. This is the order in which they occurred.

Does anyone know why:

1. "last size" occurs twice

2. Sometimes one "last size" added to the previous "volume" will equal the next "volume," but not always.

Oh! And one very important question: What price did the "last size" occur at?

last price: 1505.500000, 0
last size: 198
last size: 198
volume: 1425299
bid price: 1505.250000, 1
bid size: 824
ask price: 1505.500000, 1
ask size: 52
bid size: 824
ask size: 52
last size: 103
volume: 1425402
bid size: 748
ask size: 20
last size: 1
volume: 1425416
bid size: 715
ask size: 513
last price: 1505.250000, 0
last size: 101
last size: 101
volume: 1425517
bid size: 606
ask size: 487
bid size: 573
last price: 1505.500000, 0
last size: 347
last size: 347
volume: 1425904
bid size: 561
ask size: 101
 
From what I understand, IB data feed does not provide every single tick; which can result in some strange trade sequences.

I know this to be true under the API toolkit, so I would also assume it is true under their traders workstation.
 
Quote from sanjeevb66:

From what I understand, IB data feed does not provide every single tick; which can result in some strange trade sequences.

I know this to be true under the API toolkit, so I would also assume it is true under their traders workstation.
I got the data from the API.
 
Figured out what to do: I'll watch the IB matrix when the market is slow, take notes, then compare with the data captured from the API.
I can tell what is happening on the matrix, when it
is slow, so I should be able to figure out what the sequence is in the API data by comparing with my matrix notes. I'll report back on the results. May try Sunday night.
 
The duplicate size problem is discussed here (IB login required):

http://www.interactivebrokers.com/cgi-bin/discus/board-auth.pl?file=/2/40099.html

Here is an expert from Richard King:

The IB datafeed is optimised to ensure that it keeps up with the market no matter how busy the market is.

To accomplish this, it effectively sends a price snapshot for each instrument at regular intervals. This interval seems to be about 300 milliseconds. For each of bid, ask, and last it compares the current price and size with the values at the last snapshot. If the price is different it sends both price and size. If the price is the same, but the size is
different it sends only the size. If both price and size are the same, it doesn't send either. If there have been any trades since the last snapshot, it sends the (accumulated) volume (so where the price and size haven't changed but there have been one or more trades, this can be detected from the increased volume).

A word of caution though: this is not an exact science. It would be nice if what I said in my post was an exact description of how it works, but you'll find odd things happening occasionally, such as a volume update without a prior size message where the increase in volume is not an exact multiple of the most recent size message, or multiple last price/size messages sent at the same time, or volume messages with a smaller volume than the previous one! But most of the time my description is accurate.

By the way, one gotcha is that when both price and size messages are sent (in a single TICK_PRICE socket message), TWS also sends the size again in a separate TICK_SIZE message, but the volume is correctly updated only once. I think the reason for this duplication is that before the version 2 TICK_PRICE message was introduced, it didn't contain a size field, so prices and sizes were always sent separately: if TWS didn't send the duplicate size, then programs that relied on the separate TICK_SIZE message would no longer work properly unless they were amended and recompiled.

This mechanism eables IB to know the maximum bandwidth required for each ticker, and hence for each customer (since the number of tickers is limited), and so it can size its servers to be able to cope with that load. If a market becomes very busy, it makes no difference because it will still only send an update three times a second or thereabouts, even if there have been 100 trades during that second. This avoids the problem that every other data feed seems to have, where the data will sometimes lag way behind the market at busy times (with every other vendor I've used, I've had occasions where the data could be anything up to two or three minutes behind the market).

There is an irritating side effect of this technique, which is that price movements between shapshots may not be reported at all: for example if the last price at snapshot 1 is 100, and then price moves up to 102 and then
back to 101 by snapshot 2, the price reported at snapshot 2 will be 101, and the 102 price will not be reported at all. This leads to occasional incorrect highs and lows of bars, but rarely by more than one tick: whether that is significant depends very much on the trading strategy used.

The above isn't a complete description, but it covers the basic mechanism.
 
Here is some data from this evening's session, right after connecting via the API. It appears there may be two "problems":

1. A volume increase without a "last size" event.

2. A volume increase with two "last size" events, one of which should be ignored, if the "volume" event is accurate.


The order of the fields is:

Timestamp Event Ticker Value AutoExecute (if applicable)
Code:
18:45:32:213    ask size        8       70
18:45:32:215    last price      8       1516.500000     0
18:45:32:216    last size       8       5
18:45:32:218    bid size        8       71
18:45:32:219    ask size        8       70
18:45:32:222    last size       8       5
18:45:32:226    volume          8       3548
18:45:32:228    high price      8       1516.750000     0
18:45:32:229    low price       8       1514.750000     0
18:45:32:231    close price     8       1515.500000     0
18:45:46:330    ask size        8       75
18:45:56:856    last size       8       1                   <- one last size
18:45:56:858    volume          8       3549                
18:45:56:859    bid size        8       70
18:46:07:855    ask size        8       95
18:46:19:606    bid size        8       71
18:46:40:330    bid size        8       70
18:47:01:116    volume          8       3550                <- no last size
18:47:01:118    bid size        8       69
18:49:08:351    last size       8       6                   <- one last size
18:49:08:387    volume          8       3556
18:49:08:428    bid size        8       63
18:49:56:142    last price      8       1516.750000     0
18:49:56:144    last size       8       2                   <- 1st last size
18:49:56:145    last size       8       2                   <- 2nd last size
18:49:56:146    volume          8       3558
18:49:56:148    ask size        8       93
18:50:19:893    ask size        8       98
18:50:26:893    bid size        8       68
18:50:33:395    last size       8       5
18:50:33:397    volume          8       3563
18:50:33:398    ask size        8       93
18:50:33:895    last size       8       10
18:50:33:896    volume          8       3573
18:50:33:898    ask size        8       73
18:50:38:398    bid size        8       69
18:50:39:647    ask size        8       84
18:50:40:147    bid size        8       74
18:50:46:370    bid size        8       75
18:50:58:401    ask size        8       85
18:50:58:650    last size       8       1                  <- one last size
 
If you just ignore the LAST_SIZE events and work from the VOLUME_EVENT, things will be fine. The cumulative volume is accurate.
 
Quote from dcraig:

If you just ignore the LAST_SIZE events and work from the VOLUME_EVENT, things will be fine. The cumulative volume is accurate.
Yeah, I think that is the only way.
 
Back
Top