I started to use Polygon.io for my historical data and using their 5sec aggregates. I haven't paid much attention to the data quality until now and noticed these crazy high/low spikes in the data that appear all over the place, which is obviously a problem for backtesting.
Below is an example of TSLA (Feb 5th, 2024) for Polygon:
The left-most spike is at 11:44:55. When comparing to IBKR 5sec bars in TWS I don't see such spikes:
Besides the spikes the price data matches pretty close between the two. I first though it might have been an issue in my code that parses the Polygon data, etc. but when I examined the raw data from Polygon I saw this (for the left-most spike):
As you can see the high-value for timestamp "1707151495000" is over 6 points higher (184.14) than the surrounding high-values, so it's definitely an issue in the Polygon data stream.
This is not an isolated incident but I see these same spikes in pretty much every stock for various different days I have checked. They are all over the place.
So, is IBKR filtering the spikes somehow and these are real spikes in the price data, or is Polygon.io data corrupted?
Below is an example of TSLA (Feb 5th, 2024) for Polygon:
The left-most spike is at 11:44:55. When comparing to IBKR 5sec bars in TWS I don't see such spikes:
Besides the spikes the price data matches pretty close between the two. I first though it might have been an issue in my code that parses the Polygon data, etc. but when I examined the raw data from Polygon I saw this (for the left-most spike):
Code:
{"v":32874,"vw":177.9352,"o":177.8991,"c":177.925,"h":177.97,"l":177.8801,"t":1707151485000,"n":308},
{"v":23869,"vw":177.9224,"o":177.91,"c":177.94,"h":177.94,"l":177.9,"t":1707151490000,"n":155},
{"v":17460,"vw":178.0978,"o":177.9301,"c":177.9315,"h":184.14,"l":177.9194,"t":1707151495000,"n":174},
{"v":16172,"vw":177.9604,"o":177.934,"c":177.9564,"h":177.98,"l":177.934,"t":1707151500000,"n":225},
{"v":19644,"vw":177.9671,"o":177.9575,"c":177.96,"h":177.99,"l":177.95,"t":1707151505000,"n":222},
This is not an isolated incident but I see these same spikes in pretty much every stock for various different days I have checked. They are all over the place.
So, is IBKR filtering the spikes somehow and these are real spikes in the price data, or is Polygon.io data corrupted?