I am trying to backtest using Options EOD data from ivolatility.com and there are some interesting data issues.
Let's take the particular case where the Bid / Ask is really wide :
Bid = $0.67, Ask = $2.63. Mean Price calculated = ($0.67 + $2.63) / 2 = $1.65
In such a case, the Mean Price calculated is wrong (or rather, far away from the realistic execution price one would get, which is better), and so is the IV imputed from this, and hence, the greeks, too.
But I (as a human trader) know just by looking at the Bid and Ask that they are too wide apart. I can compare the prices and spreads of the adjacent Strikes and I can guesstimate that this Mean Price is wrong!
So how is tackled in industry / academia?
What I want to do is use a machine learning model that considers the entire curve / surface and imputes the most-probably execution price for such options rather than just use the Mean Price. However, the rabbit hole seems deep : do I have to model slippage / liquidity, too?!
Raw data (omitting Date, ExpiryDate, etc; for brevity):
Strike = 20, Bid = 1.07, Ask = 1.24, Mean Price = 1.155, IV = 0.8221
Strike = 20.5, Bid = 0.65, Ask = 3.95, Mean Price = 2.30, IV = 0.5432 <-- wrong!
Strike = 21, Bid = 0.60, Ask = 0.76, Mean Price = 0.68, IV = 0.8156
Notice how the mean price for the two surrounding strikes is actually OK because the Bid/Ask spread is not that wide.
Let's take the particular case where the Bid / Ask is really wide :
Bid = $0.67, Ask = $2.63. Mean Price calculated = ($0.67 + $2.63) / 2 = $1.65
In such a case, the Mean Price calculated is wrong (or rather, far away from the realistic execution price one would get, which is better), and so is the IV imputed from this, and hence, the greeks, too.
But I (as a human trader) know just by looking at the Bid and Ask that they are too wide apart. I can compare the prices and spreads of the adjacent Strikes and I can guesstimate that this Mean Price is wrong!
So how is tackled in industry / academia?
What I want to do is use a machine learning model that considers the entire curve / surface and imputes the most-probably execution price for such options rather than just use the Mean Price. However, the rabbit hole seems deep : do I have to model slippage / liquidity, too?!
Raw data (omitting Date, ExpiryDate, etc; for brevity):
Strike = 20, Bid = 1.07, Ask = 1.24, Mean Price = 1.155, IV = 0.8221
Strike = 20.5, Bid = 0.65, Ask = 3.95, Mean Price = 2.30, IV = 0.5432 <-- wrong!
Strike = 21, Bid = 0.60, Ask = 0.76, Mean Price = 0.68, IV = 0.8156
Notice how the mean price for the two surrounding strikes is actually OK because the Bid/Ask spread is not that wide.
