Quick pandas question

Lets say I have a data frame with options data. I would like to be able to quickly find an option on a given date with a given type, expiration that has delta closest to x. What’s the best way to do that?
 
Probably no help to you, but in the spreadsheet world, I have to
-- set up a table of all spreads (by expiry, by side, by strike)
-- then search on the deltas until I have one [abs[delta]] over my filter,
-- then back up on row.

The Pythonista process *might* be similar, with regard to forming a sub-table object for handy manipulation? Just guessing.
 
this will get the type and exp
optionsDF[ (optionsDF.type == 'CALL') & (optionsDF.EXP == '20170907') ]

closest delta you might need to use interp or take a look at http://xarray.pydata.org/en/stable/
it has nearest functions and N-dimensional functionality that would probably work well with options
 
this will get the type and exp
optionsDF[ (optionsDF.type == 'CALL') & (optionsDF.EXP == '20170907') ]
If I want it to work fast, would I want to add these fields to the index? I tried but it behaves quirky and, oddly enough, we don't have a python expert in house (at least none that I can talk to).
 
Last edited:
Expanding Rosy's code

targetDelta = 25
D = [ abs(x - targetDelta) for x in optionsDF[ (optionsDF.type == 'CALL') & (optionsDF.EXP == '20170907') ]]
D.index(min(D)

I have not actually tested this, but it is so simple it should work.
 
targetDelta = 25
D = [ abs(x - targetDelta) for x in optionsDF[ (optionsDF.type == 'CALL') & (optionsDF.EXP == '20170907') ]]
D.index(min(D)
Pandas has wonderful features for that - in fact, I can even do something like
Code:
cond = (opt['exp']=='20170907') & (opt['cp']=='C')
x = opt[cond][abs(opt['delta'] - targetDelta) == abs(opt['delta'] - targetDelta).min()]
However, my main issue is that when I am searching through a lot of options, it works pretty slow. So instead I did something like
Code:
opt = pd.read_csv('spy.csv')
opt.set_index(['date', 'exp', 'cp', 'strike'], inplace=True)
after that I can search for complete combinations very quickly:
subx = opt.loc[('20170903', '20170907', 'C', 15)]
but ranges of dates misbehave badly - it seem to do an outside join on everything instead on a single field, e.g. if i do:
Code:
subx = opt.loc[('20170903', '20170907', 'C', 15):('20170703', '20170907', 'C', 15)]
it will return every option that has a strike of 15 etc
 
Pandas has wonderful features for that - in fact, I can even do something like
Code:
cond = (opt['exp']=='20170907') & (opt['cp']=='C')
x = opt[cond][abs(opt['delta'] - targetDelta) == abs(opt['delta'] - targetDelta).min()]
However, my main issue is that when I am searching through a lot of options, it works pretty slow. So instead I did something like
Code:
opt = pd.read_csv('spy.csv')
opt.set_index(['date', 'exp', 'cp', 'strike'], inplace=True)
after that I can search for complete combinations very quickly:
subx = opt.loc[('20170903', '20170907', 'C', 15)]
but ranges of dates misbehave badly - it seem to do an outside join on everything instead on a single field, e.g. if i do:
Code:
subx = opt.loc[('20170903', '20170907', 'C', 15):('20170703', '20170907', 'C', 15)]
it will return every option that has a strike of 15 etc

The code I posted doesn't actually work, and it is too late to edit. This is why I should always test before posting.

Something like this should work, but maybe not that quickly:

targetDelta = 25
D = optionsDF[ (optionsDF.type == 'CALL') & (optionsDF.EXP == '20170907') ]
D2 = D.Delta
D3 = [ abs(x - targetDelta) for x in D2]
D[ D3.index(min(D3)) ]


One way to get around python slowness/flakiness when working with very large dataframes/tables is just to keep your options in a relational database and access
via SQL. I use cx_Oracle for python queries against an EOD options table (multi-source) of > 4mmm rows. Speed is quite good and complex self-joins, where-clause filters, etc. behave flawlessly.
 
Don't be scared by databases. You don't need to set up a MySQL/MariaDB server. I use SQLite all the time and it's great for organizing data. Instead of being server based, SQLite is file based. You have access to all the normal SQL syntax, table structures, primary keys, etc.

Best yet Python has very good bindings with SQLite.
 
I doubt a relational db would be an improvement speed-wise - pretty sure my data frame fits into memory and pandas is pretty efficient. I just don’t know how do implement it properly for specific types of searches
 
Code:
import xarray as xr
import pandas as pd

df=pd.read_csv('http://www.deltaneutral.com/files/Sample_SPX_20151001_to_20151030.csv')
sdf=df[ (df.type=='call') & (df.expiration=='10/16/2015') & (df.quotedate == '10/01/2015') &(df.underlying=='SPX')]
sxdf=xr.Dataset.from_dataframe(sdf)
sxdf.set_index(strike='strike', inplace=True)
sxdf.sel(strike=[621],method='nearest').to_dataframe()
 
Back
Top