Database organization

you dont need a table, fuckwit. This is NOT RELATIONAL DATA. IT IS COLUMNAR DATA. Please can someone save me from this moron?

a properly structured table and a well structured database will actually help you answer all those questions. I have some idea of what you are trying to do and I can see the mistakes and the assumptions you are making.

even though MonetDB is a good system, it's not going to help improve your database structure. Garbage In, Garbage Out as they say. The column store makes sense when you have 1000 of columns in a single table. Is this going to be the case here ? seems not. Another option is to go for "key-value pair" database storage system, it's a very simple approach to database engine, basically you are building tables with only a key-value column and all data are accessed through that key-value pair. MonetDB does that in some ways, but only for referencing the thousands of columns you might have. The key-value pair database has basically 2 columns in its simplest form, it's normalization to the max, and it can be very fast. Another option is to go with an "Object Database", basically you are storing data cubes, that is complex data structure into the row and columns of a table.

hence the question remains, you need to build your conceptual data model first before you can proceed, this is the first thing you learn in database design.
 
I have not found time yet. Give me till the end of the upcoming weekend. Sorry for the delay.

No I havent, Is there a link you could direct me to? I've so far only been looking at CME options, but that would still be a good read. What do you think about this possibility of throwing everything on one table vs a table for each? I'm starting to like that idea myself.

@Butterfly, thanks again for your help. I'll post the data sample after school tonight.

@volpunter, you didn't find anything interesting on hat research paper discussion starter I sent you?
 
its on each option exchange site, a simple google search should suffice. Yes, there are clear rules as function of absolute dollar amount of traded underlying and also number of strikes among couple other metrics.

I remember reading the rule when I was preparing for my series 4 test... I'll see if I can dig it up.

All in one table? As in adding it as long as u get a quote for it ? Could work...
 
you dont need a table, fuckwit. This is NOT RELATIONAL DATA. IT IS COLUMNAR DATA. Please can someone save me from this moron?

retard, tables are not ONLY found in RDBMS, you clueless fuckwit

Column data as a group are the TABLES you retard,

god, you are not only a liar, a psycho, and a fairy tale artist but also an angry troll living in his mom basement
 
dude, you continue to talk crap, from start to end. Nobody is angry here. I am simply confronting you with facts of how you are wrong on most every part you talked about:
Facts ? don't confuse your insults and your mental projections with facts, fraud. You have already been exposed as a fraud in that other thread, claiming to know programming and then later acknowledging you never wrote a single line of code, professionally or otherwise.

volpunter said:
* I never claimed what you question me with. I said most popular data bases target various APIs in many languages so limiting yourself to a specific language or even thinking about languages at the beginning is totally meaningless.
My little clueless friend, when it comes to manipulating large datasets, APIs and languages do matter. Some are better equipped than others, some are easier to learn than others, and some are faster to program than others. But you wouldn't know that since you never wrote a single line of code in your whole life. Idiot.

volpunter said:
You said: "Java and C++ will make things more difficult for data manipulation on a database." -> you do this all the time: you throw some totally unsubstantiated and unproven arguments into the air and pretend as if they are fact. They are not.
That's because you have no programming and database experience, so of course it doesn't mean anything to you. Using C++ or Java, OOP for processing large dataset is a poor application of OOP. Not going to explain to you why again since you have no understanding of those concepts, so it would be a pointless exercise.

volpunter said:
* This thread has nothing to do with Python so keep python away from this thread (aside the fact that you made yourself look like a total idiot by claiming Python is used in large banks and hedge funds to run OMS (order management systems).
it has everything to do with the Python thread, everytime you read something that doesn't fit your projected static imaginary world, you go into these rage. Not the first time and you did it to plenty of people here. LOL.

volpunter said:
* Stop accusing me personally, accuse or criticize my points and arguments as I do.
who are you again ? some kind of divine authority now ? your points deserve every criticism, and yet you can throw criticism to others but we can't on yours. Are you real, psycho ?

volpunter said:
* "You have no programming and database experience". -> again? How do you know? I used Kx's KDB+, and I wrote my own binary data store that can query data even faster than kdb+ can (albeit with a lot less bells and whistles). I certainly have way sufficient programming and design experience given I created a full blown trading system architecture. Stay with facts here
hahaha LOL typical amateur response of a clueless tard.

volpunter said:
* By the way hos is VB related to Access? I see zero connection or overlap. Why would you mention them together? You probably meant VBA and Access? Even then VBA is completely unnecessary to write some relational database schemata. But the point is that we are dealing here with columnar data nor relational data. Hence you are off track anyways.
We are talking database design you dummy, not database engine store. Choosing a database engine store is not database design, you fraud. But how would you know that since you never designed professionally anything, let alone a database system or manipulation of large datasets in a programming language.

volpunter said:
Basement and Xbox360? Neither I have nor own. Can we stick to the topic?
of course you have one, you are an angry basement boy, they all have one.
 
As mentioned, I recommend you stick with kdb's 32bit free version (I have no clue about its limitations so you may need to check whether it suits your needs because I have only used the paid commercial version where I worked before). Else, consider some of the noSQL databases that may suit your needs. If that does not look promising to you then consider writing your own binary datastore.

Important at this stage is that you understand what you actually want to query because that drives the choice of database type. I have not stored options data so I cannot add much wisdom here. If you could provide information what and how you intend to query such data then there is a better chance I or others might be able to better help. But definitely stay away from any sql solution, they are way too slow and inflexible to handle queries of such type. Queries I could imagine are like ("give me all the specific put option contracts over the past month with strikes closest to the money (atm strikes), one for each day per underlying symbol", or, "retrieve the average bid/offer spread for all atm calls over the past week"). And remember, often times it is much better to delineate storage/retrieval and custom query engines. What I mean with that is to design and implement a targeted storage and retrieval solution and when you run queries such solution would load a subset into memory and your segregated query engine will produce the desired results. That is essentially how kdb does it and a number high profile columnar data base solutions. They split the workload into different tasks that are each highly optimized. The single biggest reason why sql solutions are so slow is because the data structure/schema is not optimized for time series queries (and yours is essential a time series problem) and also because queries are run server side. It makes for lazy implementations but they almost always cannot be optimized or at least brought up to desired speed and latency.

I am out of here, I really have no patience or inclination to argue with some of the retards in this thread. PM me if you like to discuss further. I have worked with many sql solutions in the past (mostly static data), columnar databases (commercial and open source) and also have written a customized binary data storage and query solution. Good luck!

I'm looking for some advice on what the best practices are for storing options and futures data. How do you folks store these instruments? I've been using data that I (quite literally with a USB flash drive) fetch from school, but that's not optimal, so I've decided to build a database. Here's my idea for the design:

1. Use the IQFeed symbol lookup from their C++ API to get all options for a specific security/contracts on a specific future.
....e.g. search for "index/equity options" on "SPY"
2. From the list that's returned, I check to see if each a table for each result on the list exists in my database. If not, create the table.
3. Loop over the list and store the data in the right table.

Concerns:
1. Relying on IQFeed's symbol lookup to catch every option or future. There's obviously some algorithm that determines when new futures or options are created and what their strikes/expirations/deliveries are. Where can I find these algorithms so that I can code them up and know what is created when instead of relying on the IQFeed symbol lookup?
2. Is it best to create a table for each option/future? Throw them all together in one table? Throw all of it in one big table? I imagine what I've designed will probably make the queries fastest of those three options, but I'm here to learn so tell me if I'm wrong.
3. The size of my database. This thing is obviously going to become HUGE. I figure I can't really know how huge until I try or someone tells me. I've been reading about different compression options from my database of choice and believe I'm using the right tools here, but maybe I will still need to trim back the amount of data I'm looking at. To be clear, I'm looking to database TRADES (not quotes) on S&P 500 equities and options, a few futures, and a few ETF/P/Ns. I really have no idea how big this is gonna get, so I'm just gonna try it and see. I'm prepared to invest in 4 or 5 TB hard drives every once in a while, but I don't have the means or the expertise to run a legitimate data server (I don't think I do at least, don't even know what the cost would be...if anyone has any estimates of what it will cost to scale this approach I'm all ears).
4. Backups. Simply put, I want to make sure all this data I'm working on capturing is safe. I'm prepared to pay a good amount for backup space, but again I worry it could become unmanageable as I scale.
5. Choice of databases. Currently kdb+ 32bit edition, I love it. I've heard anecdotally that you can address more RAM if you use multiple kdb+ processes? If anyone knows anything about that I'd love to hear. I've used other (SQL and noSQL) databases in the past but this is the fastest I've played with. I know the general consensus on EliteTrader is to store binary files, but I don't really know how to go about querying a bunch of flat files in the most efficient way, so either I learn to do that or stick with kdb+ 32 bit. Again, I'm open to advice there.

Thanks for your ideas and feedback.
 
well spoken, the python monster. Time for you to go back to text-parse and glue stuff together with Python, your language of choice, lol

I actually bet this is you, the same guy. Or you blatantly stole all his ideas and thoughts. Same exact verbiage regarding OMS through Python (I am still smirking when I think of it; which banks again ran Python OMSs? I forgot, please remind us).

LOL, you are starting to see monsters under your bed, volpunter, you are losing it, you frigging psycho !!!
 
Back
Top