Hello All
As I get more into technical Trading, I find that the Retail tools for Trading do not cover the entire user requirement. I find that back-testing & optimizing multiple scenarios in multiple ways is really not possible in these systems or is just too slow.
One major problem I find is that if one is running multiple simulations on the same data set or permutation of the same data set, the software tends to query it again & again from the data providers server.
It was quite shocking to me that one could not buy the historical data one time & update daily from real time data providers and have a local data store to run queries at wire speed rather than be limited by your internet speed, especially if you are using more than one machine for simulation, back-testing & optimization.
Maybe this sort of thing is possible with Non-Retail Softwares.
What I envisage is something like this:
- A Data Storage Server which will take offline data of any/all symbols (one time) and then query live data provider for tick data on each symbol live and store on a daily basis.
-Preferably Open Source. May be free or Paid, again preferably free or GPL/Community Effort.
- Preferably an Appliance/User Friendly machine or User Server running Linux & RDBMS like postgreSQL. User Specified OS/RDBMS may be possible if software is built in RDBMS triggers & multi-platform languages like python/perl/java etc. exclusively.
- With capability to load TICK Historical Data from a wide variety of file formats, downloaded online or from CD/DVDs.
- Historical Data of an entire Exchange (all scrips) or all Symbols to be traded, to be loaded in Tick Format.
- Capability to clone/split the incoming data stream in real time to two threads, one to store the data and second to provide data to any local user, if wanted raw/unprocessed on tick by tick basis.
- Data Storage Server to have Data Transformation procedures as triggers to build downstream data, as required by various client applications e.g. 1/3/5/15/30 minute bars or 20/60/1500 tick bars etc. and this data to be available live & independent of the base tick data from which it is derived without any online data transformation, for speed issues.
I am assuming that speed would be vital & hard disk space merely a commodity. One has to be aware however that years of historical data, kept with multiple permutations, in populated views, alongwith required indexes and other rdbms system generated data, would make for one hell of a storage management requirement.
- Server to have Views, preferably populated views (to be rebuilt each weekend), so that recursive queries are avoided for speed.
- Data Transformation Procedures should have method for rarely used data type say 20000 tick bars to be dropped on admin/mgmt confirmation and create new data/bar type on user software demand after confirmation from mgmt. Disk Space Mgmt issues
- Data Server to have pre-defined data cleansing procedures for clearing out bad ticks.
- Data Server should store all incoming real-time data feeds as compressed csv files, to be uploaded to a geographically remote server, so that data need not be bought, in case database recovery is required.
- Data Server should serve local clients after authentication & keep a record of the type, symbol & data range queried by user/strategy, for mgmt. analysis with simulations/actual trade. Don't really know what use this will eventually give except to tailor server rollout to user requirements, but one never knows.
- Each Data Query & Delivery thread should be separately forked and multi-processor/multi-thread friendly.
- Database server should have a good connection pooling system.
- If possible data should be streamed to the client, rather than returned as a recordset. FIX? CSV strings in a tcp/ip socket/connection? Any specific protocol?
- It should be possible to transform the data from native storage type to the required delivery type i.e. Metastock, Tradestation formas etc. I'd like to put a note here that I don't know anything about any software's data format, so maybe I don't have the right idea.
- Server capacity should be measurable in connections and no. of symbols/data bars delivered/deliverable per second on a given hardware.
- Server should have a native delivery method in addition to normal jdbc/odbc pooled connections and a streaming delivery, for those who want to write custom connectors for faster speed.
- Server should have capability of connecting futures data periods & smoothing the joins, for extended history of futures contracts.
If someone knows of an opensource project or even a commercial project that can provide this sort of functionality, please advise.
I would also like to hear from people who would have an input on the featureset required or comments about something I am thinking wrong.
I personally feel that a Data Server as outlined above, a GRID based back-testing/simulation server, that can be augmented as needs, an OMS, an Automated Trading Server, Accounting Backoffice and clients, would form a basic professional trading framework.
I would really like your input on this first issue that I am tackling. Please do contribute to this thread.
With best regards,
Sanjay.
As I get more into technical Trading, I find that the Retail tools for Trading do not cover the entire user requirement. I find that back-testing & optimizing multiple scenarios in multiple ways is really not possible in these systems or is just too slow.
One major problem I find is that if one is running multiple simulations on the same data set or permutation of the same data set, the software tends to query it again & again from the data providers server.
It was quite shocking to me that one could not buy the historical data one time & update daily from real time data providers and have a local data store to run queries at wire speed rather than be limited by your internet speed, especially if you are using more than one machine for simulation, back-testing & optimization.
Maybe this sort of thing is possible with Non-Retail Softwares.
What I envisage is something like this:
- A Data Storage Server which will take offline data of any/all symbols (one time) and then query live data provider for tick data on each symbol live and store on a daily basis.
-Preferably Open Source. May be free or Paid, again preferably free or GPL/Community Effort.
- Preferably an Appliance/User Friendly machine or User Server running Linux & RDBMS like postgreSQL. User Specified OS/RDBMS may be possible if software is built in RDBMS triggers & multi-platform languages like python/perl/java etc. exclusively.
- With capability to load TICK Historical Data from a wide variety of file formats, downloaded online or from CD/DVDs.
- Historical Data of an entire Exchange (all scrips) or all Symbols to be traded, to be loaded in Tick Format.
- Capability to clone/split the incoming data stream in real time to two threads, one to store the data and second to provide data to any local user, if wanted raw/unprocessed on tick by tick basis.
- Data Storage Server to have Data Transformation procedures as triggers to build downstream data, as required by various client applications e.g. 1/3/5/15/30 minute bars or 20/60/1500 tick bars etc. and this data to be available live & independent of the base tick data from which it is derived without any online data transformation, for speed issues.
I am assuming that speed would be vital & hard disk space merely a commodity. One has to be aware however that years of historical data, kept with multiple permutations, in populated views, alongwith required indexes and other rdbms system generated data, would make for one hell of a storage management requirement.
- Server to have Views, preferably populated views (to be rebuilt each weekend), so that recursive queries are avoided for speed.
- Data Transformation Procedures should have method for rarely used data type say 20000 tick bars to be dropped on admin/mgmt confirmation and create new data/bar type on user software demand after confirmation from mgmt. Disk Space Mgmt issues
- Data Server to have pre-defined data cleansing procedures for clearing out bad ticks.
- Data Server should store all incoming real-time data feeds as compressed csv files, to be uploaded to a geographically remote server, so that data need not be bought, in case database recovery is required.
- Data Server should serve local clients after authentication & keep a record of the type, symbol & data range queried by user/strategy, for mgmt. analysis with simulations/actual trade. Don't really know what use this will eventually give except to tailor server rollout to user requirements, but one never knows.
- Each Data Query & Delivery thread should be separately forked and multi-processor/multi-thread friendly.
- Database server should have a good connection pooling system.
- If possible data should be streamed to the client, rather than returned as a recordset. FIX? CSV strings in a tcp/ip socket/connection? Any specific protocol?
- It should be possible to transform the data from native storage type to the required delivery type i.e. Metastock, Tradestation formas etc. I'd like to put a note here that I don't know anything about any software's data format, so maybe I don't have the right idea.
- Server capacity should be measurable in connections and no. of symbols/data bars delivered/deliverable per second on a given hardware.
- Server should have a native delivery method in addition to normal jdbc/odbc pooled connections and a streaming delivery, for those who want to write custom connectors for faster speed.
- Server should have capability of connecting futures data periods & smoothing the joins, for extended history of futures contracts.
If someone knows of an opensource project or even a commercial project that can provide this sort of functionality, please advise.
I would also like to hear from people who would have an input on the featureset required or comments about something I am thinking wrong.
I personally feel that a Data Server as outlined above, a GRID based back-testing/simulation server, that can be augmented as needs, an OMS, an Automated Trading Server, Accounting Backoffice and clients, would form a basic professional trading framework.
I would really like your input on this first issue that I am tackling. Please do contribute to this thread.
With best regards,
Sanjay.
