Recording live feeds

cjbuckley4 · Nov 20, 2014

Hi folk, I was about to ask this question on a server site, but I thought there might be folks here with experience with this particular problem. Maybe others struggling with this in the future on this site will find this helpful as well, because im sure I'm not the first hobbist to attempt this. This pertains to my ongoing project: building a historical tick database.

Question:

am I approaching this problem the right way? (I'm brand new to servers).

The problem:
- I have a datafeed API that I want to record. It sends streaming data which I want to turn into persistent data. I want to build a database out of this data on my home machine. This feed runs about 24 hours a day about 5+ days a week, and I'd like to capture the data as accurately as possible with minimal missed data points. How do I best go about taking this streaming data and making it persistent?

My idea for a solution:
Set up an EC2 server to run on the feed on (to maximize uptime). Database the feed on the EC2 server. Dump the EC2 database onto my home machine every so often. If any periods are missed, fill them in using the API's historical data feature (why am I not just using the historical data feature? No quote, just traded, no bid or asks sizes). In the future, I will also use the historical data feature to compare to my recording to flag bad ticks. I have no idea how I'll do that yet...it's a separate future project.

Objectives:
Minimize missed data.
Minimize human intervention.
Maximize scalability if I want to watch more symbols.
Minimize cost (lastly).

How have other traders solved this problem? What are the possible oversights in my approach? What recommendations do you have?

IAS_LLC · Nov 20, 2014

NxCore does this automatically with its *.nxc tape files. Problem solved.

I'm not sure how IQFeed works, but I did the same thing for awhile with the IB API data feed (not a true tick feed, I know...but same concept) and basically what I was doing was writing quotes to an in memory buffer, and than every time that buffer filled up I wrote them to a binary file. I didn't see the usefulness of using sql or something similar. If you're really concerned about performance, you could multithread the process.

cjbuckley4 · Nov 20, 2014

Good answer, I've considered switching to NxCore for some time, but since I presently am not anywhere near fully utilizing IQFeed I haven't been able to justify upgrading yet...I only follow a few symbols. I was unaware of the .nxc feature, that seems pretty worthwhile.

I'm really less concerned about the process of writing the data though and I am more concerned with making sure I get the least missed ticks due to server downtime or an error in my program. When you say NxCore does this automatically, do you mean that you can request these files from their servers via TCP etc. or is it a feature you "switch on" and it records the files locally? If it's local, then it doesn't really solve my problem because I need to still invest in the EC2 and write the code to send the .nxc files back to my home computer periodically...then the only real justification for upgrading I can see if I wanted to record 500+ symbols, and for that I'm guessing my EC2 bill would go up substantially. If you can request it from their server then that's will pretty much justify the added cost of NxCore because I can avoid about 200 dollars a month in EC2 fees. I've seen very different quotes on what they cost, but maybe it's worth inquiring. They definitely push the top end of what I can afford to pay as a student though.

cjbuckley4 · Nov 20, 2014

It may even be best to just stick with the trades only data from IQFeed for the moment, because I'm having trouble deciding if the addition of quotes is really going to make a lot of difference.

IAS_LLC · Nov 20, 2014

The nxc files are automatically created LOCALLY. Thats just how NxCore works, not a special feature. They do it to ensure you receive every single tick from the exchange . Historical files are not included in the NxCore fee, but can be requested for an additional fee. The historical files are EXACTLY the same as what you would receive in real time. No data has been removed (full depth and trades are there). They are wonderful.

So you are worried about the data transfer from your EC2 to wherever you are archiving getting garbled?

I may be misunderstanding what you are trying to do... but assuming you simply want to reliably transfer data from EC2 to your home computer in an automated fashion I would use a commercial backup software that does the backup at whatever frequency you like. I'm pretty sure most software of this type performs some sort of CRC check so you should be safe in terms of "losing ticks". I wouldn't be surprised if there is free software out there for this. If you insist on writing your own tool for the job, it shouldn't be too difficult to use a TCP/Ip socket with a CRC (or better) check.

cjbuckley4 · Nov 20, 2014

I see. Thank you for your reply. The NxCore historical data would probably be more akin to what I want...I'm actually trying to achieve basically the same thing: an unaltered tape which I can replay as necessary.

I think your suggestion pretty much answers my original question. Thank you. Not garbled per say, I just know that it's an inevitability that at some point in my IQFeed recording, something will go wrong, so I'm just trying to find the best way to record the feed with minimal downtime. Using a commercial backup service will probably go a long way to insuring I transfer the files smoothly. I should also try to minimize the number of transfers obviously. I think it's safe to assume the biggest vulnerability in this system will be my own code.

volpunter · Nov 22, 2014

what data do you EXACTLY need? Maybe IQFeed's historical data request will be sufficient. I cannot stress how important it is to keep fixed costs to a minimum when you start out. Too many became impatient or burnt out simply because they had to shoulder a lot of costs for servers, special broker services, data vendors, and what have you. Define exactly what data you need and why. Just because its "nice to have" does not justify paying another 1000 USD per month (EC2 24/7 + NXCore, + ...) rather than just retrieving the consolidated historical data set from IQFeed.

cjbuckley4 said:
I see. Thank you for your reply. The NxCore historical data would probably be more akin to what I want...I'm actually trying to achieve basically the same thing: an unaltered tape which I can replay as necessary.

I think your suggestion pretty much answers my original question. Thank you. Not garbled per say, I just know that it's an inevitability that at some point in my IQFeed recording, something will go wrong, so I'm just trying to find the best way to record the feed with minimal downtime. Using a commercial backup service will probably go a long way to insuring I transfer the files smoothly. I should also try to minimize the number of transfers obviously. I think it's safe to assume the biggest vulnerability in this system will be my own code.

cjbuckley4 · Nov 22, 2014

That's 100% the right question @volpunter, but it's really what I ask myself. I would say I'm pretty much a hobbyist at this point. I have enough liquid cash I can 'afford to lose' (who can really 'afford to lose' 30 thousand dollars? Probably < 1% of this forum is in a position where that wouldn't be painful) to clear PDT minimums, but honestly not a whole lot more...so I constantly find myself asking if I should strive for a recorded millisecond percision feed with market depth like NxCore's historical data to fulfill my weird little hobby/compulsion...but that's obviously quixocitic...I can't even really afford to trade on millisecond frequency data, yet I--for some decidedly hobby-driven reason--continue to want it really badly kinda like you want a certain toy for Christmas as a kid. I realize the far more realistic path would be to settle for just copying the trade data IQFeed stores and will readily provide for me. I feel like I'm also on some really strange and ever more expensive quest for the best data that I admittedly haven't ever used for live trading. First it was the EOD data, then IQFeed, then it will inevitably be NxCore, and (contingent on me getting a real paying job), it'll probably eventually progress to me listening directly to exchange feeds. I always find myself balancing an HFT fascination with wanting to create reasonably actionable medium frequency strategies. I guess that's probably okay while I'm in college, but at some point (read: some years down the road) I should probably take all this data and hours or obsessive reading and try to start a little trading business. It's really a very hard thing for a beginner to balance as you alluded to. I don't think I'm the type of person who would 'give up' something that I've had such a compulsive interest in, but I also doubt the others you've seen put down money for data and start up costs thought they would throw in the towel either. Again, as I said a few posts ago, I'm not sure blowing a whole summer's internship on NxCore data is really justifiable in the 'hobby' strange. I'm lucky to have pretty considerable resources for a college aged trader and maybe I should conserve them to be used for actionable medium frequency strategies (ie sticking with IQFeed) instead of going to NxCore which, although really cool, would be pretty much a discretionary purchase at this point.

The nice thing about my current NxCore vs IQFeed predicament is that if I pull my head out and decide to just use IQFeed for now, NxCore will sell me their historical data in the future should I progress to upgrading. If I was at the stage of deciding between listening to exchange feeds or sticking with NxCore I probably wouldn't have that luxury.

IAS_LLC · Nov 22, 2014

If u haven't defined what your strategy NEEDS yet, u need to step back. I have argued with volpunter in the recent past but I think the guy knows his shit(he's just abrasive... But knowledgeable) . I advocate NxCore historical data for strategy definition because it contains everything ....it is easy to take that data and emulate a lesser feeds. Historical data from NxCore costs about $100/exchange per month (maybe 150) , but if your trading concept only involves trade data, save your money and just get the trade data from IQFeed. Not sure what you're studying but I'm guessing its technical. Think scientific method...first form a hypothesis. If your hypothesis doesn't involve the book.... Don't pay for it

volpunter · Nov 22, 2014

Think maybe in this way: If you can afford the best data feed and do not worry about your financial resources then by all means, go for it. But that does not push you one inch forward. You still are standing where you stood before, you still need to think really hard what kind of holding periods and trading frequency suits your style, how you formulate strategies that match with such style, whether you are tending towards momentum based strategies or mean reversion ones, whether you like to focus on equities or currencies or any other asset class. Just make sure you do not use data as an excuse to not move forward. Sometimes a seemingly mid-low priority item can become the most important issue in our life if we make it become one, and often we do that to use as an excuse or hiding spot. I am not saying this applies to you but I suggest you make up your mind as IAS_LLC suggested (I do highly respect his thoughts though we may sometimes disagree, and yes, I can be an asshole, I am not the one with a too long attention span or patience for bullshit), make the data feed a function of your actual need, not the other way around. If you cannot do so then just decide on the data feed and simply move on. It is time to ponder strategies and their design. Just my 2 cents.

cjbuckley4 said:
That's 100% the right question @volpunter, but it's really what I ask myself. I would say I'm pretty much a hobbyist at this point. I have enough liquid cash I can 'afford to lose' (who can really 'afford to lose' 30 thousand dollars? Probably < 1% of this forum is in a position where that wouldn't be painful) to clear PDT minimums, but honestly not a whole lot more...so I constantly find myself asking if I should strive for a recorded millisecond percision feed with market depth like NxCore's historical data to fulfill my weird little hobby/compulsion...but that's obviously quixocitic...I can't even really afford to trade on millisecond frequency data, yet I--for some decidedly hobby-driven reason--continue to want it really badly kinda like you want a certain toy for Christmas as a kid. I realize the far more realistic path would be to settle for just copying the trade data IQFeed stores and will readily provide for me. I feel like I'm also on some really strange and ever more expensive quest for the best data that I admittedly haven't ever used for live trading. First it was the EOD data, then IQFeed, then it will inevitably be NxCore, and (contingent on me getting a real paying job), it'll probably eventually progress to me listening directly to exchange feeds. I always find myself balancing an HFT fascination with wanting to create reasonably actionable medium frequency strategies. I guess that's probably okay while I'm in college, but at some point (read: some years down the road) I should probably take all this data and hours or obsessive reading and try to start a little trading business. It's really a very hard thing for a beginner to balance as you alluded to. I don't think I'm the type of person who would 'give up' something that I've had such a compulsive interest in, but I also doubt the others you've seen put down money for data and start up costs thought they would throw in the towel either. Again, as I said a few posts ago, I'm not sure blowing a whole summer's internship on NxCore data is really justifiable in the 'hobby' strange. I'm lucky to have pretty considerable resources for a college aged trader and maybe I should conserve them to be used for actionable medium frequency strategies (ie sticking with IQFeed) instead of going to NxCore which, although really cool, would be pretty much a discretionary purchase at this point.

The nice thing about my current NxCore vs IQFeed predicament is that if I pull my head out and decide to just use IQFeed for now, NxCore will sell me their historical data in the future should I progress to upgrading. If I was at the stage of deciding between listening to exchange feeds or sticking with NxCore I probably wouldn't have that luxury.