Hi folk, I was about to ask this question on a server site, but I thought there might be folks here with experience with this particular problem. Maybe others struggling with this in the future on this site will find this helpful as well, because im sure I'm not the first hobbist to attempt this. This pertains to my ongoing project: building a historical tick database.
Question:
am I approaching this problem the right way? (I'm brand new to servers).
The problem:
- I have a datafeed API that I want to record. It sends streaming data which I want to turn into persistent data. I want to build a database out of this data on my home machine. This feed runs about 24 hours a day about 5+ days a week, and I'd like to capture the data as accurately as possible with minimal missed data points. How do I best go about taking this streaming data and making it persistent?
My idea for a solution:
Set up an EC2 server to run on the feed on (to maximize uptime). Database the feed on the EC2 server. Dump the EC2 database onto my home machine every so often. If any periods are missed, fill them in using the API's historical data feature (why am I not just using the historical data feature? No quote, just traded, no bid or asks sizes). In the future, I will also use the historical data feature to compare to my recording to flag bad ticks. I have no idea how I'll do that yet...it's a separate future project.
Objectives:
Minimize missed data.
Minimize human intervention.
Maximize scalability if I want to watch more symbols.
Minimize cost (lastly).
How have other traders solved this problem? What are the possible oversights in my approach? What recommendations do you have?
Question:
am I approaching this problem the right way? (I'm brand new to servers).
The problem:
- I have a datafeed API that I want to record. It sends streaming data which I want to turn into persistent data. I want to build a database out of this data on my home machine. This feed runs about 24 hours a day about 5+ days a week, and I'd like to capture the data as accurately as possible with minimal missed data points. How do I best go about taking this streaming data and making it persistent?
My idea for a solution:
Set up an EC2 server to run on the feed on (to maximize uptime). Database the feed on the EC2 server. Dump the EC2 database onto my home machine every so often. If any periods are missed, fill them in using the API's historical data feature (why am I not just using the historical data feature? No quote, just traded, no bid or asks sizes). In the future, I will also use the historical data feature to compare to my recording to flag bad ticks. I have no idea how I'll do that yet...it's a separate future project.
Objectives:
Minimize missed data.
Minimize human intervention.
Maximize scalability if I want to watch more symbols.
Minimize cost (lastly).
How have other traders solved this problem? What are the possible oversights in my approach? What recommendations do you have?