I mean massive infrastructure is not scary, in fact that shows that the HFT firms are slow. A massive infrastructure often shows a lack of programming knowledge, a powerful infrastructure is a small. Perhaps only 1 computer with a FPGA nic or DPPK/similar to pull packet frames and process them immediately. This method will give you the required trading latency (under 1000 nanoseconds to receive, proc and reply a packet). If you have any kind of infrastructure, by the time the packet propagates through your infrastructure you already at like 10,000 nano+
Now a FPGA nic is like $1000+, DPPK is free but requires a $500+ nic.
My teams programming time is free in terms of spending a few moments here and there to work on something that has potential.
A spot in the datacenter near the exchange I can imagine would be at least 300$ a mon (colo), a spot that apparently is the closest to the exchange I think go for $10,000 a mon around.
A data subscription for data directly from the exchange. I dont know the price on this one.
A server would be like $2000 tops. You can fit in a $1000 budget easy. $10,000-20,000 if you go all out getting 32~ phys cores 32~ vcores, 160gb+ ram, raid0 pcie ssds.
And that is basically your cost breakdown. It is NOT that much. It is not the kind of investment that warrants multimillion dollar returns. Hell no. Go buy a store chain, spend a few million and make a few hundred thousand.
Here its like spend under a hundred thousand and make a few million.
The real issue w/r/t speed and cost is not the cost of physical infrastructure as you've alluded to above, it's the cost of services and payroll. If you're actually going to be competitive in the speed game, it's very difficult to do so as a one man shop. Systems people are quite expensive, especially those with experience in this space. There are ways to keep things cheap and get better speed, but you need to use the proper infrastructure and service providers on the right exchanges.
W/r/t position in Q, for some exchanges you can know this explicitly with the correct data feed, however for others you must estimate it. Estimating Q position is a complex problem that is easiest to solve if you have fill data or are clever enough to work around it, which can be done well on some exchanges and products. Start by understanding the mechanics of the exchange and the data feed. Do some thinking about scenarios where you can (or cannot) explicitly know an order came from in front of (or behind) you in the Q. If you can figure out the fill data issue, you can start to develop statistical models for the conditional expectation of your Q position and go from there.
The big problems you'll face are as follows:
1. If you don't have your own fill data, you'll either need to figure out how to use the information in the feed or find some in order to test the efficacy of your model.
2. You'll need to deal with deaggregating a price level aggregated feed most likely. Any grouping or aggregation of orders makes this impossible.
3. Certain products, CL for example, do a lot of one lots. Any non-uniquely sized quote destroys a lot of assumptions about Q position in an aggregate book, but too many one lot offers getting hit by other small orders makes it difficult to match against the order of updates to the price level you see.
4. If you're competing for Q position, you need to know that you're going against fast guys. Without the proper timestamps that no commercial historical data sets I'm aware of provide, you will need to make assumptions about whose order arrived first.
In practice, unless you're already committed to writing a lot of code and building a pretty robust system, it's not even worth worrying about past just the pessimistic upper bound of SizeWhenIJoined - SizeTradedAgainstLevelSince.