std::atomic is probably what you want to look at. You will have to insure atomicity.
In terms of architecture you will at least need some way to persist orders so that your history can be rebuilt in the event of a crash. I would suggest first looking at the simplest possible solution: a local cache of orders in the current "window" that gets offloaded to a database at some interval. This database could be something as simple as redis, which then itself can replicate over to cold storage system later. A 3 layer caching strategy would guarantee reliability.
Establishing atomic writes is not terribly difficult using a modern database and good engineering practices. I would suggest you first implement the simplest solution (above), benchmark it, and then make modifications to bring the execution time down to where you need it. Atomicity on the local system's ledger cache will need to be carefully engineered and tested. Consider using the Ravenscar profile of ADA/GNAT as a good baseline. You need a baseline first and reaching sub-microsecond execution times gets into a mixture of:
1. How you programmed it (C++ is likely still too high level and you'll be looking into ASICs eventually)
2. How you architected it (do you have fiber pipe going to databases?)
3. The hardware it is running on (are your structures storeable in the L2 cache of a CPU? What hardware level caching do you need to fit your structures in?)
This problem is not simple. You will want to tailor it to specific hardware and optimize your structures in-flight to fit inside of the fastest available CPU caches to insure top-tier performance. You will also want to look into the lag caused by branch prediction and several other things and write your make file/compiler flags to optimize for this. Good luck.
I'm pretty sure I just learned some new (important) here but I have no idea what it was. Just imagine me sitting at my desk with an old man OMG face...