C++ concurrent data structure for fast retrieval

Russell Shuffle · Nov 30, 2020

gaussian said:
std::atomic is probably what you want to look at. You will have to insure atomicity.

In terms of architecture you will at least need some way to persist orders so that your history can be rebuilt in the event of a crash. I would suggest first looking at the simplest possible solution: a local cache of orders in the current "window" that gets offloaded to a database at some interval. This database could be something as simple as redis, which then itself can replicate over to cold storage system later. A 3 layer caching strategy would guarantee reliability.

Establishing atomic writes is not terribly difficult using a modern database and good engineering practices. I would suggest you first implement the simplest solution (above), benchmark it, and then make modifications to bring the execution time down to where you need it. Atomicity on the local system's ledger cache will need to be carefully engineered and tested. Consider using the Ravenscar profile of ADA/GNAT as a good baseline. You need a baseline first and reaching sub-microsecond execution times gets into a mixture of:

1. How you programmed it (C++ is likely still too high level and you'll be looking into ASICs eventually)
2. How you architected it (do you have fiber pipe going to databases?)
3. The hardware it is running on (are your structures storeable in the L2 cache of a CPU? What hardware level caching do you need to fit your structures in?)

This problem is not simple. You will want to tailor it to specific hardware and optimize your structures in-flight to fit inside of the fastest available CPU caches to insure top-tier performance. You will also want to look into the lag caused by branch prediction and several other things and write your make file/compiler flags to optimize for this. Good luck.

I'm pretty sure I just learned some new (important) here but I have no idea what it was. Just imagine me sitting at my desk with an old man OMG face...

931 · Nov 30, 2020

kmiklas said:
I have a requirement to create and maintain a ledger of orders. Thousands and of them, with many updates per second. It will operate in a high-speed environment: currently in single-digit mcs range; I'd like to see it in high nanos.

- Must be thread-safe
- Multiple threads and processes will be reading and writing concurrently
- Said ledger will be updated with algos (no GUI order input)

Noodling on how to best implement this... what data structure makes sense here?

1. A vector of pointers or references to orders?
2. A hash table?
3. Caching? I fear race conditions.
4. Database of some kind? I fear perf penalties.

I was thinking a separate thread or process, dedicated to holding the structure in memory, with accessor and mutator methods? Make every effort to stick with atomic operations, to perf penalties associated with locking?

Feedback and/or advice welcome. Thanks.

Based on what info will the orders be accessed later?

igr · Mar 7, 2021

kmiklas said:
I have a requirement to create and maintain a ledger of orders. Thousands and of them, with many updates per second. It will operate in a high-speed environment: currently in single-digit mcs range; I'd like to see it in high nanos.

- Must be thread-safe
- Multiple threads and processes will be reading and writing concurrently
- Said ledger will be updated with algos (no GUI order input)

Noodling on how to best implement this... what data structure makes sense here?

1. A vector of pointers or references to orders?
2. A hash table?
3. Caching? I fear race conditions.
4. Database of some kind? I fear perf penalties.

I was thinking a separate thread or process, dedicated to holding the structure in memory, with accessor and mutator methods? Make every effort to stick with atomic operations, to perf penalties associated with locking?

Feedback and/or advice welcome. Thanks.

- Do reads or writes overwhelm the other in quantity? We can consider specialized ideas for rarely-written/read often-read/written cases. Is the speed requirement the same for reads and writes?
- Are the writes modifications or mostly appending? Are the reads scan for all orders or perhaps returning the ones closer to the mark price?
- Are there any patterns to the updates? Perhaps the reads/writes are clustered in a certain region of the ledger?
- I'm guessing no, but if inconsistency can be tolerated that opens some design choices too. Also, whether this is going to go beyond the context of one machine can matter too.

While this is an interesting question to ponder, in reality you'd probably implement the simplest thing that works and then evolve by exploiting these product specific patterns.

cruisecontrol · Mar 10, 2021

kmiklas said:
I have a requirement to create and maintain a ledger of orders. Thousands and of them, with many updates per second. It will operate in a high-speed environment: currently in single-digit mcs range; I'd like to see it in high nanos.

- Must be thread-safe
- Multiple threads and processes will be reading and writing concurrently
- Said ledger will be updated with algos (no GUI order input)

Noodling on how to best implement this... what data structure makes sense here?

1. A vector of pointers or references to orders?
2. A hash table?
3. Caching? I fear race conditions.
4. Database of some kind? I fear perf penalties.

I was thinking a separate thread or process, dedicated to holding the structure in memory, with accessor and mutator methods? Make every effort to stick with atomic operations, to perf penalties associated with locking?

Feedback and/or advice welcome. Thanks.

Here's the real answer: You don't actually have any such requirement. Surprised no one figured it out after 4 pages.

C++ concurrent data structure for fast retrieval

Russell Shuffle

931

igr

cruisecontrol