interesting that you guys are having issues with storage... but without knowing what vendors you are using, etc... hard to express an opinion...
any trading activity/business generates tons and tons of data... but I dont think that should be a huge issue, my RDO teams manage about 5PB+ and the storage is tiered depending on the source, compliance req, etc...
anyhow, I dont want to clutter this thread with a storage discussion...
in any event, our grids are built with HP SL6x000, we go for density, 256 Cores or 512Threads per 4U block... each rack has a NetApp filer... the scheduler are usually DL580s... recently we deployed a full rack of DL560Gen8 Grid with K20's as a test case for a group...
in other places, the grids were built using DL380 servers, given you dont want too much compute on a single node or rack and if that goes out you are out a certain % of capacity...
anyhow, they come in really handy when doing MCS and do RT analysis/pricing and risk management... I realize you are using it more for back testing, which is also used for, but more of a secondary effect...