Wow! I wasn't expecting responses like this - this is great! Thanks so much... Hopefully this turns into a great thread for all involved.
The reason why I ask is because I run some Intel Atom (super micro 2U-Twin^3) servers and they are in pretty high demand. These are fine to keep running as OS on bare-metal solutions but not that great for redundancy. I'm looking to get fast enough that I can tell someone that there is minimal difference between running OS on bare metal vs. OS on Hyper-visor.
Quote from NetTecture:Can you describe your hardware?
Production hardware is all Supermicro stuff. I have two "MicroCloud" machines (3U, 8-Node) (
http://www.supermicro.com/products/nfo/MicroCloud.cfm ) and I have a few various 2U Twin solutions (2U 4 Nodes, 2x PSU) that run either Xeon 5400 or 5500 or 5600 CPUs.
I also have two 2U twin^3 servers (
http://www.supermicro.com/products/nfo/2UTwin3.cfm ) with Atom CPUs (2U, 8-Node). These do not run VMware and are used exclusively for bare-metal installs of some Linux/Unix flavor Server OS to run an ATS only (just execution no development or testing). They are Atom D525 CPUs 1.8ghz overclocked to ~2.1ghz and 8GB RAM. They are dual-core with HT so 4 threads (like an i3). In my opinion a perfect platform for an execution only system that needs to be ultra low latent. They all run installs of Ubuntu Server LTS - though one guy has some other flavor (Debian I think).
NICs are all either Intel Pro 1000 server-grade (PT model) NICs. They all have plenty of onboard buffering.
Quote from NetTecture:I am more a Hyper-V type, but here we go - the same applies to VmWare to my knowledge.
By Hyper-V do you mean Xen or Citirx XenServer vs. VMware's ESXi? At this point we all run some flavor of Xen Hypervisor, Citrix and VMware have just made it idiot proof whereas Xen.org has kept it CLI and raw.
Quote from NetTecture:it may depend on the network card. There are special cards / chips by Intel that have hardware queues. Basically the incomin etherner packets are not handled by the hHyper-V switch but aare pushed into a queue based on target MAC address... and the driver reads them out from the VM. This totaly kills all processing from the hypervisor (except the driver configuring the card) and significantly adds to performance under network load.
For the most part I'm running Intel Pro 1000 PT quad NICs. I have a couple intel 10G dual-port NICs. Nothing crazy like fibre channel but I'd put myself in the "no expense spared" category vs. "money is no object".
Also, with regards to NICs, these are "running raw" meaning no firewall and only using the Vswitches. I've actually noticed it's slower to assign each VM dedicated hardware (or a dedicated Vswitch per port on a quad NIC) than it is to just let the HV have the whole NIC and put them all on a single or seperate Vswitch.
Quote from NetTecture:THAT SAID: if you really need to get extremely fast, then basically skip the hypervisor. You always itnroduce latency of one sort and you have not determined behavior. Modern Hyper-Visors outside the mainframe so far do not allow locking cores to VM's so you always run the risk of having to wait for a time slice, even with higher priority than the other processes. If all cores are buy, it is a small delay.
I'm not over-booking (allocating more resources than you have available like cores, RAM, GPU, etc.). I rent cheap VMs out for guys to use as a 24/7/365 internet (or we joke and call it a porn VM) machine and I overbook those - but trading machines are never overbooked.
I mentioned the Atom servers above so I get it and agree 100% that running on bare metal is faster - I just want to improve my HV speeds.
Quote from NetTecture:Plus you have additional processing - and I am quite sure super fast trading is not on the radar of companies like VmWare or Microsoft when optimizing their hypervisors.
But your best bet on in that thing first would be a hardware review.
All of the CPUs are 3.0ghz or higher (so a X5680 or a W5580 vs. something in the 2.2 or 2.5ghz range. Nothing is over clocked (except the Atom boxes) and just about everything has max RAM as allowed by the BIOS.
My test boxes are the same (dell brand T5500 workstations because they are quiet for the home & office) Xeon 5500 or 5600 CPUs, same NICs, etc. same setup except just not racked. I've never had a VM or configuration that has worked on the test boxes that has had issues on the production machines.
I don't think it's a hardware issue - but I could be wrong. All of the hardware is very new.
PocketChange - are you overbooking? I have all of my test boxes being used right now on this issue (and I'm building 15 workstations for a hedge fund this week) but send me a PM if you want - I'm happy to set up a test machine based on Xeon x5680 or 5690 CPUs if you want to test on something else.
I don't think you should overbook given your resources. There were a lot of changes in Excel between 2003, 2007 & 2010. This isn't by the book but it's how I explain it to people - Excel 2003 was limited. It worked but had hard-coded limitations. 2007 was allowed to take (had access to but didn't just take) additional memory and resources whereas Excel 2010 feels like it tries to take any/all available resources (up to it's limits) even if it doesn't need it. By overbooking resources Excel 2010 is probably slowing itself down trying to multi-thread on virtual (non-existent) threads. As NetTecture said you are probably causing lag/latency just by queuing up thread processing.
Does your MS times reflect latency to your broker or latency to call the data from your broker or is this 100% internally on this machine/network? Is your latency solely reflected by the machine or is there networking involved too?
At times you are running into memory issues as well. Are you overbooking memory too?
I have a dual-Xeon 5400 series (2x 5482 CPUs) with 32GB RAM running ESXi that you could try sooner if that helps. Those CPUs don't have hyper threading and I'm happy to shut down the other machines and give you a few XP or W7 machines if it helps. I'm trying to sell it so it could be a short window that it's available.
Quote from NetTecture:> If VMware is allocating cores from different physical machines
Except that VmWare can not have multi machine VM's. Very few ccan ( Iknow of two )adn they are very special, so no.
Even in an HA Cluster (high availability) you can't share resources across multiple Hosts. The limits today are heterogeneous high-availability clusters vs. pulling CPU resources from another machine into yours.
PocketChange - what are you running? A lot of people sell semi knock-off "cloud computing" or even Linux "DIY supercomputer" software and that could be slowing you down as well.
(Didn't know there was an old bug - I guess that's how new I am to this space. I've only ever run ESXi 5.0)
Quote from NetTecture:Bsicalyl this one was needing8 cores avaialbel fro a 8 core VM which means the more core a VM had the longer it waited for a time slice. Hyper-V and a newer VmWare verion allocate every core separately.
5 to 10 times slower is a NO - this should not happen unless the physical platform is overloaded (cpu maxes out, memory bandwidth maxes out, so the vm runs into switching problems). The overhead normally is below 5%. That said, the machines look quite pathetic in my eyes - I am using 6-8 core machines at the moment with 16-64gb memory. I do NOT like my physical layer to run into problems.
Have to agree. On the production machines I book to about 75-80% of physical layer. I wasn't running VMs back during the flash crash but I know a few guys who were monitoring machine loads and if we had another flash crash my 75-80% wouldn't be enough but it would be much better than if I was already overbooked.