Quote from KINGOFSHORTS:
First you will not outperform a Vectorizing compiler like Intel's Fortran and C++ compiler.
Oh but i'm pretty sure you will. That's how many ASM coders get their jobs. Compilers "try" to do it right and there are many good ones, but better parallelization is always done by a human. There are just too many variations on any number of tasks for a compiler to take into account all of them. Try decompiling some code you think should be faster than ASM and looking at it in Olly Dbg (disassembly). It's just impossible, it's machine language, humans do a better job at compiling (by coding machine language) due to the nature and range of different OpCodes available. A compiler isn't creative and will not get ideas along the way, a human will/should be.