While doing some work in SynthMaker, I was optimizing a bunch of DAP paths. This basically means trying to get a small subset of seemingly random chosen SSE and SSE2 instruction to calculate what I want. As the developers of SynthMaker didn’t bother to implement most of the SSE/SSE2 instruction set (god knows why; reverse engineering the code reveals it’s parser could easily handle most of SSE dealing with packed singles, coincidentally exactly what is used in SynthMaker), most of the common hacks simply won’t work. Heck, we don’t even have XORPS!
So it all boils down to common subexpression elimination (easy, since SynthMaker creates so many) and good ol’ cycle counting. I haven’t counted cycles since the 6510! Since most ASM programmers have the full set of SSE* operands at their disposal, which typically implements a specialized op for most of what you do, it’s hard to find a decent cycle chart; most people don’t need to bother.
Luckily, there are still some resources around. And since I hate to forget, here’s the best one I found: http://www.intel80386.com/simd/mmx2-doc.html
Yup! It’s supposedly for the 386, which didn’t even have MMX, but there you go! It’s not complete but good enough.
If somebody finds a better chart, let me know!