If your performance differs between CPU's with and without hyperthreading you might be encountering the 64K Aliasing issue. It's not hyperthreading specific, but increases tenfold on hyperthreading machines. I was finally tipped off by a VTune performance trace from a user (as I did not have a hyperthreading box at the time).
What's 64K Aliasing? A quote from the VTune reference
A 64K-aliasing conflict occurs when a virtual address memory references a cache line that is modulo 64K bytes apart from another cache line that already resides in the first level cache. Only one cache line with a virtual address modulo 64K bytes can reside in the first level cache at the same time.
Worst case would be to have multiple threads all having their stack space starting at a 64k modulo. That's what I had (in my memory manager). Every time a thread is context switched into action the data from the another thread is pushed out of the cache ;-) Another comparable worst case would be to subsivide a large picture in 64K blocks and start working on each block in a different thread.
The fix in my case was to offset each threads stackspace by a differing amount of memory. Along the lines of this pretty technical article; http://software.intel.com/en-us/articles/adjusting-thread-stack-address-to-improve-performance-on-intel-xeonr-processors/
Igor Ostrovsky wrote an easier to read article on performance issues that are related to current cpu-cache architectures, including the 64K issue. Without solutions however. http://igoro.com/archive/gallery-of-processor-cache-effects/
Another good related read is can be found on dobbs; http://www.drdobbs.com/184405848 (or search 64k and pixomatic if the link is down).