#
290014 |
|
26-Oct-2015 |
vangyzen |
Disable SSE in libthr
Clang emits SSE instructions on amd64 in the common path of pthread_mutex_unlock. If the thread does not otherwise use SSE, this usage incurs a context-switch of the FPU/SSE state, which reduces the performance of multiple real-world applications by a non-trivial amount (3-5% in one application).
Instead of this change, I experimented with eagerly switching the FPU state at context-switch time. This did not help. Most of the cost seems to be in the read/write of memory--as kib@ stated--and not in the #NM handling. I tested on machines with and without XSAVEOPT.
One counter-argument to this change is that most applications already use SIMD, and the number of applications and amount of SIMD usage are only increasing. This is absolutely true. I agree that--in general and in principle--this change is in the wrong direction. However, there are applications that do not use enough SSE to offset the extra context-switch cost. SSE does not provide a clear benefit in the current libthr code with the current compiler, but it does provide a clear loss in some cases. Therefore, disabling SSE in libthr is a non-loss for most, and a gain for some.
I refrained from disabling SSE in libc--as was suggested--because I can't make the above argument for libc. It provides a wide variety of code; each case should be analyzed separately.
https://lists.freebsd.org/pipermail/freebsd-current/2015-March/055193.html
Suggestions from: dim, jmg, rpaulo Sponsored by: Dell Inc.
|
#
256281 |
|
10-Oct-2013 |
gjb |
Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle.
Approved by: re (implicit) Sponsored by: The FreeBSD Foundation |
#
217026 |
|
05-Jan-2011 |
dim |
Sort -mno-(mmx|3dnow|sse|sse2|sse3) options consistently throughout the tree.
Submitted by: arundel
|
#
216977 |
|
04-Jan-2011 |
dim |
On amd64 and i386, tell the compiler to refrain from generating SSE, 3DNow, MMX and floating point instructions in rtld-elf.
Otherwise, _rtld_bind() (and whatever it calls) could possibly clobber function arguments that are passed in SSE/3DNow/MMX/FP registers, usually floating point values. This can happen, for example, when clang generates SSE code for memset() or memcpy() calls.
One symptom of this is sshd dying early on amd64 with "PRNG not seeded", which is ultimately caused by libcrypto.so.6 calling RAND_add() with a double parameter. That parameter is passed via %xmm0, which gets wiped out by an SSE memset() in _rtld_bind().
Reviewed by: kib, kan
|
#
216975 |
|
04-Jan-2011 |
dim |
Remove '-elf' from build flags for libexec/rtld-elf for amd64 and i386. ELF has been the default format for almost 12 years now.
|
#
211725 |
|
23-Aug-2010 |
imp |
MFtbemd:
Prefer MACHNE_CPUARCH to MACHINE_ARCH in most contexts where you want to test of all the CPUs of a given family conform.
|
#
45501 |
|
08-Apr-1999 |
jdp |
Eliminate all machine-dependent code from the main source body and the Makefile, and move it down into the architecture-specific subdirectories.
Eliminate an asm() statement for the i386.
Make the dynamic linker work if it is built as an executable instead of as a shared library. See i386/Makefile.inc to find out how to do it. Note, this change is not enabled and it might never be enabled. But it might be useful in the future. Building the dynamic linker as an executable should make it start up faster, because it won't have any relocations. But in practice I suspect the difference is negligible.
|