History log of /freebsd-10.0-release/lib/msun/src/s_cbrt.c
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
# 259065 07-Dec-2013 gjb

- Copy stable/10 (r259064) to releng/10.0 as part of the
10.0-RELEASE cycle.
- Update __FreeBSD_version [1]
- Set branch name to -RC1

[1] 10.0-CURRENT __FreeBSD_version value ended at '55', so
start releng/10.0 at '100' so the branch is started with
a value ending in zero.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation

# 256281 10-Oct-2013 gjb

Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


# 219571 12-Mar-2011 kargl

Implement the long double version for the cube root function, cbrtl.
The algorithm uses Newton's iterations with a crude estimate of the
cube root to converge to a result.

Reviewed by: bde
Approved by: das


# 218909 21-Feb-2011 brucec

Fix typos - remove duplicate "the".

PR: bin/154928
Submitted by: Eitan Adler <lists at eitanadler.com>
MFC after: 3 days


# 176451 22-Feb-2008 das

s/rcsid/__FBSDID/


# 153548 19-Dec-2005 bde

Extract the high and low words together. With gcc-3.4 on uniformly
distributed non-large args, this saves about 14 of 134 cycles for
Athlon64s and about 5 of 199 cycles for AthlonXPs.

Moved the check for x == 0 inside the check for subnormals. With
gcc-3.4 on uniformly distributed non-large args, this saves another
5 cycles on Athlon64s and loses 1 cycle on AthlonXPs.

Use INSERT_WORDS() and not SET_HIGH_WORD() when converting the first
approximation from bits to double. With gcc-3.4 on uniformly distributed
non-large args, this saves another 4 cycles on both Athlon64s and and
AthlonXPs.

Accessing doubles as 2 words may be an optimization on old CPUs, but on
current CPUs it tends to cause extra operations and pipeline stalls,
especially for writes, even when only 1 of the words needs to be accessed.

Removed an unused variable.


# 153520 18-Dec-2005 bde

Use a minimax polynomial approximation instead of a Pade rational
function approximation for the second step. The polynomial has degree
2 for cbrtf() and 4 for cbrt(). These degrees are minimal for the final
accuracy to be essentially the same as before (slightly smaller).
Adjust the rounding between steps 2 and 3 to match. Unfortunately,
for cbrt(), this breaks the claimed accuracy slightly although incorrect
rounding doesn't. Claim less accuracy since its not worth pessimizing
the polynomial or relying on exhaustive testing to get insignificantly
more accuracy.

This saves about 30 cycles on Athlons (mainly by avoiding 2 divisions)
so it gives an overall optimization in the 10-25% range (a larger
percentage for float precision, especially in 32-bit mode, since other
overheads are more dominant for double precision, surprisingly more
in 32-bit mode).


# 153517 18-Dec-2005 bde

Fixed code to match comments and the algorithm:
- in preparing for the third approximation, actually make t larger in
magnitude than cbrt(x). After chopping, t must be incremented by 2
ulps to make it larger, not 1 ulp since chopping can reduce it by
almost 1 ulp and it might already be up to half a different-sized-ulp
smaller than cbrt(x). I have not found any cases where this is
essential, but the think-time error bound depends on it. The relative
smallness of the different-sized-ulp limited the bug. If there are
cases where this is essential, then the final error bound would be
5/6+epsilon instead of of 4/6+epsilon ulps (still < 1).
- in preparing for the third approximation, round more carefully (but
still sloppily to avoid branches) so that the claimed error bound of
0.667 ulps is satisfied in all cases tested for cbrt() and remains
satisfied in all cases for cbrtf(). There isn't enough spare precision
for very sloppy rounding to work:
- in cbrt(), even with the inadequate increment, the actual error was
0.6685 in some cases, and correcting the increment increased this
a little. The fix uses sloppy rounding to 25 bits instead of very
sloppy rounding to 21 bits, and starts using uint64_t instead of 2
words for bit manipulation so that rounding more bits is not much
costly.
- in cbrtf(), the 0.667 bound was already satisfied even with the
inadequate increment, but change the code to almost match cbrt()
anyway. There is not enough spare precision in the Newton
approximation to double the inadequate increment without exceeding
the 0.667 bound, and no spare precision to avoid this problem as
in cbrt(). The fix is to round using an increment of 2 smaller-ulps
before chopping so that an increment of 1 ulp is enough. In cbrt(),
we essentially do the same, but move the chop point so that the
increment of 1 is not needed.

Fixed comments to match code:
- in cbrt(), the second approximation is good to 25 bits, not quite 26 bits.
- in cbrt(), don't claim that the second approximation may be implemented
in single precision. Single precision cannot handle the full exponent
range without minor but pessimal changes to renormalize, and although
single precision is enough, 25 bit precision is now claimed and used.

Added comments about some of the magic for the error bound 4/6+epsilon.
I still don't understand why it is 4/6+ and not 6/6+ ulps.

Indent comments at the right of code more consistently.


# 153447 15-Dec-2005 bde

Added comments about the apparently-magic rational function used in
the second step of approximating cbrt(x). It turns out to be neither
very magic not nor very good. It is just the (2,2) Pade approximation
to 1/cbrt(r) at r = 1, arranged in a strange way to use fewer operations
at a cost of replacing 4 multiplications by 1 division, which is an
especially bad tradeoff on machines where some of the multiplications
can be done in parallel. A Remez rational approximation would give
at least 2 more bits of accuracy, but the (2,2) Pade approximation
already gives 6 more bits than needed. (Changed the comment which
essentially says that it gives 3 more bits.)

Lower order Pade approximations are not quite accurate enough for
double precision but are plenty for float precision. A lower order
Remez rational approximation might be enough for double precision too.
However, rational approximations inherently require an extra division,
and polynomial approximations work well for 1/cbrt(r) at r = 1, so I
plan to switch to using the latter. There are some technical
complications that tend to cost a division in another way.


# 153386 13-Dec-2005 bde

Optimize by not doing excessive conversions for handling the sign bit.
This gives an optimization of between 9 and 22% on Athlons (largest
for cbrt() on amd64 -- from 205 to 159 cycles).

We extracted the sign bit and worked with |x|, and restored the sign
bit as the last step. We avoided branches to a fault by using accesses
to FP values as bits to clear and restore the sign bit. Avoiding
branches is usually good, but the bit access macros are not so good
(especially for setting FP values), and here they always caused pipeline
stalls on Athlons. Even using branches would be faster except on args
that give perfect branch misprediction, since only mispredicted branches
cause stalls, but it possible to avoid touching the sign bit in FP
values at all (except to preserve it in conversions from bits to FP
not related to the sign bit). Do this. The results are identical
except in 2 of the 3 unsupported rounding modes, since all the
approximations use odd rational functions so they work right on strictly
negative values, and the special case of -0 doesn't use an approximation.


# 153382 13-Dec-2005 bde

Fixed some especially horrible style bugs (indentation that is neither
KNF nor fdlibmNF combined with multiple statements per line).


# 153306 11-Dec-2005 bde

Added comments about the magic behind
<cbrt(x) in bits> ~= <x in bits>/3 + BIAS.
Keep the large comments only in the double version as usual.

Fixed some style bugs (mainly grammar and spelling errors in comments).


# 97413 28-May-2002 alfred

Fix formatting, this is hard to explain, so I'll show one example.

- float ynf(int n, float x) /* wrapper ynf */
+float
+ynf(int n, float x) /* wrapper ynf */

This is because the __STDC__ stuff was indented.

Reviewed by: md5


# 97409 28-May-2002 alfred

Assume __STDC__, remove non-__STDC__ code.

Reviewed by: md5


# 50476 27-Aug-1999 peter

$Id$ -> $FreeBSD$


# 22993 22-Feb-1997 peter

Revert $FreeBSD$ to $Id$


# 21673 14-Jan-1997 jkh

Make the long-awaited change from $Id$ to $FreeBSD$

This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.


# 8870 30-May-1995 rgrimes

Remove trailing whitespace.


# 2117 19-Aug-1994 jkh

This commit was generated by cvs2svn to compensate for changes in r2116,
which included commits to RCS files with non-trunk default branches.


# 2116 19-Aug-1994 jkh

J.T. Conklin's latest version of the Sun math library.

-- Begin comments from J.T. Conklin:
The most significant improvement is the addition of "float" versions
of the math functions that take float arguments, return floats, and do
all operations in floating point. This doesn't help (performance)
much on the i386, but they are still nice to have.

The float versions were orginally done by Cygnus' Ian Taylor when
fdlibm was integrated into the libm we support for embedded systems.
I gave Ian a copy of my libm as a starting point since I had already
fixed a lot of bugs & problems in Sun's original code. After he was
done, I cleaned it up a bit and integrated the changes back into my
libm.
-- End comments

Reviewed by: jkh
Submitted by: jtc