softfloat.c - OpenGrok cross reference for /kernel/linux/linux-5.10/arch/arm/nwfpe/softfloat.c

Lines Matching full:-
4 This C source file is part of the SoftFloat IEC/IEEE Floating-point
10 National Science Foundation under grant MIP-9311980.  The original version
11 of this code was written as part of a project to build a fixed-point vector
15 http://www.jhauser.us/arithmetic/SoftFloat-2b/SoftFloat-source.txt
38 -------------------------------------------------------------------------------
39 Primitive arithmetic functions, including multi-word arithmetic, and
42 -------------------------------------------------------------------------------
44 #include "softfloat-macros"
47 -------------------------------------------------------------------------------
52 are propagated from function inputs to output.  These details are target-
54 -------------------------------------------------------------------------------
56 #include "softfloat-specialize"
59 -------------------------------------------------------------------------------
60 Takes a 64-bit fixed-point value `absZ' with binary point between bits 6
61 and 7, and returns the properly rounded 32-bit integer corresponding to the
63 to an integer.  Bit 63 of `absZ' must be zero.  Ordinarily, the fixed-point
65 the input cannot be represented exactly as an integer.  If the fixed-point
68 -------------------------------------------------------------------------------
77     roundingMode = roundData->mode;  in roundAndPackInt32()
98     if ( zSign ) z = - z;  in roundAndPackInt32()
100         roundData->exception |= float_flag_invalid;  in roundAndPackInt32()
103     if ( roundBits ) roundData->exception |= float_flag_inexact;  in roundAndPackInt32()
109 -------------------------------------------------------------------------------
110 Returns the fraction bits of the single-precision floating-point value `a'.
111 -------------------------------------------------------------------------------
121 -------------------------------------------------------------------------------
122 Returns the exponent bits of the single-precision floating-point value `a'.
123 -------------------------------------------------------------------------------
133 -------------------------------------------------------------------------------
134 Returns the sign bit of the single-precision floating-point value `a'.
135 -------------------------------------------------------------------------------
147 -------------------------------------------------------------------------------
148 Normalizes the subnormal single-precision floating-point value represented
152 -------------------------------------------------------------------------------
159     shiftCount = countLeadingZeros32( aSig ) - 8;  in normalizeFloat32Subnormal()
161     *zExpPtr = 1 - shiftCount;  in normalizeFloat32Subnormal()
166 -------------------------------------------------------------------------------
168 single-precision floating-point value, returning the result.  After being
175 -------------------------------------------------------------------------------
195 -------------------------------------------------------------------------------
196 Takes an abstract floating-point value having sign `zSign', exponent `zExp',
197 and significand `zSig', and returns the proper single-precision floating-
199 value is simply rounded and packed into the single-precision format, with
205 the abstract input cannot be represented exactly as a subnormal single-
206 precision floating-point number.
212 normalized, `zExp' must be 1 less than the ``true'' floating-point exponent.
214 Binary Floating-point Arithmetic.
215 -------------------------------------------------------------------------------
224     roundingMode = roundData->mode;  in roundAndPackFloat32()
247             roundData->exception |= float_flag_overflow | float_flag_inexact;  in roundAndPackFloat32()
248             return packFloat32( zSign, 0xFF, 0 ) - ( roundIncrement == 0 );  in roundAndPackFloat32()
253                 || ( zExp < -1 )  in roundAndPackFloat32()
255             shift32RightJamming( zSig, - zExp, &zSig );  in roundAndPackFloat32()
258             if ( isTiny && roundBits ) roundData->exception |= float_flag_underflow;  in roundAndPackFloat32()
261     if ( roundBits ) roundData->exception |= float_flag_inexact;  in roundAndPackFloat32()
270 -------------------------------------------------------------------------------
271 Takes an abstract floating-point value having sign `zSign', exponent `zExp',
272 and significand `zSig', and returns the proper single-precision floating-
275 any way.  In all cases, `zExp' must be 1 less than the ``true'' floating-
277 -------------------------------------------------------------------------------
284     shiftCount = countLeadingZeros32( zSig ) - 1;  in normalizeRoundAndPackFloat32()
285     return roundAndPackFloat32( roundData, zSign, zExp - shiftCount, zSig<<shiftCount );  in normalizeRoundAndPackFloat32()
290 -------------------------------------------------------------------------------
291 Returns the fraction bits of the double-precision floating-point value `a'.
292 -------------------------------------------------------------------------------
302 -------------------------------------------------------------------------------
303 Returns the exponent bits of the double-precision floating-point value `a'.
304 -------------------------------------------------------------------------------
314 -------------------------------------------------------------------------------
315 Returns the sign bit of the double-precision floating-point value `a'.
316 -------------------------------------------------------------------------------
328 -------------------------------------------------------------------------------
329 Normalizes the subnormal double-precision floating-point value represented
333 -------------------------------------------------------------------------------
340     shiftCount = countLeadingZeros64( aSig ) - 11;  in normalizeFloat64Subnormal()
342     *zExpPtr = 1 - shiftCount;  in normalizeFloat64Subnormal()
347 -------------------------------------------------------------------------------
349 double-precision floating-point value, returning the result.  After being
356 -------------------------------------------------------------------------------
366 -------------------------------------------------------------------------------
367 Takes an abstract floating-point value having sign `zSign', exponent `zExp',
368 and significand `zSig', and returns the proper double-precision floating-
370 value is simply rounded and packed into the double-precision format, with
376 the abstract input cannot be represented exactly as a subnormal double-
377 precision floating-point number.
383 normalized, `zExp' must be 1 less than the ``true'' floating-point exponent.
385 Binary Floating-point Arithmetic.
386 -------------------------------------------------------------------------------
395     roundingMode = roundData->mode;  in roundAndPackFloat64()
420             roundData->exception |= float_flag_overflow | float_flag_inexact;  in roundAndPackFloat64()
421             return packFloat64( zSign, 0x7FF, 0 ) - ( roundIncrement == 0 );  in roundAndPackFloat64()
426                 || ( zExp < -1 )  in roundAndPackFloat64()
428             shift64RightJamming( zSig, - zExp, &zSig );  in roundAndPackFloat64()
431             if ( isTiny && roundBits ) roundData->exception |= float_flag_underflow;  in roundAndPackFloat64()
434     if ( roundBits ) roundData->exception |= float_flag_inexact;  in roundAndPackFloat64()
443 -------------------------------------------------------------------------------
444 Takes an abstract floating-point value having sign `zSign', exponent `zExp',
445 and significand `zSig', and returns the proper double-precision floating-
448 any way.  In all cases, `zExp' must be 1 less than the ``true'' floating-
450 -------------------------------------------------------------------------------
457     shiftCount = countLeadingZeros64( zSig ) - 1;  in normalizeRoundAndPackFloat64()
458     return roundAndPackFloat64( roundData, zSign, zExp - shiftCount, zSig<<shiftCount );  in normalizeRoundAndPackFloat64()
465 -------------------------------------------------------------------------------
466 Returns the fraction bits of the extended double-precision floating-point
468 -------------------------------------------------------------------------------
478 -------------------------------------------------------------------------------
479 Returns the exponent bits of the extended double-precision floating-point
481 -------------------------------------------------------------------------------
491 -------------------------------------------------------------------------------
492 Returns the sign bit of the extended double-precision floating-point value
494 -------------------------------------------------------------------------------
504 -------------------------------------------------------------------------------
505 Normalizes the subnormal extended double-precision floating-point value
509 -------------------------------------------------------------------------------
518     *zExpPtr = 1 - shiftCount;  in normalizeFloatx80Subnormal()
523 -------------------------------------------------------------------------------
525 extended double-precision floating-point value, returning the result.
526 -------------------------------------------------------------------------------
540 -------------------------------------------------------------------------------
541 Takes an abstract floating-point value having sign `zSign', exponent `zExp',
543 and returns the proper extended double-precision floating-point value
545 rounded and packed into the extended double-precision format, with the
552 double-precision floating-point number.
555 result is rounded to the full precision of the extended double-precision
561 Floating-point Arithmetic.
562 -------------------------------------------------------------------------------
573     roundingMode = roundData->mode;  in roundAndPackFloatx80()
574     roundingPrecision = roundData->precision;  in roundAndPackFloatx80()
604     if ( 0x7FFD <= (bits32) ( zExp - 1 ) ) {  in roundAndPackFloatx80()
615             shift64RightJamming( zSig0, 1 - zExp, &zSig0 );  in roundAndPackFloatx80()
618             if ( isTiny && roundBits ) roundData->exception |= float_flag_underflow;  in roundAndPackFloatx80()
619             if ( roundBits ) roundData->exception |= float_flag_inexact;  in roundAndPackFloatx80()
630     if ( roundBits ) roundData->exception |= float_flag_inexact;  in roundAndPackFloatx80()
658     if ( 0x7FFD <= (bits32) ( zExp - 1 ) ) {  in roundAndPackFloatx80()
667             roundData->exception |= float_flag_overflow | float_flag_inexact;  in roundAndPackFloatx80()
682             shift64ExtraRightJamming( zSig0, zSig1, 1 - zExp, &zSig0, &zSig1 );  in roundAndPackFloatx80()
684             if ( isTiny && zSig1 ) roundData->exception |= float_flag_underflow;  in roundAndPackFloatx80()
685             if ( zSig1 ) roundData->exception |= float_flag_inexact;  in roundAndPackFloatx80()
705     if ( zSig1 ) roundData->exception |= float_flag_inexact;  in roundAndPackFloatx80()
724 -------------------------------------------------------------------------------
725 Takes an abstract floating-point value having sign `zSign', exponent
727 and returns the proper extended double-precision floating-point value
731 -------------------------------------------------------------------------------
743         zExp -= 64;  in normalizeRoundAndPackFloatx80()
747     zExp -= shiftCount;  in normalizeRoundAndPackFloatx80()
756 -------------------------------------------------------------------------------
757 Returns the result of converting the 32-bit two's complement integer `a' to
758 the single-precision floating-point format.  The conversion is performed
759 according to the IEC/IEEE Standard for Binary Floating-point Arithmetic.
760 -------------------------------------------------------------------------------
769     return normalizeRoundAndPackFloat32( roundData, zSign, 0x9C, zSign ? - a : a );  in int32_to_float32()
774 -------------------------------------------------------------------------------
775 Returns the result of converting the 32-bit two's complement integer `a' to
776 the double-precision floating-point format.  The conversion is performed
777 according to the IEC/IEEE Standard for Binary Floating-point Arithmetic.
778 -------------------------------------------------------------------------------
789     absA = aSign ? - a : a;  in int32_to_float64()
792     return packFloat64( aSign, 0x432 - shiftCount, zSig<<shiftCount );  in int32_to_float64()
799 -------------------------------------------------------------------------------
800 Returns the result of converting the 32-bit two's complement integer `a'
801 to the extended double-precision floating-point format.  The conversion
802 is performed according to the IEC/IEEE Standard for Binary Floating-point
804 -------------------------------------------------------------------------------
815     absA = zSign ? - a : a;  in int32_to_floatx80()
818     return packFloatx80( zSign, 0x403E - shiftCount, zSig<<shiftCount );  in int32_to_floatx80()
825 -------------------------------------------------------------------------------
826 Returns the result of converting the single-precision floating-point value
827 `a' to the 32-bit two's complement integer format.  The conversion is
828 performed according to the IEC/IEEE Standard for Binary Floating-point
829 Arithmetic---which means in particular that the conversion is rounded
833 -------------------------------------------------------------------------------
847     shiftCount = 0xAF - aExp;  in float32_to_int32()
856 -------------------------------------------------------------------------------
857 Returns the result of converting the single-precision floating-point value
858 `a' to the 32-bit two's complement integer format.  The conversion is
859 performed according to the IEC/IEEE Standard for Binary Floating-point
864 -------------------------------------------------------------------------------
876     shiftCount = aExp - 0x9E;  in float32_to_int32_round_to_zero()
888     z = aSig>>( - shiftCount );  in float32_to_int32_round_to_zero()
892     return aSign ? - z : z;  in float32_to_int32_round_to_zero()
897 -------------------------------------------------------------------------------
898 Returns the result of converting the single-precision floating-point value
899 `a' to the double-precision floating-point format.  The conversion is
900 performed according to the IEC/IEEE Standard for Binary Floating-point
902 -------------------------------------------------------------------------------
920         --aExp;  in float32_to_float64()
929 -------------------------------------------------------------------------------
930 Returns the result of converting the single-precision floating-point value
931 `a' to the extended double-precision floating-point format.  The conversion
932 is performed according to the IEC/IEEE Standard for Binary Floating-point
934 -------------------------------------------------------------------------------
961 -------------------------------------------------------------------------------
962 Rounds the single-precision floating-point value `a' to an integer, and
963 returns the result as a single-precision floating-point value.  The
965 Floating-point Arithmetic.
966 -------------------------------------------------------------------------------
983     roundingMode = roundData->mode;  in float32_round_to_int()
986         roundData->exception |= float_flag_inexact;  in float32_round_to_int()
1002     lastBitMask <<= 0x96 - aExp;  in float32_round_to_int()
1003     roundBitsMask = lastBitMask - 1;  in float32_round_to_int()
1015     if ( z != a ) roundData->exception |= float_flag_inexact;  in float32_round_to_int()
1021 -------------------------------------------------------------------------------
1022 Returns the result of adding the absolute values of the single-precision
1023 floating-point values `a' and `b'.  If `zSign' is true, the sum is negated
1026 Floating-point Arithmetic.
1027 -------------------------------------------------------------------------------
1039     expDiff = aExp - bExp;  in addFloat32Sigs()
1048             --expDiff;  in addFloat32Sigs()
1067         shift32RightJamming( aSig, - expDiff, &aSig );  in addFloat32Sigs()
1082     --zExp;  in addFloat32Sigs()
1093 -------------------------------------------------------------------------------
1094 Returns the result of subtracting the absolute values of the single-
1095 precision floating-point values `a' and `b'.  If `zSign' is true, the
1098 Standard for Binary Floating-point Arithmetic.
1099 -------------------------------------------------------------------------------
1111     expDiff = aExp - bExp;  in subFloat32Sigs()
1118         roundData->exception |= float_flag_invalid;  in subFloat32Sigs()
1127     return packFloat32( roundData->mode == float_round_down, 0, 0 );  in subFloat32Sigs()
1139     shift32RightJamming( aSig, - expDiff, &aSig );  in subFloat32Sigs()
1142     zSig = bSig - aSig;  in subFloat32Sigs()
1152         --expDiff;  in subFloat32Sigs()
1160     zSig = aSig - bSig;  in subFloat32Sigs()
1163     --zExp;  in subFloat32Sigs()
1169 -------------------------------------------------------------------------------
1170 Returns the result of adding the single-precision floating-point values `a'
1172 Binary Floating-point Arithmetic.
1173 -------------------------------------------------------------------------------
1191 -------------------------------------------------------------------------------
1192 Returns the result of subtracting the single-precision floating-point values
1194 for Binary Floating-point Arithmetic.
1195 -------------------------------------------------------------------------------
1213 -------------------------------------------------------------------------------
1214 Returns the result of multiplying the single-precision floating-point values
1216 for Binary Floating-point Arithmetic.
1217 -------------------------------------------------------------------------------
1239             roundData->exception |= float_flag_invalid;  in float32_mul()
1247             roundData->exception |= float_flag_invalid;  in float32_mul()
1260     zExp = aExp + bExp - 0x7F;  in float32_mul()
1267         --zExp;  in float32_mul()
1274 -------------------------------------------------------------------------------
1275 Returns the result of dividing the single-precision floating-point value `a'
1277 IEC/IEEE Standard for Binary Floating-point Arithmetic.
1278 -------------------------------------------------------------------------------
1297             roundData->exception |= float_flag_invalid;  in float32_div()
1309                 roundData->exception |= float_flag_invalid;  in float32_div()
1312             roundData->exception |= float_flag_divbyzero;  in float32_div()
1321     zExp = aExp - bExp + 0x7D;  in float32_div()
1341 -------------------------------------------------------------------------------
1342 Returns the remainder of the single-precision floating-point value `a'
1344 according to the IEC/IEEE Standard for Binary Floating-point Arithmetic.
1345 -------------------------------------------------------------------------------
1367         roundData->exception |= float_flag_invalid;  in float32_rem()
1376             roundData->exception |= float_flag_invalid;  in float32_rem()
1385     expDiff = aExp - bExp;  in float32_rem()
1392             if ( expDiff < -1 ) return a;  in float32_rem()
1396         if ( q ) aSig -= bSig;  in float32_rem()
1401             q >>= 32 - expDiff;  in float32_rem()
1403             aSig = ( ( aSig>>1 )<<( expDiff - 1 ) ) - bSig * q;  in float32_rem()
1411         if ( bSig <= aSig ) aSig -= bSig;  in float32_rem()
1414         expDiff -= 64;  in float32_rem()
1417             q64 = ( 2 < q64 ) ? q64 - 2 : 0;  in float32_rem()
1418             aSig64 = - ( ( bSig * q64 )<<38 );  in float32_rem()
1419             expDiff -= 62;  in float32_rem()
1423         q64 = ( 2 < q64 ) ? q64 - 2 : 0;  in float32_rem()
1424         q = q64>>( 64 - expDiff );  in float32_rem()
1426         aSig = ( ( aSig64>>33 )<<( expDiff - 1 ) ) - bSig * q;  in float32_rem()
1431         aSig -= bSig;  in float32_rem()
1438     if ( zSign ) aSig = - aSig;  in float32_rem()
1444 -------------------------------------------------------------------------------
1445 Returns the square root of the single-precision floating-point value `a'.
1447 Floating-point Arithmetic.
1448 -------------------------------------------------------------------------------
1463         roundData->exception |= float_flag_invalid;  in float32_sqrt()
1468         roundData->exception |= float_flag_invalid;  in float32_sqrt()
1475     zExp = ( ( aExp - 0x7F )>>1 ) + 0x7E;  in float32_sqrt()
1485             rem = ( ( (bits64) aSig )<<32 ) - term;  in float32_sqrt()
1487                 --zSig;  in float32_sqrt()
1499 -------------------------------------------------------------------------------
1500 Returns 1 if the single-precision floating-point value `a' is equal to the
1502 according to the IEC/IEEE Standard for Binary Floating-point Arithmetic.
1503 -------------------------------------------------------------------------------
1521 -------------------------------------------------------------------------------
1522 Returns 1 if the single-precision floating-point value `a' is less than or
1524 performed according to the IEC/IEEE Standard for Binary Floating-point
1526 -------------------------------------------------------------------------------
1546 -------------------------------------------------------------------------------
1547 Returns 1 if the single-precision floating-point value `a' is less than
1549 according to the IEC/IEEE Standard for Binary Floating-point Arithmetic.
1550 -------------------------------------------------------------------------------
1570 -------------------------------------------------------------------------------
1571 Returns 1 if the single-precision floating-point value `a' is equal to the
1574 according to the IEC/IEEE Standard for Binary Floating-point Arithmetic.
1575 -------------------------------------------------------------------------------
1591 -------------------------------------------------------------------------------
1592 Returns 1 if the single-precision floating-point value `a' is less than or
1595 IEC/IEEE Standard for Binary Floating-point Arithmetic.
1596 -------------------------------------------------------------------------------
1617 -------------------------------------------------------------------------------
1618 Returns 1 if the single-precision floating-point value `a' is less than
1621 Standard for Binary Floating-point Arithmetic.
1622 -------------------------------------------------------------------------------
1642 -------------------------------------------------------------------------------
1643 Returns the result of converting the double-precision floating-point value
1644 `a' to the 32-bit two's complement integer format.  The conversion is
1645 performed according to the IEC/IEEE Standard for Binary Floating-point
1646 Arithmetic---which means in particular that the conversion is rounded
1650 -------------------------------------------------------------------------------
1663     shiftCount = 0x42C - aExp;  in float64_to_int32()
1670 -------------------------------------------------------------------------------
1671 Returns the result of converting the double-precision floating-point value
1672 `a' to the 32-bit two's complement integer format.  The conversion is
1673 performed according to the IEC/IEEE Standard for Binary Floating-point
1678 -------------------------------------------------------------------------------
1690     shiftCount = 0x433 - aExp;  in float64_to_int32_round_to_zero()
1703     if ( aSign ) z = - z;  in float64_to_int32_round_to_zero()
1717 -------------------------------------------------------------------------------
1718 Returns the result of converting the double-precision floating-point value
1719 `a' to the 32-bit two's complement unsigned integer format.  The conversion
1720 is performed according to the IEC/IEEE Standard for Binary Floating-point
1721 Arithmetic---which means in particular that the conversion is rounded
1725 -------------------------------------------------------------------------------
1738     shiftCount = 0x42C - aExp;  in float64_to_uint32()
1744 -------------------------------------------------------------------------------
1745 Returns the result of converting the double-precision floating-point value
1746 `a' to the 32-bit two's complement integer format.  The conversion is
1747 performed according to the IEC/IEEE Standard for Binary Floating-point
1751 -------------------------------------------------------------------------------
1763     shiftCount = 0x433 - aExp;  in float64_to_uint32_round_to_zero()
1776     if ( aSign ) z = - z;  in float64_to_uint32_round_to_zero()
1789 -------------------------------------------------------------------------------
1790 Returns the result of converting the double-precision floating-point value
1791 `a' to the single-precision floating-point format.  The conversion is
1792 performed according to the IEC/IEEE Standard for Binary Floating-point
1794 -------------------------------------------------------------------------------
1814         aExp -= 0x381;  in float64_to_float32()
1823 -------------------------------------------------------------------------------
1824 Returns the result of converting the double-precision floating-point value
1825 `a' to the extended double-precision floating-point format.  The conversion
1826 is performed according to the IEC/IEEE Standard for Binary Floating-point
1828 -------------------------------------------------------------------------------
1856 -------------------------------------------------------------------------------
1857 Rounds the double-precision floating-point value `a' to an integer, and
1858 returns the result as a double-precision floating-point value.  The
1860 Floating-point Arithmetic.
1861 -------------------------------------------------------------------------------
1880         roundData->exception |= float_flag_inexact;  in float64_round_to_int()
1882         switch ( roundData->mode ) {  in float64_round_to_int()
1897     lastBitMask <<= 0x433 - aExp;  in float64_round_to_int()
1898     roundBitsMask = lastBitMask - 1;  in float64_round_to_int()
1900     roundingMode = roundData->mode;  in float64_round_to_int()
1911     if ( z != a ) roundData->exception |= float_flag_inexact;  in float64_round_to_int()
1917 -------------------------------------------------------------------------------
1918 Returns the result of adding the absolute values of the double-precision
1919 floating-point values `a' and `b'.  If `zSign' is true, the sum is negated
1922 Floating-point Arithmetic.
1923 -------------------------------------------------------------------------------
1935     expDiff = aExp - bExp;  in addFloat64Sigs()
1944             --expDiff;  in addFloat64Sigs()
1963         shift64RightJamming( aSig, - expDiff, &aSig );  in addFloat64Sigs()
1978     --zExp;  in addFloat64Sigs()
1989 -------------------------------------------------------------------------------
1990 Returns the result of subtracting the absolute values of the double-
1991 precision floating-point values `a' and `b'.  If `zSign' is true, the
1994 Standard for Binary Floating-point Arithmetic.
1995 -------------------------------------------------------------------------------
2007     expDiff = aExp - bExp;  in subFloat64Sigs()
2014         roundData->exception |= float_flag_invalid;  in subFloat64Sigs()
2023     return packFloat64( roundData->mode == float_round_down, 0, 0 );  in subFloat64Sigs()
2035     shift64RightJamming( aSig, - expDiff, &aSig );  in subFloat64Sigs()
2038     zSig = bSig - aSig;  in subFloat64Sigs()
2048         --expDiff;  in subFloat64Sigs()
2056     zSig = aSig - bSig;  in subFloat64Sigs()
2059     --zExp;  in subFloat64Sigs()
2065 -------------------------------------------------------------------------------
2066 Returns the result of adding the double-precision floating-point values `a'
2068 Binary Floating-point Arithmetic.
2069 -------------------------------------------------------------------------------
2087 -------------------------------------------------------------------------------
2088 Returns the result of subtracting the double-precision floating-point values
2090 for Binary Floating-point Arithmetic.
2091 -------------------------------------------------------------------------------
2109 -------------------------------------------------------------------------------
2110 Returns the result of multiplying the double-precision floating-point values
2112 for Binary Floating-point Arithmetic.
2113 -------------------------------------------------------------------------------
2133             roundData->exception |= float_flag_invalid;  in float64_mul()
2141             roundData->exception |= float_flag_invalid;  in float64_mul()
2154     zExp = aExp + bExp - 0x3FF;  in float64_mul()
2161         --zExp;  in float64_mul()
2168 -------------------------------------------------------------------------------
2169 Returns the result of dividing the double-precision floating-point value `a'
2171 the IEC/IEEE Standard for Binary Floating-point Arithmetic.
2172 -------------------------------------------------------------------------------
2193             roundData->exception |= float_flag_invalid;  in float64_div()
2205                 roundData->exception |= float_flag_invalid;  in float64_div()
2208             roundData->exception |= float_flag_divbyzero;  in float64_div()
2217     zExp = aExp - bExp + 0x3FD;  in float64_div()
2229             --zSig;  in float64_div()
2239 -------------------------------------------------------------------------------
2240 Returns the remainder of the double-precision floating-point value `a'
2242 according to the IEC/IEEE Standard for Binary Floating-point Arithmetic.
2243 -------------------------------------------------------------------------------
2263         roundData->exception |= float_flag_invalid;  in float64_rem()
2272             roundData->exception |= float_flag_invalid;  in float64_rem()
2281     expDiff = aExp - bExp;  in float64_rem()
2285         if ( expDiff < -1 ) return a;  in float64_rem()
2289     if ( q ) aSig -= bSig;  in float64_rem()
2290     expDiff -= 64;  in float64_rem()
2293         q = ( 2 < q ) ? q - 2 : 0;  in float64_rem()
2294         aSig = - ( ( bSig>>2 ) * q );  in float64_rem()
2295         expDiff -= 62;  in float64_rem()
2300         q = ( 2 < q ) ? q - 2 : 0;  in float64_rem()
2301         q >>= 64 - expDiff;  in float64_rem()
2303         aSig = ( ( aSig>>1 )<<( expDiff - 1 ) ) - bSig * q;  in float64_rem()
2312         aSig -= bSig;  in float64_rem()
2319     if ( zSign ) aSig = - aSig;  in float64_rem()
2325 -------------------------------------------------------------------------------
2326 Returns the square root of the double-precision floating-point value `a'.
2328 Floating-point Arithmetic.
2329 -------------------------------------------------------------------------------
2345         roundData->exception |= float_flag_invalid;  in float64_sqrt()
2350         roundData->exception |= float_flag_invalid;  in float64_sqrt()
2357     zExp = ( ( aExp - 0x3FF )>>1 ) + 0x3FE;  in float64_sqrt()
2361     aSig <<= 9 - ( aExp & 1 );  in float64_sqrt()
2372                 --zSig;  in float64_sqrt()
2386 -------------------------------------------------------------------------------
2387 Returns 1 if the double-precision floating-point value `a' is equal to the
2389 according to the IEC/IEEE Standard for Binary Floating-point Arithmetic.
2390 -------------------------------------------------------------------------------
2408 -------------------------------------------------------------------------------
2409 Returns 1 if the double-precision floating-point value `a' is less than or
2411 performed according to the IEC/IEEE Standard for Binary Floating-point
2413 -------------------------------------------------------------------------------
2433 -------------------------------------------------------------------------------
2434 Returns 1 if the double-precision floating-point value `a' is less than
2436 according to the IEC/IEEE Standard for Binary Floating-point Arithmetic.
2437 -------------------------------------------------------------------------------
2457 -------------------------------------------------------------------------------
2458 Returns 1 if the double-precision floating-point value `a' is equal to the
2461 according to the IEC/IEEE Standard for Binary Floating-point Arithmetic.
2462 -------------------------------------------------------------------------------
2478 -------------------------------------------------------------------------------
2479 Returns 1 if the double-precision floating-point value `a' is less than or
2482 IEC/IEEE Standard for Binary Floating-point Arithmetic.
2483 -------------------------------------------------------------------------------
2504 -------------------------------------------------------------------------------
2505 Returns 1 if the double-precision floating-point value `a' is less than
2508 Standard for Binary Floating-point Arithmetic.
2509 -------------------------------------------------------------------------------
2531 -------------------------------------------------------------------------------
2532 Returns the result of converting the extended double-precision floating-
2533 point value `a' to the 32-bit two's complement integer format.  The
2535 Floating-point Arithmetic---which means in particular that the conversion
2539 -------------------------------------------------------------------------------
2551     shiftCount = 0x4037 - aExp;  in floatx80_to_int32()
2559 -------------------------------------------------------------------------------
2560 Returns the result of converting the extended double-precision floating-
2561 point value `a' to the 32-bit two's complement integer format.  The
2563 Floating-point Arithmetic, except that the conversion is always rounded
2567 -------------------------------------------------------------------------------
2579     shiftCount = 0x403E - aExp;  in floatx80_to_int32_round_to_zero()
2591     if ( aSign ) z = - z;  in floatx80_to_int32_round_to_zero()
2605 -------------------------------------------------------------------------------
2606 Returns the result of converting the extended double-precision floating-
2607 point value `a' to the single-precision floating-point format.  The
2609 Floating-point Arithmetic.
2610 -------------------------------------------------------------------------------
2628     if ( aExp || aSig ) aExp -= 0x3F81;  in floatx80_to_float32()
2634 -------------------------------------------------------------------------------
2635 Returns the result of converting the extended double-precision floating-
2636 point value `a' to the double-precision floating-point format.  The
2638 Floating-point Arithmetic.
2639 -------------------------------------------------------------------------------
2657     if ( aExp || aSig ) aExp -= 0x3C01;  in floatx80_to_float64()
2663 -------------------------------------------------------------------------------
2664 Rounds the extended double-precision floating-point value `a' to an integer,
2665 and returns the result as an extended quadruple-precision floating-point
2667 Binary Floating-point Arithmetic.
2668 -------------------------------------------------------------------------------
2690         roundData->exception |= float_flag_inexact;  in floatx80_round_to_int()
2692         switch ( roundData->mode ) {  in floatx80_round_to_int()
2713     lastBitMask <<= 0x403E - aExp;  in floatx80_round_to_int()
2714     roundBitsMask = lastBitMask - 1;  in floatx80_round_to_int()
2716     roundingMode = roundData->mode;  in floatx80_round_to_int()
2731     if ( z.low != a.low ) roundData->exception |= float_flag_inexact;  in floatx80_round_to_int()
2737 -------------------------------------------------------------------------------
2738 Returns the result of adding the absolute values of the extended double-
2739 precision floating-point values `a' and `b'.  If `zSign' is true, the sum is
2742 Floating-point Arithmetic.
2743 -------------------------------------------------------------------------------
2755     expDiff = aExp - bExp;  in addFloatx80Sigs()
2761         if ( bExp == 0 ) --expDiff;  in addFloatx80Sigs()
2771         shift64ExtraRightJamming( aSig, 0, - expDiff, &aSig, &zSig1 );  in addFloatx80Sigs()
2806 -------------------------------------------------------------------------------
2808 double-precision floating-point values `a' and `b'.  If `zSign' is true,
2811 Standard for Binary Floating-point Arithmetic.
2812 -------------------------------------------------------------------------------
2825     expDiff = aExp - bExp;  in subFloatx80Sigs()
2832         roundData->exception |= float_flag_invalid;  in subFloatx80Sigs()
2845     return packFloatx80( roundData->mode == float_round_down, 0, 0 );  in subFloatx80Sigs()
2852     shift128RightJamming( aSig, 0, - expDiff, &aSig, &zSig1 );  in subFloatx80Sigs()
2863     if ( bExp == 0 ) --expDiff;  in subFloatx80Sigs()
2876 -------------------------------------------------------------------------------
2877 Returns the result of adding the extended double-precision floating-point
2879 Standard for Binary Floating-point Arithmetic.
2880 -------------------------------------------------------------------------------
2898 -------------------------------------------------------------------------------
2899 Returns the result of subtracting the extended double-precision floating-
2901 IEC/IEEE Standard for Binary Floating-point Arithmetic.
2902 -------------------------------------------------------------------------------
2920 -------------------------------------------------------------------------------
2921 Returns the result of multiplying the extended double-precision floating-
2923 IEC/IEEE Standard for Binary Floating-point Arithmetic.
2924 -------------------------------------------------------------------------------
2952             roundData->exception |= float_flag_invalid;  in floatx80_mul()
2968     zExp = aExp + bExp - 0x3FFE;  in floatx80_mul()
2972         --zExp;  in floatx80_mul()
2981 -------------------------------------------------------------------------------
2982 Returns the result of dividing the extended double-precision floating-point
2984 according to the IEC/IEEE Standard for Binary Floating-point Arithmetic.
2985 -------------------------------------------------------------------------------
3018                 roundData->exception |= float_flag_invalid;  in floatx80_div()
3024             roundData->exception |= float_flag_divbyzero;  in floatx80_div()
3033     zExp = aExp - bExp + 0x3FFE;  in floatx80_div()
3043         --zSig0;  in floatx80_div()
3051             --zSig1;  in floatx80_div()
3063 -------------------------------------------------------------------------------
3064 Returns the remainder of the extended double-precision floating-point value
3066 according to the IEC/IEEE Standard for Binary Floating-point Arithmetic.
3067 -------------------------------------------------------------------------------
3097             roundData->exception |= float_flag_invalid;  in floatx80_rem()
3111     expDiff = aExp - bExp;  in floatx80_rem()
3114         if ( expDiff < -1 ) return a;  in floatx80_rem()
3119     if ( q ) aSig0 -= bSig;  in floatx80_rem()
3120     expDiff -= 64;  in floatx80_rem()
3123         q = ( 2 < q ) ? q - 2 : 0;  in floatx80_rem()
3127         expDiff -= 62;  in floatx80_rem()
3132         q = ( 2 < q ) ? q - 2 : 0;  in floatx80_rem()
3133         q >>= 64 - expDiff;  in floatx80_rem()
3134         mul64To128( bSig, q<<( 64 - expDiff ), &term0, &term1 );  in floatx80_rem()
3136         shortShift128Left( 0, bSig, 64 - expDiff, &term0, &term1 );  in floatx80_rem()
3163 -------------------------------------------------------------------------------
3164 Returns the square root of the extended double-precision floating-point
3166 for Binary Floating-point Arithmetic.
3167 -------------------------------------------------------------------------------
3189         roundData->exception |= float_flag_invalid;  in floatx80_sqrt()
3199     zExp = ( ( aExp - 0x3FFF )>>1 ) + 0x3FFF;  in floatx80_sqrt()
3210         --zSig0;  in floatx80_sqrt()
3225             --zSig1;  in floatx80_sqrt()
3240 -------------------------------------------------------------------------------
3241 Returns 1 if the extended double-precision floating-point value `a' is
3243 performed according to the IEC/IEEE Standard for Binary Floating-point
3245 -------------------------------------------------------------------------------
3271 -------------------------------------------------------------------------------
3272 Returns 1 if the extended double-precision floating-point value `a' is
3275 Floating-point Arithmetic.
3276 -------------------------------------------------------------------------------
3305 -------------------------------------------------------------------------------
3306 Returns 1 if the extended double-precision floating-point value `a' is
3308 is performed according to the IEC/IEEE Standard for Binary Floating-point
3310 -------------------------------------------------------------------------------
3339 -------------------------------------------------------------------------------
3340 Returns 1 if the extended double-precision floating-point value `a' is equal
3343 according to the IEC/IEEE Standard for Binary Floating-point Arithmetic.
3344 -------------------------------------------------------------------------------
3367 -------------------------------------------------------------------------------
3368 Returns 1 if the extended double-precision floating-point value `a' is less
3371 to the IEC/IEEE Standard for Binary Floating-point Arithmetic.
3372 -------------------------------------------------------------------------------
3401 -------------------------------------------------------------------------------
3402 Returns 1 if the extended double-precision floating-point value `a' is less
3405 IEC/IEEE Standard for Binary Floating-point Arithmetic.
3406 -------------------------------------------------------------------------------