1<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" 2"http://www.w3.org/TR/html4/loose.dtd"> 3 4<html> 5<head> 6 <meta http-equiv="Content-Language" content="en-us"> 7 <meta http-equiv="Content-Type" content="text/html; charset=us-ascii"> 8 <link rel="stylesheet" type="text/css" href="../../../../boost.css"> 9 10 <title>Rounding Policies</title> 11</head> 12 13<body lang="en"> 14 <h1>Rounding Policies</h1> 15 16 <p>In order to be as general as possible, the library uses a class to 17 compute all the necessary functions rounded upward or downward. This class 18 is the first parameter of <code>policies</code>, it is also the type named 19 <code>rounding</code> in the policy definition of 20 <code>interval</code>.</p> 21 22 <p>By default, it is <code>interval_lib::rounded_math<T></code>. The 23 class <code>interval_lib::rounded_math</code> is already specialized for 24 the standard floating types (<code>float</code> , <code>double</code> and 25 <code>long double</code>). So if the base type of your intervals is not one 26 of these, a good solution would probably be to provide a specialization of 27 this class. But if the default specialization of 28 <code>rounded_math<T></code> for <code>float</code>, 29 <code>double</code>, or <code>long double</code> is not what you seek, or 30 you do not want to specialize 31 <code>interval_lib::rounded_math<T></code> (say because you prefer to 32 work in your own namespace) you can also define your own rounding policy 33 and pass it directly to <code>interval_lib::policies</code>.</p> 34 35 <h2>Requirements</h2> 36 37 <p>Here comes what the class is supposed to provide. The domains are 38 written next to their respective functions (as you can see, the functions 39 do not have to worry about invalid values, but they have to handle infinite 40 arguments).</p> 41 <pre> 42/* Rounding requirements */ 43struct rounding { 44 // default constructor, destructor 45 rounding(); 46 ~rounding(); 47 // mathematical operations 48 T add_down(T, T); // [-∞;+∞][-∞;+∞] 49 T add_up (T, T); // [-∞;+∞][-∞;+∞] 50 T sub_down(T, T); // [-∞;+∞][-∞;+∞] 51 T sub_up (T, T); // [-∞;+∞][-∞;+∞] 52 T mul_down(T, T); // [-∞;+∞][-∞;+∞] 53 T mul_up (T, T); // [-∞;+∞][-∞;+∞] 54 T div_down(T, T); // [-∞;+∞]([-∞;+∞]-{0}) 55 T div_up (T, T); // [-∞;+∞]([-∞;+∞]-{0}) 56 T sqrt_down(T); // ]0;+∞] 57 T sqrt_up (T); // ]0;+∞] 58 T exp_down(T); // [-∞;+∞] 59 T exp_up (T); // [-∞;+∞] 60 T log_down(T); // ]0;+∞] 61 T log_up (T); // ]0;+∞] 62 T cos_down(T); // [0;2π] 63 T cos_up (T); // [0;2π] 64 T tan_down(T); // ]-π/2;π/2[ 65 T tan_up (T); // ]-π/2;π/2[ 66 T asin_down(T); // [-1;1] 67 T asin_up (T); // [-1;1] 68 T acos_down(T); // [-1;1] 69 T acos_up (T); // [-1;1] 70 T atan_down(T); // [-∞;+∞] 71 T atan_up (T); // [-∞;+∞] 72 T sinh_down(T); // [-∞;+∞] 73 T sinh_up (T); // [-∞;+∞] 74 T cosh_down(T); // [-∞;+∞] 75 T cosh_up (T); // [-∞;+∞] 76 T tanh_down(T); // [-∞;+∞] 77 T tanh_up (T); // [-∞;+∞] 78 T asinh_down(T); // [-∞;+∞] 79 T asinh_up (T); // [-∞;+∞] 80 T acosh_down(T); // [1;+∞] 81 T acosh_up (T); // [1;+∞] 82 T atanh_down(T); // [-1;1] 83 T atanh_up (T); // [-1;1] 84 T median(T, T); // [-∞;+∞][-∞;+∞] 85 T int_down(T); // [-∞;+∞] 86 T int_up (T); // [-∞;+∞] 87 // conversion functions 88 T conv_down(U); 89 T conv_up (U); 90 // unprotected rounding class 91 typedef ... unprotected_rounding; 92}; 93</pre> 94 95 <p>The constructor and destructor of the rounding class have a very 96 important semantic requirement: they are responsible for setting and 97 resetting the rounding modes of the computation on T. For instance, if T is 98 a standard floating point type and floating point computation is performed 99 according to the Standard IEEE 754, the constructor can save the current 100 rounding state, each <code>_up</code> (resp. <code>_down</code>) function 101 will round up (resp. down), and the destructor will restore the saved 102 rounding state. Indeed this is the behavior of the default rounding 103 policy.</p> 104 105 <p>The meaning of all the mathematical functions up until 106 <code>atanh_up</code> is clear: each function returns number representable 107 in the type <code>T</code> which is a lower bound (for <code>_down</code>) 108 or upper bound (for <code>_up</code>) on the true mathematical result of 109 the corresponding function. The function <code>median</code> computes the 110 average of its two arguments rounded to its nearest representable number. 111 The functions <code>int_down</code> and <code>int_up</code> compute the 112 nearest integer smaller or bigger than their argument. Finally, 113 <code>conv_down</code> and <code>conv_up</code> are responsible of the 114 conversions of values of other types to the base number type: the first one 115 must round down the value and the second one must round it up.</p> 116 117 <p>The type <code>unprotected_rounding</code> allows to remove all 118 controls. For reasons why one might to do this, see the <a href= 119 "#Protection">protection</a> paragraph below.</p> 120 121 <h2>Overview of the provided classes</h2> 122 123 <p>A lot of classes are provided. The classes are organized by level. At 124 the bottom is the class <code>rounding_control</code>. At the next level 125 come <code>rounded_arith_exact</code>, <code>rounded_arith_std</code> and 126 <code>rounded_arith_opp</code>. Then there are 127 <code>rounded_transc_dummy</code>, <code>rounded_transc_exact</code>, 128 <code>rounded_transc_std</code> and <code>rounded_transc_opp</code>. And 129 finally are <code>save_state</code> and <code>save_state_nothing</code>. 130 Each of these classes provide a set of members that are required by the 131 classes of the next level. For example, a <code>rounded_transc_...</code> 132 class needs the members of a <code>rounded_arith_...</code> class.</p> 133 134 <p>When they exist in two versions <code>_std</code> and <code>_opp</code>, 135 the first one does switch the rounding mode each time, and the second one 136 tries to keep it oriented toward plus infinity. The main purpose of the 137 <code>_opp</code> version is to speed up the computations through the use 138 of the "opposite trick" (see the <a href="#perf">performance notes</a>). 139 This version requires the rounding mode to be upward before entering any 140 computation functions of the class. It guarantees that the rounding mode 141 will still be upward at the exit of the functions.</p> 142 143 <p>Please note that it is really a very bad idea to mix the 144 <code>_opp</code> version with the <code>_std</code> since they do not have 145 compatible properties.</p> 146 147 <p>There is a third version named <code>_exact</code> which computes the 148 functions without changing the rounding mode. It is an "exact" version 149 because it is intended for a base type that produces exact results.</p> 150 151 <p>The last version is the <code>_dummy</code> version. It does not do any 152 computations but still produces compatible results.</p> 153 154 <p>Please note that it is possible to use the "exact" version for an 155 inexact base type, e.g. <code>float</code> or <code>double</code>. In that 156 case, the inclusion property is no longer guaranteed, but this can be 157 useful to speed up the computation when the inclusion property is not 158 desired strictly. For instance, in computer graphics, a small error due to 159 floating-point roundoff is acceptable as long as an approximate version of 160 the inclusion property holds.</p> 161 162 <p>Here comes what each class defines. Later, when they will be described 163 more thoroughly, these members will not be repeated. Please come back here 164 in order to see them. Inheritance is also used to avoid repetitions.</p> 165 <pre> 166template <class T> 167struct rounding_control 168{ 169 typedef ... rounding_mode; 170 void set_rounding_mode(rounding_mode); 171 void get_rounding_mode(rounding_mode&); 172 void downward (); 173 void upward (); 174 void to_nearest(); 175 T to_int(T); 176 T force_rounding(T); 177}; 178 179template <class T, class Rounding> 180struct rounded_arith_... : Rounding 181{ 182 void init(); 183 T add_down(T, T); 184 T add_up (T, T); 185 T sub_down(T, T); 186 T sub_up (T, T); 187 T mul_down(T, T); 188 T mul_up (T, T); 189 T div_down(T, T); 190 T div_up (T, T); 191 T sqrt_down(T); 192 T sqrt_up (T); 193 T median(T, T); 194 T int_down(T); 195 T int_up (T); 196}; 197 198template <class T, class Rounding> 199struct rounded_transc_... : Rounding 200{ 201 T exp_down(T); 202 T exp_up (T); 203 T log_down(T); 204 T log_up (T); 205 T cos_down(T); 206 T cos_up (T); 207 T tan_down(T); 208 T tan_up (T); 209 T asin_down(T); 210 T asin_up (T); 211 T acos_down(T); 212 T acos_up (T); 213 T atan_down(T); 214 T atan_up (T); 215 T sinh_down(T); 216 T sinh_up (T); 217 T cosh_down(T); 218 T cosh_up (T); 219 T tanh_down(T); 220 T tanh_up (T); 221 T asinh_down(T); 222 T asinh_up (T); 223 T acosh_down(T); 224 T acosh_up (T); 225 T atanh_down(T); 226 T atanh_up (T); 227}; 228 229template <class Rounding> 230struct save_state_... : Rounding 231{ 232 save_state_...(); 233 ~save_state_...(); 234 typedef ... unprotected_rounding; 235}; 236</pre> 237 238 <h2>Synopsis</h2> 239 <pre> 240namespace boost { 241namespace numeric { 242namespace interval_lib { 243 244<span style="color: #FF0000">/* basic rounding control */</span> 245template <class T> struct rounding_control; 246 247<span style="color: #FF0000">/* arithmetic functions rounding */</span> 248template <class T, class Rounding = rounding_control<T> > struct rounded_arith_exact; 249template <class T, class Rounding = rounding_control<T> > struct rounded_arith_std; 250template <class T, class Rounding = rounding_control<T> > struct rounded_arith_opp; 251 252<span style="color: #FF0000">/* transcendental functions rounding */</span> 253template <class T, class Rounding> struct rounded_transc_dummy; 254template <class T, class Rounding = rounded_arith_exact<T> > struct rounded_transc_exact; 255template <class T, class Rounding = rounded_arith_std<T> > struct rounded_transc_std; 256template <class T, class Rounding = rounded_arith_opp<T> > struct rounded_transc_opp; 257 258<span style="color: #FF0000">/* rounding-state-saving classes */</span> 259template <class Rounding> struct save_state; 260template <class Rounding> struct save_state_nothing; 261 262<span style="color: #FF0000">/* default policy for type T */</span> 263template <class T> struct rounded_math; 264template <> struct rounded_math<float>; 265template <> struct rounded_math<double>; 266 267<span style= 268"color: #FF0000">/* some metaprogramming to convert a protected to unprotected rounding */</span> 269template <class I> struct unprotect; 270 271} // namespace interval_lib 272} // namespace numeric 273} // namespace boost 274</pre> 275 276 <h2>Description of the provided classes</h2> 277 278 <p>We now describe each class in the order they appear in the definition of 279 a rounding policy (this outermost-to-innermost order is the reverse order 280 from the synopsis).</p> 281 282 <h3 id="Protection">Protection control</h3> 283 284 <p>Protection refers to the fact that the interval operations will be 285 surrounded by rounding mode controls. Unprotecting a class means to remove 286 all the rounding controls. Each rounding policy provides a type 287 <code>unprotected_rounding</code>. The required type 288 <code>unprotected_rounding</code> gives another rounding class that enables 289 to work when nested inside rounding. For example, the first three lines 290 below should all produce the same result (because the first operation is 291 the rounding constructor, and the last is its destructor, which take care 292 of setting the rounding modes); and the last line is allowed to have an 293 undefined behavior (since no rounding constructor or destructor is ever 294 called).</p> 295 <pre> 296T c; { rounding rnd; c = rnd.add_down(a, b); } 297T c; { rounding rnd1; { rounding rnd2; c = rnd2.add_down(a, b); } } 298T c; { rounding rnd1; { rounding::unprotected_rounding rnd2; c = rnd2.add_down(a, b); } } 299T d; { rounding::unprotected_rounding rnd; d = rnd.add_down(a, b); } 300</pre> 301 302 <p>Naturally <code>rounding::unprotected_rounding</code> may simply be 303 <code>rounding</code> itself. But it can improve performance if it is a 304 simplified version with empty constructor and destructor. In order to avoid 305 undefined behaviors, in the library, an object of type 306 <code>rounding::unprotected_rounding</code> is guaranteed to be created 307 only when an object of type <code>rounding</code> is already alive. See the 308 <a href="#perf">performance notes</a> for some additional details.</p> 309 310 <p>The support library defines a metaprogramming class template 311 <code>unprotect</code> which takes an interval type <code>I</code> and 312 returns an interval type <code>unprotect<I>::type</code> where the 313 rounding policy has been unprotected. Some information about the types: 314 <code>interval<T, interval_lib::policies<Rounding, _> 315 >::traits_type::rounding</code> <b>is</b> the same type as 316 <code>Rounding</code>, and <code>unprotect<interval<T, 317 interval_lib::policies<Rounding, _> > >::type</code> <b>is</b> 318 the same type as <code>interval<T, 319 interval_lib::policies<Rounding::unprotected, _> ></code>.</p> 320 321 <h3>State saving</h3> 322 323 <p>First comes <code>save_state</code>. This class is responsible for 324 saving the current rounding mode and calling init in its constructor, and 325 for restoring the saved rounding mode in its destructor. This class also 326 defines the <code>unprotected_rounding</code> type.</p> 327 328 <p>If the rounding mode does not require any state-saving or 329 initialization, <code>save_state_nothing</code> can be used instead of 330 <code>save_state</code>.</p> 331 332 <h3>Transcendental functions</h3> 333 334 <p>The classes <code>rounded_transc_exact</code>, 335 <code>rounded_transc_std</code> and <code>rounded_transc_opp</code> expect 336 the std namespace to provide the functions exp log cos tan acos asin atan 337 cosh sinh tanh acosh asinh atanh. For the <code>_std</code> and 338 <code>_opp</code> versions, all these functions should respect the current 339 rounding mode fixed by a call to downward or upward.</p> 340 341 <p><strong>Please note:</strong> Unfortunately, the latter is rarely the 342 case. It is the reason why a class <code>rounded_transc_dummy</code> is 343 provided which does not depend on the functions from the std namespace. 344 There is no magic, however. The functions of 345 <code>rounded_transc_dummy</code> do not compute anything. They only return 346 valid values. For example, <code>cos_down</code> always returns -1. In this 347 way, we do verify the inclusion property for the default implementation, 348 even if this has strictly no value for the user. In order to have useful 349 values, another policy should be used explicitely, which will most likely 350 lead to a violation of the inclusion property. In this way, we ensure that 351 the violation is clearly pointed out to the user who then knows what he 352 stands against. This class could have been used as the default 353 transcendental rounding class, but it was decided it would be better for 354 the compilation to fail due to missing declarations rather than succeed 355 thanks to valid but unusable functions.</p> 356 357 <h3>Basic arithmetic functions</h3> 358 359 <p>The classes <code>rounded_arith_std</code> and 360 <code>rounded_arith_opp</code> expect the operators + - * / and the 361 function <code>std::sqrt</code> to respect the current rounding mode.</p> 362 363 <p>The class <code>rounded_arith_exact</code> requires 364 <code>std::floor</code> and <code>std::ceil</code> to be defined since it 365 can not rely on <code>to_int</code>.</p> 366 367 <h3>Rounding control</h3> 368 369 <p>The functions defined by each of the previous classes did not need any 370 explanation. For example, the behavior of <code>add_down</code> is to 371 compute the sum of two numbers rounded downward. For 372 <code>rounding_control</code>, the situation is a bit more complex.</p> 373 374 <p>The basic function is <code>force_rounding</code> which returns its 375 argument correctly rounded accordingly to the current rounding mode if it 376 was not already the case. This function is necessary to handle delayed 377 rounding. Indeed, depending on the way the computations are done, the 378 intermediate results may be internally stored in a more precise format and 379 it can lead to a wrong rounding. So the function enforces the rounding. 380 <a href="#extreg">Here</a> is an example of what happens when the rounding 381 is not enforced.</p> 382 383 <p>The function <code>get_rounding_mode</code> returns the current rounding 384 mode, <code>set_rounding_mode</code> sets the rounding mode back to a 385 previous value returned by <code>get_rounding_mode</code>. 386 <code>downward</code>, <code>upward</code> and <code>to_nearest</code> sets 387 the rounding mode in one of the three directions. This rounding mode should 388 be global to all the functions that use the type <code>T</code>. For 389 example, after a call to <code>downward</code>, 390 <code>force_rounding(x+y)</code> is expected to return the sum rounded 391 toward -∞.</p> 392 393 <p>The function <code>to_int</code> computes the nearest integer 394 accordingly to the current rounding mode.</p> 395 396 <p>The non-specialized version of <code>rounding_control</code> does not do 397 anything. The functions for the rounding mode are empty, and 398 <code>to_int</code> and <code>force_rounding</code> are identity functions. 399 The <code>pi_</code> constant functions return suitable integers (for 400 example, <code>pi_up</code> returns <code>T(4)</code>).</p> 401 402 <p>The class template <code>rounding_control</code> is specialized for 403 <code>float</code>, <code>double</code> and <code>long double</code> in 404 order to best use the floating point unit of the computer.</p> 405 406 <h2>Template class <tt>rounded_math</tt></h2> 407 408 <p>The default policy (aka <code>rounded_math<T></code>) is simply 409 defined as:</p> 410 <pre> 411template <class T> struct rounded_math<T> : save_state_nothing<rounded_arith_exact<T> > {}; 412</pre> 413 414 <p>and the specializations for <code>float</code>, <code>double</code> and 415 <code>long double</code> use <code>rounded_arith_opp</code>, as in:</p> 416 <pre> 417template <> struct rounded_math<float> : save_state<rounded_arith_opp<float> > {}; 418template <> struct rounded_math<double> : save_state<rounded_arith_opp<double> > {}; 419template <> struct rounded_math<long double> : save_state<rounded_arith_opp<long double> > {}; 420</pre> 421 422 <h2 id="perf">Performance Issues</h2> 423 424 <p>This paragraph deals mostly with the performance of the library with 425 intervals using the floating-point unit (FPU) of the computer. Let's 426 consider the sum of [<i>a</i>,<i>b</i>] and [<i>c</i>,<i>d</i>] as an 427 example. The result is [<code>down</code>(<i>a</i>+<i>c</i>), 428 <code>up</code>(<i>b</i>+<i>d</i>)], where <code>down</code> and 429 <code>up</code> indicate the rounding mode needed.</p> 430 431 <h3>Rounding Mode Switch</h3> 432 433 <p>If the FPU is able to use a different rounding mode for each operation, 434 there is no problem. For example, it's the case for the Alpha processor: 435 each floating-point instruction can specify a different rounding mode. 436 However, the IEEE-754 Standard does not require such a behavior. So most of 437 the FPUs only provide some instructions to set the rounding mode for all 438 subsequent operations. And generally, these instructions need to flush the 439 pipeline of the FPU.</p> 440 441 <p>In this situation, the time needed to sum [<i>a</i>,<i>b</i>] and 442 [<i>c</i>,<i>d</i>] is far worse than the time needed to calculate 443 <i>a</i>+<i>b</i> and <i>c</i>+<i>d</i> since the two additions cannot be 444 parallelized. Consequently, the objective is to diminish the number of 445 rounding mode switches.</p> 446 447 <p>If this library is not used to provide exact computations, but only for 448 pair arithmetic, the solution is quite simple: do not use rounding. In that 449 case, doing the sum [<i>a</i>,<i>b</i>] and [<i>c</i>,<i>d</i>] will be as 450 fast as computing <i>a</i>+<i>b</i> and <i>c</i>+<i>d</i>. Everything is 451 perfect.</p> 452 453 <p>However, if exact computations are required, such a solution is totally 454 unthinkable. So, are we penniless? No, there is still a trick available. 455 Indeed, down(<i>a</i>+<i>c</i>) = -up(-<i>a</i>-<i>c</i>) if the unary 456 minus is an exact operation. It is now possible to calculate the whole sum 457 with the same rounding mode. Generally, the cost of the mode switching is 458 worse than the cost of the sign changes.</p> 459 460 <h4>Speeding up consecutive operations</h4> 461 462 <p>The interval addition is not the only operation; most of the interval 463 operations can be computed by setting the rounding direction of the FPU 464 only once. So the operations of the floating point rounding policy assume 465 that the direction is correctly set. This assumption is usually not true in 466 a program (the user and the standard library expect the rounding direction 467 to be to nearest), so these operations have to be enclosed in a shell that 468 sets the floating point environment. This protection is done by the 469 constructor and destructor of the rounding policy.</p> 470 471 <p>Les us now consider the case of two consecutive interval additions: 472 [<i>a</i>,<i>b</i>] + [<i>c</i>,<i>d</i>] + [<i>e</i>,<i>f</i>]. The 473 generated code should look like:</p> 474 <pre> 475init_rounding_mode(); // rounding object construction during the first addition 476t1 = -(-a - c); 477t2 = b + d; 478restore_rounding_mode(); // rounding object destruction 479init_rounding_mode(); // rounding object construction during the second addition 480x = -(-t1 - e); 481y = t2 + f; 482restore_rounding_mode(); // rounding object destruction 483// the result is the interval [x,y] 484</pre> 485 486 <p>Between the two operations, the rounding direction is restored, and then 487 initialized again. Ideally, compilers should be able to optimize this 488 useless code away. But unfortunately they are not, and this slows the code 489 down by an order of magnitude. In order to avoid this bottleneck, the user 490 can tell to the interval operations that they do not need to be protected 491 anymore. It will then be up to the user to protect the interval 492 computations. The compiler will then be able to generate such a code:</p> 493 <pre> 494init_rounding_mode(); // done by the user 495x = -(-a - c - e); 496y = b + d + f; 497restore_rounding_mode(); // done by the user 498</pre> 499 500 <p>The user will have to create a rounding object. And as long as this 501 object is alive, unprotected versions of the interval operations can be 502 used. They are selected by using an interval type with a specific rounding 503 policy. If the initial interval type is <code>I</code>, then 504 <code>I::traits_type::rounding</code> is the type of the rounding object, 505 and <code>interval_lib::unprotect<I>::type</code> is the type of the 506 unprotected interval type.</p> 507 508 <p>Because the rounding mode of the FPU is changed during the life of the 509 rounding object, any arithmetic floating point operation that does not 510 involve the interval library can lead to unexpected results. And 511 reciprocally, using unprotected interval operation when no rounding object 512 is alive will produce intervals that are not guaranteed anymore to contain 513 the real result.</p> 514 515 <h4 id="perfexp">Example</h4> 516 517 <p>Here is an example of Horner's scheme to compute the value of a polynom. 518 The rounding mode switches are disabled for the whole computation.</p> 519 <pre> 520// I is an interval class, the polynom is a simple array 521template<class I> 522I horner(const I& x, const I p[], int n) { 523 524 // save and initialize the rounding mode 525 typename I::traits_type::rounding rnd; 526 527 // define the unprotected version of the interval type 528 typedef typename boost::numeric::interval_lib::unprotect<I>::type R; 529 530 const R& a = x; 531 R y = p[n - 1]; 532 for(int i = n - 2; i >= 0; i--) { 533 y = y * a + (const R&)(p[i]); 534 } 535 return y; 536 537 // restore the rounding mode with the destruction of rnd 538} 539</pre> 540 541 <p>Please note that a rounding object is specially created in order to 542 protect all the interval computations. Each interval of type I is converted 543 in an interval of type R before any operations. If this conversion is not 544 done, the result is still correct, but the interest of this whole 545 optimization has disappeared. Whenever possible, it is good to convert to 546 <code>const R&</code> instead of <code>R</code>: indeed, the function 547 could already be called inside an unprotection block so the types 548 <code>R</code> and <code>I</code> would be the same interval, no need for a 549 conversion.</p> 550 551 <h4>Uninteresting remark</h4> 552 553 <p>It was said at the beginning that the Alpha processors can use a 554 specific rounding mode for each operation. However, due to the instruction 555 format, the rounding toward plus infinity is not available. Only the 556 rounding toward minus infinity can be used. So the trick using the change 557 of sign becomes essential, but there is no need to save and restore the 558 rounding mode on both sides of an operation.</p> 559 560 <h3 id="extreg">Extended Registers</h3> 561 562 <p>There is another problem besides the cost of the rounding mode switch. 563 Some FPUs use extended registers (for example, float computations will be 564 done with double registers, or double computations with long double 565 registers). Consequently, many problems can arise.</p> 566 567 <p>The first one is due to to the extended precision of the mantissa. The 568 rounding is also done on this extended precision. And consequently, we 569 still have down(<i>a</i>+<i>b</i>) = -up(-<i>a</i>-<i>b</i>) in the 570 extended registers. But back to the standard precision, we now have 571 down(<i>a</i>+<i>b</i>) < -up(-<i>a</i>-<i>b</i>) instead of an 572 equality. A solution could be not to use this method. But there still are 573 other problems, with the comparisons between numbers for example.</p> 574 575 <p>Naturally, there is also a problem with the extended precision of the 576 exponent. To illustrate this problem, let <i>m</i> be the biggest number 577 before +<i>inf</i>. If we calculate 2*[<i>m</i>,<i>m</i>], the answer 578 should be [<i>m</i>,<i>inf</i>]. But due to the extended registers, the FPU 579 will first store [<i>2m</i>,<i>2m</i>] and then convert it to 580 [<i>inf</i>,<i>inf</i>] at the end of the calculus (when the rounding mode 581 is toward +<i>inf</i>). So the answer is no more accurate.</p> 582 583 <p>There is only one solution: to force the FPU to convert the extended 584 values back to standard precision after each operation. Some FPUs provide 585 an instruction able to do this conversion (for example the PowerPC 586 processors). But for the FPUs that do not provide it (the x86 processors), 587 the only solution is to write the values to memory and read them back. Such 588 an operation is obviously very expensive.</p> 589 590 <h2>Some Examples</h2> 591 592 <p>Here come several cases:</p> 593 594 <ul> 595 <li>if you need precise computations with the <code>float</code> or 596 <code>double</code> types, use the default 597 <code>rounded_math<T></code>;</li> 598 599 <li>for fast wide intervals without any rounding nor precision, use 600 <code>save_state_nothing<rounded_transc_exact<T> 601 ></code>;</li> 602 603 <li>for an exact type (like int or rational with a little help for 604 infinite and NaN values) without support for transcendental functions, 605 the solution could be 606 <code>save_state_nothing<rounded_transc_dummy<T, 607 rounded_arith_exact<T> > ></code> or directly 608 <code>save_state_nothing<rounded_arith_exact<T> 609 ></code>;</li> 610 611 <li>if it is a more complex case than the previous ones, the best thing 612 is probably to directly define your own policy.</li> 613 </ul> 614 <hr> 615 616 <p><a href="http://validator.w3.org/check?uri=referer"><img border="0" src= 617 "../../../../doc/images/valid-html401.png" alt="Valid HTML 4.01 Transitional" 618 height="31" width="88"></a></p> 619 620 <p>Revised 621 <!--webbot bot="Timestamp" s-type="EDITED" s-format="%Y-%m-%d" startspan -->2006-12-24<!--webbot bot="Timestamp" endspan i-checksum="12172" --></p> 622 623 <p><i>Copyright © 2002 Guillaume Melquiond, Sylvain Pion, Hervé 624 Brönnimann, Polytechnic University<br> 625 Copyright © 2004-2005 Guillaume Melquiond, ENS Lyon</i></p> 626 627 <p><i>Distributed under the Boost Software License, Version 1.0. (See 628 accompanying file <a href="../../../../LICENSE_1_0.txt">LICENSE_1_0.txt</a> 629 or copy at <a href= 630 "http://www.boost.org/LICENSE_1_0.txt">http://www.boost.org/LICENSE_1_0.txt</a>)</i></p> 631</body> 632</html> 633