1[/ 2 Boost.Optional 3 4 Copyright (c) 2003-2007 Fernando Luis Cacciola Carballal 5 6 Distributed under the Boost Software License, Version 1.0. 7 (See accompanying file LICENSE_1_0.txt or copy at 8 http://www.boost.org/LICENSE_1_0.txt) 9] 10 11 12[section Definitions] 13 14[section Introduction] 15 16This section provides definitions of terms used in the Numeric Conversion library. 17 18[blurb [*Notation] 19[_underlined text] denotes terms defined in the C++ standard. 20 21[*bold face] denotes terms defined here but not in the standard. 22] 23 24[endsect] 25 26[section Types and Values] 27 28As defined by the [_C++ Object Model] (§1.7) the [_storage] or memory on which a 29C++ program runs is a contiguous sequence of [_bytes] where each byte is a 30contiguous sequence of bits. 31 32An [_object] is a region of storage (§1.8) and has a type (§3.9). 33 34A [_type] is a discrete set of values. 35 36An object of type `T` has an [_object representation] which is the sequence of 37bytes stored in the object (§3.9/4) 38 39An object of type `T` has a [_value representation] which is the set of 40bits that determine the ['value] of an object of that type (§3.9/4). 41For [_POD] types (§3.9/10), this bitset is given by the object representation, 42but not all the bits in the storage need to participate in the value 43representation (except for character types): for example, some bits might 44be used for padding or there may be trap-bits. 45 46__SPACE__ 47 48The [*typed value] that is held by an object is the value which is determined 49by its value representation. 50 51An [*abstract value] (untyped) is the conceptual information that is 52represented in a type (i.e. the number π). 53 54The [*intrinsic value] of an object is the binary value of the sequence of 55unsigned characters which form its object representation. 56 57__SPACE__ 58 59['Abstract] values can be [*represented] in a given type. 60 61To [*represent] an abstract value `V` in a type `T` is to obtain a typed value 62`v` which corresponds to the abstract value `V`. 63 64The operation is denoted using the `rep()` operator, as in: `v=rep(V)`. 65`v` is the [*representation] of `V` in the type `T`. 66 67For example, the abstract value π can be represented in the type 68`double` as the `double value M_PI` and in the type `int` as the 69`int value 3` 70 71__SPACE__ 72 73Conversely, ['typed values] can be [*abstracted]. 74 75To [*abstract] a typed value `v` of type `T` is to obtain the abstract value `V` 76whose representation in `T` is `v`. 77 78The operation is denoted using the `abt()` operator, as in: `V=abt(v)`. 79 80`V` is the [*abstraction] of `v` of type `T`. 81 82Abstraction is just an abstract operation (you can't do it); but it is 83defined nevertheless because it will be used to give the definitions in the 84rest of this document. 85 86[endsect] 87 88[section C++ Arithmetic Types] 89 90The C++ language defines [_fundamental types] (§3.9.1). The following subsets of 91the fundamental types are intended to represent ['numbers]: 92 93[variablelist 94[[[_signed integer types] (§3.9.1/2):][ 95`{signed char, signed short int, signed int, signed long int}` 96Can be used to represent general integer numbers (both negative and positive). 97]] 98[[[_unsigned integer types] (§3.9.1/3):][ 99`{unsigned char, unsigned short int, unsigned int, unsigned long int}` 100Can be used to represent positive integer numbers with modulo-arithmetic. 101]] 102[[[_floating-point types] (§3.9.1/8):][ 103`{float,double,long double}` 104Can be used to represent real numbers. 105]] 106[[[_integral or integer types] (§3.9.1/7):][ 107`{{signed integers},{unsigned integers}, bool, char and wchar_t}` 108]] 109[[[_arithmetic types] (§3.9.1/8):][ 110`{{integer types},{floating types}}` 111]] 112] 113 114The integer types are required to have a ['binary] value representation. 115 116Additionally, the signed/unsigned integer types of the same base type 117(`short`, `int` or `long`) are required to have the same value representation, 118that is: 119 120 int i = -3 ; // suppose value representation is: 10011 (sign bit + 4 magnitude bits) 121 unsigned int u = i ; // u is required to have the same 10011 as its value representation. 122 123In other words, the integer types signed/unsigned X use the same value 124representation but a different ['interpretation] of it; that is, their 125['typed values] might differ. 126 127Another consequence of this is that the range for signed X is always a smaller 128subset of the range of unsigned X, as required by §3.9.1/3. 129 130[note 131Always remember that unsigned types, unlike signed types, have modulo-arithmetic; 132that is, they do not overflow. 133This means that: 134 135[*-] Always be extra careful when mixing signed/unsigned types 136 137[*-] Use unsigned types only when you need modulo arithmetic or very very large 138numbers. Don't use unsigned types just because you intend to deal with 139positive values only (you can do this with signed types as well). 140] 141 142 143[endsect] 144 145[section Numeric Types] 146 147This section introduces the following definitions intended to integrate 148arithmetic types with user-defined types which behave like numbers. 149Some definitions are purposely broad in order to include a vast variety of 150user-defined number types. 151 152Within this library, the term ['number] refers to an abstract numeric value. 153 154A type is [*numeric] if: 155 156* It is an arithmetic type, or, 157* It is a user-defined type which 158 * Represents numeric abstract values (i.e. numbers). 159 * Can be converted (either implicitly or explicitly) to/from at least one arithmetic type. 160 * Has [link boost_numericconversion.definitions.range_and_precision range] (possibly unbounded) 161 and [link boost_numericconversion.definitions.range_and_precision precision] (possibly dynamic or 162 unlimited). 163 * Provides an specialization of `std::numeric_limits`. 164 165A numeric type is [*signed] if the abstract values it represent include negative numbers. 166 167A numeric type is [*unsigned] if the abstract values it represent exclude negative numbers. 168 169A numeric type is [*modulo] if it has modulo-arithmetic (does not overflow). 170 171A numeric type is [*integer] if the abstract values it represent are whole numbers. 172 173A numeric type is [*floating] if the abstract values it represent are real numbers. 174 175An [*arithmetic value] is the typed value of an arithmetic type 176 177A [*numeric value] is the typed value of a numeric type 178 179These definitions simply generalize the standard notions of arithmetic types and 180values by introducing a superset called [_numeric]. All arithmetic types and values are 181numeric types and values, but not vice versa, since user-defined numeric types are not 182arithmetic types. 183 184The following examples clarify the differences between arithmetic and numeric 185types (and values): 186 187 188 // A numeric type which is not an arithmetic type (is user-defined) 189 // and which is intended to represent integer numbers (i.e., an 'integer' numeric type) 190 class MyInt 191 { 192 MyInt ( long long v ) ; 193 long long to_builtin(); 194 } ; 195 namespace std { 196 template<> numeric_limits<MyInt> { ... } ; 197 } 198 199 // A 'floating' numeric type (double) which is also an arithmetic type (built-in), 200 // with a float numeric value. 201 double pi = M_PI ; 202 203 // A 'floating' numeric type with a whole numeric value. 204 // NOTE: numeric values are typed valued, hence, they are, for instance, 205 // integer or floating, despite the value itself being whole or including 206 // a fractional part. 207 double two = 2.0 ; 208 209 // An integer numeric type with an integer numeric value. 210 MyInt i(1234); 211 212 213[endsect] 214 215[section Range and Precision] 216 217Given a number set `N`, some of its elements are representable in a numeric type `T`. 218 219The set of representable values of type `T`, or numeric set of `T`, is a set of numeric 220values whose elements are the representation of some subset of `N`. 221 222For example, the interval of `int` values `[INT_MIN,INT_MAX]` is the set of representable 223values of type `int`, i.e. the `int` numeric set, and corresponds to the representation 224of the elements of the interval of abstract values `[abt(INT_MIN),abt(INT_MAX)]` from 225the integer numbers. 226 227Similarly, the interval of `double` values `[-DBL_MAX,DBL_MAX]` is the `double` 228numeric set, which corresponds to the subset of the real numbers from `abt(-DBL_MAX)` to 229`abt(DBL_MAX)`. 230 231__SPACE__ 232 233Let [*`next(x)`] denote the lowest numeric value greater than x. 234 235Let [*`prev(x)`] denote the highest numeric value lower then x. 236 237Let [*`v=prev(next(V))`] and [*`v=next(prev(V))`] be identities that relate a numeric 238typed value `v` with a number `V`. 239 240An ordered pair of numeric values `x`,`y` s.t. `x<y` are [*consecutive] iff `next(x)==y`. 241 242The abstract distance between consecutive numeric values is usually referred to as a 243[_Unit in the Last Place], or [*ulp] for short. A ulp is a quantity whose abstract 244magnitude is relative to the numeric values it corresponds to: If the numeric set 245is not evenly distributed, that is, if the abstract distance between consecutive 246numeric values varies along the set -as is the case with the floating-point types-, 247the magnitude of 1ulp after the numeric value `x` might be (usually is) different 248from the magnitude of a 1ulp after the numeric value y for `x!=y`. 249 250Since numbers are inherently ordered, a [*numeric set] of type `T` is an ordered sequence 251of numeric values (of type `T`) of the form: 252 253 REP(T)={l,next(l),next(next(l)),...,prev(prev(h)),prev(h),h} 254 255where `l` and `h` are respectively the lowest and highest values of type `T`, called 256the boundary values of type `T`. 257 258__SPACE__ 259 260A numeric set is discrete. It has a [*size] which is the number of numeric values in the set, 261a [*width] which is the abstract difference between the highest and lowest boundary values: 262`[abt(h)-abt(l)]`, and a [*density] which is the relation between its size and width: 263`density=size/width`. 264 265The integer types have density 1, which means that there are no unrepresentable integer 266numbers between `abt(l)` and `abt(h)` (i.e. there are no gaps). On the other hand, 267floating types have density much smaller than 1, which means that there are real numbers 268unrepresented between consecutive floating values (i.e. there are gaps). 269 270__SPACE__ 271 272The interval of [_abstract values] `[abt(l),abt(h)]` is the range of the type `T`, 273denoted `R(T)`. 274 275A range is a set of abstract values and not a set of numeric values. In other 276documents, such as the C++ standard, the word `range` is ['sometimes] used as synonym 277for `numeric set`, that is, as the ordered sequence of numeric values from `l` to `h`. 278In this document, however, a range is an abstract interval which subtends the 279numeric set. 280 281For example, the sequence `[-DBL_MAX,DBL_MAX]` is the numeric set of the type 282`double`, and the real interval `[abt(-DBL_MAX),abt(DBL_MAX)]` is its range. 283 284Notice, for instance, that the range of a floating-point type is ['continuous] 285unlike its numeric set. 286 287This definition was chosen because: 288 289* [*(a)] The discrete set of numeric values is already given by the numeric set. 290* [*(b)] Abstract intervals are easier to compare and overlap since only boundary 291values need to be considered. 292 293This definition allows for a concise definition of `subranged` as given in the last section. 294 295The width of a numeric set, as defined, is exactly equivalent to the width of a range. 296 297__SPACE__ 298 299The [*precision] of a type is given by the width or density of the numeric set. 300 301For integer types, which have density 1, the precision is conceptually equivalent 302to the range and is determined by the number of bits used in the value representation: 303The higher the number of bits the bigger the size of the numeric set, the wider the 304range, and the higher the precision. 305 306For floating types, which have density <<1, the precision is given not by the width 307of the range but by the density. In a typical implementation, the range is determined 308by the number of bits used in the exponent, and the precision by the number of bits 309used in the mantissa (giving the maximum number of significant digits that can be 310exactly represented). The higher the number of exponent bits the wider the range, 311while the higher the number of mantissa bits, the higher the precision. 312 313[endsect] 314 315[section Exact, Correctly Rounded and Out-Of-Range Representations] 316 317Given an abstract value `V` and a type `T` with its corresponding range `[abt(l),abt(h)]`: 318 319If `V < abt(l)` or `V > abt(h)`, `V` is [*not representable] (cannot be represented) in 320the type `T`, or, equivalently, it's representation in the type `T` is [*out of range], 321or [*overflows]. 322 323* If `V < abt(l)`, the [*overflow is negative]. 324* If `V > abt(h)`, the [*overflow is positive]. 325 326If `V >= abt(l)` and `V <= abt(h)`, `V` is [*representable] (can be represented) in the 327type `T`, or, equivalently, its representation in the type `T` is [*in range], or 328[*does not overflow]. 329 330Notice that a numeric type, such as a C++ unsigned type, can define that any `V` does 331not overflow by always representing not `V` itself but the abstract value 332`U = [ V % (abt(h)+1) ]`, which is always in range. 333 334Given an abstract value `V` represented in the type `T` as `v`, the [*roundoff] error 335of the representation is the abstract difference: `(abt(v)-V)`. 336 337Notice that a representation is an ['operation], hence, the roundoff error corresponds 338to the representation operation and not to the numeric value itself 339(i.e. numeric values do not have any error themselves) 340 341* If the roundoff is 0, the representation is [*exact], and `V` is exactly representable 342in the type `T`. 343* If the roundoff is not 0, the representation is [*inexact], and `V` is inexactly 344representable in the type `T`. 345 346If a representation `v` in a type `T` -either exact or inexact-, is any of the adjacents 347of `V` in that type, that is, if `v==prev` or `v==next`, the representation is 348faithfully rounded. If the choice between `prev` and `next` matches a given 349[*rounding direction], it is [*correctly rounded]. 350 351All exact representations are correctly rounded, but not all inexact representations are. 352In particular, C++ requires numeric conversions (described below) and the result of 353arithmetic operations (not covered by this document) to be correctly rounded, but 354batch operations propagate roundoff, thus final results are usually incorrectly 355rounded, that is, the numeric value `r` which is the computed result is neither of 356the adjacents of the abstract value `R` which is the theoretical result. 357 358Because a correctly rounded representation is always one of adjacents of the abstract 359value being represented, the roundoff is guaranteed to be at most 1ulp. 360 361The following examples summarize the given definitions. Consider: 362 363* A numeric type `Int` representing integer numbers with a 364['numeric set]: `{-2,-1,0,1,2}` and 365['range]: `[-2,2]` 366* A numeric type `Cardinal` representing integer numbers with a 367['numeric set]: `{0,1,2,3,4,5,6,7,8,9}` and 368['range]: `[0,9]` (no modulo-arithmetic here) 369* A numeric type `Real` representing real numbers with a 370['numeric set]: `{-2.0,-1.5,-1.0,-0.5,-0.0,+0.0,+0.5,+1.0,+1.5,+2.0}` and 371['range]: `[-2.0,+2.0]` 372* A numeric type `Whole` representing real numbers with a 373['numeric set]: `{-2.0,-1.0,0.0,+1.0,+2.0}` and 374['range]: `[-2.0,+2.0]` 375 376First, notice that the types `Real` and `Whole` both represent real numbers, 377have the same range, but different precision. 378 379* The integer number `1` (an abstract value) can be exactly represented 380in any of these types. 381* The integer number `-1` can be exactly represented in `Int`, `Real` and `Whole`, 382but cannot be represented in `Cardinal`, yielding negative overflow. 383* The real number `1.5` can be exactly represented in `Real`, and inexactly 384represented in the other types. 385* If `1.5` is represented as either `1` or `2` in any of the types (except `Real`), 386the representation is correctly rounded. 387* If `0.5` is represented as `+1.5` in the type `Real`, it is incorrectly rounded. 388* `(-2.0,-1.5)` are the `Real` adjacents of any real number in the interval 389`[-2.0,-1.5]`, yet there are no `Real` adjacents for `x < -2.0`, nor for `x > +2.0`. 390 391[endsect] 392 393[section Standard (numeric) Conversions] 394 395The C++ language defines [_Standard Conversions] (§4) some of which are conversions 396between arithmetic types. 397 398These are [_Integral promotions] (§4.5), [_Integral conversions] (§4.7), 399[_Floating point promotions] (§4.6), [_Floating point conversions] (§4.8) and 400[_Floating-integral conversions] (§4.9). 401 402In the sequel, integral and floating point promotions are called [*arithmetic promotions], 403and these plus integral, floating-point and floating-integral conversions are called 404[*arithmetic conversions] (i.e, promotions are conversions). 405 406Promotions, both Integral and Floating point, are ['value-preserving], which means that 407the typed value is not changed with the conversion. 408 409In the sequel, consider a source typed value `s` of type `S`, the source abstract 410value `N=abt(s)`, a destination type `T`; and whenever possible, a result typed value 411`t` of type `T`. 412 413 414Integer to integer conversions are always defined: 415 416* If `T` is unsigned, the abstract value which is effectively represented is not 417`N` but `M=[ N % ( abt(h) + 1 ) ]`, where `h` is the highest unsigned typed 418value of type `T`. 419* If `T` is signed and `N` is not directly representable, the result `t` is 420[_implementation-defined], which means that the C++ implementation is required to 421produce a value `t` even if it is totally unrelated to `s`. 422 423 424Floating to Floating conversions are defined only if `N` is representable; 425if it is not, the conversion has [_undefined behavior]. 426 427* If `N` is exactly representable, `t` is required to be the exact representation. 428* If `N` is inexactly representable, `t` is required to be one of the two 429adjacents, with an implementation-defined choice of rounding direction; 430that is, the conversion is required to be correctly rounded. 431 432 433Floating to Integer conversions represent not `N` but `M=trunc(N)`, were 434`trunc()` is to truncate: i.e. to remove the fractional part, if any. 435 436* If `M` is not representable in `T`, the conversion has [_undefined behavior] 437(unless `T` is `bool`, see §4.12). 438 439 440Integer to Floating conversions are always defined. 441 442* If `N` is exactly representable, `t` is required to be the exact representation. 443* If `N` is inexactly representable, `t` is required to be one of the 444two adjacents, with an implementation-defined choice of rounding direction; 445that is, the conversion is required to be correctly rounded. 446 447[endsect] 448 449[section Subranged Conversion Direction, Subtype and Supertype] 450 451Given a source type `S` and a destination type `T`, there is a 452[*conversion direction] denoted: `S->T`. 453 454For any two ranges the following ['range relation] can be defined: 455A range `X` can be ['entirely contained] in a range `Y`, in which case 456it is said that `X` is enclosed by `Y`. 457 458[: [*Formally:] `R(S)` is enclosed by `R(T)` iif `(R(S) intersection R(T)) == R(S)`.] 459 460If the source type range, `R(S)`, is not enclosed in the target type range, 461`R(T)`; that is, if `(R(S) & R(T)) != R(S)`, the conversion direction is said 462to be [*subranged], which means that `R(S)` is not entirely contained in `R(T)` 463and therefore there is some portion of the source range which falls outside 464the target range. In other words, if a conversion direction `S->T` is subranged, 465there are values in `S` which cannot be represented in `T` because they are 466out of range. 467Notice that for `S->T`, the adjective subranged applies to `T`. 468 469Examples: 470 471Given the following numeric types all representing real numbers: 472 473* `X` with numeric set `{-2.0,-1.0,0.0,+1.0,+2.0}` and range `[-2.0,+2.0]` 474* `Y` with numeric set `{-2.0,-1.5,-1.0,-0.5,0.0,+0.5,+1.0,+1.5,+2.0}` and range `[-2.0,+2.0]` 475* `Z` with numeric set `{-1.0,0.0,+1.0}` and range `[-1.0,+1.0]` 476 477For: 478 479[variablelist 480[[(a) X->Y:][ 481`R(X) & R(Y) == R(X)`, then `X->Y` is not subranged. 482Thus, all values of type `X` are representable in the type `Y`. 483]] 484[[(b) Y->X:][ 485`R(Y) & R(X) == R(Y)`, then `Y->X` is not subranged. 486Thus, all values of type `Y` are representable in the type `X`, but in this case, 487some values are ['inexactly] representable (all the halves). 488(note: it is to permit this case that a range is an interval of abstract values and 489not an interval of typed values) 490]] 491[[(b) X->Z:][ 492`R(X) & R(Z) != R(X)`, then `X->Z` is subranged. 493Thus, some values of type `X` are not representable in the type `Z`, they fall 494out of range `(-2.0 and +2.0)`. 495]] 496] 497 498It is possible that `R(S)` is not enclosed by `R(T)`, while neither is `R(T)` enclosed 499by `R(S)`; for example, `UNSIG=[0,255]` is not enclosed by `SIG=[-128,127]`; 500neither is `SIG` enclosed by `UNSIG`. 501This implies that is possible that a conversion direction is subranged both ways. 502This occurs when a mixture of signed/unsigned types are involved and indicates that 503in both directions there are values which can fall out of range. 504 505Given the range relation (subranged or not) of a conversion direction `S->T`, it 506is possible to classify `S` and `T` as [*supertype] and [*subtype]: 507If the conversion is subranged, which means that `T` cannot represent all possible 508values of type `S`, `S` is the supertype and `T` the subtype; otherwise, `T` is the 509supertype and `S` the subtype. 510 511For example: 512 513[: `R(float)=[-FLT_MAX,FLT_MAX]` and `R(double)=[-DBL_MAX,DBL_MAX]` ] 514 515If `FLT_MAX < DBL_MAX`: 516 517* `double->float` is subranged and `supertype=double`, `subtype=float`. 518* `float->double` is not subranged and `supertype=double`, `subtype=float`. 519 520Notice that while `double->float` is subranged, `float->double` is not, 521which yields the same supertype,subtype for both directions. 522 523Now consider: 524 525[: `R(int)=[INT_MIN,INT_MAX]` and `R(unsigned int)=[0,UINT_MAX]` ] 526 527A C++ implementation is required to have `UINT_MAX > INT_MAX` (§3.9/3), so: 528 529* 'int->unsigned' is subranged (negative values fall out of range) 530and `supertype=int`, `subtype=unsigned`. 531* 'unsigned->int' is ['also] subranged (high positive values fall out of range) 532and `supertype=unsigned`, `subtype=int`. 533 534In this case, the conversion is subranged in both directions and the 535supertype,subtype pairs are not invariant (under inversion of direction). 536This indicates that none of the types can represent all the values of the other. 537 538When the supertype is the same for both `S->T` and `T->S`, it is effectively 539indicating a type which can represent all the values of the subtype. 540Consequently, if a conversion `X->Y` is not subranged, but the opposite `(Y->X)` is, 541so that the supertype is always `Y`, it is said that the direction `X->Y` is [*correctly 542rounded value preserving], meaning that all such conversions are guaranteed to 543produce results in range and correctly rounded (even if inexact). 544For example, all integer to floating conversions are correctly rounded value preserving. 545 546[endsect] 547 548[endsect] 549 550 551