1============== 2File Time Type 3============== 4 5.. contents:: 6 :local: 7 8.. _file-time-type-motivation: 9 10Motivation 11========== 12 13The filesystem library provides interfaces for getting and setting the last 14write time of a file or directory. The interfaces use the ``file_time_type`` 15type, which is a specialization of ``chrono::time_point`` for the 16"filesystem clock". According to [fs.filesystem.syn] 17 18 trivial-clock is an implementation-defined type that satisfies the 19 Cpp17TrivialClock requirements ([time.clock.req]) and that is capable of 20 representing and measuring file time values. Implementations should ensure 21 that the resolution and range of file_time_type reflect the operating 22 system dependent resolution and range of file time values. 23 24 25On POSIX systems, file times are represented using the ``timespec`` struct, 26which is defined as follows: 27 28.. code-block:: cpp 29 30 struct timespec { 31 time_t tv_sec; 32 long tv_nsec; 33 }; 34 35To represent the range and resolution of ``timespec``, we need to (A) have 36nanosecond resolution, and (B) use more than 64 bits (assuming a 64 bit ``time_t``). 37 38As the standard requires us to use the ``chrono`` interface, we have to define 39our own filesystem clock which specifies the period and representation of 40the time points and duration it provides. It will look like this: 41 42.. code-block:: cpp 43 44 struct _FilesystemClock { 45 using period = nano; 46 using rep = TBD; // What is this? 47 48 using duration = chrono::duration<rep, period>; 49 using time_point = chrono::time_point<_FilesystemClock>; 50 51 // ... // 52 }; 53 54 using file_time_type = _FilesystemClock::time_point; 55 56 57To get nanosecond resolution, we simply define ``period`` to be ``std::nano``. 58But what type can we use as the arithmetic representation that is capable 59of representing the range of the ``timespec`` struct? 60 61Problems To Consider 62==================== 63 64Before considering solutions, let's consider the problems they should solve, 65and how important solving those problems are: 66 67 68Having a Smaller Range than ``timespec`` 69---------------------------------------- 70 71One solution to the range problem is to simply reduce the resolution of 72``file_time_type`` to be less than that of nanoseconds. This is what libc++'s 73initial implementation of ``file_time_type`` did; it's also what 74``std::system_clock`` does. As a result, it can represent time points about 75292 thousand years on either side of the epoch, as opposed to only 292 years 76at nanosecond resolution. 77 78``timespec`` can represent time points +/- 292 billion years from the epoch 79(just in case you needed a time point 200 billion years before the big bang, 80and with nanosecond resolution). 81 82To get the same range, we would need to drop our resolution to that of seconds 83to come close to having the same range. 84 85This begs the question, is the range problem "really a problem"? Sane usages 86of file time stamps shouldn't exceed +/- 300 years, so should we care to support it? 87 88I believe the answer is yes. We're not designing the filesystem time API, we're 89providing glorified C++ wrappers for it. If the underlying API supports 90a value, then we should too. Our wrappers should not place artificial restrictions 91on users that are not present in the underlying filesystem. 92 93Having a smaller range that the underlying filesystem forces the 94implementation to report ``value_too_large`` errors when it encounters a time 95point that it can't represent. This can cause the call to ``last_write_time`` 96to throw in cases where the user was confident the call should succeed. (See below) 97 98 99.. code-block:: cpp 100 101 #include <filesystem> 102 using namespace std::filesystem; 103 104 // Set the times using the system interface. 105 void set_file_times(const char* path, struct timespec ts) { 106 timespec both_times[2]; 107 both_times[0] = ts; 108 both_times[1] = ts; 109 int result = ::utimensat(AT_FDCWD, path, both_times, 0); 110 assert(result != -1); 111 } 112 113 // Called elsewhere to set the file time to something insane, and way 114 // out of the 300 year range we might expect. 115 void some_bad_persons_code() { 116 struct timespec new_times; 117 new_times.tv_sec = numeric_limits<time_t>::max(); 118 new_times.tv_nsec = 0; 119 set_file_times("/tmp/foo", new_times); // OK, supported by most FSes 120 } 121 122 int main() { 123 path p = "/tmp/foo"; 124 file_status st = status(p); 125 if (!exists(st) || !is_regular_file(st)) 126 return 1; 127 if ((st.permissions() & perms::others_read) == perms::none) 128 return 1; 129 // It seems reasonable to assume this call should succeed. 130 file_time_type tp = last_write_time(p); // BAD! Throws value_too_large. 131 } 132 133 134Having a Smaller Resolution than ``timespec`` 135--------------------------------------------- 136 137As mentioned in the previous section, one way to solve the range problem 138is by reducing the resolution. But matching the range of ``timespec`` using a 13964 bit representation requires limiting the resolution to seconds. 140 141So we might ask: Do users "need" nanosecond precision? Is seconds not good enough? 142I limit my consideration of the point to this: Why was it not good enough for 143the underlying system interfaces? If it wasn't good enough for them, then it 144isn't good enough for us. Our job is to match the filesystems range and 145representation, not design it. 146 147 148Having a Larger Range than ``timespec`` 149---------------------------------------- 150 151We should also consider the opposite problem of having a ``file_time_type`` 152that is able to represent a larger range than ``timespec``. At least in 153this case ``last_write_time`` can be used to get and set all possible values 154supported by the underlying filesystem; meaning ``last_write_time(p)`` will 155never throw a overflow error when retrieving a value. 156 157However, this introduces a new problem, where users are allowed to attempt to 158create a time point beyond what the filesystem can represent. Two particular 159values which cause this are ``file_time_type::min()`` and 160``file_time_type::max()``. As a result, the following code would throw: 161 162.. code-block:: cpp 163 164 void test() { 165 last_write_time("/tmp/foo", file_time_type::max()); // Throws 166 last_write_time("/tmp/foo", file_time_type::min()); // Throws. 167 } 168 169Apart from cases explicitly using ``min`` and ``max``, I don't see users taking 170a valid time point, adding a couple hundred billions of years in error, 171and then trying to update a file's write time to that value very often. 172 173Compared to having a smaller range, this problem seems preferable. At least 174now we can represent any time point the filesystem can, so users won't be forced 175to revert back to system interfaces to avoid limitations in the C++ STL. 176 177I posit that we should only consider this concern *after* we have something 178with at least the same range and resolution of the underlying filesystem. The 179latter two problems are much more important to solve. 180 181Potential Solutions And Their Complications 182=========================================== 183 184Source Code Portability Across Implementations 185----------------------------------------------- 186 187As we've discussed, ``file_time_type`` needs a representation that uses more 188than 64 bits. The possible solutions include using ``__int128_t``, emulating a 189128 bit integer using a class, or potentially defining a ``timespec`` like 190arithmetic type. All three will allow us to, at minimum, match the range 191and resolution, and the last one might even allow us to match them exactly. 192 193But when considering these potential solutions we need to consider more than 194just the values they can represent. We need to consider the effects they will 195have on users and their code. For example, each of them breaks the following 196code in some way: 197 198.. code-block:: cpp 199 200 // Bug caused by an unexpected 'rep' type returned by count. 201 void print_time(path p) { 202 // __int128_t doesn't have streaming operators, and neither would our 203 // custom arithmetic types. 204 cout << last_write_time(p).time_since_epoch().count() << endl; 205 } 206 207 // Overflow during creation bug. 208 file_time_type timespec_to_file_time_type(struct timespec ts) { 209 // woops! chrono::seconds and chrono::nanoseconds use a 64 bit representation 210 // this may overflow before it's converted to a file_time_type. 211 auto dur = seconds(ts.tv_sec) + nanoseconds(ts.tv_nsec); 212 return file_time_type(dur); 213 } 214 215 file_time_type correct_timespec_to_file_time_type(struct timespec ts) { 216 // This is the correct version of the above example, where we 217 // avoid using the chrono typedefs as they're not sufficient. 218 // Can we expect users to avoid this bug? 219 using fs_seconds = chrono::duration<file_time_type::rep>; 220 using fs_nanoseconds = chrono::duration<file_time_type::rep, nano>; 221 auto dur = fs_seconds(ts.tv_sec) + fs_nanoseconds(tv.tv_nsec); 222 return file_time_type(dur); 223 } 224 225 // Implicit truncation during conversion bug. 226 intmax_t get_time_in_seconds(path p) { 227 using fs_seconds = duration<file_time_type::rep, ratio<1, 1> >; 228 auto tp = last_write_time(p); 229 230 // This works with truncation for __int128_t, but what does it do for 231 // our custom arithmetic types. 232 return duration_cast<fs_seconds>().count(); 233 } 234 235 236Each of the above examples would require a user to adjust their filesystem code 237to the particular eccentricities of the representation, hopefully only in such 238a way that the code is still portable across implementations. 239 240At least some of the above issues are unavoidable, no matter what 241representation we choose. But some representations may be quirkier than others, 242and, as I'll argue later, using an actual arithmetic type (``__int128_t``) 243provides the least aberrant behavior. 244 245 246Chrono and ``timespec`` Emulation. 247---------------------------------- 248 249One of the options we've considered is using something akin to ``timespec`` 250to represent the ``file_time_type``. It only seems natural seeing as that's 251what the underlying system uses, and because it might allow us to match 252the range and resolution exactly. But would it work with chrono? And could 253it still act at all like a ``timespec`` struct? 254 255For ease of consideration, let's consider what the implementation might 256look like. 257 258.. code-block:: cpp 259 260 struct fs_timespec_rep { 261 fs_timespec_rep(long long v) 262 : tv_sec(v / nano::den), tv_nsec(v % nano::den) 263 { } 264 private: 265 time_t tv_sec; 266 long tv_nsec; 267 }; 268 bool operator==(fs_timespec_rep, fs_timespec_rep); 269 fs_int128_rep operator+(fs_timespec_rep, fs_timespec_rep); 270 // ... arithmetic operators ... // 271 272The first thing to notice is that we can't construct ``fs_timespec_rep`` like 273a ``timespec`` by passing ``{secs, nsecs}``. Instead we're limited to 274constructing it from a single 64 bit integer. 275 276We also can't allow the user to inspect the ``tv_sec`` or ``tv_nsec`` values 277directly. A ``chrono::duration`` represents its value as a tick period and a 278number of ticks stored using ``rep``. The representation is unaware of the 279tick period it is being used to represent, but ``timespec`` is setup to assume 280a nanosecond tick period; which is the only case where the names ``tv_sec`` 281and ``tv_nsec`` match the values they store. 282 283When we convert a nanosecond duration to seconds, ``fs_timespec_rep`` will 284use ``tv_sec`` to represent the number of giga seconds, and ``tv_nsec`` the 285remaining seconds. Let's consider how this might cause a bug were users allowed 286to manipulate the fields directly. 287 288.. code-block:: cpp 289 290 template <class Period> 291 timespec convert_to_timespec(duration<fs_time_rep, Period> dur) { 292 fs_timespec_rep rep = dur.count(); 293 return {rep.tv_sec, rep.tv_nsec}; // Oops! Period may not be nanoseconds. 294 } 295 296 template <class Duration> 297 Duration convert_to_duration(timespec ts) { 298 Duration dur({ts.tv_sec, ts.tv_nsec}); // Oops! Period may not be nanoseconds. 299 return file_time_type(dur); 300 file_time_type tp = last_write_time(p); 301 auto dur = 302 } 303 304 time_t extract_seconds(file_time_type tp) { 305 // Converting to seconds is a silly bug, but I could see it happening. 306 using SecsT = chrono::duration<file_time_type::rep, ratio<1, 1>>; 307 auto secs = duration_cast<Secs>(tp.time_since_epoch()); 308 // tv_sec is now representing gigaseconds. 309 return secs.count().tv_sec; // Oops! 310 } 311 312Despite ``fs_timespec_rep`` not being usable in any manner resembling 313``timespec``, it still might buy us our goal of matching its range exactly, 314right? 315 316Sort of. Chrono provides a specialization point which specifies the minimum 317and maximum values for a custom representation. It looks like this: 318 319.. code-block:: cpp 320 321 template <> 322 struct duration_values<fs_timespec_rep> { 323 static fs_timespec_rep zero(); 324 static fs_timespec_rep min(); 325 static fs_timespec_rep max() { // assume friendship. 326 fs_timespec_rep val; 327 val.tv_sec = numeric_limits<time_t>::max(); 328 val.tv_nsec = nano::den - 1; 329 return val; 330 } 331 }; 332 333Notice that ``duration_values`` doesn't tell the representation what tick 334period it's actually representing. This would indeed correctly limit the range 335of ``duration<fs_timespec_rep, nano>`` to exactly that of ``timespec``. But 336nanoseconds isn't the only tick period it will be used to represent. For 337example: 338 339.. code-block:: cpp 340 341 void test() { 342 using rep = file_time_type::rep; 343 using fs_nsec = duration<rep, nano>; 344 using fs_sec = duration<rep>; 345 fs_nsec nsecs(fs_seconds::max()); // Truncates 346 } 347 348Though the above example may appear silly, I think it follows from the incorrect 349notion that using a ``timespec`` rep in chrono actually makes it act as if it 350were an actual ``timespec``. 351 352Interactions with 32 bit ``time_t`` 353----------------------------------- 354 355Up until now we've only be considering cases where ``time_t`` is 64 bits, but what 356about 32 bit systems/builds where ``time_t`` is 32 bits? (this is the common case 357for 32 bit builds). 358 359When ``time_t`` is 32 bits, we can implement ``file_time_type`` simply using 64-bit 360``long long``. There is no need to get either ``__int128_t`` or ``timespec`` emulation 361involved. And nor should we, as it would suffer from the numerous complications 362described by this paper. 363 364Obviously our implementation for 32-bit builds should act as similarly to the 36564-bit build as possible. Code which compiles in one, should compile in the other. 366This consideration is important when choosing between ``__int128_t`` and 367emulating ``timespec``. The solution which provides the most uniformity with 368the least eccentricity is the preferable one. 369 370Summary 371======= 372 373The ``file_time_type`` time point is used to represent the write times for files. 374Its job is to act as part of a C++ wrapper for less ideal system interfaces. The 375underlying filesystem uses the ``timespec`` struct for the same purpose. 376 377However, the initial implementation of ``file_time_type`` could not represent 378either the range or resolution of ``timespec``, making it unsuitable. Fixing 379this requires an implementation which uses more than 64 bits to store the 380time point. 381 382We primarily considered two solutions: Using ``__int128_t`` and using a 383arithmetic emulation of ``timespec``. Each has its pros and cons, and both 384come with more than one complication. 385 386The Potential Solutions 387----------------------- 388 389``long long`` - The Status Quo 390~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 391 392Pros: 393 394* As a type ``long long`` plays the nicest with others: 395 396 * It works with streaming operators and other library entities which support 397 builtin integer types, but don't support ``__int128_t``. 398 * Its the representation used by chrono's ``nanosecond`` and ``second`` typedefs. 399 400Cons: 401 402* It cannot provide the same resolution as ``timespec`` unless we limit it 403 to a range of +/- 300 years from the epoch. 404* It cannot provide the same range as ``timespec`` unless we limit its resolution 405 to seconds. 406* ``last_write_time`` has to report an error when the time reported by the filesystem 407 is unrepresentable. 408 409__int128_t 410~~~~~~~~~~~ 411 412Pros: 413 414* It is an integer type. 415* It makes the implementation simple and efficient. 416* Acts exactly like other arithmetic types. 417* Can be implicitly converted to a builtin integer type by the user. 418 419 * This is important for doing things like: 420 421 .. code-block:: cpp 422 423 void c_interface_using_time_t(const char* p, time_t); 424 425 void foo(path p) { 426 file_time_type tp = last_write_time(p); 427 time_t secs = duration_cast<seconds>(tp.time_since_epoch()).count(); 428 c_interface_using_time_t(p.c_str(), secs); 429 } 430 431Cons: 432 433* It isn't always available (but on 64 bit machines, it normally is). 434* It causes ``file_time_type`` to have a larger range than ``timespec``. 435* It doesn't always act the same as other builtin integer types. For example 436 with ``cout`` or ``to_string``. 437* Allows implicit truncation to 64 bit integers. 438* It can be implicitly converted to a builtin integer type by the user, 439 truncating its value. 440 441Arithmetic ``timespec`` Emulation 442~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 443 444Pros: 445 446* It has the exact same range and resolution of ``timespec`` when representing 447 a nanosecond tick period. 448* It's always available, unlike ``__int128_t``. 449 450Cons: 451 452* It has a larger range when representing any period longer than a nanosecond. 453* Doesn't actually allow users to use it like a ``timespec``. 454* The required representation of using ``tv_sec`` to store the giga tick count 455 and ``tv_nsec`` to store the remainder adds nothing over a 128 bit integer, 456 but complicates a lot. 457* It isn't a builtin integer type, and can't be used anything like one. 458* Chrono can be made to work with it, but not nicely. 459* Emulating arithmetic classes come with their own host of problems regarding 460 overload resolution (Each operator needs three SFINAE constrained versions of 461 it in order to act like builtin integer types). 462* It offers little over simply using ``__int128_t``. 463* It acts the most differently than implementations using an actual integer type, 464 which has a high chance of breaking source compatibility. 465 466 467Selected Solution - Using ``__int128_t`` 468========================================= 469 470The solution I selected for libc++ is using ``__int128_t`` when available, 471and otherwise falling back to using ``long long`` with nanosecond precision. 472 473When ``__int128_t`` is available, or when ``time_t`` is 32-bits, the implementation 474provides same resolution and a greater range than ``timespec``. Otherwise 475it still provides the same resolution, but is limited to a range of +/- 300 476years. This final case should be rather rare, as ``__int128_t`` 477is normally available in 64-bit builds, and ``time_t`` is normally 32-bits 478during 32-bit builds. 479 480Although falling back to ``long long`` and nanosecond precision is less than 481ideal, it also happens to be the implementation provided by both libstdc++ 482and MSVC. (So that makes it better, right?) 483 484Although the ``timespec`` emulation solution is feasible and would largely 485do what we want, it comes with too many complications, potential problems 486and discrepancies when compared to "normal" chrono time points and durations. 487 488An emulation of a builtin arithmetic type using a class is never going to act 489exactly the same, and the difference will be felt by users. It's not reasonable 490to expect them to tolerate and work around these differences. And once 491we commit to an ABI it will be too late to change. Committing to this seems 492risky. 493 494Therefore, ``__int128_t`` seems like the better solution. 495