• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1==============
2File Time Type
3==============
4
5.. contents::
6   :local:
7
8.. _file-time-type-motivation:
9
10Motivation
11==========
12
13The filesystem library provides interfaces for getting and setting the last
14write time of a file or directory. The interfaces use the ``file_time_type``
15type, which is a specialization of ``chrono::time_point`` for the
16"filesystem clock". According to [fs.filesystem.syn]
17
18  trivial-clock is an implementation-defined type that satisfies the
19  Cpp17TrivialClock requirements ([time.clock.req]) and that is capable of
20  representing and measuring file time values. Implementations should ensure
21  that the resolution and range of file_­time_­type reflect the operating
22  system dependent resolution and range of file time values.
23
24
25On POSIX systems, file times are represented using the ``timespec`` struct,
26which is defined as follows:
27
28.. code-block:: cpp
29
30  struct timespec {
31    time_t tv_sec;
32    long   tv_nsec;
33  };
34
35To represent the range and resolution of ``timespec``, we need to (A) have
36nanosecond resolution, and (B) use more than 64 bits (assuming a 64 bit ``time_t``).
37
38As the standard requires us to use the ``chrono`` interface, we have to define
39our own filesystem clock which specifies the period and representation of
40the time points and duration it provides. It will look like this:
41
42.. code-block:: cpp
43
44  struct _FilesystemClock {
45    using period = nano;
46    using rep = TBD; // What is this?
47
48    using duration = chrono::duration<rep, period>;
49    using time_point = chrono::time_point<_FilesystemClock>;
50
51    // ... //
52  };
53
54  using file_time_type = _FilesystemClock::time_point;
55
56
57To get nanosecond resolution, we simply define ``period`` to be ``std::nano``.
58But what type can we use as the arithmetic representation that is capable
59of representing the range of the ``timespec`` struct?
60
61Problems To Consider
62====================
63
64Before considering solutions, let's consider the problems they should solve,
65and how important solving those problems are:
66
67
68Having a Smaller Range than ``timespec``
69----------------------------------------
70
71One solution to the range problem is to simply reduce the resolution of
72``file_time_type`` to be less than that of nanoseconds. This is what libc++'s
73initial implementation of ``file_time_type`` did; it's also what
74``std::system_clock`` does. As a result, it can represent time points about
75292 thousand years on either side of the epoch, as opposed to only 292 years
76at nanosecond resolution.
77
78``timespec`` can represent time points +/- 292 billion years from the epoch
79(just in case you needed a time point 200 billion years before the big bang,
80and with nanosecond resolution).
81
82To get the same range, we would need to drop our resolution to that of seconds
83to come close to having the same range.
84
85This begs the question, is the range problem "really a problem"? Sane usages
86of file time stamps shouldn't exceed +/- 300 years, so should we care to support it?
87
88I believe the answer is yes. We're not designing the filesystem time API, we're
89providing glorified C++ wrappers for it. If the underlying API supports
90a value, then we should too. Our wrappers should not place artificial restrictions
91on users that are not present in the underlying filesystem.
92
93Having a smaller range that the underlying filesystem forces the
94implementation to report ``value_too_large`` errors when it encounters a time
95point that it can't represent. This can cause the call to ``last_write_time``
96to throw in cases where the user was confident the call should succeed. (See below)
97
98
99.. code-block:: cpp
100
101  #include <filesystem>
102  using namespace std::filesystem;
103
104  // Set the times using the system interface.
105  void set_file_times(const char* path, struct timespec ts) {
106    timespec both_times[2];
107    both_times[0] = ts;
108    both_times[1] = ts;
109    int result = ::utimensat(AT_FDCWD, path, both_times, 0);
110    assert(result != -1);
111  }
112
113  // Called elsewhere to set the file time to something insane, and way
114  // out of the 300 year range we might expect.
115  void some_bad_persons_code() {
116    struct timespec new_times;
117    new_times.tv_sec = numeric_limits<time_t>::max();
118    new_times.tv_nsec = 0;
119    set_file_times("/tmp/foo", new_times); // OK, supported by most FSes
120  }
121
122  int main() {
123    path p = "/tmp/foo";
124    file_status st = status(p);
125    if (!exists(st) || !is_regular_file(st))
126      return 1;
127    if ((st.permissions() & perms::others_read) == perms::none)
128      return 1;
129    // It seems reasonable to assume this call should succeed.
130    file_time_type tp = last_write_time(p); // BAD! Throws value_too_large.
131  }
132
133
134Having a Smaller Resolution than ``timespec``
135---------------------------------------------
136
137As mentioned in the previous section, one way to solve the range problem
138is by reducing the resolution. But matching the range of ``timespec`` using a
13964 bit representation requires limiting the resolution to seconds.
140
141So we might ask: Do users "need" nanosecond precision? Is seconds not good enough?
142I limit my consideration of the point to this: Why was it not good enough for
143the underlying system interfaces? If it wasn't good enough for them, then it
144isn't good enough for us. Our job is to match the filesystems range and
145representation, not design it.
146
147
148Having a Larger Range than ``timespec``
149----------------------------------------
150
151We should also consider the opposite problem of having a ``file_time_type``
152that is able to represent a larger range than ``timespec``. At least in
153this case ``last_write_time`` can be used to get and set all possible values
154supported by the underlying filesystem; meaning ``last_write_time(p)`` will
155never throw a overflow error when retrieving a value.
156
157However, this introduces a new problem, where users are allowed to attempt to
158create a time point beyond what the filesystem can represent. Two particular
159values which cause this are ``file_time_type::min()`` and
160``file_time_type::max()``. As a result, the following code would throw:
161
162.. code-block:: cpp
163
164  void test() {
165    last_write_time("/tmp/foo", file_time_type::max()); // Throws
166    last_write_time("/tmp/foo", file_time_type::min()); // Throws.
167  }
168
169Apart from cases explicitly using ``min`` and ``max``, I don't see users taking
170a valid time point, adding a couple hundred billions of years in error,
171and then trying to update a file's write time to that value very often.
172
173Compared to having a smaller range, this problem seems preferable. At least
174now we can represent any time point the filesystem can, so users won't be forced
175to revert back to system interfaces to avoid limitations in the C++ STL.
176
177I posit that we should only consider this concern *after* we have something
178with at least the same range and resolution of the underlying filesystem. The
179latter two problems are much more important to solve.
180
181Potential Solutions And Their Complications
182===========================================
183
184Source Code Portability Across Implementations
185-----------------------------------------------
186
187As we've discussed, ``file_time_type`` needs a representation that uses more
188than 64 bits. The possible solutions include using ``__int128_t``, emulating a
189128 bit integer using a class, or potentially defining a ``timespec`` like
190arithmetic type. All three will allow us to, at minimum, match the range
191and resolution, and the last one might even allow us to match them exactly.
192
193But when considering these potential solutions we need to consider more than
194just the values they can represent. We need to consider the effects they will
195have on users and their code. For example, each of them breaks the following
196code in some way:
197
198.. code-block:: cpp
199
200  // Bug caused by an unexpected 'rep' type returned by count.
201  void print_time(path p) {
202    // __int128_t doesn't have streaming operators, and neither would our
203    // custom arithmetic types.
204    cout << last_write_time(p).time_since_epoch().count() << endl;
205  }
206
207  // Overflow during creation bug.
208  file_time_type timespec_to_file_time_type(struct timespec ts) {
209    // woops! chrono::seconds and chrono::nanoseconds use a 64 bit representation
210    // this may overflow before it's converted to a file_time_type.
211    auto dur = seconds(ts.tv_sec) + nanoseconds(ts.tv_nsec);
212    return file_time_type(dur);
213  }
214
215  file_time_type correct_timespec_to_file_time_type(struct timespec ts) {
216    // This is the correct version of the above example, where we
217    // avoid using the chrono typedefs as they're not sufficient.
218    // Can we expect users to avoid this bug?
219    using fs_seconds = chrono::duration<file_time_type::rep>;
220    using fs_nanoseconds = chrono::duration<file_time_type::rep, nano>;
221    auto dur = fs_seconds(ts.tv_sec) + fs_nanoseconds(tv.tv_nsec);
222    return file_time_type(dur);
223  }
224
225  // Implicit truncation during conversion bug.
226  intmax_t get_time_in_seconds(path p) {
227    using fs_seconds = duration<file_time_type::rep, ratio<1, 1> >;
228    auto tp = last_write_time(p);
229
230    // This works with truncation for __int128_t, but what does it do for
231    // our custom arithmetic types.
232    return duration_cast<fs_seconds>().count();
233  }
234
235
236Each of the above examples would require a user to adjust their filesystem code
237to the particular eccentricities of the representation, hopefully only in such
238a way that the code is still portable across implementations.
239
240At least some of the above issues are unavoidable, no matter what
241representation we choose. But some representations may be quirkier than others,
242and, as I'll argue later, using an actual arithmetic type (``__int128_t``)
243provides the least aberrant behavior.
244
245
246Chrono and ``timespec`` Emulation.
247----------------------------------
248
249One of the options we've considered is using something akin to ``timespec``
250to represent the ``file_time_type``. It only seems natural seeing as that's
251what the underlying system uses, and because it might allow us to match
252the range and resolution exactly. But would it work with chrono? And could
253it still act at all like a ``timespec`` struct?
254
255For ease of consideration, let's consider what the implementation might
256look like.
257
258.. code-block:: cpp
259
260  struct fs_timespec_rep {
261    fs_timespec_rep(long long v)
262      : tv_sec(v / nano::den), tv_nsec(v % nano::den)
263    { }
264  private:
265    time_t tv_sec;
266    long tv_nsec;
267  };
268  bool operator==(fs_timespec_rep, fs_timespec_rep);
269  fs_int128_rep operator+(fs_timespec_rep, fs_timespec_rep);
270  // ... arithmetic operators ... //
271
272The first thing to notice is that we can't construct ``fs_timespec_rep`` like
273a ``timespec`` by passing ``{secs, nsecs}``. Instead we're limited to
274constructing it from a single 64 bit integer.
275
276We also can't allow the user to inspect the ``tv_sec`` or ``tv_nsec`` values
277directly. A ``chrono::duration`` represents its value as a tick period and a
278number of ticks stored using ``rep``. The representation is unaware of the
279tick period it is being used to represent, but ``timespec`` is setup to assume
280a nanosecond tick period; which is the only case where the names ``tv_sec``
281and ``tv_nsec`` match the values they store.
282
283When we convert a nanosecond duration to seconds, ``fs_timespec_rep`` will
284use ``tv_sec`` to represent the number of giga seconds, and ``tv_nsec`` the
285remaining seconds. Let's consider how this might cause a bug were users allowed
286to manipulate the fields directly.
287
288.. code-block:: cpp
289
290  template <class Period>
291  timespec convert_to_timespec(duration<fs_time_rep, Period> dur) {
292    fs_timespec_rep rep = dur.count();
293    return {rep.tv_sec, rep.tv_nsec}; // Oops! Period may not be nanoseconds.
294  }
295
296  template <class Duration>
297  Duration convert_to_duration(timespec ts) {
298    Duration dur({ts.tv_sec, ts.tv_nsec}); // Oops! Period may not be nanoseconds.
299    return file_time_type(dur);
300    file_time_type tp = last_write_time(p);
301    auto dur =
302  }
303
304  time_t extract_seconds(file_time_type tp) {
305    // Converting to seconds is a silly bug, but I could see it happening.
306    using SecsT = chrono::duration<file_time_type::rep, ratio<1, 1>>;
307    auto secs = duration_cast<Secs>(tp.time_since_epoch());
308    // tv_sec is now representing gigaseconds.
309    return secs.count().tv_sec; // Oops!
310  }
311
312Despite ``fs_timespec_rep`` not being usable in any manner resembling
313``timespec``, it still might buy us our goal of matching its range exactly,
314right?
315
316Sort of. Chrono provides a specialization point which specifies the minimum
317and maximum values for a custom representation. It looks like this:
318
319.. code-block:: cpp
320
321  template <>
322  struct duration_values<fs_timespec_rep> {
323    static fs_timespec_rep zero();
324    static fs_timespec_rep min();
325    static fs_timespec_rep max() { // assume friendship.
326      fs_timespec_rep val;
327      val.tv_sec = numeric_limits<time_t>::max();
328      val.tv_nsec = nano::den - 1;
329      return val;
330    }
331  };
332
333Notice that ``duration_values`` doesn't tell the representation what tick
334period it's actually representing. This would indeed correctly limit the range
335of ``duration<fs_timespec_rep, nano>`` to exactly that of ``timespec``. But
336nanoseconds isn't the only tick period it will be used to represent. For
337example:
338
339.. code-block:: cpp
340
341  void test() {
342    using rep = file_time_type::rep;
343    using fs_nsec = duration<rep, nano>;
344    using fs_sec = duration<rep>;
345    fs_nsec nsecs(fs_seconds::max()); // Truncates
346  }
347
348Though the above example may appear silly, I think it follows from the incorrect
349notion that using a ``timespec`` rep in chrono actually makes it act as if it
350were an actual ``timespec``.
351
352Interactions with 32 bit ``time_t``
353-----------------------------------
354
355Up until now we've only be considering cases where ``time_t`` is 64 bits, but what
356about 32 bit systems/builds where ``time_t`` is 32 bits? (this is the common case
357for 32 bit builds).
358
359When ``time_t`` is 32 bits, we can implement ``file_time_type`` simply using 64-bit
360``long long``. There is no need to get either ``__int128_t`` or ``timespec`` emulation
361involved. And nor should we, as it would suffer from the numerous complications
362described by this paper.
363
364Obviously our implementation for 32-bit builds should act as similarly to the
36564-bit build as possible. Code which compiles in one, should compile in the other.
366This consideration is important when choosing between ``__int128_t`` and
367emulating ``timespec``. The solution which provides the most uniformity with
368the least eccentricity is the preferable one.
369
370Summary
371=======
372
373The ``file_time_type`` time point is used to represent the write times for files.
374Its job is to act as part of a C++ wrapper for less ideal system interfaces. The
375underlying filesystem uses the ``timespec`` struct for the same purpose.
376
377However, the initial implementation of ``file_time_type`` could not represent
378either the range or resolution of ``timespec``, making it unsuitable. Fixing
379this requires an implementation which uses more than 64 bits to store the
380time point.
381
382We primarily considered two solutions: Using ``__int128_t`` and using a
383arithmetic emulation of ``timespec``. Each has its pros and cons, and both
384come with more than one complication.
385
386The Potential Solutions
387-----------------------
388
389``long long`` - The Status Quo
390~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
391
392Pros:
393
394* As a type ``long long`` plays the nicest with others:
395
396  * It works with streaming operators and other library entities which support
397    builtin integer types, but don't support ``__int128_t``.
398  * Its the representation used by chrono's ``nanosecond`` and ``second`` typedefs.
399
400Cons:
401
402* It cannot provide the same resolution as ``timespec`` unless we limit it
403  to a range of +/- 300 years from the epoch.
404* It cannot provide the same range as ``timespec`` unless we limit its resolution
405  to seconds.
406* ``last_write_time`` has to report an error when the time reported by the filesystem
407  is unrepresentable.
408
409__int128_t
410~~~~~~~~~~~
411
412Pros:
413
414* It is an integer type.
415* It makes the implementation simple and efficient.
416* Acts exactly like other arithmetic types.
417* Can be implicitly converted to a builtin integer type by the user.
418
419  * This is important for doing things like:
420
421    .. code-block:: cpp
422
423      void c_interface_using_time_t(const char* p, time_t);
424
425      void foo(path p) {
426        file_time_type tp = last_write_time(p);
427        time_t secs = duration_cast<seconds>(tp.time_since_epoch()).count();
428        c_interface_using_time_t(p.c_str(), secs);
429      }
430
431Cons:
432
433* It isn't always available (but on 64 bit machines, it normally is).
434* It causes ``file_time_type`` to have a larger range than ``timespec``.
435* It doesn't always act the same as other builtin integer types. For example
436  with ``cout`` or ``to_string``.
437* Allows implicit truncation to 64 bit integers.
438* It can be implicitly converted to a builtin integer type by the user,
439  truncating its value.
440
441Arithmetic ``timespec`` Emulation
442~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
443
444Pros:
445
446* It has the exact same range and resolution of ``timespec`` when representing
447  a nanosecond tick period.
448* It's always available, unlike ``__int128_t``.
449
450Cons:
451
452* It has a larger range when representing any period longer than a nanosecond.
453* Doesn't actually allow users to use it like a ``timespec``.
454* The required representation of using ``tv_sec`` to store the giga tick count
455  and ``tv_nsec`` to store the remainder adds nothing over a 128 bit integer,
456  but complicates a lot.
457* It isn't a builtin integer type, and can't be used anything like one.
458* Chrono can be made to work with it, but not nicely.
459* Emulating arithmetic classes come with their own host of problems regarding
460  overload resolution (Each operator needs three SFINAE constrained versions of
461  it in order to act like builtin integer types).
462* It offers little over simply using ``__int128_t``.
463* It acts the most differently than implementations using an actual integer type,
464  which has a high chance of breaking source compatibility.
465
466
467Selected Solution - Using ``__int128_t``
468=========================================
469
470The solution I selected for libc++ is using ``__int128_t`` when available,
471and otherwise falling back to using ``long long`` with nanosecond precision.
472
473When ``__int128_t`` is available, or when ``time_t`` is 32-bits, the implementation
474provides same resolution and a greater range than ``timespec``. Otherwise
475it still provides the same resolution, but is limited to a range of +/- 300
476years. This final case should be rather rare, as ``__int128_t``
477is normally available in 64-bit builds, and ``time_t`` is normally 32-bits
478during 32-bit builds.
479
480Although falling back to ``long long`` and nanosecond precision is less than
481ideal, it also happens to be the implementation provided by both libstdc++
482and MSVC. (So that makes it better, right?)
483
484Although the ``timespec`` emulation solution is feasible and would largely
485do what we want, it comes with too many complications, potential problems
486and discrepancies when compared to "normal" chrono time points and durations.
487
488An emulation of a builtin arithmetic type using a class is never going to act
489exactly the same, and the difference will be felt by users. It's not reasonable
490to expect them to tolerate and work around these differences. And once
491we commit to an ABI it will be too late to change. Committing to this seems
492risky.
493
494Therefore, ``__int128_t`` seems like the better solution.
495