• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# I/O (Input / Output)
2
3Wuffs per se doesn't have the ability to read or write to files or network
4connections. Recall that Wuffs is a programming language for writing libraries,
5not applications, and that having fewer capabilities means that it's trivial to
6prove that you can't misuse a capability, even when given malicious input.
7
8Instead, the code that calls into Wuffs libraries is responsible for
9interfacing with e.g. the file system or the network system. An `io_buffer` is
10the mechanism for transferring data into and out of Wuffs libraries. For
11example, when decompressing gzip, there are two `io_buffer`'s: the caller fills
12a source buffer with e.g. the compressed file's contents and the callee (the
13Wuffs library) reads compressed bytes from that source buffer and writes
14decompressed bytes to a destination buffer.
15
16
17## I/O Buffers
18
19An `io_buffer` is a [slice](/doc/note/slices-and-tables.md) of bytes (the data,
20a `ptr` and `len`) with additional fields (the metadata): a read index (`ri`),
21a write index (`wi`), a position (`pos`) and whether or not it is `closed`.
22
23
24## Read Index and Write Index
25
26Writing to an `io_buffer`, e.g. copying from a file to a buffer, increments
27`wi`. The buffer is full for writing (no more can be written) when `wi` equals
28`len`. Writing does not have to fill a buffer before further processing.
29
30Reading from an `io_buffer`, e.g. copying from a buffer to a file, increments
31`ri`. The buffer is empty for reading (no more can be read) when `ri` equals
32`wi`. Reading does not have to empty a buffer before further processing.
33
34An invariant condition is that `((0 <= ri) and (ri <= wi) and (wi <= len))`.
35
36Having separate read and write indexes simplifies connecting a sequence of
37filters or processors with `io_buffer`'s, similar to connecting Unix processes
38with pipes. Each filter reads from the previous buffer and writes to the next
39buffer. Each buffer is written to by the previous filter and is read from by
40the next filter. There's no need to flip a buffer between reading and writing
41modes. Nonetheless, `io_buffer`'s are generally not thread-safe.
42
43Continuing the "decompressing gzip" example, the application would write to the
44source buffer by copying from e.g. `/dev/stdin`. The Wuffs library would read
45from the source buffer and write to the destination buffer. The application
46would read from the destination buffer by copying to e.g. `stdout`. Buffer
47space can be re-used, via compaction (see below), so that neither the source
48(`/dev/stdin`) or destination (`/dev/stdout`) data needs to be entirely in
49memory at any point.
50
51For example, an `io_buffer` of length 8 could have 4 bytes available to read
52and 1 byte available to write. If 1 byte was written, there would then be 5
53bytes available to read. Visually:
54
55```
56[.. .. .. .. .. .. .. ..]
57 |<- ri ->|           |  |
58 |<------- wi ------->|  |
59 |<-------- len -------->|
60```
61
62
63## Position
64
65An `io_buffer` is a sliding window into a stream of bytes. Its position (`pos`)
66is the number of bytes in the stream prior to the first element of the slice.
67The total number of bytes read from and written to the stream are therefore
68`(pos + ri)` and `(pos + wi)`.
69
70While every slice element is in-memory, the stream's prior bytes do not
71necessarily have to be in-memory now, or have been in-memory in the past. It is
72valid to open a file, seek to the 1000'th byte and start copying from there to
73an `io_buffer`, provided that `pos` was also initialized to 1000.
74
75
76## Closed-ness
77
78The `closed` field indicates that no further writes are expected to the
79`io_buffer`. When copying from a file to a buffer, `closed` means that we have
80reached EOF (End Of File).
81
82For example, decoding a particular file format might, at some point, expect at
83least another 4 bytes of data, but only 3 are available to read. If `closed` is
84false, this isn't necessarily an error, since an `io_buffer` holds only a
85partial view of the underlying data stream, and more data might be forthcoming
86but not yet buffered. If `closed` is true, it is definitely an error.
87
88
89## Undoing Reads and Writes
90
91It is possible to decrement `ri` or `wi`, undoing previous reads or writes,
92provided that the invariant `((0 <= ri) and (ri <= wi) and (wi <= len))` holds.
93For example, it can be faster on 64 bit (8 byte) systems, if buffer space is
94available, to write 8 bytes and then undo 1 byte than to write exactly 7 bytes.
95
96The Wuffs compiler enforces that, during a Wuffs function, `ri` and `wi` will
97never be decremented (by an undo operation) to be less than the initial values
98at the time of the call. When considering a function as a 'black box', the two
99indexes can only travel forward, and it is up to the application code (not
100Wuffs library) code to rewind the indexes (e.g. by compaction).
101
102Even though `ri` can not drop below its initial value, Wuffs code can still
103read the contents of the slice before `ri` (in sub-slice notation,
104`data[0:ri]`) and it should still contain  the `(pos + 0)`th, `(pos + 1)`th,
105etc. byte of the stream.
106
107The contents of the slice after `wi` (in sub-slice notation, `data[wi:len]`)
108are undefined, and code should not rely on its values. When passing an
109`io_buffer` into a function, that function is free to modify anything in
110`data[wi:len]`, for either value of `wi` before or after the function returns.
111
112
113## Compaction
114
115Compacting an `io_buffer` moves any written but unread bytes (those in
116`data[ri:wi]`) to the start of the buffer, and updates the metadata fields
117`ri`, `wi` and `pos`. Equivalently, it moves the sliding window that is the
118`io_buffer` as far forward as possible along the stream.
119
120This generally increases `(len - wi)`, the number of bytes available for
121writing, allowing for re-using the allocated buffer memory (the data slice).
122
123Suppose that the underlying data stream's `i`th byte has value `i`, and that we
124start with `ri`, `wi` and `pos` were `3`, `7` and `20`. Compaction will
125subtract 3 from the first two and add 3 to the last, so that the new `ri`, `wi`
126and `pos` are `0`, `4` and `23`. Note that `len`, `(pos + ri)` and `(pos + wi)`
127are all unchanged.
128
129Here are two equivalent visualizations of before and after compaction. The `xx`
130means a byte whose value is undefined (as it is at or past `wi`).
131
132The first visualization is where the slice is fixed and its contents (its view
133of the stream) moves relative to the slice:
134
135```
136Before:
137[20 21 22 23 24 25 26 xx]
138 |<- ri ->|           |  |
139 |<------- wi ------->|  |
140 |<-------- len -------->|
141
142After:
143[23 24 25 26 xx xx xx xx]
144 ||          |           |
145 |<-- wi --->|           |
146 |<-------- len -------->|
147```
148
149The second visualization is where the stream (and its contents) is fixed and
150the slice (the sliding window) moves relative to the stream:
151
152```
153                           pos+ri      pos+wi
154                           |           |
155Before:          [20 21 22 23 24 25 26 xx]
156Stream: ... 18 19 20 21 22 23 24 25 26 27 27 28 29 30 31 ...
157After:                    [23 24 25 26 xx xx xx xx]
158                           |           |
159                           pos+ri      pos+wi
160```
161
162
163## Seeking and I/O Positions
164
165Recall that Wuffs code has limited capabilities, and cannot seek in the
166underlying I/O data streams per se. When it needs to seek (e.g. when jumping
167between video frames), it will typically provide an "I/O position", a
168`uint64_t` value, via some package-specific API. The application (the caller of
169the Wuffs code) is then responsible for configuring an `io_buffer` whose
170`(pos + ri)` or `(pos + wi)` value, depending on whether we're reading or
171writing, is at that "I/O position".
172
173If the underlying file (or equivalent) isn't seekable, e.g. it's `/dev/stdin`
174instead of a regular file, then the request cannot be satisfied. The
175application should then decide whether that error is recoverable or fatal. This
176is the application's responsibility, not the library's, as the application
177usually has more context to make that decision.
178
179If that "I/O position" is already within the sliding window, it might not be
180necessary to seek in the underlying file, as it may be possible to e.g. simply
181decrement `ri` to reach a target `(pos + ri)`, for the reading case. Otherwise,
182the typical process is:
183
1841. Set `ri`, `wi` and `pos` to `0`, `0` and that "I/O position". This discards
185   any buffered data (but does not free the buffer's memory).
1862. Seek in the underlying file to that same "I/O position".
1873. Copy from the underlying file to the `io_buffer`, incrementing `wi`.
188
189Whether or not it was necessary to seek and copy from the underlying file, when
190calling back into the Wuffs library, it typically checks that the `io_buffer`'s
191`(pos + ri)` is now at the expected "I/O position".
192
193
194## I/O Reader and I/O Writer
195
196An `io_buffer` is the mechanism for transferring data between the application
197and the Wuffs library. Application code can manipulate an `io_buffer`'s fields
198as it wishes (but is responsible for maintaining the invariant condition).
199Wuffs library code places a further restriction that `io_buffer`s are used
200exclusively either for reading or for writing, as optimizing incremental access
201to an `io_buffer`'s data, while enforcing invariants, is simpler when only one
202of `ri` and `wi` can vary.
203
204Wuffs code therefore refers to either a `base.io_reader` or `base.io_writer`,
205both of which are essentially the same type (an `io_buffer`) with different
206methods. Wuffs code does not reference an `io_buffer` directly.
207
208
209## Binding
210
211TODO: discuss `io_bind`, which temporarily adapts a slice of bytes into an
212`io_buffer`.
213