1This document describes some things to know about the Ogg format, as well 2as implementation details in GStreamer. 3 4INTRODUCTION 5============ 6 7ogg and the granulepos 8---------------------- 9 10An ogg stream contains pages with a serial number and a granulepos. 11The granulepos is a 64 bit signed integer. It is a value that in some way 12represents a time since the start of the stream. 13The interpretation as such is however both codec-specific and 14stream-specific. 15 16ogg has no notion of time: it only knows about bytes and granulepos values 17on pages. 18 19The granule position is just a number; the only guarantee for a valid ogg 20stream is that within a logical stream, this number never decreases. 21 22While logically a granulepos value can be constructed for every ogg packet, 23the page is marked with only one granulepos value: the granulepos of the 24last packet to end on that page. 25 26theora and the granulepos 27------------------------- 28 29The granulepos in theora is an encoding of the frame number of the last 30key frame ("i frame"), and the number of frames since the last key frame 31("p frame"). The granulepos is constructed as the sum of the first number, 32shifted to the left for granuleshift bits, and the second number: 33granulepos = (pframe << granuleshift) + iframe 34 35(This means that given a framenumber or a timestamp, one cannot generate 36 the one and only granulepos for that page; several granulepos possibilities 37 correspond to this frame number. You also need the last keyframe, as well 38 as the granuleshift. 39 However, given a granulepos, the theora codec can still map that to a 40 unique timestamp and frame number for that theora stream) 41 42 Note: currently theora stores the "presentation time" as the granulepos; 43 ie. a first data page with one packet contains one video frame and 44 will be marked with 0/0. Changing that to be 1/0 (so that it 45 represents the number of decodable frames up to that point, like 46 for Vorbis) is being discussed. 47 48vorbis and granulepos 49--------------------- 50 51In Vorbis, the granulepos represents the number of samples that can be 52decoded from all packets up to that point. 53 54In GStreamer, the vorbisenc elements produces a stream where: 55- OFFSET is the time corresponding to the granulepos 56 number of bytes produced before 57- OFFSET_END is the granulepos of the produced vorbis buffer 58- TIMESTAMP is the timestamp matching the begin of the buffer 59- DURATION is set to the length in time of the buffer 60 61Ogg media mapping 62----------------- 63 64Ogg defines a mapping for each media type that it embeds. 65 66For Vorbis: 67 68 - 3 header pages, with granulepos 0. 69 - 1 page with 1 packet header identification 70 - N pages with 2 packets comments and codebooks 71 - granulepos is samplenumber of next page 72 - one packet can contain a variable number of samples but one frame 73 that should be handed to the vorbis decoder. 74 75For Theora 76 77 - 3 header pages, with granulepos 0. 78 - 1 page with 1 packet header identification 79 - N pages with 2 packets comments and codebooks 80 - granulepos is framenumber of last packet in page, where framenumber 81 is a combination of keyframe number and p frames since keyframe. 82 - one packet contains 1 frame 83 84 85 86 87DEMUXING 88======== 89 90ogg demuxer 91----------- 92 93This ogg demuxer has two modes of operation, which both share a significant 94amount of code. The first mode is the streaming mode which is automatically 95selected when the demuxer is connected to a non-getrange based element. When 96connected to a getrange based element the ogg demuxer can do full seeking 97with great efficiency. 98 991) the streaming mode. 100 101In this mode, the ogg demuxer receives buffers in the _chain() function which 102are then simply submitted to the ogg sync layer. Pages are then processed when 103the sync layer detects them, pads are created for new chains and packets are 104sent to the peer elements of the pads. 105 106In this mode, no seeking is possible. This is the typical case when the 107stream is read from a network source. 108 109In this mode, no setup is done at startup, the pages are just read and decoded. 110A new logical chain is detected when one of the pages has the BOS flag set. At 111this point the existing pads are removed and new pads are created for all the 112logical streams in this new chain. 113 114 1152) the random access mode. 116 117 In this mode, the ogg file is first scanned to detect the position and length 118of all chains. This scanning is performed using a recursive binary search 119algorithm that is explained below. 120 121 find_chains(start, end) 122 { 123 ret1 = read_next_pages (start); 124 ret2 = read_prev_page (end); 125 126 if (WAS_HEADER (ret1)) { 127 } 128 else { 129 } 130 131 } 132 133 a) read first and last pages 134 135 start end 136 V V 137 +-----------------------+-------------+--------------------+ 138 | 111 | 222 | 333 | 139 BOS BOS BOS EOS 140 141 142 after reading start, serial 111, BOS, chain[0] = 111 143 after reading end, serial 333, EOS 144 145 start serialno != end serialno, binary search start, (end-start)/2 146 147 start bisect end 148 V V V 149 +-----------------------+-------------+--------------------+ 150 | 111 | 222 | 333 | 151 152 153 after reading start, serial 111, BOS, chain[0] = 111 154 after reading end, serial 222, EOS 155 156 while ( 157 158 159 160testcases 161--------- 162 163 a) stream without BOS 164 165 +----------------------------------------------------------+ 166 111 | 167 EOS 168 169 b) chained stream, first chain without BOS 170 171 +-------------------+--------------------------------------+ 172 111 | 222 | 173 BOS EOS 174 175 176 c) chained stream 177 178 +-------------------+--------------------------------------+ 179 | 111 | 222 | 180 BOS BOS EOS 181 182 183 d) chained stream, second without BOS 184 185 +-------------------+--------------------------------------+ 186 | 111 | 222 | 187 BOS EOS 188 189What can an ogg demuxer do? 190--------------------------- 191 192An ogg demuxer can read pages and get the granulepos from them. 193It can ask the decoder elements to convert a granulepos to time. 194 195An ogg demuxer can also get the granulepos of the first and the last page of a 196stream to get the start and end timestamp of that stream. 197It can also get the length in bytes of the stream 198(when the peer is seekable, that is). 199 200An ogg demuxer is therefore basically able to seek to any byte position and 201timestamp. 202 203When asked to seek to a given granulepos, the ogg demuxer should always convert 204the value to a timestamp using the peer decoder element conversion function. It 205can then binary search the file to eventually end up on the page with the given 206granule pos or a granulepos with the same timestamp. 207 208Seeking in ogg currently 209------------------------ 210 211When seeking in an ogg, the decoders can choose to forward the seek event as a 212granulepos or a timestamp to the ogg demuxer. 213 214In the case of a granulepos, the ogg demuxer will seek back to the beginning of 215the stream and skip pages until it finds one with the requested timestamp. 216 217In the case of a timestamp, the ogg demuxer also seeks back to the beginning of 218the stream. For each page it reads, it asks the decoder element to convert the 219granulepos back to a timestamp. The ogg demuxer keeps on skipping pages until 220the page has a timestamp bigger or equal to the requested one. 221 222It is therefore important that the decoder elements in vorbis can convert a 223granulepos into a timestamp or never seek on timestamp on the oggdemuxer. 224 225The default format on the oggdemuxer source pads is currently defined as a the 226granulepos of the packets, it is also the value of the OFFSET field in the 227GstBuffer. 228 229MUXING 230====== 231 232Oggmux 233------ 234 235The ogg muxer's job is to output complete Ogg pages such that the absolute 236time represented by the valid (ie, not -1) granulepos values on those pages 237never decreases. This has to be true for all logical streams in the group at 238the same time. 239 240To achieve this, encoders are required to pass along the exact time that the 241granulepos represents for each ogg packet that it pushes to the ogg muxer. 242This is ESSENTIAL: without this exact time representation of the granulepos, 243the muxer can not produce valid streams. 244 245The ogg muxer has a packet queue per sink pad. From this queue a page can 246be flushed when: 247 - total byte size of queued packets exceeds a given value 248 - total time duration of queued packets exceeds a given value 249 - total byte size of queued packets exceeds maximum Ogg page size 250 - eos of the pad 251 - encoder sent a command to flush out an ogg page after this new packet 252 (in 0.8, through a flush event; in 0.10, with a GstOggBuffer) 253 - muxer wants a flush to happen (so it can output pages) 254 255The ogg muxer also has a page queue per sink pad. This queue collects 256Ogg pages from the corresponding packet queue. Each page is also marked 257with the timestamp that the granulepos in the header represents. 258 259A page can be flushed from this collection of page queues when: 260- ideally, every page queue has at least one page with a valid granulepos 261 -> choose the page, from all queues, with the lowest timestamp value 262- if not, muxer can wait if the following limits aren't reached: 263 - total byte size of any page queue exceeds a limit 264 - total time duration of any page queue exceeds a limit 265- if this limit is reached, then: 266 - request a page flush from packet queue to page queue for each queue 267 that does not have pages 268 - now take the page from all queues with the lowest timestamp value 269 - make sure all later-coming data is marked as old, either to be still 270 output (but producing an invalid stream, though it can be fixed later) 271 or dropped (which means it's gone forever) 272 273The oggmuxer uses the offset fields to fill in the granulepos in the pages. 274 275GStreamer implementation details 276-------------------------------- 277As said before, the basic rule is that the ogg muxer needs an exact time 278representation for each granulepos. This needs to be provided by the encoder. 279 280Potential problems are: 281 - initial offsets for a raw stream need to be preserved somehow. Example: 282 if the first audio sample has time 0.5, the granulepos in the vorbis encoder 283 needs to be adjusted to take this into account. 284 - initial offsets may need be on rate boundaries. Example: 285 if the framerate is 5 fps, and the first video frame has time 0.1 s, the 286 granulepos cannot correctly represent this timestamp. 287 This can be handled out-of-band (initial offset in another muxing format, 288 skeleton track with initial offsets, ...) 289 290Given that the basic rule for muxing is that the muxer needs an exact timestamp 291matching the granulepos, we need some way of communicating this time value 292from encoders to the Ogg muxer. So we need a mechanism to communicate 293a granulepos and its time representation for each GstBuffer. 294 295(This is an instance of a more generic problem - having a way to attach 296 more fields to a GstBuffer) 297 298Possible ways: 299- setting TIMESTAMP to this value: bad - this value represents the end time 300 of the buffer, and thus conflicts with GStreamer's idea of what TIMESTAMP 301 is. This would cause problems muxing the encoded stream in other muxing 302 formats, or for streaming. Note that this is what was done in GStreamer 0.8 303- setting DURATION to GP_TIME - TIMESTAMP: bad - this breaks the concept of 304 duration for this frame. Take the video example above; each buffer would 305 have a correct timestamp, but always a 0.1 s duration as opposed to the 306 correct 0.2 s duration 307- subclassing GstBuffer: clean, but requires a common header used between 308 ogg muxer and all encoders that can be muxed into ogg. Also, what if 309 a format can be muxed into more than one container, and they each have 310 their own "extra" info to communicate ? 311- adding key/value pairs to GstBuffer: clean, but requires changes to 312 core. Also, the overhead of allocating e.g. a GstStructure for *each* buffer 313 may be expensive. 314- "cheating": 315 - abuse OFFSET to store the timestamp matching this granulepos 316 - abuse OFFSET_END to store the granulepos value 317 The drawback here is that before, it made sense to use OFFSET and OFFSET_END 318 to store a byte count. Given that this is not used for anything critical 319 (you can't store a raw theora or vorbis stream in a file anyway), 320 this is what's being done for now. 321 322In practice 323----------- 324- all encoders of formats that can be muxed into Ogg produce a stream where: 325 - OFFSET is abused to be the timestamp corresponding exactly to the 326 granulepos 327 - OFFSET_END is abused to be the granulepos of the encoded theora buffer 328 - TIMESTAMP is the timestamp matching the begin of the buffer 329 - DURATION is the length in time of the buffer 330 331- initial delays should be handled in the GStreamer encoders by mangling 332 the granulepos of the encoded packet to take the delay into account as 333 best as possible and store that in OFFSET; 334 this then brings TIMESTAMP + DURATION to within less 335 than a frame period of the granulepos's time representation 336 The ogg muxer will then create new ogg packets with this OFFSET as 337 the granulepos. So in effect, the granulepos produced by the encoders 338 does not get used directly. 339 340TODO 341---- 342- decide on a proper mechanism for communicating extra per-buffer fields 343- the ogg muxer sets timestamp and duration on outgoing ogg pages based on 344 timestamp/duration of incoming ogg packets. 345 Note that: 346 - since the ogg muxer *has* to output pages sorted by gp time, representing 347 end time of the page, this means that the buffer's timestamps are not 348 necessarily monotonically increasing 349 - timestamp + duration of buffers don't match up; the duration represents 350 the length of the ogg page *for that stream*. Hence, for a normal 351 two-stream file, the sum of all durations is twice the length of the 352 muxed file. 353 354TESTING 355------- 356Proper muxing can be tested by generating test files with command lines like: 357- video and audio start from 0: 358gst-launch -v videotestsrc ! theoraenc ! oggmux audiotestsrc ! audioconvert ! vorbisenc ! identity ! oggmux0. oggmux0. ! filesink location=test.ogg 359 360- video starts after audio: 361gst-launch -v videotestsrc timestamp-offset=500000000 ! theoraenc ! oggmux audiotestsrc ! audioconvert ! vorbisenc ! identity ! oggmux0. oggmux0. ! filesink location=test.ogg 362 363- audio starts after video: 364gst-launch -v videotestsrc ! theoraenc ! oggmux audiotestsrc timestamp-offset=500000000 ! audioconvert ! vorbisenc ! identity ! oggmux0. oggmux0. ! filesink location=test.ogg 365 366The resulting files can be verified with oggz-validate for correctness. 367