1<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> 2<html> 3<head> 4 5<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-15"/> 6<title>Ogg Documentation</title> 7 8<style type="text/css"> 9body { 10 margin: 0 18px 0 18px; 11 padding-bottom: 30px; 12 font-family: Verdana, Arial, Helvetica, sans-serif; 13 color: #333333; 14 font-size: .8em; 15} 16 17a { 18 color: #3366cc; 19} 20 21img { 22 border: 0; 23} 24 25#xiphlogo { 26 margin: 30px 0 16px 0; 27} 28 29#content p { 30 line-height: 1.4; 31} 32 33h1, h1 a, h2, h2 a, h3, h3 a { 34 font-weight: bold; 35 color: #ff9900; 36 margin: 1.3em 0 8px 0; 37} 38 39h1 { 40 font-size: 1.3em; 41} 42 43h2 { 44 font-size: 1.2em; 45} 46 47h3 { 48 font-size: 1.1em; 49} 50 51li { 52 line-height: 1.4; 53} 54 55#copyright { 56 margin-top: 30px; 57 line-height: 1.5em; 58 text-align: center; 59 font-size: .8em; 60 color: #888888; 61 clear: both; 62} 63</style> 64 65</head> 66 67<body> 68 69<div id="xiphlogo"> 70 <a href="http://www.xiph.org/"><img src="fish_xiph_org.png" alt="Fish Logo and Xiph.org"/></a> 71</div> 72 73<h1>Ogg bitstream overview</h1> 74 75This document serves as starting point for understanding the design 76and implementation of the Ogg container format. If you're new to Ogg 77or merely want a high-level technical overview, start reading here. 78Other documents linked from the <a href="index.html">index page</a> 79give distilled technical descriptions and references of the container 80mechanisms. This document is intended to aid understanding. 81 82<h2>Container format design points</h2> 83 84<p>Ogg is intended to be a simplest-possible container, concerned only 85with framing, ordering, and interleave. It can be used as a stream delivery 86mechanism, for media file storage, or as a building block toward 87implementing a more complex, non-linear container (for example, see 88the <a href="skeleton.html">Skeleton</a> or <a 89href="http://en.wikipedia.org/wiki/Annodex">Annodex/CMML</a>). 90 91<p>The Ogg container is not intended to be a monolithic 92'kitchen-sink'. It exists only to frame and deliver in-order stream 93data and as such is vastly simpler than most other containers. 94Elementary and multiplexed streams are both constructed entirely from a 95single building block (an Ogg page) comprised of eight fields 96totalling twenty-eight bytes (the page header) a list of packet lengths 97(up to 255 bytes) and payload data (up to 65025 bytes). The structure 98of every page is the same. There are no optional fields or alternate 99encodings. 100 101<p>Stream and media metadata is contained in Ogg and not built into 102the Ogg container itself. Metadata is thus compartmentalized and 103layered rather than part of a monolithic design, an especially good 104idea as no two groups seem able to agree on what a complete or 105complete-enough metadata set should be. In this way, the container and 106container implementation are isolated from unnecessary design flux. 107 108<h3>Streaming</h3> 109 110<p>The Ogg container is primarily a streaming format, 111encapsulating chronological, time-linear mixed media into a single 112delivery stream or file. The design is such that an application can 113always encode and/or decode all features of a bitstream in one pass 114with no seeking and minimal buffering. Seeking to provide optimized 115encoding (such as two-pass encoding) or interactive decoding (such as 116scrubbing or instant replay) is not disallowed or discouraged, however 117no container feature requires nonlinear access of the bitstream. 118 119<h3>Variable Bit Rate, Variable Payload Size</h3> 120 121<p>Ogg is designed to contain any size data payload with bounded, 122predictable efficiency. Ogg packets have no maximum size and a 123zero-byte minimum size. There is no restriction on size changes from 124packet to packet. Variable size packets do not require the use of any 125optional or additional container features. There is no optimal 126suggested packet size, though special consideration was paid to make 127sure 50-200 byte packets were no less efficient than larger packet 128sizes. The original design criteria was a 2% overhead at 50 byte 129packets, dropping to a maximum working overhead of 1% with larger 130packets, and a typical working overhead of .5-.7% for most practical 131uses. 132 133<h3>Simple pagination</h3> 134 135<p>Ogg is a byte-aligned container with no context-dependent, optional 136or variable-length fields. Ogg requires no repacking of codec data. 137The page structure is written out in-line as packet data is submitted 138to the streaming abstraction. In addition, it is possible to 139implement both Ogg mux and demux as MT-hot zero-copy abstractions (as 140is done in the Tremor sourcebase). 141 142<h3>Capture</h3> 143 144<p>Ogg is designed for efficient and immediate stream capture with 145high confidence. Although packets have no size limit in Ogg, pages 146are a maximum of just under 64kB meaning that any Ogg stream can be 147captured with confidence after seeing 128kB of data or less [worst 148case; typical figure is 6kB] from any random starting point in the 149stream. 150 151<h3>Seeking</h3> 152 153<p>Ogg implements simple coarse- and fine-grained seeking by design. 154 155<p>Coarse seeking may be performed by simply 'moving the tone arm' to a 156new position and 'dropping the needle'. Rapid capture with 157accompanying timecode from any location in an Ogg file is guaranteed 158by the stream design. From the acquisition of the first timecode, 159all data needed to play back from that time code forward is ahead of 160the stream cursor. 161 162<p>Ogg implements full sample-granularity seeking using an 163interpolated bisection search built on the capture and timecode 164mechanisms used by coarse seeking. As above, once a search finds 165the desired timecode, all data needed to play back from that time code 166forward is ahead of the stream cursor. 167 168<p>Both coarse and fine seeking use the page structure and sequencing 169inherent to the Ogg format. All Ogg streams are fully seekable from 170creation; seekability is unaffected by truncation or missing data, and 171is tolerant of gross corruption. Seek operations are neither 'fuzzy' nor 172heuristic. 173 174<p>Seeking without use of an index is a major point of the Ogg 175design. There are several reasons why Ogg forgoes an index: 176 177<ul> 178 179<li>It must be possible to create an Ogg stream in a single pass, and 180an index requires either two passes to create, or the index must be 181tacked onto the end of a live stream after the stream is finished. 182Both methods run afoul of other design constraints. 183 184<li>An index is only marginally useful in Ogg for the complexity 185added; it adds no new functionality and seldom improves performance 186noticeably. Empirical testing shows that indexless interpolation 187search does not require many more seeks in practice than using an 188index would. 189 190<li>'Optional' indexes encourage lazy implementations that can seek 191only when indexes are present, or that implement indexless seeking 192only by building an internal index after reading the entire file 193beginning to end. This has been the fate of other containers that 194specify optional indexing. 195 196</ul> 197 198<h3>Simple multiplexing</h3> 199 200<p>Ogg multiplexes streams by interleaving pages from multiple elementary streams into a 201multiplexed stream in time order. The multiplexed pages are not 202altered. Muxing an Ogg AV stream out of separate audio, 203video and data streams is akin to shuffling several decks of cards 204together into a single deck; the cards themselves remain unchanged. 205Demultiplexing is similarly simple (as the cards are marked). 206 207<p>The goal of this design is to make the mux/demux operation as 208trivial as possible to allow live streaming systems to build and 209rebuild streams on the fly with minimal CPU usage and no additional 210storage or latency requirements. 211 212<h3>Continuous and Discontinuous Media</h3> 213 214<p>Ogg streams belong to one of two categories, "Continuous" streams and 215"Discontinuous" streams. 216 217<p>A stream that provides a gapless, time-continuous media type with a 218fine-grained timebase is considered to be 'Continuous'. A continuous 219stream should never be starved of data. Examples of continuous data 220types include broadcast audio and video. 221 222<p>A stream that delivers data in a potentially irregular pattern or 223with widely spaced timing gaps is considered to be 'Discontinuous'. A 224discontinuous stream may be best thought of as data representing 225scattered events; although they happen in order, they are typically 226unconnected data often located far apart. One example of a 227discontinuous stream types would be captioning such as <a 228href="http://wiki.xiph.org/OggKate">Ogg Kate</a>. Although it's 229possible to design captions as a continuous stream type, it's most 230natural to think of captions as widely spaced pieces of text with 231little happening between. 232 233<p>The fundamental reason for distinction between continuous and 234discontinuous streams concerns buffering. 235 236<h3>Buffering</h3> 237 238<p>A continuous stream is, by definition, gapless. Ogg buffering is based 239on the simple premise of never allowing an active continuous stream 240to starve for data during decode; buffering works ahead until all 241continuous streams in a physical stream have data ready and no further. 242 243<p>Discontinuous stream data is not assumed to be predictable. The 244buffering design takes discontinuous data 'as it comes' rather than 245working ahead to look for future discontinuous data for a potentially 246unbounded period. Thus, the buffering process makes no attempt to fill 247discontinuous stream buffers; their pages simply 'fall out' of the 248stream when continuous streams are handled properly. 249 250<p>Buffering requirements in this design need not be explicitly 251declared or managed in the encoded stream. The decoder simply reads as 252much data as is necessary to keep all continuous stream types gapless 253and no more, with discontinuous data processed as it arrives in the 254continuous data. Buffering is implicitly optimal for the given 255stream. Because all pages of all data types are stamped with absolute 256timing information within the stream, inter-stream synchronization 257timing is always maintained without the need for explicitly declared 258buffer-ahead hinting. 259 260<h3>Codec metadata</h3> 261 262<p>Ogg does not replicate codec-specific metadata into the mux layer 263in an attempt to make the mux and codec layer implementations 'fully 264separable'. Things like specific timebase, keyframing strategy, frame 265duration, etc, do not appear in the Ogg container. The mux layer is, 266instead, expected to query a codec through a standardized interface, 267left to the implementation, for this data when it is needed. 268 269<p>Though modern design wisdom usually prefers to predict all possible 270needs of current and future codecs then embed these dependencies and 271the required metadata into the container itself, this strategy 272increases container specification complexity, fragility, and rigidity. 273The mux and codec implementations become more independent, but the 274specifications become less independent. A codec can't do what a 275container hasn't already provided for. New codecs are harder to 276support, and you can do fewer useful things with the ones you've 277already got (eg, try to make a good splitter without using any codecs. 278You're stuck splitting at keyframes only, or building yet another new 279mechanism into the container layer to mark what frames to skip 280displaying). 281 282<p>Ogg's design goes the opposite direction, where the specification 283is to be as simple, easy to understand, and 'proofed' against novel 284codecs as possible. When an Ogg mux layer requires codec-specific 285information, it queries the codec (or a codec stub). This trades a 286more complex implementation for a simpler, more flexible 287specification. 288 289<h3>Stream structure metadata</h3> 290 291<p>The Ogg container itself does not define a metadata system for 292declaring the structure and interrelations between multiple media 293types in a muxed stream. That is, the Ogg container itself does not 294specify data like 'which steam is the subtitle stream?' or 'which 295video stream is the primary angle?'. This metadata still exists, but 296is stored in the Ogg container rather than being built into the Ogg 297container. Xiph specifies the 'Skeleton' metadata format for Ogg 298streams, but this decoupling of container and stream structure 299metadata means it is possible to use Ogg with any metadata 300specification without altering the container itself, or without stream 301structure metadata at all. 302 303<h3>Frame accurate absolute position</h3> 304 305<p>Every Ogg page is stamped with a 64 bit 'granule position' that 306serves as an absolute timestamp for mux and seeking. A few nifty 307little tricks are usually also embedded in the granpos state, but 308we'll leave those aside for the moment (strictly speaking, they're 309part of each codec's mapping, not Ogg). 310 311<p>As previously mentioned above, granule positions are mapped into 312absolute timestamps by the codec, rather than being a hard timestamp. 313This allows maximally efficient use of the available 64 bits to 314address every sample/frame position without approximation while 315supporting new and previously unknown timebase encodings without 316needing to extend or update the mux layer. When a codec needs a novel 317timebase, it simply brings the code for that mapping along with it. 318This is not a theoretical curiosity; new, wholly novel timebases were 319deployed with the adoption of both Theora and Dirac. "Rolling INTRA" 320(keyframeless video) also benefits from novel use of the granule 321position. 322 323<h2>Ogg stream arrangement</h2> 324 325<h3>Packets, pages, and bitstreams</h3> 326 327<p>Ogg codecs use <em>packets</em>. Packets are octet payloads of 328raw, compressed data, containing the data needed for a single 329decompressed unit, eg, one video frame. Packets have no maximum size 330and may be zero length. They do not have any high-level structure or 331boundary information; strung together, the unframed packets form a 332<em>logical bitstream</em> of apparently random bytes with no internal 333landmarks. 334 335<p>Logical bitstream packets are grouped and framed into Ogg pages 336along with a unique stream <em>serial number</em> to produce a 337<em>physical bitstream</em>. An <em>elementary stream</em> is a 338physical bitstream containing only the pages framing a single logical 339bitstream. Each page is a self contained entity, although a packet may 340be split and encoded across one or more pages. The page decode 341mechanism is designed to recognize, verify and handle single pages at 342a time from the overall bitstream. 343 344<p><a href="framing.html">Ogg Bitstream Framing</a> specifies 345the page format of an Ogg bitstream, the packet coding process 346and elementary bitstreams in detail. 347 348<h3>Multiplexed bitstreams</h3> 349 350<p>Multiple logical/elementary bitstreams can be combined into a single 351<em>multiplexed bitstream</em> by interleaving whole pages from each 352contributing elementary stream in time order. The result is a single 353physical stream that multiplexes and frames multiple logical streams. 354Each logical stream is identified by the unique stream serial number 355stamped in its pages. A physical stream may include a 'meta-header' 356(such as the <a href="skeleton.html">Ogg Skeleton</a>) comprising its 357own Ogg page at the beginning of the physical stream. A decoder 358recovers the original logical/elementary bitstreams out of the 359physical bitstream by taking the pages in order from the physical 360bitstream and redirecting them into the appropriate logical decoding 361entity. 362 363<p><a href="ogg-multiplex.html">Ogg Bitstream Multiplexing</a> specifies 364proper multiplexing of an Ogg bitstream in detail. 365 366<h3>Chaining</h3> 367 368<p>Multiple Ogg physical bitstreams may be concatenated into a single new 369stream; this is <em>chaining</em>. The bitstreams do not overlap; the 370final page of a given logical bitstream is immediately followed by the 371initial page of the next.</p> 372 373<p>Each logical bitstream in a chain must have a unique serial number 374within the scope of the full physical bitstream, not only within a 375particular <em>link</em> or <em>segment</em> of the chain.</p> 376 377<h3>Continuous and discontinuous streams</h3> 378 379<p>Within Ogg, each stream must be declared (by the codec) to be 380continuous- or discontinuous-time. Most codecs treat all streams they 381use as either inherently continuous- or discontinuous-time, although 382this is not a requirement. A codec may, as part of its mapping, choose 383according to data in the initial header. 384 385<p>Continuous-time pages are stamped by end-time, discontinuous pages 386are stamped by begin-time. Pages in a multiplexed stream are 387interleaved in order of the time stamp regardless of stream type. 388Both continuous and discontinuous logical streams are used to seek 389within a physical stream, however only continuous streams are used to 390determine buffering depth; because discontinuous streams are stamped 391by start time, they will always 'fall out' in time when buffering 392tracks only the continuous streams. See 'Examples' for an 393illustration of the buffering mechanism. 394 395<h2>Mapping Requirements</h2> 396 397<p>Each codec is allowed some freedom in deciding how its logical 398bitstream is encapsulated into an Ogg bitstream (even if it is a 399trivial mapping, eg, 'plop the packets in and go'). This is the 400codec's <em>mapping</em>. Ogg imposes a few mapping requirements 401on any codec. 402 403<p>The <a href="framing.html">framing specification</a> defines 404'beginning of stream' and 'end of stream' page markers via a header 405flag (it is possible for a stream to consist of a single page). A 406correct stream always consists of an integer number of pages, an easy 407requirement given the variable size nature of pages.</p> 408 409<p>The first page of an elementary Ogg bitstream consists of a single, 410small 'initial header' packet that must include sufficient information 411to identify the exact CODEC type. From this initial header, the codec 412must also be able to determine its timebase and whether or not it is a 413continuous- or discontinuous-time stream. The initial header must fit 414on a single page. If a codec makes use of auxiliary headers (for 415example, Vorbis uses two auxiliary headers), these headers must follow 416the initial header immediately. The last header finishes its page; 417data begins on a fresh page. 418 419<p>As an example, Ogg Vorbis places the name and revision of the 420Vorbis CODEC, the audio rate and the audio quality into this initial 421header. Comments and detailed codec setup appears in the larger 422auxiliary headers.</p> 423 424<h2>Multiplexing Requirements</h2> 425 426<p>Multiplexing requirements within Ogg are straightforward. When 427constructing a single-link (unchained) physical bitstream consisting 428of multiple elementary streams: 429 430<ol> 431 432<li> The initial header for each stream appears in sequence, each 433header on a single page. All initial headers must appear with no 434intervening data (no auxiliary header pages or packets, no data pages 435or packets). Order of the initial headers is unspecified. The 436'beginning of stream' flag is set on each initial header. 437 438<li> All auxiliary headers for all streams must follow. Order 439is unspecified. The final auxiliary header of each stream must flush 440its page. 441 442<li>Data pages for each stream follow, interleaved in time order. 443 444<li>The final page of each stream sets the 'end of stream' flag. 445Unlike initial pages, terminal pages for the logical bitstreams need 446not occur contiguously; indeed it may not be possible for them to do so. 447</oL> 448 449<p>Each grouped bitstream must have a unique serial number within the 450scope of the physical bitstream.</p> 451 452<h3>chaining and multiplexing</h3> 453 454<p>Multiplexed and/or unmultiplexed bitstreams may be chained 455consecutively. Such a physical bitstream obeys all the rules of both 456chained and multiplexed streams. Each link, when unchained, must 457stand on its own as a valid physical bitstream. Chained streams do 458not mix; a new segment may not begin until all streams in the 459preceding segment have terminated. </p> 460 461<h2>Examples</h2> 462 463<em>[More to come shortly; this section is currently being revised and expanded]</em> 464 465<p>Below, we present an example of a multiplexed and chained bitstream:</p> 466 467<p><img src="stream.png" alt="stream"/></p> 468 469<p>In this example, we see pages from five total logical bitstreams 470multiplexed into a physical bitstream. Note the following 471characteristics:</p> 472 473<ol> 474<li>Multiplexed bitstreams in a given link begin together; all of the 475initial pages must appear before any data pages. When concurrently 476multiplexed groups are chained, the new group does not begin until all 477the bitstreams in the previous group have terminated.</li> 478 479<li>The ordering of pages of concurrently multiplexed bitstreams is 480goverened by timestamp (not shown here); there is no regular 481interleaving order. Pages within a logical bitstream appear in 482sequence order.</li> 483</ol> 484 485<div id="copyright"> 486 The Xiph Fish Logo is a 487 trademark (™) of Xiph.Org.<br/> 488 489 These pages © 1994 - 2010 Xiph.Org. All rights reserved. 490</div> 491 492</body> 493</html> 494