• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1page.title=Data Formats
2@jd:body
3
4<!--
5    Copyright 2015 The Android Open Source Project
6
7    Licensed under the Apache License, Version 2.0 (the "License");
8    you may not use this file except in compliance with the License.
9    You may obtain a copy of the License at
10
11        http://www.apache.org/licenses/LICENSE-2.0
12
13    Unless required by applicable law or agreed to in writing, software
14    distributed under the License is distributed on an "AS IS" BASIS,
15    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
16    See the License for the specific language governing permissions and
17    limitations under the License.
18-->
19
20<div id="qv-wrapper">
21  <div id="qv">
22    <h2>In this document</h2>
23    <ol id="auto-toc">
24    </ol>
25  </div>
26</div>
27
28<p>
29Android uses a wide variety of audio
30<a href="http://en.wikipedia.org/wiki/Data_format">data formats</a>
31internally, and exposes a subset of these in public APIs,
32<a href="http://en.wikipedia.org/wiki/Audio_file_format">file formats</a>,
33and the
34<a href="https://en.wikipedia.org/wiki/Hardware_abstraction">Hardware Abstraction Layer</a> (HAL).
35</p>
36
37<h2 id="properties">Properties</h2>
38
39<p>
40The audio data formats are classified by their properties:
41</p>
42
43<dl>
44
45  <dt><a href="https://en.wikipedia.org/wiki/Data_compression">Compression</a></dt>
46  <dd>
47    <a href="http://en.wikipedia.org/wiki/Raw_data">Uncompressed</a>,
48    <a href="http://en.wikipedia.org/wiki/Lossless_compression">lossless compressed</a>, or
49    <a href="http://en.wikipedia.org/wiki/Lossy_compression">lossy compressed</a>.
50    PCM is the most common uncompressed audio format. FLAC is a lossless compressed
51    format, while MP3 and AAC are lossy compressed formats.
52  </dd>
53
54  <dt><a href="http://en.wikipedia.org/wiki/Audio_bit_depth">Bit depth</a></dt>
55  <dd>
56    Number of significant bits per audio sample.
57  </dd>
58
59  <dt><a href="https://en.wikipedia.org/wiki/Sizeof">Container size</a></dt>
60  <dd>
61    Number of bits used to store or transmit a sample. Usually
62    this is the same as the bit depth, but sometimes additional
63    padding bits are allocated for alignment. For example, a
64    24-bit sample could be contained within a 32-bit word.
65  </dd>
66
67  <dt><a href="http://en.wikipedia.org/wiki/Data_structure_alignment">Alignment</a></dt>
68  <dd>
69    If the container size is exactly equal to the bit depth, the
70    representation is called <em>packed</em>. Otherwise the representation is
71    <em>unpacked</em>. The significant bits of the sample are typically
72    aligned with either the leftmost (most significant) or rightmost
73    (least significant) bit of the container. It is conventional to use
74    the terms <em>packed</em> and <em>unpacked</em> only when the bit
75    depth is not a
76    <a href="http://en.wikipedia.org/wiki/Power_of_two">power of two</a>.
77  </dd>
78
79  <dt><a href="http://en.wikipedia.org/wiki/Signedness">Signedness</a></dt>
80  <dd>
81    Whether samples are signed or unsigned.
82  </dd>
83
84  <dt>Representation</dt>
85  <dd>
86    Either fixed point or floating point; see below.
87  </dd>
88
89</dl>
90
91<h2 id="fixed">Fixed point representation</h2>
92
93<p>
94<a href="http://en.wikipedia.org/wiki/Fixed-point_arithmetic">Fixed point</a>
95is the most common representation for uncompressed PCM audio data,
96especially at hardware interfaces.
97</p>
98
99<p>
100A fixed-point number has a fixed (constant) number of digits
101before and after the <a href="https://en.wikipedia.org/wiki/Radix_point">radix point</a>.
102All of our representations use
103<a href="https://en.wikipedia.org/wiki/Binary_number">base 2</a>,
104so we substitute <em>bit</em> for <em>digit</em>,
105and <em>binary point</em> or simply <em>point</em> for <em>radix point</em>.
106The bits to the left of the point are the integer part,
107and the bits to the right of the point are the
108<a href="https://en.wikipedia.org/wiki/Fractional_part">fractional part</a>.
109</p>
110
111<p>
112We speak of <em>integer PCM</em>, because fixed-point values
113are usually stored and manipulated as integer values.
114The interpretation as fixed-point is implicit.
115</p>
116
117<p>
118We use <a href="https://en.wikipedia.org/wiki/Two%27s_complement">two's complement</a>
119for all signed fixed-point representations,
120so the following holds where all values are in units of one
121<a href="https://en.wikipedia.org/wiki/Least_significant_bit">LSB</a>:
122</p>
123<pre>
124|largest negative value| = |largest positive value| + 1
125</pre>
126
127<h3 id="q">Q and U notation</h3>
128
129<p>
130There are various
131<a href="https://en.wikipedia.org/wiki/Fixed-point_arithmetic#Notation">notations</a>
132for fixed-point representation in an integer.
133We use <a href="https://en.wikipedia.org/wiki/Q_(number_format)">Q notation</a>:
134Q<em>m</em>.<em>n</em> means <em>m</em> integer bits and <em>n</em> fractional bits.
135The "Q" counts as one bit, though the value is expressed in two's complement.
136The total number of bits is <em>m</em> + <em>n</em> + 1.
137</p>
138
139<p>
140U<em>m</em>.<em>n</em> is for unsigned numbers:
141<em>m</em> integer bits and <em>n</em> fractional bits,
142and the "U" counts as zero bits.
143The total number of bits is <em>m</em> + <em>n</em>.
144</p>
145
146<p>
147The integer part may be used in the final result, or be temporary.
148In the latter case, the bits that make up the integer part are called
149<em>guard bits</em>. The guard bits permit an intermediate calculation to overflow,
150as long as the final value is within range or can be clamped to be within range.
151Note that fixed-point guard bits are at the left, while floating-point unit
152<a href="https://en.wikipedia.org/wiki/Guard_digit">guard digits</a>
153are used to reduce roundoff error and are on the right.
154</p>
155
156<h2 id="floating">Floating point representation</h2>
157
158<p>
159<a href="https://en.wikipedia.org/wiki/Floating_point">Floating point</a>
160is an alternative to fixed point, in which the location of the point can vary.
161The primary advantages of floating-point include:
162</p>
163
164<ul>
165  <li>Greater <a href="https://en.wikipedia.org/wiki/Headroom_(audio_signal_processing)">headroom</a>
166      and <a href="https://en.wikipedia.org/wiki/Dynamic_range">dynamic range</a>;
167      floating-point arithmetic tolerates exceeeding nominal ranges
168      during intermediate computation, and only clamps values at the end
169  </li>
170  <li>Support for special values such as infinities and NaN</li>
171  <li>Easier to use in many cases</li>
172</ul>
173
174<p>
175Historically, floating-point arithmetic was slower than integer or fixed-point
176arithmetic, but now it is common for floating-point to be faster,
177provided control flow decisions aren't based on the value of a computation.
178</p>
179
180<h2 id="androidFormats">Android formats for audio</h2>
181
182<p>
183The major Android formats for audio are listed in the table below:
184</p>
185
186<table>
187
188<tr>
189  <th></th>
190  <th colspan="5"><center>Notation</center></th>
191</tr>
192
193<tr>
194  <th>Property</th>
195  <th>Q0.15</th>
196  <th>Q0.7 <sup>1</sup></th>
197  <th>Q0.23</th>
198  <th>Q0.31</th>
199  <th>float</th>
200</tr>
201
202<tr>
203  <td>Container<br />bits</td>
204  <td>16</td>
205  <td>8</td>
206  <td>24 or 32 <sup>2</sup></td>
207  <td>32</td>
208  <td>32</td>
209</tr>
210
211<tr>
212  <td>Significant bits<br />including sign</td>
213  <td>16</td>
214  <td>8</td>
215  <td>24</td>
216  <td>24 or 32 <sup>2</sup></td>
217  <td>25 <sup>3</sup></td>
218</tr>
219
220<tr>
221  <td>Headroom<br />in dB</td>
222  <td>0</td>
223  <td>0</td>
224  <td>0</td>
225  <td>0</td>
226  <td>126 <sup>4</sup></td>
227</tr>
228
229<tr>
230  <td>Dynamic range<br />in dB</td>
231  <td>90</td>
232  <td>42</td>
233  <td>138</td>
234  <td>138 to 186</td>
235  <td>900 <sup>5</sup></td>
236</tr>
237
238</table>
239
240<p>
241All fixed-point formats above have a nominal range of -1.0 to +1.0 minus one LSB.
242There is one more negative value than positive value due to the
243two's complement representation.
244</p>
245
246<p>
247Footnotes:
248</p>
249
250<ol>
251
252<li>
253All formats above express signed sample values.
254The 8-bit format is commonly called "unsigned", but
255it is actually a signed value with bias of <code>0.10000000</code>.
256</li>
257
258<li>
259Q0.23 may be packed into 24 bits (three 8-bit bytes), or unpacked
260in 32 bits. If unpacked, the significant bits are either right-justified
261towards the LSB with sign extension padding towards the MSB (Q8.23),
262or left-justified towards the MSB with zero fill towards the LSB
263(Q0.31). Q0.31 theoretically permits up to 32 significant bits,
264but hardware interfaces that accept Q0.31 rarely use all the bits.
265</li>
266
267<li>
268Single-precision floating point has 23 explicit bits plus one hidden bit and sign bit,
269resulting in 25 significant bits total.
270<a href="https://en.wikipedia.org/wiki/Denormal_number">Denormal numbers</a>
271have fewer significant bits.
272</li>
273
274<li>
275Single-precision floating point can express values up to &plusmn;1.7e+38,
276which explains the large headroom.
277</li>
278
279<li>
280The dynamic range shown is for denormals up to the nominal maximum
281value &plusmn;1.0.
282Note that some architecture-specific floating point implementations such as
283<a href="https://en.wikipedia.org/wiki/ARM_architecture#NEON">NEON</a>
284don't support denormals.
285</li>
286
287</ol>
288
289<h2 id="conversions">Conversions</h2>
290
291<p>
292This section discusses
293<a href="https://en.wikipedia.org/wiki/Data_conversion">data conversions</a>
294between various representations.
295</p>
296
297<h3 id="floatConversions">Floating point conversions</h3>
298
299<p>
300To convert a value from Q<em>m</em>.<em>n</em> format to floating point:
301</p>
302
303<ol>
304  <li>Convert the value to floating point as if it were an integer (by ignoring the point).</li>
305  <li>Multiply by 2<sup>-<em>n</em></sup>.</li>
306</ol>
307
308<p>
309For example, to convert a Q4.27 internal value to floating point, use:
310</p>
311<pre>
312float = integer * (2 ^ -27)
313</pre>
314
315<p>
316Conversions from floating point to fixed point follow these rules:
317</p>
318
319<ul>
320
321<li>
322Single-precision floating point has a nominal range of &plusmn;1.0,
323but the full range for intermediate values is &plusmn;1.7e+38.
324Conversion between floating point and fixed point for external representation
325(such as output to audio devices) will consider only the nominal range, with
326clamping for values that exceed that range.
327In particular, when +1.0 is converted
328to a fixed-point format, it is clamped to +1.0 minus one LSB.
329</li>
330
331<li>
332Denormals (subnormals) and both +/- 0.0 are allowed in representation,
333but may be silently converted to 0.0 during processing.
334</li>
335
336<li>
337Infinities will either pass through operations or will be silently hard-limited
338to +/- 1.0. Generally the latter is for conversion to a fixed-point format.
339</li>
340
341<li>
342NaN behavior is undefined: a NaN may propagate as an identical NaN, or may be
343converted to a Default NaN, may be silently hard limited to +/- 1.0, or
344silently converted to 0.0, or result in an error.
345</li>
346
347</ul>
348
349<h3 id="fixedConversion">Fixed point conversions</h3>
350
351<p>
352Conversions between different Q<em>m</em>.<em>n</em> formats follow these rules:
353</p>
354
355<ul>
356
357<li>
358When <em>m</em> is increased, sign extend the integer part at left.
359</li>
360
361<li>
362When <em>m</em> is decreased, clamp the integer part.
363</li>
364
365<li>
366When <em>n</em> is increased, zero extend the fractional part at right.
367</li>
368
369<li>
370When <em>n</em> is decreased, either dither, round, or truncate the excess fractional bits at right.
371</li>
372
373</ul>
374
375<p>
376For example, to convert a Q4.27 value to Q0.15 (without dither or
377rounding), right shift the Q4.27 value by 12 bits, and clamp any results
378that exceed the 16-bit signed range. This aligns the point of the
379Q representation.
380</p>
381
382<p>To convert Q7.24 to Q7.23, do a signed divide by 2,
383or equivalently add the sign bit to the Q7.24 integer quantity, and then signed right shift by 1.
384Note that a simple signed right shift is <em>not</em> equivalent to a signed divide by 2.
385</p>
386
387<h3 id="lossyConversion">Lossy and lossless conversions</h3>
388
389<p>
390A conversion is <em>lossless</em> if it is
391<a href="https://en.wikipedia.org/wiki/Inverse_function">invertible</a>:
392a conversion from <code>A</code> to <code>B</code> to
393<code>C</code> results in <code>A = C</code>.
394Otherwise the conversion is <a href="https://en.wikipedia.org/wiki/Lossy_data_conversion">lossy</a>.
395</p>
396
397<p>
398Lossless conversions permit
399<a href="https://en.wikipedia.org/wiki/Round-trip_format_conversion">round-trip format conversion</a>.
400</p>
401
402<p>
403Conversions from fixed point representation with 25 or fewer significant bits to floating point are lossless.
404Conversions from floating point to any common fixed point representation are lossy.
405</p>
406