1 /* ----------------------------------------------------------------------------- 2 Software License for The Fraunhofer FDK AAC Codec Library for Android 3 4 © Copyright 1995 - 2021 Fraunhofer-Gesellschaft zur Förderung der angewandten 5 Forschung e.V. All rights reserved. 6 7 1. INTRODUCTION 8 The Fraunhofer FDK AAC Codec Library for Android ("FDK AAC Codec") is software 9 that implements the MPEG Advanced Audio Coding ("AAC") encoding and decoding 10 scheme for digital audio. This FDK AAC Codec software is intended to be used on 11 a wide variety of Android devices. 12 13 AAC's HE-AAC and HE-AAC v2 versions are regarded as today's most efficient 14 general perceptual audio codecs. AAC-ELD is considered the best-performing 15 full-bandwidth communications codec by independent studies and is widely 16 deployed. AAC has been standardized by ISO and IEC as part of the MPEG 17 specifications. 18 19 Patent licenses for necessary patent claims for the FDK AAC Codec (including 20 those of Fraunhofer) may be obtained through Via Licensing 21 (www.vialicensing.com) or through the respective patent owners individually for 22 the purpose of encoding or decoding bit streams in products that are compliant 23 with the ISO/IEC MPEG audio standards. Please note that most manufacturers of 24 Android devices already license these patent claims through Via Licensing or 25 directly from the patent owners, and therefore FDK AAC Codec software may 26 already be covered under those patent licenses when it is used for those 27 licensed purposes only. 28 29 Commercially-licensed AAC software libraries, including floating-point versions 30 with enhanced sound quality, are also available from Fraunhofer. Users are 31 encouraged to check the Fraunhofer website for additional applications 32 information and documentation. 33 34 2. COPYRIGHT LICENSE 35 36 Redistribution and use in source and binary forms, with or without modification, 37 are permitted without payment of copyright license fees provided that you 38 satisfy the following conditions: 39 40 You must retain the complete text of this software license in redistributions of 41 the FDK AAC Codec or your modifications thereto in source code form. 42 43 You must retain the complete text of this software license in the documentation 44 and/or other materials provided with redistributions of the FDK AAC Codec or 45 your modifications thereto in binary form. You must make available free of 46 charge copies of the complete source code of the FDK AAC Codec and your 47 modifications thereto to recipients of copies in binary form. 48 49 The name of Fraunhofer may not be used to endorse or promote products derived 50 from this library without prior written permission. 51 52 You may not charge copyright license fees for anyone to use, copy or distribute 53 the FDK AAC Codec software or your modifications thereto. 54 55 Your modified versions of the FDK AAC Codec must carry prominent notices stating 56 that you changed the software and the date of any change. For modified versions 57 of the FDK AAC Codec, the term "Fraunhofer FDK AAC Codec Library for Android" 58 must be replaced by the term "Third-Party Modified Version of the Fraunhofer FDK 59 AAC Codec Library for Android." 60 61 3. NO PATENT LICENSE 62 63 NO EXPRESS OR IMPLIED LICENSES TO ANY PATENT CLAIMS, including without 64 limitation the patents of Fraunhofer, ARE GRANTED BY THIS SOFTWARE LICENSE. 65 Fraunhofer provides no warranty of patent non-infringement with respect to this 66 software. 67 68 You may use this FDK AAC Codec software or modifications thereto only for 69 purposes that are authorized by appropriate patent licenses. 70 71 4. DISCLAIMER 72 73 This FDK AAC Codec software is provided by Fraunhofer on behalf of the copyright 74 holders and contributors "AS IS" and WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES, 75 including but not limited to the implied warranties of merchantability and 76 fitness for a particular purpose. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR 77 CONTRIBUTORS BE LIABLE for any direct, indirect, incidental, special, exemplary, 78 or consequential damages, including but not limited to procurement of substitute 79 goods or services; loss of use, data, or profits, or business interruption, 80 however caused and on any theory of liability, whether in contract, strict 81 liability, or tort (including negligence), arising in any way out of the use of 82 this software, even if advised of the possibility of such damage. 83 84 5. CONTACT INFORMATION 85 86 Fraunhofer Institute for Integrated Circuits IIS 87 Attention: Audio and Multimedia Departments - FDK AAC LL 88 Am Wolfsmantel 33 89 91058 Erlangen, Germany 90 91 www.iis.fraunhofer.de/amm 92 amm-info@iis.fraunhofer.de 93 ----------------------------------------------------------------------------- */ 94 95 /**************************** AAC encoder library ****************************** 96 97 Author(s): M. Lohwasser 98 99 Description: 100 101 *******************************************************************************/ 102 103 /** 104 * \file aacenc_lib.h 105 * \brief FDK AAC Encoder library interface header file. 106 * 107 \mainpage Introduction 108 109 \section Scope 110 111 This document describes the high-level interface and usage of the ISO/MPEG-2/4 112 AAC Encoder library developed by the Fraunhofer Institute for Integrated 113 Circuits (IIS). 114 115 The library implements encoding on the basis of the MPEG-2 and MPEG-4 AAC 116 Low-Complexity standard, and depending on the library's configuration, MPEG-4 117 High-Efficiency AAC v2 and/or AAC-ELD standard. 118 119 All references to SBR (Spectral Band Replication) are only applicable to HE-AAC 120 or AAC-ELD versions of the library. All references to PS (Parametric Stereo) are 121 only applicable to HE-AAC v2 versions of the library. 122 123 \section encBasics Encoder Basics 124 125 This document can only give a rough overview about the ISO/MPEG-2 and ISO/MPEG-4 126 AAC audio coding standard. To understand all the terms in this document, you are 127 encouraged to read the following documents. 128 129 - ISO/IEC 13818-7 (MPEG-2 AAC), which defines the syntax of MPEG-2 AAC audio 130 bitstreams. 131 - ISO/IEC 14496-3 (MPEG-4 AAC, subparts 1 and 4), which defines the syntax of 132 MPEG-4 AAC audio bitstreams. 133 - Lutzky, Schuller, Gayer, Krämer, Wabnik, "A guideline to audio codec 134 delay", 116th AES Convention, May 8, 2004 135 136 MPEG Advanced Audio Coding is based on a time-to-frequency mapping of the 137 signal. The signal is partitioned into overlapping portions and transformed into 138 frequency domain. The spectral components are then quantized and coded. \n An 139 MPEG-2 or MPEG-4 AAC audio bitstream is composed of frames. Contrary to MPEG-1/2 140 Layer-3 (mp3), the length of individual frames is not restricted to a fixed 141 number of bytes, but can take on any length between 1 and 768 bytes. 142 143 144 \page LIBUSE Library Usage 145 146 \section InterfaceDescription API Files 147 148 All API header files are located in the folder /include of the release package. 149 All header files are provided for usage in C/C++ programs. The AAC encoder 150 library API functions are located in aacenc_lib.h. 151 152 \section CallingSequence Calling Sequence 153 154 For encoding of ISO/MPEG-2/4 AAC bitstreams the following sequence is mandatory. 155 Input read and output write functions as well as the corresponding open and 156 close functions are left out, since they may be implemented differently 157 according to the user's specific requirements. The example implementation uses 158 file-based input/output. 159 160 -# Call aacEncOpen() to allocate encoder instance with required \ref encOpen 161 "configuration". \code HANDLE_AACENCODER hAacEncoder = NULL; if ( (ErrorStatus = 162 aacEncOpen(&hAacEncoder,0,0)) != AACENC_OK ) { \endcode 163 -# Call aacEncoder_SetParam() for each parameter to be set. AOT, samplingrate, 164 channelMode, bitrate and transport type are \ref encParams "mandatory". \code 165 ErrorStatus = aacEncoder_SetParam(hAacEncoder, parameter, value); 166 \endcode 167 -# Call aacEncEncode() with NULL parameters to \ref encReconf "initialize" 168 encoder instance with present parameter set. \code ErrorStatus = 169 aacEncEncode(hAacEncoder, NULL, NULL, NULL, NULL); \endcode 170 -# Call aacEncInfo() to retrieve a configuration data block to be transmitted 171 out of band. This is required when using RFC3640 or RFC3016 like transport. 172 \code 173 AACENC_InfoStruct encInfo; 174 aacEncInfo(hAacEncoder, &encInfo); 175 \endcode 176 -# Encode input audio data in loop. 177 \code 178 do 179 { 180 \endcode 181 Feed \ref feedInBuf "input buffer" with new audio data and provide input/output 182 \ref bufDes "arguments" to aacEncEncode(). \code ErrorStatus = 183 aacEncEncode(hAacEncoder, &inBufDesc, &outBufDesc, &inargs, &outargs); \endcode 184 Write \ref writeOutData "output data" to file or audio device. 185 \code 186 } while (ErrorStatus==AACENC_OK); 187 \endcode 188 -# Call aacEncClose() and destroy encoder instance. 189 \code 190 aacEncClose(&hAacEncoder); 191 \endcode 192 193 194 \section encOpen Encoder Instance Allocation 195 196 The assignment of the aacEncOpen() function is very flexible and can be used in 197 the following way. 198 - If the amount of memory consumption is not an issue, the encoder instance can 199 be allocated for the maximum number of possible audio channels (for example 6 or 200 8) with the full functional range supported by the library. This is the default 201 open procedure for the AAC encoder if memory consumption does not need to be 202 minimized. \code aacEncOpen(&hAacEncoder,0,0) \endcode 203 - If the required MPEG-4 AOTs do not call for the full functional range of the 204 library, encoder modules can be allocated selectively. \verbatim 205 ------------------------------------------------------ 206 AAC | SBR | PS | MD | FLAGS | value 207 -----+-----+-----+----+-----------------------+------- 208 X | - | - | - | (0x01) | 0x01 209 X | X | - | - | (0x01|0x02) | 0x03 210 X | X | X | - | (0x01|0x02|0x04) | 0x07 211 X | - | - | X | (0x01 |0x10) | 0x11 212 X | X | - | X | (0x01|0x02 |0x10) | 0x13 213 X | X | X | X | (0x01|0x02|0x04|0x10) | 0x17 214 ------------------------------------------------------ 215 - AAC: Allocate AAC Core Encoder module. 216 - SBR: Allocate Spectral Band Replication module. 217 - PS: Allocate Parametric Stereo module. 218 - MD: Allocate Meta Data module within AAC encoder. 219 \endverbatim 220 \code aacEncOpen(&hAacEncoder,value,0) \endcode 221 - Specifying the maximum number of channels to be supported in the encoder 222 instance can be done as follows. 223 - For example allocate an encoder instance which supports 2 channels for all 224 supported AOTs. The library itself may be capable of encoding up to 6 or 8 225 channels but in this example only 2 channel encoding is required and thus only 226 buffers for 2 channels are allocated to save data memory. \code 227 aacEncOpen(&hAacEncoder,0,2) \endcode 228 - Additionally the maximum number of supported channels in the SBR module can 229 be denoted separately.\n In this example the encoder instance provides a maximum 230 of 6 channels out of which up to 2 channels support SBR. This encoder instance 231 can produce for example 5.1 channel AAC-LC streams or stereo HE-AAC (v2) 232 streams. HE-AAC 5.1 multi channel is not possible since only 2 out of 6 channels 233 support SBR, which saves data memory. \code aacEncOpen(&hAacEncoder,0,6|(2<<8)) 234 \endcode \n 235 236 \section bufDes Input/Output Arguments 237 238 \subsection allocIOBufs Provide Buffer Descriptors 239 In the present encoder API, the input and output buffers are described with \ref 240 AACENC_BufDesc "buffer descriptors". This mechanism allows a flexible handling 241 of input and output buffers without impact to the actual encoding call. Optional 242 buffers are necessary e.g. for ancillary data, meta data input or additional 243 output buffers describing superframing data in DAB+ or DRM+.\n At least one 244 input buffer for audio input data and one output buffer for bitstream data must 245 be allocated. The input buffer size can be a user defined multiple of the number 246 of input channels. PCM input data will be copied from the user defined PCM 247 buffer to an internal input buffer and so input data can be less than one AAC 248 audio frame. The output buffer size should be 6144 bits per channel excluding 249 the LFE channel. If the output data does not fit into the provided buffer, an 250 AACENC_ERROR will be returned by aacEncEncode(). \code static INT_PCM 251 inputBuffer[8*2048]; static UCHAR ancillaryBuffer[50]; static 252 AACENC_MetaData metaDataSetup; static UCHAR outputBuffer[8192]; 253 \endcode 254 255 All input and output buffer must be clustered in input and output buffer arrays. 256 \code 257 static void* inBuffer[] = { inputBuffer, ancillaryBuffer, &metaDataSetup 258 }; static INT inBufferIds[] = { IN_AUDIO_DATA, IN_ANCILLRY_DATA, 259 IN_METADATA_SETUP }; static INT inBufferSize[] = { sizeof(inputBuffer), 260 sizeof(ancillaryBuffer), sizeof(metaDataSetup) }; static INT inBufferElSize[] 261 = { sizeof(INT_PCM), sizeof(UCHAR), sizeof(AACENC_MetaData) }; 262 263 static void* outBuffer[] = { outputBuffer }; 264 static INT outBufferIds[] = { OUT_BITSTREAM_DATA }; 265 static INT outBufferSize[] = { sizeof(outputBuffer) }; 266 static INT outBufferElSize[] = { sizeof(UCHAR) }; 267 \endcode 268 269 Allocate buffer descriptors 270 \code 271 AACENC_BufDesc inBufDesc; 272 AACENC_BufDesc outBufDesc; 273 \endcode 274 275 Initialize input buffer descriptor 276 \code 277 inBufDesc.numBufs = sizeof(inBuffer)/sizeof(void*); 278 inBufDesc.bufs = (void**)&inBuffer; 279 inBufDesc.bufferIdentifiers = inBufferIds; 280 inBufDesc.bufSizes = inBufferSize; 281 inBufDesc.bufElSizes = inBufferElSize; 282 \endcode 283 284 Initialize output buffer descriptor 285 \code 286 outBufDesc.numBufs = sizeof(outBuffer)/sizeof(void*); 287 outBufDesc.bufs = (void**)&outBuffer; 288 outBufDesc.bufferIdentifiers = outBufferIds; 289 outBufDesc.bufSizes = outBufferSize; 290 outBufDesc.bufElSizes = outBufferElSize; 291 \endcode 292 293 \subsection argLists Provide Input/Output Argument Lists 294 The input and output arguments of an aacEncEncode() call are described in 295 argument structures. \code AACENC_InArgs inargs; AACENC_OutArgs outargs; 296 \endcode 297 298 \section feedInBuf Feed Input Buffer 299 The input buffer should be handled as a modulo buffer. New audio data in the 300 form of pulse-code- modulated samples (PCM) must be read from external and be 301 fed to the input buffer depending on its fill level. The required sample bitrate 302 (represented by the data type INT_PCM which is 16, 24 or 32 bits wide) is fixed 303 and depends on library configuration (usually 16 bit). \code inargs.numInSamples 304 += WAV_InputRead ( wavIn, &inputBuffer[inargs.numInSamples], 305 FDKmin(encInfo.inputChannels*encInfo.frameLength, 306 sizeof(inputBuffer) / 307 sizeof(INT_PCM)-inargs.numInSamples), 308 SAMPLE_BITS 309 ); 310 \endcode 311 312 After the encoder's internal buffer is fed with incoming audio samples, and 313 aacEncEncode() processed the new input data, update/move remaining samples in 314 input buffer, simulating a modulo buffer: \code if (outargs.numInSamples>0) { 315 FDKmemmove( inputBuffer, 316 &inputBuffer[outargs.numInSamples], 317 sizeof(INT_PCM)*(inargs.numInSamples-outargs.numInSamples) ); 318 inargs.numInSamples -= outargs.numInSamples; 319 } 320 \endcode 321 322 \section writeOutData Output Bitstream Data 323 If any AAC bitstream data is available, write it to output file or device as 324 follows. \code if (outargs.numOutBytes>0) { FDKfwrite(outputBuffer, 325 outargs.numOutBytes, 1, pOutFile); 326 } 327 \endcode 328 329 \section cfgMetaData Meta Data Configuration 330 331 If the present library is configured with Metadata support, it is possible to 332 insert meta data side info into the generated audio bitstream while encoding. 333 334 To work with meta data the encoder instance has to be \ref encOpen "allocated" 335 with meta data support. The meta data mode must be be configured with the 336 ::AACENC_METADATA_MODE parameter and aacEncoder_SetParam() function. \code 337 aacEncoder_SetParam(hAacEncoder, AACENC_METADATA_MODE, 0-3); \endcode 338 339 This configuration indicates how to embed meta data into bitstrem. Either no 340 insertion, MPEG or ETSI style. The meta data itself must be specified within the 341 meta data setup structure AACENC_MetaData. 342 343 Changing one of the AACENC_MetaData setup parameters can be achieved from 344 outside the library within ::IN_METADATA_SETUP input buffer. There is no need to 345 supply meta data setup structure every frame. If there is no new meta setup data 346 available, the encoder uses the previous setup or the default configuration in 347 initial state. 348 349 In general the audio compressor and limiter within the encoder library can be 350 configured with the ::AACENC_METADATA_DRC_PROFILE parameter 351 AACENC_MetaData::drc_profile and and AACENC_MetaData::comp_profile. 352 \n 353 354 \section encReconf Encoder Reconfiguration 355 356 The encoder library allows reconfiguration of the encoder instance with new 357 settings continuously between encoding frames. Each parameter to be changed must 358 be set with a single aacEncoder_SetParam() call. The internal status of each 359 parameter can be retrieved with an aacEncoder_GetParam() call.\n There is no 360 stand-alone reconfiguration function available. When parameters were modified 361 from outside the library, an internal control mechanism triggers the necessary 362 reconfiguration process which will be applied at the beginning of the following 363 aacEncEncode() call. This state can be observed from external via the 364 AACENC_INIT_STATUS and aacEncoder_GetParam() function. The reconfiguration 365 process can also be applied immediately when all parameters of an aacEncEncode() 366 call are NULL with a valid encoder handle.\n\n The internal reconfiguration 367 process can be controlled from extern with the following access. \code 368 aacEncoder_SetParam(hAacEncoder, AACENC_CONTROL_STATE, AACENC_CTRLFLAGS); 369 \endcode 370 371 372 \section encParams Encoder Parametrization 373 374 All parameteres listed in ::AACENC_PARAM can be modified within an encoder 375 instance. 376 377 \subsection encMandatory Mandatory Encoder Parameters 378 The following parameters must be specified when the encoder instance is 379 initialized. \code aacEncoder_SetParam(hAacEncoder, AACENC_AOT, value); 380 aacEncoder_SetParam(hAacEncoder, AACENC_BITRATE, value); 381 aacEncoder_SetParam(hAacEncoder, AACENC_SAMPLERATE, value); 382 aacEncoder_SetParam(hAacEncoder, AACENC_CHANNELMODE, value); 383 \endcode 384 Beyond that is an internal auto mode which preinitizializes the ::AACENC_BITRATE 385 parameter if the parameter was not set from extern. The bitrate depends on the 386 number of effective channels and sampling rate and is determined as follows. 387 \code 388 AAC-LC (AOT_AAC_LC): 1.5 bits per sample 389 HE-AAC (AOT_SBR): 0.625 bits per sample (dualrate sbr) 390 HE-AAC (AOT_SBR): 1.125 bits per sample (downsampled sbr) 391 HE-AAC v2 (AOT_PS): 0.5 bits per sample 392 \endcode 393 394 \subsection channelMode Channel Mode Configuration 395 The input audio data is described with the ::AACENC_CHANNELMODE parameter in the 396 aacEncoder_SetParam() call. It is not possible to use the encoder instance with 397 a 'number of input channels' argument. Instead, the channelMode must be set as 398 follows. \code aacEncoder_SetParam(hAacEncoder, AACENC_CHANNELMODE, value); 399 \endcode The parameter is specified in ::CHANNEL_MODE and can be mapped from the 400 number of input channels in the following way. \code CHANNEL_MODE chMode = 401 MODE_INVALID; 402 403 switch (nChannels) { 404 case 1: chMode = MODE_1; break; 405 case 2: chMode = MODE_2; break; 406 case 3: chMode = MODE_1_2; break; 407 case 4: chMode = MODE_1_2_1; break; 408 case 5: chMode = MODE_1_2_2; break; 409 case 6: chMode = MODE_1_2_2_1; break; 410 case 7: chMode = MODE_6_1; break; 411 case 8: chMode = MODE_7_1_BACK; break; 412 default: 413 chMode = MODE_INVALID; 414 } 415 return chMode; 416 \endcode 417 418 \subsection peakbitrate Peak Bitrate Configuration 419 In AAC, the default bitreservoir configuration depends on the chosen bitrate per 420 frame and the number of effective channels. The size can be determined as below. 421 \f[ 422 bitreservoir = nEffChannels*6144 - (bitrate*framelength/samplerate) 423 \f] 424 Due to audio quality concerns it is not recommended to change the bitreservoir 425 size to a lower value than the default setting! However, for minimizing the 426 delay for streaming applications or for achieving a constant size of the 427 bitstream packages in each frame, it may be necessaray to limit the maximum bits 428 per frame size. This can be done with the ::AACENC_PEAK_BITRATE parameter. \code 429 aacEncoder_SetParam(hAacEncoder, AACENC_PEAK_BITRATE, value); 430 \endcode 431 432 To achieve acceptable audio quality with a reduced bitreservoir size setting at 433 least 1000 bits per audio channel is recommended. For a multichannel audio file 434 with 5.1 channels the bitreservoir reduced to 5000 bits results in acceptable 435 audio quality. 436 437 438 \subsection vbrmode Variable Bitrate Mode 439 The variable bitrate (VBR) mode coding adapts the bit consumption to the 440 psychoacoustic requirements of the signal. The encoder ignores the user-defined 441 bit rate and selects a suitable pre-defined configuration based on the provided 442 AOT. The VBR mode 1 is tuned for HE-AACv2, for VBR mode 2, HE-AACv1 should be 443 used. VBR modes 3-5 should be used with Low-Complexity AAC. When encoding 444 AAC-ELD, the best mode is selected automatically. 445 446 The bitrates given in the table are averages over time and different encoder 447 settings. They strongly depend on the type of audio signal. The VBR 448 configurations can be adjusted with the ::AACENC_BITRATEMODE encoder parameter. 449 \verbatim 450 ----------------------------------------------- 451 VBR_MODE | Approx. Bitrate in kbps for stereo 452 | AAC-LC | AAC-ELD 453 ----------+---------------+-------------------- 454 VBR_1 | 32 (HE-AACv2) | 48 455 VBR_2 | 72 (HE-AACv1) | 56 456 VBR_3 | 112 | 72 457 VBR_4 | 148 | 148 458 VBR_5 | 228 | 224 459 -------------------------------------------- 460 \endverbatim 461 Note that these figures are valid for stereo encoding only. VBR modes 2-5 will 462 yield much lower bit rates when encoding single-channel input. For 463 configurations which are making use of downmix modules the AAC core channels 464 respectively downmix channels shall be considered. 465 466 \subsection encQual Audio Quality Considerations 467 The default encoder configuration is suggested to be used. Encoder tools such as 468 TNS and PNS are activated by default and are internally controlled (see \ref 469 BEHAVIOUR_TOOLS). 470 471 There is an additional quality parameter called ::AACENC_AFTERBURNER. In the 472 default configuration this quality switch is deactivated because it would cause 473 a workload increase which might be significant. If workload is not an issue in 474 the application we recommended to activate this feature. \code 475 aacEncoder_SetParam(hAacEncoder, AACENC_AFTERBURNER, 0/1); \endcode 476 477 \subsection encELD ELD Auto Configuration Mode 478 For ELD configuration a so called auto configurator is available which 479 configures SBR and the SBR ratio by itself. The configurator is used when the 480 encoder parameter ::AACENC_SBR_MODE and ::AACENC_SBR_RATIO are not set 481 explicitly. 482 483 Based on sampling rate and chosen bitrate a reasonable SBR configuration will be 484 used. \verbatim 485 ------------------------------------------------------------------ 486 Sampling Rate | Total Bitrate | No. of | SBR | SBR Ratio 487 [kHz] | [bit/s] | Chan | | 488 | | | | 489 ---------------+-----------------+--------+-----+----------------- 490 ]min, 16[ | min - max | 1 | off | --- 491 ---------------+-----------------+--------------+----------------- 492 [16] | min - 27999 | 1 | on | downsampled SBR 493 | 28000 - max | 1 | off | --- 494 ---------------+-----------------+--------------+----------------- 495 ]16 - 24] | min - 39999 | 1 | on | downsampled SBR 496 | 40000 - max | 1 | off | --- 497 ---------------+-----------------+--------------+----------------- 498 ]24 - 32] | min - 27999 | 1 | on | dualrate SBR 499 | 28000 - 55999 | 1 | on | downsampled SBR 500 | 56000 - max | 1 | off | --- 501 ---------------+-----------------+--------------+----------------- 502 ]32 - 44.1] | min - 63999 | 1 | on | dualrate SBR 503 | 64000 - max | 1 | off | --- 504 ---------------+-----------------+--------------+----------------- 505 ]44.1 - 48] | min - 63999 | 1 | on | dualrate SBR 506 | 64000 - max | 1 | off | --- 507 | | | | 508 ---------------+-----------------+--------+-----+----------------- 509 ]min, 16[ | min - max | 2 | off | --- 510 ---------------+-----------------+--------------+----------------- 511 [16] | min - 31999 | 2 | on | downsampled SBR 512 | 32000 - 63999 | 2 | on | downsampled SBR 513 | 64000 - max | 2 | off | --- 514 ---------------+-----------------+--------------+----------------- 515 ]16 - 24] | min - 47999 | 2 | on | downsampled SBR 516 | 48000 - 79999 | 2 | on | downsampled SBR 517 | 80000 - max | 2 | off | --- 518 ---------------+-----------------+--------------+----------------- 519 ]24 - 32] | min - 31999 | 2 | on | dualrate SBR 520 | 32000 - 67999 | 2 | on | dualrate SBR 521 | 68000 - 95999 | 2 | on | downsampled SBR 522 | 96000 - max | 2 | off | --- 523 ---------------+-----------------+--------------+----------------- 524 ]32 - 44.1] | min - 43999 | 2 | on | dualrate SBR 525 | 44000 - 127999 | 2 | on | dualrate SBR 526 | 128000 - max | 2 | off | --- 527 ---------------+-----------------+--------------+----------------- 528 ]44.1 - 48] | min - 43999 | 2 | on | dualrate SBR 529 | 44000 - 127999 | 2 | on | dualrate SBR 530 | 128000 - max | 2 | off | --- 531 | | | 532 ------------------------------------------------------------------ 533 \endverbatim 534 535 \subsection encDsELD Reduced Delay (Downscaled) Mode 536 The downscaled mode of AAC-ELD reduces the algorithmic delay of AAC-ELD by 537 virtually increasing the sampling rate. When using the downscaled mode, the 538 bitrate should be increased for keeping the same audio quality level. For common 539 signals, the bitrate should be increased by 25% for a downscale factor of 2. 540 541 Currently, downscaling factors 2 and 4 are supported. 542 To enable the downscaled mode in the encoder, the framelength parameter 543 AACENC_GRANULE_LENGTH must be set accordingly to 256 or 240 for a downscale 544 factor of 2 or 128 or 120 for a downscale factor of 4. The default values of 512 545 or 480 mean that no downscaling is applied. \code 546 aacEncoder_SetParam(hAacEncoder, AACENC_GRANULE_LENGTH, 256); 547 aacEncoder_SetParam(hAacEncoder, AACENC_GRANULE_LENGTH, 128); 548 \endcode 549 550 Downscaled bitstreams are fully backwards compatible. However, the legacy 551 decoder needs to support high sample rate, e.g. 96kHz. The signaled sampling 552 rate is multiplied by the downscale factor. Although not required, downscaling 553 should be applied when decoding downscaled bitstreams. It reduces CPU workload 554 and the output will have the same sampling rate as the input. In an ideal 555 configuration both encoder and decoder should run with the same downscale 556 factor. 557 558 The following table shows approximate filter bank delays in ms for common 559 sampling rates(sr) at framesize(fs), and downscale factor(dsf), based on this 560 formula: \f[ 1000 * fs / (dsf * sr) \f] 561 562 \verbatim 563 -------------------------------------- 564 | 512/2 | 512/4 | 480/2 | 480/4 565 ------+-------+-------+-------+------- 566 22050 | 17.41 | 8.71 | 16.33 | 8.16 567 32000 | 12.00 | 6.00 | 11.25 | 5.62 568 44100 | 8.71 | 4.35 | 8.16 | 4.08 569 48000 | 8.00 | 4.00 | 7.50 | 3.75 570 -------------------------------------- 571 \endverbatim 572 573 \section audiochCfg Audio Channel Configuration 574 The MPEG standard refers often to the so-called Channel Configuration. This 575 Channel Configuration is used for a fixed Channel Mapping. The configurations 576 1-7 and 11,12,14 are predefined in MPEG standard and used for implicit 577 signalling within the encoded bitstream. For user defined Configurations the 578 Channel Configuration is set to 0 and the Channel Mapping must be explecitly 579 described with an appropriate Program Config Element. The present Encoder 580 implementation does not allow the user to configure this Channel Configuration 581 from extern. The Encoder implementation supports fixed Channel Modes which are 582 mapped to Channel Configuration as follow. \verbatim 583 ---------------------------------------------------------------------------------------- 584 ChannelMode | ChCfg | Height | front_El | side_El | back_El | 585 lfe_El 586 -----------------------+-------+--------+---------------+----------+----------+--------- 587 MODE_1 | 1 | NORM | SCE | | | 588 MODE_2 | 2 | NORM | CPE | | | 589 MODE_1_2 | 3 | NORM | SCE, CPE | | | 590 MODE_1_2_1 | 4 | NORM | SCE, CPE | | SCE | 591 MODE_1_2_2 | 5 | NORM | SCE, CPE | | CPE | 592 MODE_1_2_2_1 | 6 | NORM | SCE, CPE | | CPE | 593 LFE MODE_1_2_2_2_1 | 7 | NORM | SCE, CPE, CPE | | CPE 594 | LFE MODE_6_1 | 11 | NORM | SCE, CPE | | CPE, 595 SCE | LFE MODE_7_1_BACK | 12 | NORM | SCE, CPE | | 596 CPE, CPE | LFE 597 -----------------------+-------+--------+---------------+----------+----------+--------- 598 MODE_7_1_TOP_FRONT | 14 | NORM | SCE, CPE | | CPE | 599 LFE | | TOP | CPE | | | 600 -----------------------+-------+--------+---------------+----------+----------+--------- 601 MODE_7_1_REAR_SURROUND | 0 | NORM | SCE, CPE | | CPE, CPE | 602 LFE MODE_7_1_FRONT_CENTER | 0 | NORM | SCE, CPE, CPE | | CPE 603 | LFE 604 ---------------------------------------------------------------------------------------- 605 - NORM: Normal Height Layer. - TOP: Top Height Layer. - BTM: Bottom Height 606 Layer. 607 - SCE: Single Channel Element. - CPE: Channel Pair. - LFE: Low Frequency 608 Element. \endverbatim 609 610 The Table describes all fixed Channel Elements for each Channel Mode which are 611 assigned to a speaker arrangement. The arrangement includes front, side, back 612 and lfe Audio Channel Elements in the normal height layer, possibly followed by 613 front, side, and back elements in the top and bottom layer (Channel 614 Configuration 14). \n This mapping of Audio Channel Elements is defined in MPEG 615 standard for Channel Config 1-7 and 11,12,14.\n In case of Channel Config 0 or 616 writing matrix mixdown coefficients, the encoder enables the writing of Program 617 Config Element itself as described in \ref encPCE. The configuration used in 618 Program Config Element refers to the denoted Table.\n Beside the Channel Element 619 assignment the Channel Modes are resposible for audio input data channel 620 mapping. The Channel Mapping of the audio data depends on the selected 621 ::AACENC_CHANNELORDER which can be MPEG or WAV like order.\n Following table 622 describes the complete channel mapping for both Channel Order configurations. 623 \verbatim 624 --------------------------------------------------------------------------------------- 625 ChannelMode | MPEG-Channelorder | WAV-Channelorder 626 -----------------------+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+--- 627 MODE_1 | 0 | | | | | | | | 0 | | | | | | 628 | MODE_2 | 0 | 1 | | | | | | | 0 | 1 | | | | 629 | | MODE_1_2 | 0 | 1 | 2 | | | | | | 2 | 0 | 1 | | 630 | | | MODE_1_2_1 | 0 | 1 | 2 | 3 | | | | | 2 | 0 | 1 | 3 631 | | | | MODE_1_2_2 | 0 | 1 | 2 | 3 | 4 | | | | 2 | 0 | 1 632 | 3 | 4 | | | MODE_1_2_2_1 | 0 | 1 | 2 | 3 | 4 | 5 | | | 2 | 0 633 | 1 | 4 | 5 | 3 | | MODE_1_2_2_2_1 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 2 634 | 6 | 7 | 0 | 1 | 4 | 5 | 3 MODE_6_1 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 635 | 2 | 0 | 1 | 4 | 5 | 6 | 3 | MODE_7_1_BACK | 0 | 1 | 2 | 3 | 4 | 5 | 6 636 | 7 | 2 | 0 | 1 | 6 | 7 | 4 | 5 | 3 MODE_7_1_TOP_FRONT | 0 | 1 | 2 | 3 | 4 | 637 5 | 6 | 7 | 2 | 0 | 1 | 4 | 5 | 3 | 6 | 7 638 -----------------------+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+--- 639 MODE_7_1_REAR_SURROUND | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 2 | 0 | 1 | 6 | 7 | 4 | 640 5 | 3 MODE_7_1_FRONT_CENTER | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 2 | 6 | 7 | 0 | 1 641 | 4 | 5 | 3 642 --------------------------------------------------------------------------------------- 643 \endverbatim 644 645 The denoted mapping is important for correct audio channel assignment when using 646 MPEG or WAV ordering. The incoming audio channels are distributed MPEG like 647 starting at the front channels and ending at the back channels. The distribution 648 is used as described in Table concering Channel Config and fix channel elements. 649 Please see the following example for clarification. 650 651 \verbatim 652 Example: MODE_1_2_2_1 - WAV-Channelorder 5.1 653 ------------------------------------------ 654 Input Channel | Coder Channel 655 --------------------+--------------------- 656 2 (front center) | 0 (SCE channel) 657 0 (left center) | 1 (1st of 1st CPE) 658 1 (right center) | 2 (2nd of 1st CPE) 659 4 (left surround) | 3 (1st of 2nd CPE) 660 5 (right surround) | 4 (2nd of 2nd CPE) 661 3 (LFE) | 5 (LFE) 662 ------------------------------------------ 663 \endverbatim 664 665 666 \section suppBitrates Supported Bitrates 667 668 The FDK AAC Encoder provides a wide range of supported bitrates. 669 The minimum and maximum allowed bitrate depends on the Audio Object Type. For 670 AAC-LC the minimum bitrate is the bitrate that is required to write the most 671 basic and minimal valid bitstream. It consists of the bitstream format header 672 information and other static/mandatory information within the AAC payload. The 673 maximum AAC framesize allowed by the MPEG-4 standard determines the maximum 674 allowed bitrate for AAC-LC. For HE-AAC and HE-AAC v2 a library internal look-up 675 table is used. 676 677 A good working point in terms of audio quality, sampling rate and bitrate, is at 678 1 to 1.5 bits/audio sample for AAC-LC, 0.625 bits/audio sample for dualrate 679 HE-AAC, 1.125 bits/audio sample for downsampled HE-AAC and 0.5 bits/audio sample 680 for HE-AAC v2. For example for one channel with a sampling frequency of 48 kHz, 681 the range from 48 kbit/s to 72 kbit/s achieves reasonable audio quality for 682 AAC-LC. 683 684 For HE-AAC and HE-AAC v2 the lowest possible audio input sampling frequency is 685 16 kHz because then the AAC-LC core encoder operates in dual rate mode at its 686 lowest possible sampling frequency, which is 8 kHz. HE-AAC v2 requires stereo 687 input audio data. 688 689 Please note that in HE-AAC or HE-AAC v2 mode the encoder supports much higher 690 bitrates than are appropriate for HE-AAC or HE-AAC v2. For example, at a bitrate 691 of more than 64 kbit/s for a stereo audio signal at 44.1 kHz it usually makes 692 sense to use AAC-LC, which will produce better audio quality at that bitrate 693 than HE-AAC or HE-AAC v2. 694 695 \section reommendedConfig Recommended Sampling Rate and Bitrate Combinations 696 697 The following table provides an overview of recommended encoder configuration 698 parameters which we determined by virtue of numerous listening tests. 699 700 \subsection reommendedConfigLC AAC-LC, HE-AAC, HE-AACv2 in Dualrate SBR mode. 701 \verbatim 702 ----------------------------------------------------------------------------------- 703 Audio Object Type | Bit Rate Range | Supported | Preferred | No. 704 of | [bit/s] | Sampling Rates | Sampl. | Chan. | 705 | [kHz] | Rate | | | 706 | [kHz] | 707 -------------------+------------------+-----------------------+------------+------- 708 AAC LC + SBR + PS | 8000 - 11999 | 22.05, 24.00 | 24.00 | 2 709 AAC LC + SBR + PS | 12000 - 17999 | 32.00 | 32.00 | 2 710 AAC LC + SBR + PS | 18000 - 39999 | 32.00, 44.10, 48.00 | 44.10 | 2 711 AAC LC + SBR + PS | 40000 - 64000 | 32.00, 44.10, 48.00 | 48.00 | 2 712 -------------------+------------------+-----------------------+------------+------- 713 AAC LC + SBR | 8000 - 11999 | 22.05, 24.00 | 24.00 | 1 714 AAC LC + SBR | 12000 - 17999 | 32.00 | 32.00 | 1 715 AAC LC + SBR | 18000 - 39999 | 32.00, 44.10, 48.00 | 44.10 | 1 716 AAC LC + SBR | 40000 - 64000 | 32.00, 44.10, 48.00 | 48.00 | 1 717 -------------------+------------------+-----------------------+------------+------- 718 AAC LC + SBR | 16000 - 27999 | 32.00, 44.10, 48.00 | 32.00 | 2 719 AAC LC + SBR | 28000 - 63999 | 32.00, 44.10, 48.00 | 44.10 | 2 720 AAC LC + SBR | 64000 - 128000 | 32.00, 44.10, 48.00 | 48.00 | 2 721 -------------------+------------------+-----------------------+------------+------- 722 AAC LC + SBR | 64000 - 69999 | 32.00, 44.10, 48.00 | 32.00 | 723 5, 5.1 AAC LC + SBR | 70000 - 239999 | 32.00, 44.10, 48.00 | 44.10 724 | 5, 5.1 AAC LC + SBR | 240000 - 319999 | 32.00, 44.10, 48.00 | 725 48.00 | 5, 5.1 726 -------------------+------------------+-----------------------+------------+------- 727 AAC LC | 8000 - 15999 | 11.025, 12.00, 16.00 | 12.00 | 1 728 AAC LC | 16000 - 23999 | 16.00 | 16.00 | 1 729 AAC LC | 24000 - 31999 | 16.00, 22.05, 24.00 | 24.00 | 1 730 AAC LC | 32000 - 55999 | 32.00 | 32.00 | 1 731 AAC LC | 56000 - 160000 | 32.00, 44.10, 48.00 | 44.10 | 1 732 AAC LC | 160001 - 288000 | 48.00 | 48.00 | 1 733 -------------------+------------------+-----------------------+------------+------- 734 AAC LC | 16000 - 23999 | 11.025, 12.00, 16.00 | 12.00 | 2 735 AAC LC | 24000 - 31999 | 16.00 | 16.00 | 2 736 AAC LC | 32000 - 39999 | 16.00, 22.05, 24.00 | 22.05 | 2 737 AAC LC | 40000 - 95999 | 32.00 | 32.00 | 2 738 AAC LC | 96000 - 111999 | 32.00, 44.10, 48.00 | 32.00 | 2 739 AAC LC | 112000 - 320001 | 32.00, 44.10, 48.00 | 44.10 | 2 740 AAC LC | 320002 - 576000 | 48.00 | 48.00 | 2 741 -------------------+------------------+-----------------------+------------+------- 742 AAC LC | 160000 - 239999 | 32.00 | 32.00 | 743 5, 5.1 AAC LC | 240000 - 279999 | 32.00, 44.10, 48.00 | 32.00 744 | 5, 5.1 AAC LC | 280000 - 800000 | 32.00, 44.10, 48.00 | 745 44.10 | 5, 5.1 746 ----------------------------------------------------------------------------------- 747 \endverbatim \n 748 749 \subsection reommendedConfigLD AAC-LD, AAC-ELD, AAC-ELD with SBR in Dualrate SBR 750 mode. Unlike to HE-AAC configuration the SBR is not covered by ELD audio object 751 type and needs to be enabled explicitly. Use ::AACENC_SBR_MODE to configure SBR 752 and its samplingrate ratio with ::AACENC_SBR_RATIO parameter. \verbatim 753 ----------------------------------------------------------------------------------- 754 Audio Object Type | Bit Rate Range | Supported | Preferred | No. 755 of | [bit/s] | Sampling Rates | Sampl. | Chan. | 756 | [kHz] | Rate | | | 757 | [kHz] | 758 -------------------+------------------+-----------------------+------------+------- 759 ELD + SBR | 18000 - 24999 | 32.00 - 44.10 | 32.00 | 1 760 ELD + SBR | 25000 - 31999 | 32.00 - 48.00 | 32.00 | 1 761 ELD + SBR | 32000 - 64000 | 32.00 - 48.00 | 48.00 | 1 762 -------------------+------------------+-----------------------+------------+------- 763 ELD + SBR | 32000 - 51999 | 32.00 - 48.00 | 44.10 | 2 764 ELD + SBR | 52000 - 128000 | 32.00 - 48.00 | 48.00 | 2 765 -------------------+------------------+-----------------------+------------+------- 766 ELD + SBR | 78000 - 160000 | 32.00 - 48.00 | 48.00 | 3 767 -------------------+------------------+-----------------------+------------+------- 768 ELD + SBR | 104000 - 212000 | 32.00 - 48.00 | 48.00 | 4 769 -------------------+------------------+-----------------------+------------+------- 770 ELD + SBR | 130000 - 246000 | 32.00 - 48.00 | 48.00 | 771 5, 5.1 772 -------------------+------------------+-----------------------+------------+------- 773 LD, ELD | 16000 - 19999 | 16.00 - 24.00 | 16.00 | 1 774 LD, ELD | 20000 - 39999 | 16.00 - 32.00 | 24.00 | 1 775 LD, ELD | 40000 - 49999 | 22.05 - 32.00 | 32.00 | 1 776 LD, ELD | 50000 - 61999 | 24.00 - 44.10 | 32.00 | 1 777 LD, ELD | 62000 - 84999 | 32.00 - 48.00 | 44.10 | 1 778 LD, ELD | 85000 - 192000 | 44.10 - 48.00 | 48.00 | 1 779 -------------------+------------------+-----------------------+------------+------- 780 LD, ELD | 64000 - 75999 | 24.00 - 32.00 | 32.00 | 2 781 LD, ELD | 76000 - 97999 | 24.00 - 44.10 | 32.00 | 2 782 LD, ELD | 98000 - 135999 | 32.00 - 48.00 | 44.10 | 2 783 LD, ELD | 136000 - 384000 | 44.10 - 48.00 | 48.00 | 2 784 -------------------+------------------+-----------------------+------------+------- 785 LD, ELD | 96000 - 113999 | 24.00 - 32.00 | 32.00 | 3 786 LD, ELD | 114000 - 146999 | 24.00 - 44.10 | 32.00 | 3 787 LD, ELD | 147000 - 203999 | 32.00 - 48.00 | 44.10 | 3 788 LD, ELD | 204000 - 576000 | 44.10 - 48.00 | 48.00 | 3 789 -------------------+------------------+-----------------------+------------+------- 790 LD, ELD | 128000 - 151999 | 24.00 - 32.00 | 32.00 | 4 791 LD, ELD | 152000 - 195999 | 24.00 - 44.10 | 32.00 | 4 792 LD, ELD | 196000 - 271999 | 32.00 - 48.00 | 44.10 | 4 793 LD, ELD | 272000 - 768000 | 44.10 - 48.00 | 48.00 | 4 794 -------------------+------------------+-----------------------+------------+------- 795 LD, ELD | 160000 - 189999 | 24.00 - 32.00 | 32.00 | 796 5, 5.1 LD, ELD | 190000 - 244999 | 24.00 - 44.10 | 32.00 797 | 5, 5.1 LD, ELD | 245000 - 339999 | 32.00 - 48.00 | 798 44.10 | 5, 5.1 LD, ELD | 340000 - 960000 | 44.10 - 48.00 | 799 48.00 | 5, 5.1 800 ----------------------------------------------------------------------------------- 801 \endverbatim \n 802 803 \subsection reommendedConfigELD AAC-ELD with SBR in Downsampled SBR mode. 804 \verbatim 805 ----------------------------------------------------------------------------------- 806 Audio Object Type | Bit Rate Range | Supported | Preferred | No. 807 of | [bit/s] | Sampling Rates | Sampl. | Chan. | 808 | [kHz] | Rate | | | 809 | [kHz] | 810 -------------------+------------------+-----------------------+------------+------- 811 ELD + SBR | 18000 - 24999 | 16.00 - 22.05 | 22.05 | 1 812 (downsampled SBR) | 25000 - 31999 | 16.00 - 24.00 | 24.00 | 1 813 | 32000 - 47999 | 22.05 - 32.00 | 32.00 | 1 814 | 48000 - 64000 | 22.05 - 48.00 | 32.00 | 1 815 -------------------+------------------+-----------------------+------------+------- 816 ELD + SBR | 32000 - 51999 | 16.00 - 24.00 | 24.00 | 2 817 (downsampled SBR) | 52000 - 59999 | 22.05 - 24.00 | 24.00 | 2 818 | 60000 - 95999 | 22.05 - 32.00 | 32.00 | 2 819 | 96000 - 128000 | 22.05 - 48.00 | 32.00 | 2 820 -------------------+------------------+-----------------------+------------+------- 821 ELD + SBR | 78000 - 99999 | 22.05 - 24.00 | 24.00 | 3 822 (downsampled SBR) | 100000 - 143999 | 22.05 - 32.00 | 32.00 | 3 823 | 144000 - 159999 | 22.05 - 48.00 | 32.00 | 3 824 | 160000 - 192000 | 32.00 - 48.00 | 32.00 | 3 825 -------------------+------------------+-----------------------+------------+------- 826 ELD + SBR | 104000 - 149999 | 22.05 - 24.00 | 24.00 | 4 827 (downsampled SBR) | 150000 - 191999 | 22.05 - 32.00 | 32.00 | 4 828 | 192000 - 211999 | 22.05 - 48.00 | 32.00 | 4 829 | 212000 - 256000 | 32.00 - 48.00 | 32.00 | 4 830 -------------------+------------------+-----------------------+------------+------- 831 ELD + SBR | 130000 - 171999 | 22.05 - 24.00 | 24.00 | 832 5, 5.1 (downsampled SBR) | 172000 - 239999 | 22.05 - 32.00 | 32.00 833 | 5, 5.1 | 240000 - 320000 | 32.00 - 48.00 | 32.00 | 5, 5.1 834 ----------------------------------------------------------------------------------- 835 \endverbatim \n 836 837 \subsection reommendedConfigELDv2 AAC-ELD v2, AAC-ELD v2 with SBR. 838 The ELD v2 212 configuration must be configured explicitly with 839 ::AACENC_CHANNELMODE parameter according MODE_212 value. SBR can be configured 840 separately through ::AACENC_SBR_MODE and ::AACENC_SBR_RATIO parameter. Following 841 configurations shall apply to both framelengths 480 and 512. For ELD v2 842 configuration without SBR and framelength 480 the supported sampling rate is 843 restricted to the range from 16 kHz up to 24 kHz. \verbatim 844 ----------------------------------------------------------------------------------- 845 Audio Object Type | Bit Rate Range | Supported | Preferred | No. 846 of | [bit/s] | Sampling Rates | Sampl. | Chan. | 847 | [kHz] | Rate | | | 848 | [kHz] | 849 -------------------+------------------+-----------------------+------------+------- 850 ELD-212 | 16000 - 19999 | 16.00 - 24.00 | 16.00 | 2 851 (without SBR) | 20000 - 39999 | 16.00 - 32.00 | 24.00 | 2 852 | 40000 - 49999 | 22.05 - 32.00 | 32.00 | 2 853 | 50000 - 61999 | 24.00 - 44.10 | 32.00 | 2 854 | 62000 - 84999 | 32.00 - 48.00 | 44.10 | 2 855 | 85000 - 192000 | 44.10 - 48.00 | 48.00 | 2 856 -------------------+------------------+-----------------------+------------+------- 857 ELD-212 + SBR | 18000 - 20999 | 32.00 | 32.00 | 2 858 (dualrate SBR) | 21000 - 25999 | 32.00 - 44.10 | 32.00 | 2 859 | 26000 - 31999 | 32.00 - 48.00 | 44.10 | 2 860 | 32000 - 64000 | 32.00 - 48.00 | 48.00 | 2 861 -------------------+------------------+-----------------------+------------+------- 862 ELD-212 + SBR | 18000 - 19999 | 16.00 - 22.05 | 22.05 | 2 863 (downsampled SBR) | 20000 - 24999 | 16.00 - 24.00 | 22.05 | 2 864 | 25000 - 31999 | 16.00 - 24.00 | 24.00 | 2 865 | 32000 - 64000 | 24.00 - 24.00 | 24.00 | 2 866 -------------------+------------------+-----------------------+------------+------- 867 \endverbatim \n 868 869 \page ENCODERBEHAVIOUR Encoder Behaviour 870 871 \section BEHAVIOUR_BANDWIDTH Bandwidth 872 873 The FDK AAC encoder usually does not use the full frequency range of the input 874 signal, but restricts the bandwidth according to certain library-internal 875 settings. They can be changed in the table "bandWidthTable" in the file 876 bandwidth.cpp (if available). 877 878 The encoder API provides the ::AACENC_BANDWIDTH parameter to adjust the 879 bandwidth explicitly. \code aacEncoder_SetParam(hAacEncoder, AACENC_BANDWIDTH, 880 value); \endcode 881 882 However it is not recommended to change these settings, because they are based 883 on numerous listening tests and careful tweaks to ensure the best overall 884 encoding quality. Also, the maximum bandwidth that can be set manually by the 885 user is 20kHz or fs/2, whichever value is smaller. 886 887 Theoretically a signal of for example 48 kHz can contain frequencies up to 24 888 kHz, but to use this full range in an audio encoder usually does not make sense. 889 Usually the encoder has a very limited amount of bits to spend (typically 128 890 kbit/s for stereo 48 kHz content) and to allow full range bandwidth would waste 891 a lot of these bits for frequencies the human ear is hardly able to perceive 892 anyway, if at all. Hence it is wise to use the available bits for the really 893 important frequency range and just skip the rest. At lower bitrates (e. g. <= 80 894 kbit/s for stereo 48 kHz content) the encoder will choose an even smaller 895 bandwidth, because an encoded signal with smaller bandwidth and hence less 896 artifacts sounds better than a signal with higher bandwidth but then more coding 897 artefacts across all frequencies. These artefacts would occur if small bitrates 898 and high bandwidths are chosen because the available bits are just not enough to 899 encode all frequencies well. 900 901 Unfortunately some people evaluate encoding quality based on possible bandwidth 902 as well, but it is a double-edged sword considering the trade-off described 903 above. 904 905 Another aspect is workload consumption. The higher the allowed bandwidth, the 906 more frequency lines have to be processed, which in turn increases the workload. 907 908 \section FRAMESIZES_AND_BIT_RESERVOIR Frame Sizes & Bit Reservoir 909 910 For AAC there is a difference between constant bit rate and constant frame 911 length due to the so-called bit reservoir technique, which allows the encoder to 912 use less bits in an AAC frame for those audio signal sections which are easy to 913 encode, and then spend them at a later point in time for more complex audio 914 sections. The extent to which this "bit exchange" is done is limited to allow 915 for reliable and relatively low delay real time streaming. Therefore, for 916 AAC-ELD, the bitreservoir is limited. It varies between 500 and 4000 bits/frame, 917 depending on the bitrate/channel. 918 - For a bitrate of 12kbps/channel and below, the AAC-ELD bitreservoir is 500 919 bits/frame. 920 - For a bitrate of 70kbps/channel and above, the AAC-ELD bitreservoir is 4000 921 bits/frame. 922 - Between 12kbps/channel and 70kbps/channel, the AAC-ELD bitrervoir is increased 923 linearly. 924 - For AAC-LC, the bitrate is only limited by the maximum AAC frame length. It 925 is, regardless of the available bit reservoir, defined as 6144 bits per channel. 926 927 Over a longer period in time the bitrate will be constant in the AAC constant 928 bitrate mode, e.g. for ISDN transmission. This means that in AAC each bitstream 929 frame will in general have a different length in bytes but over time it 930 will reach the target bitrate. 931 932 933 One could also make an MPEG compliant 934 AAC encoder which always produces constant length packages for each AAC frame, 935 but the audio quality would be considerably worse since the bit reservoir 936 technique would have to be switched off completely. A higher bit rate would have 937 to be used to get the same audio quality as with an enabled bit reservoir. 938 939 For mp3 by the way, the same bit reservoir technique exists, but there each bit 940 stream frame has a constant length for a given bit rate (ignoring the 941 padding byte). In mp3 there is a so-called "back pointer" which tells 942 the decoder which bits belong to the current mp3 frame - and in general some or 943 many bits have been transmitted in an earlier mp3 frame. Basically this leads to 944 the same "bit exchange between mp3 frames" as in AAC but with virtually constant 945 length frames. 946 947 This variable frame length at "constant bit rate" is not something special 948 in this Fraunhofer IIS AAC encoder. AAC has been designed in that way. 949 950 \subsection BEHAVIOUR_ESTIM_AVG_FRAMESIZES Estimating Average Frame Sizes 951 952 A HE-AAC v1 or v2 audio frame contains 2048 PCM samples per channel. 953 954 The number of HE-AAC frames \f$N\_FRAMES\f$ per second at 44.1 kHz is: 955 956 \f[ 957 N\_FRAMES = 44100 / 2048 = 21.5332 958 \f] 959 960 At a bit rate of 8 kbps the average number of bits per frame 961 \f$N\_BITS\_PER\_FRAME\f$ is: 962 963 \f[ 964 N\_BITS\_PER\_FRAME = 8000 / 21.5332 = 371.52 965 \f] 966 967 which is about 46.44 bytes per encoded frame. 968 969 At a bit rate of 32 kbps, which is quite high for single channel HE-AAC v1, it 970 is: 971 972 \f[ 973 N\_BITS\_PER\_FRAME = 32000 / 21.5332 = 1486 974 \f] 975 976 which is about 185.76 bytes per encoded frame. 977 978 These bits/frame figures are average figures where each AAC frame generally has 979 a different size in bytes. To calculate the same for AAC-LC just use 1024 980 instead of 2048 PCM samples per frame and channel. For AAC-LD/ELD it is either 981 480 or 512 PCM samples per frame and channel. 982 983 984 \section BEHAVIOUR_TOOLS Encoder Tools 985 986 The AAC encoder supports TNS, PNS, MS, Intensity and activates these tools 987 depending on the audio signal and the encoder configuration (i.e. bitrate or 988 AOT). It is not required to configure these tools manually. 989 990 PNS improves encoding quality only for certain bitrates. Therefore it makes 991 sense to activate PNS only for these bitrates and save the processing power 992 required for PNS (about 10 % of the encoder) when using other bitrates. This is 993 done automatically inside the encoder library. PNS is disabled inside the 994 encoder library if an MPEG-2 AOT is choosen since PNS is an MPEG-4 AAC feature. 995 996 If SBR is activated, the encoder automatically deactivates PNS internally. If 997 TNS is disabled but PNS is allowed, the encoder deactivates PNS calculation 998 internally. 999 1000 */ 1001 1002 #ifndef AACENC_LIB_H 1003 #define AACENC_LIB_H 1004 1005 #include "machine_type.h" 1006 #include "FDK_audio.h" 1007 1008 /** 1009 * AAC encoder error codes. 1010 */ 1011 typedef enum { 1012 AACENC_OK = 0x0000, /*!< No error happened. All fine. */ 1013 1014 AACENC_INVALID_HANDLE = 1015 0x0020, /*!< Handle passed to function call was invalid. */ 1016 AACENC_MEMORY_ERROR = 0x0021, /*!< Memory allocation failed. */ 1017 AACENC_UNSUPPORTED_PARAMETER = 0x0022, /*!< Parameter not available. */ 1018 AACENC_INVALID_CONFIG = 0x0023, /*!< Configuration not provided. */ 1019 1020 AACENC_INIT_ERROR = 0x0040, /*!< General initialization error. */ 1021 AACENC_INIT_AAC_ERROR = 0x0041, /*!< AAC library initialization error. */ 1022 AACENC_INIT_SBR_ERROR = 0x0042, /*!< SBR library initialization error. */ 1023 AACENC_INIT_TP_ERROR = 0x0043, /*!< Transport library initialization error. */ 1024 AACENC_INIT_META_ERROR = 1025 0x0044, /*!< Meta data library initialization error. */ 1026 AACENC_INIT_MPS_ERROR = 0x0045, /*!< MPS library initialization error. */ 1027 1028 AACENC_ENCODE_ERROR = 0x0060, /*!< The encoding process was interrupted by an 1029 unexpected error. */ 1030 1031 AACENC_ENCODE_EOF = 0x0080 /*!< End of file reached. */ 1032 1033 } AACENC_ERROR; 1034 1035 /** 1036 * AAC encoder buffer descriptors identifier. 1037 * This identifier are used within buffer descriptors 1038 * AACENC_BufDesc::bufferIdentifiers. 1039 */ 1040 typedef enum { 1041 /* Input buffer identifier. */ 1042 IN_AUDIO_DATA = 0, /*!< Audio input buffer, interleaved INT_PCM samples. */ 1043 IN_ANCILLRY_DATA = 1, /*!< Ancillary data to be embedded into bitstream. */ 1044 IN_METADATA_SETUP = 2, /*!< Setup structure for embedding meta data. */ 1045 1046 /* Output buffer identifier. */ 1047 OUT_BITSTREAM_DATA = 3, /*!< Buffer holds bitstream output data. */ 1048 OUT_AU_SIZES = 1049 4 /*!< Buffer contains sizes of each access unit. This information 1050 is necessary for superframing. */ 1051 1052 } AACENC_BufferIdentifier; 1053 1054 /** 1055 * AAC encoder handle. 1056 */ 1057 typedef struct AACENCODER *HANDLE_AACENCODER; 1058 1059 /** 1060 * Provides some info about the encoder configuration. 1061 */ 1062 typedef struct { 1063 UINT maxOutBufBytes; /*!< Maximum number of encoder bitstream bytes within one 1064 frame. Size depends on maximum number of supported 1065 channels in encoder instance. */ 1066 1067 UINT maxAncBytes; /*!< Maximum number of ancillary data bytes which can be 1068 inserted into bitstream within one frame. */ 1069 1070 UINT inBufFillLevel; /*!< Internal input buffer fill level in samples per 1071 channel. This parameter will automatically be cleared 1072 if samplingrate or channel(Mode/Order) changes. */ 1073 1074 UINT inputChannels; /*!< Number of input channels expected in encoding 1075 process. */ 1076 1077 UINT frameLength; /*!< Amount of input audio samples consumed each frame per 1078 channel, depending on audio object type configuration. */ 1079 1080 UINT nDelay; /*!< Codec delay in PCM samples/channel. Depends on framelength 1081 and AOT. Does not include framing delay for filling up encoder 1082 PCM input buffer. */ 1083 1084 UINT nDelayCore; /*!< Codec delay in PCM samples/channel, w/o delay caused by 1085 the decoder SBR module. This delay is needed to correctly 1086 write edit lists for gapless playback. The decoder may not 1087 know how much delay is introdcued by SBR, since it may not 1088 know if SBR is active at all (implicit signaling), 1089 therefore the deocder must take into account any delay 1090 caused by the SBR module. */ 1091 1092 UCHAR confBuf[64]; /*!< Configuration buffer in binary format as an 1093 AudioSpecificConfig or StreamMuxConfig according to the 1094 selected transport type. */ 1095 1096 UINT confSize; /*!< Number of valid bytes in confBuf. */ 1097 1098 } AACENC_InfoStruct; 1099 1100 /** 1101 * Describes the input and output buffers for an aacEncEncode() call. 1102 */ 1103 typedef struct { 1104 INT numBufs; /*!< Number of buffers. */ 1105 void **bufs; /*!< Pointer to vector containing buffer addresses. */ 1106 INT *bufferIdentifiers; /*!< Identifier of each buffer element. See 1107 ::AACENC_BufferIdentifier. */ 1108 INT *bufSizes; /*!< Size of each buffer in 8-bit bytes. */ 1109 INT *bufElSizes; /*!< Size of each buffer element in bytes. */ 1110 1111 } AACENC_BufDesc; 1112 1113 /** 1114 * Defines the input arguments for an aacEncEncode() call. 1115 */ 1116 typedef struct { 1117 INT numInSamples; /*!< Number of valid input audio samples (multiple of input 1118 channels). */ 1119 INT numAncBytes; /*!< Number of ancillary data bytes to be encoded. */ 1120 1121 } AACENC_InArgs; 1122 1123 /** 1124 * Defines the output arguments for an aacEncEncode() call. 1125 */ 1126 typedef struct { 1127 INT numOutBytes; /*!< Number of valid bitstream bytes generated during 1128 aacEncEncode(). */ 1129 INT numInSamples; /*!< Number of input audio samples consumed by the encoder. 1130 */ 1131 INT numAncBytes; /*!< Number of ancillary data bytes consumed by the encoder. 1132 */ 1133 INT bitResState; /*!< State of the bit reservoir in bits. */ 1134 1135 } AACENC_OutArgs; 1136 1137 /** 1138 * Meta Data Compression Profiles. 1139 */ 1140 typedef enum { 1141 AACENC_METADATA_DRC_NONE = 0, /*!< None. */ 1142 AACENC_METADATA_DRC_FILMSTANDARD = 1, /*!< Film standard. */ 1143 AACENC_METADATA_DRC_FILMLIGHT = 2, /*!< Film light. */ 1144 AACENC_METADATA_DRC_MUSICSTANDARD = 3, /*!< Music standard. */ 1145 AACENC_METADATA_DRC_MUSICLIGHT = 4, /*!< Music light. */ 1146 AACENC_METADATA_DRC_SPEECH = 5, /*!< Speech. */ 1147 AACENC_METADATA_DRC_NOT_PRESENT = 1148 256 /*!< Disable writing gain factor (used for comp_profile only). */ 1149 1150 } AACENC_METADATA_DRC_PROFILE; 1151 1152 /** 1153 * Meta Data setup structure. 1154 */ 1155 typedef struct { 1156 AACENC_METADATA_DRC_PROFILE 1157 drc_profile; /*!< MPEG DRC compression profile. See 1158 ::AACENC_METADATA_DRC_PROFILE. */ 1159 AACENC_METADATA_DRC_PROFILE 1160 comp_profile; /*!< ETSI heavy compression profile. See 1161 ::AACENC_METADATA_DRC_PROFILE. */ 1162 1163 INT drc_TargetRefLevel; /*!< Used to define expected level to: 1164 Scaled with 16 bit. x*2^16. */ 1165 INT comp_TargetRefLevel; /*!< Adjust limiter to avoid overload. 1166 Scaled with 16 bit. x*2^16. */ 1167 1168 INT prog_ref_level_present; /*!< Flag, if prog_ref_level is present */ 1169 INT prog_ref_level; /*!< Programme Reference Level = Dialogue Level: 1170 -31.75dB .. 0 dB ; stepsize: 0.25dB 1171 Scaled with 16 bit. x*2^16.*/ 1172 1173 UCHAR PCE_mixdown_idx_present; /*!< Flag, if dmx-idx should be written in 1174 programme config element */ 1175 UCHAR ETSI_DmxLvl_present; /*!< Flag, if dmx-lvl should be written in 1176 ETSI-ancData */ 1177 1178 SCHAR centerMixLevel; /*!< Center downmix level (0...7, according to table) */ 1179 SCHAR surroundMixLevel; /*!< Surround downmix level (0...7, according to 1180 table) */ 1181 1182 UCHAR 1183 dolbySurroundMode; /*!< Indication for Dolby Surround Encoding Mode. 1184 - 0: Dolby Surround mode not indicated 1185 - 1: 2-ch audio part is not Dolby surround encoded 1186 - 2: 2-ch audio part is Dolby surround encoded */ 1187 1188 UCHAR drcPresentationMode; /*!< Indicatin for DRC Presentation Mode. 1189 - 0: Presentation mode not inticated 1190 - 1: Presentation mode 1 1191 - 2: Presentation mode 2 */ 1192 1193 struct { 1194 /* extended ancillary data */ 1195 UCHAR extAncDataEnable; /*< Indicates if MPEG4_ext_ancillary_data() exists. 1196 - 0: No MPEG4_ext_ancillary_data(). 1197 - 1: Insert MPEG4_ext_ancillary_data(). */ 1198 1199 UCHAR 1200 extDownmixLevelEnable; /*< Indicates if ext_downmixing_levels() exists. 1201 - 0: No ext_downmixing_levels(). 1202 - 1: Insert ext_downmixing_levels(). */ 1203 UCHAR extDownmixLevel_A; /*< Downmix level index A (0...7, according to 1204 table) */ 1205 UCHAR extDownmixLevel_B; /*< Downmix level index B (0...7, according to 1206 table) */ 1207 1208 UCHAR dmxGainEnable; /*< Indicates if ext_downmixing_global_gains() exists. 1209 - 0: No ext_downmixing_global_gains(). 1210 - 1: Insert ext_downmixing_global_gains(). */ 1211 INT dmxGain5; /*< Gain factor for downmix to 5 channels. 1212 -15.75dB .. -15.75dB; stepsize: 0.25dB 1213 Scaled with 16 bit. x*2^16.*/ 1214 INT dmxGain2; /*< Gain factor for downmix to 2 channels. 1215 -15.75dB .. -15.75dB; stepsize: 0.25dB 1216 Scaled with 16 bit. x*2^16.*/ 1217 1218 UCHAR lfeDmxEnable; /*< Indicates if ext_downmixing_lfe_level() exists. 1219 - 0: No ext_downmixing_lfe_level(). 1220 - 1: Insert ext_downmixing_lfe_level(). */ 1221 UCHAR lfeDmxLevel; /*< Downmix level index for LFE (0..15, according to 1222 table) */ 1223 1224 } ExtMetaData; 1225 1226 } AACENC_MetaData; 1227 1228 /** 1229 * AAC encoder control flags. 1230 * 1231 * In interaction with the ::AACENC_CONTROL_STATE parameter it is possible to 1232 * get information about the internal initialization process. It is also 1233 * possible to overwrite the internal state from extern when necessary. 1234 */ 1235 typedef enum { 1236 AACENC_INIT_NONE = 0x0000, /*!< Do not trigger initialization. */ 1237 AACENC_INIT_CONFIG = 1238 0x0001, /*!< Initialize all encoder modules configuration. */ 1239 AACENC_INIT_STATES = 0x0002, /*!< Reset all encoder modules history buffer. */ 1240 AACENC_INIT_TRANSPORT = 1241 0x1000, /*!< Initialize transport lib with new parameters. */ 1242 AACENC_RESET_INBUFFER = 1243 0x2000, /*!< Reset fill level of internal input buffer. */ 1244 AACENC_INIT_ALL = 0xFFFF /*!< Initialize all. */ 1245 } AACENC_CTRLFLAGS; 1246 1247 /** 1248 * \brief AAC encoder setting parameters. 1249 * 1250 * Use aacEncoder_SetParam() function to configure, or use aacEncoder_GetParam() 1251 * function to read the internal status of the following parameters. 1252 */ 1253 typedef enum { 1254 AACENC_AOT = 1255 0x0100, /*!< Audio object type. See ::AUDIO_OBJECT_TYPE in FDK_audio.h. 1256 - 2: MPEG-4 AAC Low Complexity. 1257 - 5: MPEG-4 AAC Low Complexity with Spectral Band Replication 1258 (HE-AAC). 1259 - 29: MPEG-4 AAC Low Complexity with Spectral Band 1260 Replication and Parametric Stereo (HE-AAC v2). This 1261 configuration can be used only with stereo input audio data. 1262 - 23: MPEG-4 AAC Low-Delay. 1263 - 39: MPEG-4 AAC Enhanced Low-Delay. Since there is no 1264 ::AUDIO_OBJECT_TYPE for ELD in combination with SBR defined, 1265 enable SBR explicitely by ::AACENC_SBR_MODE parameter. The ELD 1266 v2 212 configuration can be configured by ::AACENC_CHANNELMODE 1267 parameter. 1268 - 129: MPEG-2 AAC Low Complexity. 1269 - 132: MPEG-2 AAC Low Complexity with Spectral Band 1270 Replication (HE-AAC). 1271 1272 Please note that the virtual MPEG-2 AOT's basically disables 1273 non-existing Perceptual Noise Substitution tool in AAC encoder 1274 and controls the MPEG_ID flag in adts header. The virtual 1275 MPEG-2 AOT doesn't prohibit specific transport formats. */ 1276 1277 AACENC_BITRATE = 0x0101, /*!< Total encoder bitrate. This parameter is 1278 mandatory and interacts with ::AACENC_BITRATEMODE. 1279 - CBR: Bitrate in bits/second. 1280 - VBR: Variable bitrate. Bitrate argument will 1281 be ignored. See \ref suppBitrates for details. */ 1282 1283 AACENC_BITRATEMODE = 0x0102, /*!< Bitrate mode. Configuration can be different 1284 kind of bitrate configurations: 1285 - 0: Constant bitrate, use bitrate according 1286 to ::AACENC_BITRATE. (default) Within none 1287 LD/ELD ::AUDIO_OBJECT_TYPE, the CBR mode makes 1288 use of full allowed bitreservoir. In contrast, 1289 at Low-Delay ::AUDIO_OBJECT_TYPE the 1290 bitreservoir is kept very small. 1291 - 1: Variable bitrate mode, \ref vbrmode 1292 "very low bitrate". 1293 - 2: Variable bitrate mode, \ref vbrmode 1294 "low bitrate". 1295 - 3: Variable bitrate mode, \ref vbrmode 1296 "medium bitrate". 1297 - 4: Variable bitrate mode, \ref vbrmode 1298 "high bitrate". 1299 - 5: Variable bitrate mode, \ref vbrmode 1300 "very high bitrate". */ 1301 1302 AACENC_SAMPLERATE = 0x0103, /*!< Audio input data sampling rate. Encoder 1303 supports following sampling rates: 8000, 11025, 1304 12000, 16000, 22050, 24000, 32000, 44100, 1305 48000, 64000, 88200, 96000 */ 1306 1307 AACENC_SBR_MODE = 0x0104, /*!< Configure SBR independently of the chosen Audio 1308 Object Type ::AUDIO_OBJECT_TYPE. This parameter 1309 is for ELD audio object type only. 1310 - -1: Use ELD SBR auto configurator (default). 1311 - 0: Disable Spectral Band Replication. 1312 - 1: Enable Spectral Band Replication. */ 1313 1314 AACENC_GRANULE_LENGTH = 1315 0x0105, /*!< Core encoder (AAC) audio frame length in samples: 1316 - 1024: Default configuration. 1317 - 512: Default length in LD/ELD configuration. 1318 - 480: Length in LD/ELD configuration. 1319 - 256: Length for ELD reduced delay mode (x2). 1320 - 240: Length for ELD reduced delay mode (x2). 1321 - 128: Length for ELD reduced delay mode (x4). 1322 - 120: Length for ELD reduced delay mode (x4). */ 1323 1324 AACENC_CHANNELMODE = 0x0106, /*!< Set explicit channel mode. Channel mode must 1325 match with number of input channels. 1326 - 1-7, 11,12,14 and 33,34: MPEG channel 1327 modes supported, see ::CHANNEL_MODE in 1328 FDK_audio.h. */ 1329 1330 AACENC_CHANNELORDER = 1331 0x0107, /*!< Input audio data channel ordering scheme: 1332 - 0: MPEG channel ordering (e. g. 5.1: C, L, R, SL, SR, LFE). 1333 (default) 1334 - 1: WAVE file format channel ordering (e. g. 5.1: L, R, C, 1335 LFE, SL, SR). */ 1336 1337 AACENC_SBR_RATIO = 1338 0x0108, /*!< Controls activation of downsampled SBR. With downsampled 1339 SBR, the delay will be shorter. On the other hand, for 1340 achieving the same quality level, downsampled SBR needs more 1341 bits than dual-rate SBR. With downsampled SBR, the AAC encoder 1342 will work at the same sampling rate as the SBR encoder (single 1343 rate). Downsampled SBR is supported for AAC-ELD and HE-AACv1. 1344 - 1: Downsampled SBR (default for ELD). 1345 - 2: Dual-rate SBR (default for HE-AAC). */ 1346 1347 AACENC_AFTERBURNER = 1348 0x0200, /*!< This parameter controls the use of the afterburner feature. 1349 The afterburner is a type of analysis by synthesis algorithm 1350 which increases the audio quality but also the required 1351 processing power. It is recommended to always activate this if 1352 additional memory consumption and processing power consumption 1353 is not a problem. If increased MHz and memory consumption are 1354 an issue then the MHz and memory cost of this optional module 1355 need to be evaluated against the improvement in audio quality 1356 on a case by case basis. 1357 - 0: Disable afterburner (default). 1358 - 1: Enable afterburner. */ 1359 1360 AACENC_BANDWIDTH = 0x0203, /*!< Core encoder audio bandwidth: 1361 - 0: Determine audio bandwidth internally 1362 (default, see chapter \ref BEHAVIOUR_BANDWIDTH). 1363 - 1 to fs/2: Audio bandwidth in Hertz. Limited 1364 to 20kHz max. Not usable if SBR is active. This 1365 setting is for experts only, better do not touch 1366 this value to avoid degraded audio quality. */ 1367 1368 AACENC_PEAK_BITRATE = 1369 0x0207, /*!< Peak bitrate configuration parameter to adjust maximum bits 1370 per audio frame. Bitrate is in bits/second. The peak bitrate 1371 will internally be limited to the chosen bitrate 1372 ::AACENC_BITRATE as lower limit and the 1373 number_of_effective_channels*6144 bit as upper limit. 1374 1375 Setting the peak bitrate equal to ::AACENC_BITRATE does not 1376 necessarily mean that the audio frames will be of constant 1377 size. Since the peak bitate is in bits/second, the frame sizes 1378 can vary by one byte in one or the other direction over various 1379 frames. However, it is not recommended to reduce the peak 1380 pitrate to ::AACENC_BITRATE - it would disable the 1381 bitreservoir, which would affect the audio quality by a large 1382 amount. */ 1383 1384 AACENC_TRANSMUX = 0x0300, /*!< Transport type to be used. See ::TRANSPORT_TYPE 1385 in FDK_audio.h. Following types can be configured 1386 in encoder library: 1387 - 0: raw access units 1388 - 1: ADIF bitstream format 1389 - 2: ADTS bitstream format 1390 - 6: Audio Mux Elements (LATM) with 1391 muxConfigPresent = 1 1392 - 7: Audio Mux Elements (LATM) with 1393 muxConfigPresent = 0, out of band StreamMuxConfig 1394 - 10: Audio Sync Stream (LOAS) */ 1395 1396 AACENC_HEADER_PERIOD = 1397 0x0301, /*!< Frame count period for sending in-band configuration buffers 1398 within LATM/LOAS transport layer. Additionally this parameter 1399 configures the PCE repetition period in raw_data_block(). See 1400 \ref encPCE. 1401 - 0xFF: auto-mode default 10 for TT_MP4_ADTS, TT_MP4_LOAS and 1402 TT_MP4_LATM_MCP1, otherwise 0. 1403 - n: Frame count period. */ 1404 1405 AACENC_SIGNALING_MODE = 1406 0x0302, /*!< Signaling mode of the extension AOT: 1407 - 0: Implicit backward compatible signaling (default for 1408 non-MPEG-4 based AOT's and for the transport formats ADIF and 1409 ADTS) 1410 - A stream that uses implicit signaling can be decoded 1411 by every AAC decoder, even AAC-LC-only decoders 1412 - An AAC-LC-only decoder will only decode the 1413 low-frequency part of the stream, resulting in a band-limited 1414 output 1415 - This method works with all transport formats 1416 - This method does not work with downsampled SBR 1417 - 1: Explicit backward compatible signaling 1418 - A stream that uses explicit backward compatible 1419 signaling can be decoded by every AAC decoder, even AAC-LC-only 1420 decoders 1421 - An AAC-LC-only decoder will only decode the 1422 low-frequency part of the stream, resulting in a band-limited 1423 output 1424 - A decoder not capable of decoding PS will only decode 1425 the AAC-LC+SBR part. If the stream contained PS, the result 1426 will be a a decoded mono downmix 1427 - This method does not work with ADIF or ADTS. For 1428 LOAS/LATM, it only works with AudioMuxVersion==1 1429 - This method does work with downsampled SBR 1430 - 2: Explicit hierarchical signaling (default for MPEG-4 1431 based AOT's and for all transport formats excluding ADIF and 1432 ADTS) 1433 - A stream that uses explicit hierarchical signaling can 1434 be decoded only by HE-AAC decoders 1435 - An AAC-LC-only decoder will not decode a stream that 1436 uses explicit hierarchical signaling 1437 - A decoder not capable of decoding PS will not decode 1438 the stream at all if it contained PS 1439 - This method does not work with ADIF or ADTS. It works 1440 with LOAS/LATM and the MPEG-4 File format 1441 - This method does work with downsampled SBR 1442 1443 For making sure that the listener always experiences the 1444 best audio quality, explicit hierarchical signaling should be 1445 used. This makes sure that only a full HE-AAC-capable decoder 1446 will decode those streams. The audio is played at full 1447 bandwidth. For best backwards compatibility, it is recommended 1448 to encode with implicit SBR signaling. A decoder capable of 1449 AAC-LC only will then only decode the AAC part, which means the 1450 decoded audio will sound band-limited. 1451 1452 For MPEG-2 transport types (ADTS,ADIF), only implicit 1453 signaling is possible. 1454 1455 For LOAS and LATM, explicit backwards compatible signaling 1456 only works together with AudioMuxVersion==1. The reason is 1457 that, for explicit backwards compatible signaling, additional 1458 information will be appended to the ASC. A decoder that is only 1459 capable of decoding AAC-LC will skip this part. Nevertheless, 1460 for jumping to the end of the ASC, it needs to know the ASC 1461 length. Transmitting the length of the ASC is a feature of 1462 AudioMuxVersion==1, it is not possible to transmit the length 1463 of the ASC with AudioMuxVersion==0, therefore an AAC-LC-only 1464 decoder will not be able to parse a LOAS/LATM stream that was 1465 being encoded with AudioMuxVersion==0. 1466 1467 For downsampled SBR, explicit signaling is mandatory. The 1468 reason for this is that the extension sampling frequency (which 1469 is in case of SBR the sampling frequqncy of the SBR part) can 1470 only be signaled in explicit mode. 1471 1472 For AAC-ELD, the SBR information is transmitted in the 1473 ELDSpecific Config, which is part of the AudioSpecificConfig. 1474 Therefore, the settings here will have no effect on AAC-ELD.*/ 1475 1476 AACENC_TPSUBFRAMES = 1477 0x0303, /*!< Number of sub frames in a transport frame for LOAS/LATM or 1478 ADTS (default 1). 1479 - ADTS: Maximum number of sub frames restricted to 4. 1480 - LOAS/LATM: Maximum number of sub frames restricted to 2.*/ 1481 1482 AACENC_AUDIOMUXVER = 1483 0x0304, /*!< AudioMuxVersion to be used for LATM. (AudioMuxVersionA, 1484 currently not implemented): 1485 - 0: Default, no transmission of tara Buffer fullness, no ASC 1486 length and including actual latm Buffer fullnes. 1487 - 1: Transmission of tara Buffer fullness, ASC length and 1488 actual latm Buffer fullness. 1489 - 2: Transmission of tara Buffer fullness, ASC length and 1490 maximum level of latm Buffer fullness. */ 1491 1492 AACENC_PROTECTION = 0x0306, /*!< Configure protection in transport layer: 1493 - 0: No protection. (default) 1494 - 1: CRC active for ADTS transport format. */ 1495 1496 AACENC_ANCILLARY_BITRATE = 1497 0x0500, /*!< Constant ancillary data bitrate in bits/second. 1498 - 0: Either no ancillary data or insert exact number of 1499 bytes, denoted via input parameter, numAncBytes in 1500 AACENC_InArgs. 1501 - else: Insert ancillary data with specified bitrate. */ 1502 1503 AACENC_METADATA_MODE = 0x0600, /*!< Configure Meta Data. See ::AACENC_MetaData 1504 for further details: 1505 - 0: Do not embed any metadata. 1506 - 1: Embed dynamic_range_info metadata. 1507 - 2: Embed dynamic_range_info and 1508 ancillary_data metadata. 1509 - 3: Embed ancillary_data metadata. */ 1510 1511 AACENC_CONTROL_STATE = 1512 0xFF00, /*!< There is an automatic process which internally reconfigures 1513 the encoder instance when a configuration parameter changed or 1514 an error occured. This paramerter allows overwriting or getting 1515 the control status of this process. See ::AACENC_CTRLFLAGS. */ 1516 1517 AACENC_NONE = 0xFFFF /*!< ------ */ 1518 1519 } AACENC_PARAM; 1520 1521 #ifdef __cplusplus 1522 extern "C" { 1523 #endif 1524 1525 /** 1526 * \brief Open an instance of the encoder. 1527 * 1528 * Allocate memory for an encoder instance with a functional range denoted by 1529 * the function parameters. Preinitialize encoder instance with default 1530 * configuration. 1531 * 1532 * \param phAacEncoder A pointer to an encoder handle. Initialized on return. 1533 * \param encModules Specify encoder modules to be supported in this encoder 1534 * instance: 1535 * - 0x0: Allocate memory for all available encoder 1536 * modules. 1537 * - else: Select memory allocation regarding encoder 1538 * modules. Following flags are possible and can be combined. 1539 * - 0x01: AAC module. 1540 * - 0x02: SBR module. 1541 * - 0x04: PS module. 1542 * - 0x08: MPS module. 1543 * - 0x10: Metadata module. 1544 * - example: (0x01|0x02|0x04|0x08|0x10) allocates 1545 * all modules and is equivalent to default configuration denotet by 0x0. 1546 * \param maxChannels Number of channels to be allocated. This parameter can 1547 * be used in different ways: 1548 * - 0: Allocate maximum number of AAC and SBR channels as 1549 * supported by the library. 1550 * - nChannels: Use same maximum number of channels for 1551 * allocating memory in AAC and SBR module. 1552 * - nChannels | (nSbrCh<<8): Number of SBR channels can be 1553 * different to AAC channels to save data memory. 1554 * 1555 * \return 1556 * - AACENC_OK, on succes. 1557 * - AACENC_INVALID_HANDLE, AACENC_MEMORY_ERROR, AACENC_INVALID_CONFIG, 1558 * on failure. 1559 */ 1560 AACENC_ERROR aacEncOpen(HANDLE_AACENCODER *phAacEncoder, const UINT encModules, 1561 const UINT maxChannels); 1562 1563 /** 1564 * \brief Close the encoder instance. 1565 * 1566 * Deallocate encoder instance and free whole memory. 1567 * 1568 * \param phAacEncoder Pointer to the encoder handle to be deallocated. 1569 * 1570 * \return 1571 * - AACENC_OK, on success. 1572 * - AACENC_INVALID_HANDLE, on failure. 1573 */ 1574 AACENC_ERROR aacEncClose(HANDLE_AACENCODER *phAacEncoder); 1575 1576 /** 1577 * \brief Encode audio data. 1578 * 1579 * This function is mainly for encoding audio data. In addition the function can 1580 * be used for an encoder (re)configuration process. 1581 * - PCM input data will be retrieved from external input buffer until the fill 1582 * level allows encoding a single frame. This functionality allows an external 1583 * buffer with reduced size in comparison to the AAC or HE-AAC audio frame 1584 * length. 1585 * - If the value of the input samples argument is zero, just internal 1586 * reinitialization will be applied if it is requested. 1587 * - At the end of a file the flushing process can be triggerd via setting the 1588 * value of the input samples argument to -1. The encoder delay lines are fully 1589 * flushed when the encoder returns no valid bitstream data 1590 * AACENC_OutArgs::numOutBytes. Furthermore the end of file is signaled by the 1591 * return value AACENC_ENCODE_EOF. 1592 * - If an error occured in the previous frame or any of the encoder parameters 1593 * changed, an internal reinitialization process will be applied before encoding 1594 * the incoming audio samples. 1595 * - The function can also be used for an independent reconfiguration process 1596 * without encoding. The first parameter has to be a valid encoder handle and 1597 * all other parameters can be set to NULL. 1598 * - If the size of the external bitbuffer in outBufDesc is not sufficient for 1599 * writing the whole bitstream, an internal error will be the return value and a 1600 * reconfiguration will be triggered. 1601 * 1602 * \param hAacEncoder A valid AAC encoder handle. 1603 * \param inBufDesc Input buffer descriptor, see AACENC_BufDesc: 1604 * - At least one input buffer with audio data is 1605 * expected. 1606 * - Optionally a second input buffer with 1607 * ancillary data can be fed. 1608 * \param outBufDesc Output buffer descriptor, see AACENC_BufDesc: 1609 * - Provide one output buffer for the encoded 1610 * bitstream. 1611 * \param inargs Input arguments, see AACENC_InArgs. 1612 * \param outargs Output arguments, AACENC_OutArgs. 1613 * 1614 * \return 1615 * - AACENC_OK, on success. 1616 * - AACENC_INVALID_HANDLE, AACENC_ENCODE_ERROR, on failure in encoding 1617 * process. 1618 * - AACENC_INVALID_CONFIG, AACENC_INIT_ERROR, AACENC_INIT_AAC_ERROR, 1619 * AACENC_INIT_SBR_ERROR, AACENC_INIT_TP_ERROR, AACENC_INIT_META_ERROR, 1620 * AACENC_INIT_MPS_ERROR, on failure in encoder initialization. 1621 * - AACENC_UNSUPPORTED_PARAMETER, on incorrect input or output buffer 1622 * descriptor initialization. 1623 * - AACENC_ENCODE_EOF, when flushing fully concluded. 1624 */ 1625 AACENC_ERROR aacEncEncode(const HANDLE_AACENCODER hAacEncoder, 1626 const AACENC_BufDesc *inBufDesc, 1627 const AACENC_BufDesc *outBufDesc, 1628 const AACENC_InArgs *inargs, AACENC_OutArgs *outargs); 1629 1630 /** 1631 * \brief Acquire info about present encoder instance. 1632 * 1633 * This function retrieves information of the encoder configuration. In addition 1634 * to informative internal states, a configuration data block of the current 1635 * encoder settings will be returned. The format is either Audio Specific Config 1636 * in case of Raw Packets transport format or StreamMuxConfig in case of 1637 * LOAS/LATM transport format. The configuration data block is binary coded as 1638 * specified in ISO/IEC 14496-3 (MPEG-4 audio), to be used directly for MPEG-4 1639 * File Format or RFC3016 or RFC3640 applications. 1640 * 1641 * \param hAacEncoder A valid AAC encoder handle. 1642 * \param pInfo Pointer to AACENC_InfoStruct. Filled on return. 1643 * 1644 * \return 1645 * - AACENC_OK, on succes. 1646 * - AACENC_INVALID_HANDLE, AACENC_INIT_ERROR, on failure. 1647 */ 1648 AACENC_ERROR aacEncInfo(const HANDLE_AACENCODER hAacEncoder, 1649 AACENC_InfoStruct *pInfo); 1650 1651 /** 1652 * \brief Set one single AAC encoder parameter. 1653 * 1654 * This function allows configuration of all encoder parameters specified in 1655 * ::AACENC_PARAM. Each parameter must be set with a separate function call. An 1656 * internal validation of the configuration value range will be done and an 1657 * internal reconfiguration will be signaled. The actual configuration adoption 1658 * is part of the subsequent aacEncEncode() call. 1659 * 1660 * \param hAacEncoder A valid AAC encoder handle. 1661 * \param param Parameter to be set. See ::AACENC_PARAM. 1662 * \param value Parameter value. See parameter description in 1663 * ::AACENC_PARAM. 1664 * 1665 * \return 1666 * - AACENC_OK, on success. 1667 * - AACENC_INVALID_HANDLE, AACENC_UNSUPPORTED_PARAMETER, 1668 * AACENC_INVALID_CONFIG, on failure. 1669 */ 1670 AACENC_ERROR aacEncoder_SetParam(const HANDLE_AACENCODER hAacEncoder, 1671 const AACENC_PARAM param, const UINT value); 1672 1673 /** 1674 * \brief Get one single AAC encoder parameter. 1675 * 1676 * This function is the complement to aacEncoder_SetParam(). After encoder 1677 * reinitialization with user defined settings, the internal status can be 1678 * obtained of each parameter, specified with ::AACENC_PARAM. 1679 * 1680 * \param hAacEncoder A valid AAC encoder handle. 1681 * \param param Parameter to be returned. See ::AACENC_PARAM. 1682 * 1683 * \return Internal configuration value of specifed parameter ::AACENC_PARAM. 1684 */ 1685 UINT aacEncoder_GetParam(const HANDLE_AACENCODER hAacEncoder, 1686 const AACENC_PARAM param); 1687 1688 /** 1689 * \brief Get information about encoder library build. 1690 * 1691 * Fill a given LIB_INFO structure with library version information. 1692 * 1693 * \param info Pointer to an allocated LIB_INFO struct. 1694 * 1695 * \return 1696 * - AACENC_OK, on success. 1697 * - AACENC_INVALID_HANDLE, AACENC_INIT_ERROR, on failure. 1698 */ 1699 AACENC_ERROR aacEncGetLibInfo(LIB_INFO *info); 1700 1701 #ifdef __cplusplus 1702 } 1703 #endif 1704 1705 #endif /* AACENC_LIB_H */ 1706