page.title=Audio Latency @jd:body

In this document

Audio latency is the time delay as an audio signal passes through a system. For a complete description of audio latency for the purposes of Android compatibility, see Section 5.4 Audio Latency in the Android CDD.

Contributors to Latency

This section focuses on the contributors to output latency, but a similar discussion applies to input latency.

Assuming that the analog circuitry does not contribute significantly. Then the major surface-level contributors to audio latency are the following:

Application
Total number of buffers in pipeline
Size of each buffer, in frames
Additional latency after the app processor, such as from a DSP

As accurate as the above list of contributors may be, it is also misleading. The reason is that buffer count and buffer size are more of an effect than a cause. What usually happens is that a given buffer scheme is implemented and tested, but during testing, an audio underrun is heard as a "click" or "pop". To compensate, the system designer then increases buffer sizes or buffer counts. This has the desired result of eliminating the underruns, but it also has the undesired side effect of increasing latency.

A better approach is to understand the underlying causes of the underruns and then correct those. This eliminates the audible artifacts and may even permit even smaller or fewer buffers and thus reduce latency.

In our experience, the most common causes of underruns include:

Linux CFS (Completely Fair Scheduler)
high-priority threads with SCHED_FIFO scheduling
long scheduling latency
long-running interrupt handlers
long interrupt disable time

Linux CFS and SCHED_FIFO scheduling

The Linux CFS is designed to be fair to competing workloads sharing a common CPU resource. This fairness is represented by a per-thread nice parameter. The nice value ranges from -19 (least nice, or most CPU time allocated) to 20 (nicest, or least CPU time allocated). In general, all threads with a given nice value receive approximately equal CPU time and threads with a numerically lower nice value should expect to receive more CPU time. However, CFS is "fair" only over relatively long periods of observation. Over short-term observation windows, CFS may allocate the CPU resource in unexpected ways. For example, it may take the CPU away from a thread with numerically low niceness onto a thread with a numerically high niceness. In the case of audio, this can result in an underrun.

The obvious solution is to avoid CFS for high-performance audio threads. Beginning with Android 4.1 (Jelly Bean), such threads now use the SCHED_FIFO scheduling policy rather than the SCHED_NORMAL (also called SCHED_OTHER) scheduling policy implemented by CFS.

Though the high-performance audio threads now use SCHED_FIFO, they are still susceptible to other higher priority SCHED_FIFO threads. These are typically kernel worker threads, but there may also be a few non-audio user threads with policy SCHED_FIFO. The available SCHED_FIFO priorities range from 1 to 99. The audio threads run at priority 2 or 3. This leaves priority 1 available for lower priority threads, and priorities 4 to 99 for higher priority threads. We recommend that you use priority 1 whenever possible, and reserve priorities 4 to 99 for those threads that are guaranteed to complete within a bounded amount of time, and are known to not interfere with scheduling of audio threads.

Scheduling latency

Scheduling latency is the time between when a thread becomes ready to run, and when the resulting context switch completes so that the thread actually runs on a CPU. The shorter the latency the better and anything over two milliseconds causes problems for audio. Long scheduling latency is most likely to occur during mode transitions, such as bringing up or shutting down a CPU, switching between a security kernel and the normal kernel, switching from full power to low-power mode, or adjusting the CPU clock frequency and voltage.

Interrupts

In many designs, CPU 0 services all external interrupts. So a long-running interrupt handler may delay other interrupts, in particular audio DMA completion interrupts. Design interrupt handlers to finish quickly and defer any lengthy work to a thread (preferably a CFS thread or SCHED_FIFO thread of priority 1).

Equivalently, disabling interrupts on CPU 0 for a long period has the same result of delaying the servicing of audio interrupts. Long interrupt disable times typically happen while waiting for a kernel spin lock. Review these spin locks to ensure that they are bounded.

Measuring Output Latency

There are several techniques available to measure output latency, with varying degrees of accuracy and ease of running.

LED and oscilloscope test

This test measures latency in relation to the device's LED indicator. If your production device does not have an LED, you can install the LED on a prototype form factor device. For even better accuracy on prototype devices with exposed circuity, connect one oscilloscope probe to the LED directly to bypass the light sensor latency.

If you cannot install an LED on either your production or prototype device, try the following workarounds:

Use a General Purpose Input/Output (GPIO) pin for the same purpose
Use JTAG or another debugging port
Use the screen backlight. This might be risky as the backlight may have a non-neglible latency, and can contribute to an inaccurate latency reading.

To conduct this test:

Run an app that periodically pulses the LED at the same time it outputs audio.
Note: To get useful results, it is crucial to use the correct APIs in the test app so that you're exercising the fast audio output path. See the separate document "Application developer guidelines for reduced audio latency".
Place a light sensor next to the LED.
Connect the probes of a dual-channel oscilloscope to both the wired headphone jack (line output) and light sensor.
Use the oscilloscope to measure the time difference between observing the line output signal versus the light sensor signal.

The difference in time is the approximate audio output latency, assuming that the LED latency and light sensor latency are both zero. Typically, the LED and light sensor each have a relatively low latency on the order of 1 millisecond or less, which is sufficiently low enough to ignore.

Larsen test

One of the easiest latency tests is an audio feedback (Larsen effect) test. This provides a crude measure of combined output and input latency by timing an impulse response loop. This test is not very useful by itself because of the nature of the test, but

To conduct this test:

Run an app that captures audio from the microphone and immediately plays the captured data back over the speaker.
Create a sound externally, such as tapping a pencil by the microphone. This noise generates a feedback loop.
Measure the time between feedback pulses to get the sum of the output latency, input latency, and application overhead.

This method does not break down the component times, which is important when the output latency and input latency are independent, so this method is not recommended for measuring output latency, but might be useful to help measure output latency.

Measuring Input Latency

Input latency is more difficult to measure than output latency. The following tests might help.

One approach is to first determine the output latency using the LED and oscilloscope method and then use the audio feedback (Larsen) test to determine the sum of output latency and input latency. The difference between these two measurements is the input latency.

Another technique is to use a GPIO pin on a prototype device. Externally, pulse a GPIO input at the same time that you present an audio signal to the device. Run an app that compares the difference in arrival times of the GPIO signal and audio data.

Reducing Latency

To achieve low audio latency, pay special attention throughout the system to scheduling, interrupt handling, power management, and device driver design. Your goal is to prevent any part of the platform from blocking a SCHED_FIFO audio thread for more than a couple of milliseconds. By adopting such a systematic approach, you can reduce audio latency and get the side benefit of more predictable performance overall.

Audio underruns, when they do occur, are often detectable only under certain conditions or only at the transitions. Try stressing the system by launching new apps and scrolling quickly through various displays. But be aware that some test conditions are so stressful as to be beyond the design goals. For example, taking a bugreport puts such enormous load on the system that it may be acceptable to have an underrun in that case.

When testing for underruns:

Configure any DSP after the app processor so that it adds minimal latency
Run tests under different conditions such as having the screen on or off, USB plugged in or unplugged, WiFi on or off, Bluetooth on or off, and telephony and data radios on or off.
Select relatively quiet music that you're very familiar with, and which is easy to hear underruns in.
Use wired headphones for extra sensitivity.
Give yourself breaks so that you don't experience "ear fatigue".

Once you find the underlying causes of underruns, reduce the buffer counts and sizes to take advantage of this. The eager approach of reducing buffer counts and sizes before analyzing underruns and fixing the causes of underruns only results in frustration.

Tools

systrace is an excellent general-purpose tool for diagnosing system-level performance glitches.

The output of dumpsys media.audio_flinger also contains a useful section called "simple moving statistics". This has a summary of the variability of elapsed times for each audio mix and I/O cycle. Ideally, all the time measurements should be about equal to the mean or nominal cycle time. If you see a very low minimum or high maximum, this is an indication of a problem, which is probably a high scheduling latency or interrupt disable time. The tail part of the output is especially helpful, as it highlights the variability beyond +/- 3 standard deviations.