• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1page.title=Designing for Performance
2@jd:body
3
4<div id="qv-wrapper">
5<div id="qv">
6
7<h2>In this document</h2>
8<ol>
9  <li><a href="#intro">Introduction</a></li>
10  <li><a href="#optimize_judiciously">Optimize Judiciously</a></li>
11  <li><a href="#object_creation">Avoid Creating Unnecessary Objects</a></li>
12  <li><a href="#myths">Performance Myths</a></li>
13  <li><a href="#prefer_static">Prefer Static Over Virtual</a></li>
14  <li><a href="#internal_get_set">Avoid Internal Getters/Setters</a></li>
15  <li><a href="#use_final">Use Static Final For Constants</a></li>
16  <li><a href="#foreach">Use Enhanced For Loop Syntax</a></li>
17  <li><a href="#package_inner">Consider Package Instead of Private Access with Inner Classes</a></li>
18  <li><a href="#avoidfloat">Use Floating-Point Judiciously</a> </li>
19  <li><a href="#library">Know And Use The Libraries</a></li>
20  <li><a href="#native_methods">Use Native Methods Judiciously</a></li>
21  <li><a href="#closing_notes">Closing Notes</a></li>
22</ol>
23
24</div>
25</div>
26
27<p>An Android application will run on a mobile device with limited computing
28power and storage, and constrained battery life. Because of
29this, it should be <em>efficient</em>. Battery life is one reason you might
30want to optimize your app even if it already seems to run "fast enough".
31Battery life is important to users, and Android's battery usage breakdown
32means users will know if your app is responsible draining their battery.</p>
33
34<p>Note that although this document primarily covers micro-optimizations,
35these will almost never make or break your software. Choosing the right
36algorithms and data structures should always be your priority, but is
37outside the scope of this document.</p>
38
39<a name="intro" id="intro"></a>
40<h2>Introduction</h2>
41
42<p>There are two basic rules for writing efficient code:</p>
43<ul>
44    <li>Don't do work that you don't need to do.</li>
45    <li>Don't allocate memory if you can avoid it.</li>
46</ul>
47
48<h2 id="optimize_judiciously">Optimize Judiciously</h2>
49
50<p>This document is about Android-specific micro-optimization, so it assumes
51that you've already used profiling to work out exactly what code needs to be
52optimized, and that you already have a way to measure the effect (good or bad)
53of any changes you make. You only have so much engineering time to invest, so
54it's important to know you're spending it wisely.
55
56<p>(See <a href="#closing_notes">Closing Notes</a> for more on profiling and
57writing effective benchmarks.)
58
59<p>This document also assumes that you made the best decisions about data
60structures and algorithms, and that you've also considered the future
61performance consequences of your API decisions. Using the right data
62structures and algorithms will make more difference than any of the advice
63here, and considering the performance consequences of your API decisions will
64make it easier to switch to better implementations later (this is more
65important for library code than for application code).
66
67<p>(If you need that kind of advice, see Josh Bloch's <em>Effective Java</em>,
68item 47.)</p>
69
70<p>One of the trickiest problems you'll face when micro-optimizing an Android
71app is that your app is pretty much guaranteed to be running on multiple
72hardware platforms. Different versions of the VM running on different
73processors running at different speeds. It's not even generally the case
74that you can simply say "device X is a factor F faster/slower than device Y",
75and scale your results from one device to others. In particular, measurement
76on the emulator tells you very little about performance on any device. There
77are also huge differences between devices with and without a JIT: the "best"
78code for a device with a JIT is not always the best code for a device
79without.</p>
80
81<p>If you want to know how your app performs on a given device, you need to
82test on that device.</p>
83
84<a name="object_creation"></a>
85<h2>Avoid Creating Unnecessary Objects</h2>
86
87<p>Object creation is never free. A generational GC with per-thread allocation
88pools for temporary objects can make allocation cheaper, but allocating memory
89is always more expensive than not allocating memory.</p>
90
91<p>If you allocate objects in a user interface loop, you will force a periodic
92garbage collection, creating little "hiccups" in the user experience. The
93concurrent collector introduced in Gingerbread helps, but unnecessary work
94should always be avoided.</p>
95
96<p>Thus, you should avoid creating object instances you don't need to.  Some
97examples of things that can help:</p>
98
99<ul>
100    <li>If you have a method returning a string, and you know that its result
101    will always be appended to a StringBuffer anyway, change your signature
102    and implementation so that the function does the append directly,
103    instead of creating a short-lived temporary object.</li>
104    <li>When extracting strings from a set of input data, try
105    to return a substring of the original data, instead of creating a copy.
106    You will create a new String object, but it will share the char[]
107    with the data. (The trade-off being that if you're only using a small
108    part of the original input, you'll be keeping it all around in memory
109    anyway if you go this route.)</li>
110</ul>
111
112<p>A somewhat more radical idea is to slice up multidimensional arrays into
113parallel single one-dimension arrays:</p>
114
115<ul>
116    <li>An array of ints is a much better than an array of Integers,
117    but this also generalizes to the fact that two parallel arrays of ints
118    are also a <strong>lot</strong> more efficient than an array of (int,int)
119    objects.  The same goes for any combination of primitive types.</li>
120    <li>If you need to implement a container that stores tuples of (Foo,Bar)
121    objects, try to remember that two parallel Foo[] and Bar[] arrays are
122    generally much better than a single array of custom (Foo,Bar) objects.
123    (The exception to this, of course, is when you're designing an API for
124    other code to access;  in those cases, it's usually better to trade
125    good API design for a small hit in speed. But in your own internal
126    code, you should try and be as efficient as possible.)</li>
127</ul>
128
129<p>Generally speaking, avoid creating short-term temporary objects if you
130can.  Fewer objects created mean less-frequent garbage collection, which has
131a direct impact on user experience.</p>
132
133<a name="avoid_enums" id="avoid_enums"></a>
134<a name="myths" id="myths"></a>
135<h2>Performance Myths</h2>
136
137<p>Previous versions of this document made various misleading claims. We
138address some of them here.</p>
139
140<p>On devices without a JIT, it is true that invoking methods via a
141variable with an exact type rather than an interface is slightly more
142efficient. (So, for example, it was cheaper to invoke methods on a
143<code>HashMap map</code> than a <code>Map map</code>, even though in both
144cases the map was a <code>HashMap</code>.) It was not the case that this
145was 2x slower; the actual difference was more like 6% slower. Furthermore,
146the JIT makes the two effectively indistinguishable.</p>
147
148<p>On devices without a JIT, caching field accesses is about 20% faster than
149repeatedly accesssing the field. With a JIT, field access costs about the same
150as local access, so this isn't a worthwhile optimization unless you feel it
151makes your code easier to read. (This is true of final, static, and static
152final fields too.)
153
154<a name="prefer_static" id="prefer_static"></a>
155<h2>Prefer Static Over Virtual</h2>
156
157<p>If you don't need to access an object's fields, make your method static.
158Invocations will be about 15%-20% faster.
159It's also good practice, because you can tell from the method
160signature that calling the method can't alter the object's state.</p>
161
162<a name="internal_get_set" id="internal_get_set"></a>
163<h2>Avoid Internal Getters/Setters</h2>
164
165<p>In native languages like C++ it's common practice to use getters (e.g.
166<code>i = getCount()</code>) instead of accessing the field directly (<code>i
167= mCount</code>). This is an excellent habit for C++, because the compiler can
168usually inline the access, and if you need to restrict or debug field access
169you can add the code at any time.</p>
170
171<p>On Android, this is a bad idea.  Virtual method calls are expensive,
172much more so than instance field lookups.  It's reasonable to follow
173common object-oriented programming practices and have getters and setters
174in the public interface, but within a class you should always access
175fields directly.</p>
176
177<p>Without a JIT, direct field access is about 3x faster than invoking a
178trivial getter. With the JIT (where direct field access is as cheap as
179accessing a local), direct field access is about 7x faster than invoking a
180trivial getter. This is true in Froyo, but will improve in the future when
181the JIT inlines getter methods.</p>
182
183<p>Note that if you're using ProGuard, you can have the best
184of both worlds because ProGuard can inline accessors for you.</p>
185
186<a name="use_final" id="use_final"></a>
187<h2>Use Static Final For Constants</h2>
188
189<p>Consider the following declaration at the top of a class:</p>
190
191<pre>static int intVal = 42;
192static String strVal = "Hello, world!";</pre>
193
194<p>The compiler generates a class initializer method, called
195<code>&lt;clinit&gt;</code>, that is executed when the class is first used.
196The method stores the value 42 into <code>intVal</code>, and extracts a
197reference from the classfile string constant table for <code>strVal</code>.
198When these values are referenced later on, they are accessed with field
199lookups.</p>
200
201<p>We can improve matters with the "final" keyword:</p>
202
203<pre>static final int intVal = 42;
204static final String strVal = "Hello, world!";</pre>
205
206<p>The class no longer requires a <code>&lt;clinit&gt;</code> method,
207because the constants go into static field initializers in the dex file.
208Code that refers to <code>intVal</code> will use
209the integer value 42 directly, and accesses to <code>strVal</code> will
210use a relatively inexpensive "string constant" instruction instead of a
211field lookup. (Note that this optimization only applies to primitive types and
212<code>String</code> constants, not arbitrary reference types. Still, it's good
213practice to declare constants <code>static final</code> whenever possible.)</p>
214
215<a name="foreach" id="foreach"></a>
216<h2>Use Enhanced For Loop Syntax</h2>
217
218<p>The enhanced for loop (also sometimes known as "for-each" loop) can be used
219for collections that implement the Iterable interface and for arrays.
220With collections, an iterator is allocated to make interface calls
221to hasNext() and next(). With an ArrayList, a hand-written counted loop is
222about 3x faster (with or without JIT), but for other collections the enhanced
223for loop syntax will be exactly equivalent to explicit iterator usage.</p>
224
225<p>There are several alternatives for iterating through an array:</p>
226
227<pre>    static class Foo {
228        int mSplat;
229    }
230    Foo[] mArray = ...
231
232    public void zero() {
233        int sum = 0;
234        for (int i = 0; i &lt; mArray.length; ++i) {
235            sum += mArray[i].mSplat;
236        }
237    }
238
239    public void one() {
240        int sum = 0;
241        Foo[] localArray = mArray;
242        int len = localArray.length;
243
244        for (int i = 0; i &lt; len; ++i) {
245            sum += localArray[i].mSplat;
246        }
247    }
248
249    public void two() {
250        int sum = 0;
251        for (Foo a : mArray) {
252            sum += a.mSplat;
253        }
254    }
255</pre>
256
257<p><strong>zero()</strong> is slowest, because the JIT can't yet optimize away
258the cost of getting the array length once for every iteration through the
259loop.</p>
260
261<p><strong>one()</strong> is faster. It pulls everything out into local
262variables, avoiding the lookups. Only the array length offers a performance
263benefit.</p>
264
265<p><strong>two()</strong> is fastest for devices without a JIT, and
266indistinguishable from <strong>one()</strong> for devices with a JIT.
267It uses the enhanced for loop syntax introduced in version 1.5 of the Java
268programming language.</p>
269
270<p>To summarize: use the enhanced for loop by default, but consider a
271hand-written counted loop for performance-critical ArrayList iteration.</p>
272
273<p>(See also <em>Effective Java</em> item 46.)</p>
274
275<a name="package_inner" id="package_inner"></a>
276<h2>Consider Package Instead of Private Access with Private Inner Classes</h2>
277
278<p>Consider the following class definition:</p>
279
280<pre>public class Foo {
281    private class Inner {
282        void stuff() {
283            Foo.this.doStuff(Foo.this.mValue);
284        }
285    }
286
287    private int mValue;
288
289    public void run() {
290        Inner in = new Inner();
291        mValue = 27;
292        in.stuff();
293    }
294
295    private void doStuff(int value) {
296        System.out.println("Value is " + value);
297    }
298}</pre>
299
300<p>The key things to note here are that we define a private inner class
301(<code>Foo$Inner</code>) that directly accesses a private method and a private
302instance field in the outer class. This is legal, and the code prints "Value is
30327" as expected.</p>
304
305<p>The problem is that the VM considers direct access to <code>Foo</code>'s
306private members from <code>Foo$Inner</code> to be illegal because
307<code>Foo</code> and <code>Foo$Inner</code> are different classes, even though
308the Java language allows an inner class to access an outer class' private
309members. To bridge the gap, the compiler generates a couple of synthetic
310methods:</p>
311
312<pre>/*package*/ static int Foo.access$100(Foo foo) {
313    return foo.mValue;
314}
315/*package*/ static void Foo.access$200(Foo foo, int value) {
316    foo.doStuff(value);
317}</pre>
318
319<p>The inner class code calls these static methods whenever it needs to
320access the <code>mValue</code> field or invoke the <code>doStuff</code> method
321in the outer class. What this means is that the code above really boils down to
322a case where you're accessing member fields through accessor methods.
323Earlier we talked about how accessors are slower than direct field
324accesses, so this is an example of a certain language idiom resulting in an
325"invisible" performance hit.</p>
326
327<p>If you're using code like this in a performance hotspot, you can avoid the
328overhead by declaring fields and methods accessed by inner classes to have
329package access, rather than private access. Unfortunately this means the fields
330can be accessed directly by other classes in the same package, so you shouldn't
331use this in public API.</p>
332
333<a name="avoidfloat" id="avoidfloat"></a>
334<h2>Use Floating-Point Judiciously</h2>
335
336<p>As a rule of thumb, floating-point is about 2x slower than integer on
337Android devices. This is true on a FPU-less, JIT-less G1 and a Nexus One with
338an FPU and the JIT. (Of course, absolute speed difference between those two
339devices is about 10x for arithmetic operations.)</p>
340
341<p>In speed terms, there's no difference between <code>float</code> and
342<code>double</code> on the more modern hardware. Space-wise, <code>double</code>
343is 2x larger. As with desktop machines, assuming space isn't an issue, you
344should prefer <code>double</code> to <code>float</code>.</p>
345
346<p>Also, even for integers, some chips have hardware multiply but lack
347hardware divide. In such cases, integer division and modulus operations are
348performed in software &mdash; something to think about if you're designing a
349hash table or doing lots of math.</p>
350
351<a name="library" id="library"></a>
352<h2>Know And Use The Libraries</h2>
353
354<p>In addition to all the usual reasons to prefer library code over rolling
355your own, bear in mind that the system is at liberty to replace calls
356to library methods with hand-coded assembler, which may be better than the
357best code the JIT can produce for the equivalent Java. The typical example
358here is <code>String.indexOf</code> and friends, which Dalvik replaces with
359an inlined intrinsic. Similarly, the <code>System.arraycopy</code> method
360is about 9x faster than a hand-coded loop on a Nexus One with the JIT.</p>
361
362<p>(See also <em>Effective Java</em> item 47.)</p>
363
364<a name="native_methods" id="native_methods"></a>
365<h2>Use Native Methods Judiciously</h2>
366
367<p>Native code isn't necessarily more efficient than Java. For one thing,
368there's a cost associated with the Java-native transition, and the JIT can't
369optimize across these boundaries. If you're allocating native resources (memory
370on the native heap, file descriptors, or whatever), it can be significantly
371more difficult to arrange timely collection of these resources. You also
372need to compile your code for each architecture you wish to run on (rather
373than rely on it having a JIT). You may even have to compile multiple versions
374for what you consider the same architecture: native code compiled for the ARM
375processor in the G1 can't take full advantage of the ARM in the Nexus One, and
376code compiled for the ARM in the Nexus One won't run on the ARM in the G1.</p>
377
378<p>Native code is primarily useful when you have an existing native codebase
379that you want to port to Android, not for "speeding up" parts of a Java app.</p>
380
381<p>If you do need to use native code, you should read our
382<a href="{@docRoot}guide/practices/jni.html">JNI Tips</a>.</p>
383
384<p>(See also <em>Effective Java</em> item 54.)</p>
385
386<a name="closing_notes" id="closing_notes"></a>
387<h2>Closing Notes</h2>
388
389<p>One last thing: always measure. Before you start optimizing, make sure you
390have a problem. Make sure you can accurately measure your existing performance,
391or you won't be able to measure the benefit of the alternatives you try.</p>
392
393<p>Every claim made in this document is backed up by a benchmark. The source
394to these benchmarks can be found in the <a href="http://code.google.com/p/dalvik/source/browse/#svn/trunk/benchmarks">code.google.com "dalvik" project</a>.</p>
395
396<p>The benchmarks are built with the
397<a href="http://code.google.com/p/caliper/">Caliper</a> microbenchmarking
398framework for Java. Microbenchmarks are hard to get right, so Caliper goes out
399of its way to do the hard work for you, and even detect some cases where you're
400not measuring what you think you're measuring (because, say, the VM has
401managed to optimize all your code away). We highly recommend you use Caliper
402to run your own microbenchmarks.</p>
403
404<p>You may also find
405<a href="{@docRoot}tools/debugging/debugging-tracing.html">Traceview</a> useful
406for profiling, but it's important to realize that it currently disables the JIT,
407which may cause it to misattribute time to code that the JIT may be able to win
408back. It's especially important after making changes suggested by Traceview
409data to ensure that the resulting code actually runs faster when run without
410Traceview.
411