• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1page.title=Designing for Performance
2@jd:body
3
4<div id="qv-wrapper">
5<div id="qv">
6
7<h2>In this document</h2>
8<ol>
9  <li><a href="#intro">Introduction</a></li>
10  <li><a href="#optimize_judiciously">Optimize Judiciously</a></li>
11  <li><a href="#object_creation">Avoid Creating Objects</a></li>
12  <li><a href="#myths">Performance Myths</a></li>
13  <li><a href="#prefer_static">Prefer Static Over Virtual</a></li>
14  <li><a href="#internal_get_set">Avoid Internal Getters/Setters</a></li>
15  <li><a href="#use_final">Use Static Final For Constants</a></li>
16  <li><a href="#foreach">Use Enhanced For Loop Syntax</a></li>
17  <li><a href="#avoid_enums">Avoid Enums Where You Only Need Ints</a></li>
18  <li><a href="#package_inner">Use Package Scope with Inner Classes</a></li>
19  <li><a href="#avoidfloat">Use Floating-Point Judiciously</a> </li>
20  <li><a href="#library">Know And Use The Libraries</a></li>
21  <li><a href="#native_methods">Use Native Methods Judiciously</a></li>
22  <li><a href="#closing_notes">Closing Notes</a></li>
23</ol>
24
25</div>
26</div>
27
28<p>An Android application will run on a mobile device with limited computing
29power and storage, and constrained battery life. Because of
30this, it should be <em>efficient</em>. Battery life is one reason you might
31want to optimize your app even if it already seems to run "fast enough".
32Battery life is important to users, and Android's battery usage breakdown
33means users will know if your app is responsible draining their battery.</p>
34
35<p>Note that although this document primarily covers micro-optimizations,
36these will almost never make or break your software. Choosing the right
37algorithms and data structures should always be your priority, but is
38outside the scope of this document.</p>
39
40<a name="intro" id="intro"></a>
41<h2>Introduction</h2>
42
43<p>There are two basic rules for writing efficient code:</p>
44<ul>
45    <li>Don't do work that you don't need to do.</li>
46    <li>Don't allocate memory if you can avoid it.</li>
47</ul>
48
49<h2 id="optimize_judiciously">Optimize Judiciously</h2>
50
51<p>This document is about Android-specific micro-optimization, so it assumes
52that you've already used profiling to work out exactly what code needs to be
53optimized, and that you already have a way to measure the effect (good or bad)
54of any changes you make. You only have so much engineering time to invest, so
55it's important to know you're spending it wisely.
56
57<p>(See <a href="#closing_notes">Closing Notes</a> for more on profiling and
58writing effective benchmarks.)
59
60<p>This document also assumes that you made the best decisions about data
61structures and algorithms, and that you've also considered the future
62performance consequences of your API decisions. Using the right data
63structures and algorithms will make more difference than any of the advice
64here, and considering the performance consequences of your API decisions will
65make it easier to switch to better implementations later (this is more
66important for library code than for application code).
67
68<p>(If you need that kind of advice, see Josh Bloch's <em>Effective Java</em>,
69item 47.)</p>
70
71<p>One of the trickiest problems you'll face when micro-optimizing an Android
72app is that your app is pretty much guaranteed to be running on multiple
73hardware platforms. Different versions of the VM running on different
74processors running at different speeds. It's not even generally the case
75that you can simply say "device X is a factor F faster/slower than device Y",
76and scale your results from one device to others. In particular, measurement
77on the emulator tells you very little about performance on any device. There
78are also huge differences between devices with and without a JIT: the "best"
79code for a device with a JIT is not always the best code for a device
80without.</p>
81
82<p>If you want to know how your app performs on a given device, you need to
83test on that device.</p>
84
85<a name="object_creation"></a>
86<h2>Avoid Creating Objects</h2>
87
88<p>Object creation is never free. A generational GC with per-thread allocation
89pools for temporary objects can make allocation cheaper, but allocating memory
90is always more expensive than not allocating memory.</p>
91
92<p>If you allocate objects in a user interface loop, you will force a periodic
93garbage collection, creating little "hiccups" in the user experience.</p>
94
95<p>Thus, you should avoid creating object instances you don't need to.  Some
96examples of things that can help:</p>
97
98<ul>
99    <li>When extracting strings from a set of input data, try
100    to return a substring of the original data, instead of creating a copy.
101    You will create a new String object, but it will share the char[]
102    with the data.</li>
103    <li>If you have a method returning a string, and you know that its result
104    will always be appended to a StringBuffer anyway, change your signature
105    and implementation so that the function does the append directly,
106    instead of creating a short-lived temporary object.</li>
107</ul>
108
109<p>A somewhat more radical idea is to slice up multidimensional arrays into
110parallel single one-dimension arrays:</p>
111
112<ul>
113    <li>An array of ints is a much better than an array of Integers,
114    but this also generalizes to the fact that two parallel arrays of ints
115    are also a <strong>lot</strong> more efficient than an array of (int,int)
116    objects.  The same goes for any combination of primitive types.</li>
117    <li>If you need to implement a container that stores tuples of (Foo,Bar)
118    objects, try to remember that two parallel Foo[] and Bar[] arrays are
119    generally much better than a single array of custom (Foo,Bar) objects.
120    (The exception to this, of course, is when you're designing an API for
121    other code to access;  in those cases, it's usually better to trade
122    correct API design for a small hit in speed. But in your own internal
123    code, you should try and be as efficient as possible.)</li>
124</ul>
125
126<p>Generally speaking, avoid creating short-term temporary objects if you
127can.  Fewer objects created mean less-frequent garbage collection, which has
128a direct impact on user experience.</p>
129
130<a name="myths" id="myths"></a>
131<h2>Performance Myths</h2>
132
133<p>Previous versions of this document made various misleading claims. We
134address some of them here.</p>
135
136<p>On devices without a JIT, it is true that invoking methods via a
137variable with an exact type rather than an interface is slightly more
138efficient. (So, for example, it was cheaper to invoke methods on a
139<code>HashMap map</code> than a <code>Map map</code>, even though in both
140cases the map was a <code>HashMap</code>.) It was not the case that this
141was 2x slower; the actual difference was more like 6% slower. Furthermore,
142the JIT makes the two effectively indistinguishable.</p>
143
144<p>On devices without a JIT, caching field accesses is about 20% faster than
145repeatedly accesssing the field. With a JIT, field access costs about the same
146as local access, so this isn't a worthwhile optimization unless you feel it
147makes your code easier to read. (This is true of final, static, and static
148final fields too.)
149
150<a name="prefer_static" id="prefer_static"></a>
151<h2>Prefer Static Over Virtual</h2>
152
153<p>If you don't need to access an object's fields, make your method static.
154Invocations will be about 15%-20% faster.
155It's also good practice, because you can tell from the method
156signature that calling the method can't alter the object's state.</p>
157
158<a name="internal_get_set" id="internal_get_set"></a>
159<h2>Avoid Internal Getters/Setters</h2>
160
161<p>In native languages like C++ it's common practice to use getters (e.g.
162<code>i = getCount()</code>) instead of accessing the field directly (<code>i
163= mCount</code>). This is an excellent habit for C++, because the compiler can
164usually inline the access, and if you need to restrict or debug field access
165you can add the code at any time.</p>
166
167<p>On Android, this is a bad idea.  Virtual method calls are expensive,
168much more so than instance field lookups.  It's reasonable to follow
169common object-oriented programming practices and have getters and setters
170in the public interface, but within a class you should always access
171fields directly.</p>
172
173<p>Without a JIT, direct field access is about 3x faster than invoking a
174trivial getter. With the JIT (where direct field access is as cheap as
175accessing a local), direct field access is about 7x faster than invoking a
176trivial getter. This is true in Froyo, but will improve in the future when
177the JIT inlines getter methods.</p>
178
179<a name="use_final" id="use_final"></a>
180<h2>Use Static Final For Constants</h2>
181
182<p>Consider the following declaration at the top of a class:</p>
183
184<pre>static int intVal = 42;
185static String strVal = "Hello, world!";</pre>
186
187<p>The compiler generates a class initializer method, called
188<code>&lt;clinit&gt;</code>, that is executed when the class is first used.
189The method stores the value 42 into <code>intVal</code>, and extracts a
190reference from the classfile string constant table for <code>strVal</code>.
191When these values are referenced later on, they are accessed with field
192lookups.</p>
193
194<p>We can improve matters with the "final" keyword:</p>
195
196<pre>static final int intVal = 42;
197static final String strVal = "Hello, world!";</pre>
198
199<p>The class no longer requires a <code>&lt;clinit&gt;</code> method,
200because the constants go into static field initializers in the dex file.
201Code that refers to <code>intVal</code> will use
202the integer value 42 directly, and accesses to <code>strVal</code> will
203use a relatively inexpensive "string constant" instruction instead of a
204field lookup. (Note that this optimization only applies to primitive types and
205<code>String</code> constants, not arbitrary reference types. Still, it's good
206practice to declare constants <code>static final</code> whenever possible.)</p>
207
208<a name="foreach" id="foreach"></a>
209<h2>Use Enhanced For Loop Syntax</h2>
210
211<p>The enhanced for loop (also sometimes known as "for-each" loop) can be used
212for collections that implement the Iterable interface and for arrays.
213With collections, an iterator is allocated to make interface calls
214to hasNext() and next(). With an ArrayList, a hand-written counted loop is
215about 3x faster (with or without JIT), but for other collections the enhanced
216for loop syntax will be exactly equivalent to explicit iterator usage.</p>
217
218<p>There are several alternatives for iterating through an array:</p>
219
220<pre>    static class Foo {
221        int mSplat;
222    }
223    Foo[] mArray = ...
224
225    public void zero() {
226        int sum = 0;
227        for (int i = 0; i &lt; mArray.length; ++i) {
228            sum += mArray[i].mSplat;
229        }
230    }
231
232    public void one() {
233        int sum = 0;
234        Foo[] localArray = mArray;
235        int len = localArray.length;
236
237        for (int i = 0; i &lt; len; ++i) {
238            sum += localArray[i].mSplat;
239        }
240    }
241
242    public void two() {
243        int sum = 0;
244        for (Foo a : mArray) {
245            sum += a.mSplat;
246        }
247    }
248</pre>
249
250<p><strong>zero()</strong> is slowest, because the JIT can't yet optimize away
251the cost of getting the array length once for every iteration through the
252loop.</p>
253
254<p><strong>one()</strong> is faster. It pulls everything out into local
255variables, avoiding the lookups. Only the array length offers a performance
256benefit.</p>
257
258<p><strong>two()</strong> is fastest for devices without a JIT, and
259indistinguishable from <strong>one()</strong> for devices with a JIT.
260It uses the enhanced for loop syntax introduced in version 1.5 of the Java
261programming language.</p>
262
263<p>To summarize: use the enhanced for loop by default, but consider a
264hand-written counted loop for performance-critical ArrayList iteration.</p>
265
266<p>(See also <em>Effective Java</em> item 46.)</p>
267
268<a name="avoid_enums" id="avoid_enums"></a>
269<h2>Avoid Enums Where You Only Need Ints</h2>
270
271<p>Enums are very convenient, but unfortunately can be painful when size
272and speed matter.  For example, this:</p>
273
274<pre>public enum Shrubbery { GROUND, CRAWLING, HANGING }</pre>
275
276<p>adds 740 bytes to your .dex file compared to the equivalent class
277with three public static final ints. On first use, the
278class initializer invokes the &lt;init&gt; method on objects representing each
279of the enumerated values. Each object gets its own static field, and the full
280set is stored in an array (a static field called "$VALUES"). That's a lot of
281code and data, just for three integers. Additionally, this:</p>
282
283<pre>Shrubbery shrub = Shrubbery.GROUND;</pre>
284
285<p>causes a static field lookup.  If "GROUND" were a static final int,
286the compiler would treat it as a known constant and inline it.</p>
287
288<p>The flip side, of course, is that with enums you get nicer APIs and
289some compile-time value checking.  So, the usual trade-off applies: you should
290by all means use enums for public APIs, but try to avoid them when performance
291matters.</p>
292
293<p>If you're using <code>Enum.ordinal</code>, that's usually a sign that you
294should be using ints instead. As a rule of thumb, if an enum doesn't have a
295constructor and doesn't define its own methods, and it's used in
296performance-critical code, you should consider <code>static final int</code>
297constants instead.</p>
298
299<a name="package_inner" id="package_inner"></a>
300<h2>Use Package Scope with Inner Classes</h2>
301
302<p>Consider the following class definition:</p>
303
304<pre>public class Foo {
305    private int mValue;
306
307    public void run() {
308        Inner in = new Inner();
309        mValue = 27;
310        in.stuff();
311    }
312
313    private void doStuff(int value) {
314        System.out.println("Value is " + value);
315    }
316
317    private class Inner {
318        void stuff() {
319            Foo.this.doStuff(Foo.this.mValue);
320        }
321    }
322}</pre>
323
324<p>The key things to note here are that we define an inner class (Foo$Inner)
325that directly accesses a private method and a private instance field
326in the outer class.  This is legal, and the code prints "Value is 27" as
327expected.</p>
328
329<p>The problem is that the VM considers direct access to Foo's private members
330from Foo$Inner to be illegal because Foo and Foo$Inner are different classes,
331even though the Java language allows an inner class to access an outer class'
332private members. To bridge the gap, the compiler generates a couple of
333synthetic methods:</p>
334
335<pre>/*package*/ static int Foo.access$100(Foo foo) {
336    return foo.mValue;
337}
338/*package*/ static void Foo.access$200(Foo foo, int value) {
339    foo.doStuff(value);
340}</pre>
341
342<p>The inner-class code calls these static methods whenever it needs to
343access the "mValue" field or invoke the "doStuff" method in the outer
344class. What this means is that the code above really boils down to a case
345where you're accessing member fields through accessor methods instead of
346directly.  Earlier we talked about how accessors are slower than direct field
347accesses, so this is an example of a certain language idiom resulting in an
348"invisible" performance hit.</p>
349
350<p>We can avoid this problem by declaring fields and methods accessed
351by inner classes to have package scope, rather than private scope.
352This runs faster and removes the overhead of the generated methods.
353(Unfortunately it also means the fields could be accessed directly by other
354classes in the same package, which runs counter to the standard
355practice of making all fields private. Once again, if you're
356designing a public API you might want to carefully consider using this
357optimization.)</p>
358
359<a name="avoidfloat" id="avoidfloat"></a>
360<h2>Use Floating-Point Judiciously</h2>
361
362<p>As a rule of thumb, floating-point is about 2x slower than integer on
363Android devices. This is true on a FPU-less, JIT-less G1 and a Nexus One with
364an FPU and the JIT. (Of course, absolute speed difference between those two
365devices is about 10x for arithmetic operations.)</p>
366
367<p>In speed terms, there's no difference between <code>float</code> and
368<code>double</code> on the more modern hardware. Space-wise, <code>double</code>
369is 2x larger. As with desktop machines, assuming space isn't an issue, you
370should prefer <code>double</code> to <code>float</code>.</p>
371
372<p>Also, even for integers, some chips have hardware multiply but lack
373hardware divide. In such cases, integer division and modulus operations are
374performed in software &mdash; something to think about if you're designing a
375hash table or doing lots of math.</p>
376
377<a name="library" id="library"></a>
378<h2>Know And Use The Libraries</h2>
379
380<p>In addition to all the usual reasons to prefer library code over rolling
381your own, bear in mind that the system is at liberty to replace calls
382to library methods with hand-coded assembler, which may be better than the
383best code the JIT can produce for the equivalent Java. The typical example
384here is <code>String.indexOf</code> and friends, which Dalvik replaces with
385an inlined intrinsic. Similarly, the <code>System.arraycopy</code> method
386is about 9x faster than a hand-coded loop on a Nexus One with the JIT.</p>
387
388<p>(See also <em>Effective Java</em> item 47.)</p>
389
390<a name="native_methods" id="native_methods"></a>
391<h2>Use Native Methods Judiciously</h2>
392
393<p>Native code isn't necessarily more efficient than Java. For one thing,
394there's a cost associated with the Java-native transition, and the JIT can't
395optimize across these boundaries. If you're allocating native resources (memory
396on the native heap, file descriptors, or whatever), it can be significantly
397more difficult to arrange timely collection of these resources. You also
398need to compile your code for each architecture you wish to run on (rather
399than rely on it having a JIT). You may even have to compile multiple versions
400for what you consider the same architecture: native code compiled for the ARM
401processor in the G1 can't take full advantage of the ARM in the Nexus One, and
402code compiled for the ARM in the Nexus One won't run on the ARM in the G1.</p>
403
404<p>Native code is primarily useful when you have an existing native codebase
405that you want to port to Android, not for "speeding up" parts of a Java app.</p>
406
407<p>(See also <em>Effective Java</em> item 54.)</p>
408
409<a name="closing_notes" id="closing_notes"></a>
410<h2>Closing Notes</h2>
411
412<p>One last thing: always measure. Before you start optimizing, make sure you
413have a problem. Make sure you can accurately measure your existing performance,
414or you won't be able to measure the benefit of the alternatives you try.</p>
415
416<p>Every claim made in this document is backed up by a benchmark. The source
417to these benchmarks can be found in the <a href="http://code.google.com/p/dalvik/source/browse/#svn/trunk/benchmarks">code.google.com "dalvik" project</a>.</p>
418
419<p>The benchmarks are built with the
420<a href="http://code.google.com/p/caliper/">Caliper</a> microbenchmarking
421framework for Java. Microbenchmarks are hard to get right, so Caliper goes out
422of its way to do the hard work for you, and even detect some cases where you're
423not measuring what you think you're measuring (because, say, the VM has
424managed to optimize all your code away). We highly recommend you use Caliper
425to run your own microbenchmarks.</p>
426
427<p>You may also find
428<a href="{@docRoot}guide/developing/tools/traceview.html">Traceview</a> useful
429for profiling, but it's important to realize that it currently disables the JIT,
430which may cause it to misattribute time to code that the JIT may be able to win
431back. It's especially important after making changes suggested by Traceview
432data to ensure that the resulting code actually runs faster when run without
433Traceview.
434