• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1page.title=Designing for Performance
2@jd:body
3
4<p>An Android application should be fast. Well, it's probably more accurate to
5say that it should be <em>efficient</em>. That is, it should execute as
6efficiently as possible in the mobile device environment, with its limited
7computing power and data storage, smaller screen, and constrained battery life.
8
9<p>As you develop your application, keep in mind that, while the application may
10perform well enough in your emulator, running on your dual-core development
11computer, it will not perform that well when run a mobile device &mdash; even
12the most powerful mobile device can't match the capabilities of a typical
13desktop system. For that reason, you should strive to write efficient code, to
14ensure the best possible performance on a variety of mobile devices.</p>
15
16<p>Generally speaking, writing fast or efficient code means keeping memory
17allocations to a minimum, writing tight code, and avoiding certain language and
18programming idioms that can subtly cripple performance. In object-oriented
19terms, most of this work takes place at the <em>method</em> level, on the order of
20actual lines of code, loops, and so on.</p>
21
22<p>This document covers these topics: </p>
23<ul>
24    <li><a href="#intro">Introduction</a></li>
25    <li><a href="#object_creation">Avoid Creating Objects</a></li>
26    <li><a href="#native_methods">Use Native Methods</a></li>
27    <li><a href="#prefer_virtual">Prefer Virtual Over Interface</a></li>
28    <li><a href="#prefer_static">Prefer Static Over Virtual</a></li>
29    <li><a href="#internal_get_set">Avoid Internal Getters/Setters</a></li>
30    <li><a href="#cache_fields">Cache Field Lookups</a></li>
31    <li><a href="#use_final">Declare Constants Final</a></li>
32    <li><a href="#foreach">Use Enhanced For Loop Syntax With Caution</a></li>
33    <li><a href="#avoid_enums">Avoid Enums</a></li>
34    <li><a href="#package_inner">Use Package Scope with Inner Classes</a></li>
35    <li><a href="#avoidfloat">Avoid Float</a> </li>
36    <li><a href="#samples">Some Sample Performance Numbers</a> </li>
37    <li><a href="#closing_notes">Closing Notes</a></li>
38</ul>
39
40<a name="intro" id="intro"></a>
41<h2>Introduction</h2>
42
43<p>There are two basic rules for resource-constrained systems:</p>
44
45<ul>
46    <li>Don't do work that you don't need to do.</li>
47    <li>Don't allocate memory if you can avoid it.</li>
48</ul>
49
50<p>All the tips below follow from these two basic tenets.</p>
51
52<p>Some would argue that much of the advice on this page amounts to "premature
53optimization." While it's true that micro-optimizations sometimes make it
54harder to develop efficient data structures and algorithms, on embedded
55devices like handsets you often simply have no choice.  For instance, if you
56bring your assumptions about VM performance on desktop machines to Android,
57you're quite likely to write code that exhausts system memory.  This will bring
58your application to a crawl &mdash; let alone what it will do to other programs
59running on the system!</p>
60
61<p>That's why these guidelines are important.  Android's success depends on
62the user experience that your applications provide, and that user experience
63depends in part on whether your code is responsive and snappy, or slow and
64aggravating.  Since all our applications will run on the same devices, we're
65all in this together, in a way.  Think of this document as like the rules of
66the road you had to learn when you got your driver's license:  things run
67smoothly when everybody follows them, but when you don't, you get your car
68smashed up.</p>
69
70<p>Before we get down to brass tacks, a brief observation: nearly all issues
71described below are valid whether or not the VM features a JIT compiler. If I
72have two methods that accomplish the same thing, and the interpreted execution
73of foo() is faster than bar(), then the compiled version of foo() will
74probably be as fast or faster than compiled bar(). It is unwise to rely on a
75compiler to "save" you and make your code fast enough.</p>
76
77<a name="object_creation"></a>
78<h2>Avoid Creating Objects</h2>
79
80<p>Object creation is never free. A generational GC with per-thread allocation
81pools for temporary objects can make allocation cheaper, but allocating memory
82is always more expensive than not allocating memory.</p>
83
84<p>If you allocate objects in a user interface loop, you will force a periodic
85garbage collection, creating little "hiccups" in the user experience.</p>
86
87<p>Thus, you should avoid creating object instances you don't need to.  Some
88examples of things that can help:</p>
89
90<ul>
91    <li>When extracting strings from a set of input data, try
92    to return a substring of the original data, instead of creating a copy.
93    You will create a new String object, but it will share the char[]
94    with the data.</li>
95    <li>If you have a method returning a string, and you know that its result
96    will always be appended to a StringBuffer anyway, change your signature
97    and implementation so that the function does the append directly,
98    instead of creating a short-lived temporary object.</li>
99</ul>
100
101<p>A somewhat more radical idea is to slice up multidimensional arrays into parallel
102single one-dimension arrays:</p>
103
104<ul>
105    <li>An array of ints is a much better than an array of Integers,
106    but this also generalizes to the fact that two parallel arrays of ints
107    are also a <strong>lot</strong> more efficient than an array of (int,int)
108    objects.  The same goes for any combination of primitive types.</li>
109    <li>If you need to implement a container that stores tuples of (Foo,Bar)
110    objects, try to remember that two parallel Foo[] and Bar[] arrays are
111    generally much better than a single array of custom (Foo,Bar) objects.
112    (The exception to this, of course, is when you're designing an API for
113    other code to access;  in those cases, it's usually better to trade
114    correct API design for a small hit in speed. But in your own internal
115    code, you should try and be as efficient as possible.)</li>
116</ul>
117
118<p>Generally speaking, avoid creating short-term temporary objects if you
119can.  Fewer objects created mean less-frequent garbage collection, which has
120a direct impact on user experience.</p>
121
122<a name="native_methods" id="native_methods"></a>
123<h2>Use Native Methods</h2>
124
125<p>When processing strings, don't hesitate to use specialty methods like
126String.indexOf(), String.lastIndexOf(), and their cousins. These are typically
127implemented in C/C++ code that easily runs 10-100x faster than doing the same
128thing in a Java loop.</p>
129
130<p>The flip side of that advice is that punching through to a native
131method is more expensive than calling an interpreted method. Don't use native
132methods for trivial computation, if you can avoid it.</p>
133
134<a name="prefer_virtual" id="prefer_virtual"></a>
135<h2>Prefer Virtual Over Interface</h2>
136
137<p>Suppose you have a HashMap object.  You can declare it as a HashMap or as
138a generic Map:</p>
139
140<pre>Map myMap1 = new HashMap();
141HashMap myMap2 = new HashMap();</pre>
142
143<p>Which is better?</p>
144
145<p>Conventional wisdom says that you should prefer Map, because it
146allows you to change the underlying implementation to anything that
147implements the Map interface.  Conventional wisdom is correct for
148conventional programming, but isn't so great for embedded systems.  Calling
149through an interface reference can take 2x longer than a virtual
150method call through a concrete reference.</p>
151
152<p>If you have chosen a HashMap because it fits what you're doing, there
153is little value in calling it a Map.  Given the availability of
154IDEs that refactor your code for you, there's not much value in calling
155it a Map even if you're not sure where the code is headed. (Again, though,
156public APIs are an exception:  a good API usually trumps small performance
157concerns.)</p>
158
159<a name="prefer_static" id="prefer_static"></a>
160<h2>Prefer Static Over Virtual</h2>
161
162<p>If you don't need to access an object's fields, make your method static.  It can
163be called faster, because it doesn't require a virtual method table
164indirection.  It's also good practice, because you can tell from the method
165signature that calling the method can't alter the object's state.</p>
166
167<a name="internal_get_set" id="internal_get_set"></a>
168<h2>Avoid Internal Getters/Setters</h2>
169
170<p>In native languages like C++ it's common practice to use getters (e.g.
171<code>i = getCount()</code>) instead of accessing the field directly (<code>i
172= mCount</code>). This is an excellent habit for C++, because the compiler can
173usually inline the access, and if you need to restrict or debug field access
174you can add the code at any time.</p>
175
176<p>On Android, this is a bad idea.  Virtual method calls are expensive,
177much more so than instance field lookups.  It's reasonable to follow
178common object-oriented programming practices and have getters and setters
179in the public interface, but within a class you should always access
180fields directly.</p>
181
182<a name="cache_fields" id="cache_fields"></a>
183<h2>Cache Field Lookups</h2>
184
185<p>Accessing object fields is much slower than accessing local variables.
186Instead of writing:</p>
187<pre>for (int i = 0; i &lt; this.mCount; i++)
188      dumpItem(this.mItems[i]);</pre>
189
190<p>You should write:</p>
191<pre>  int count = this.mCount;
192  Item[] items = this.mItems;
193
194  for (int i = 0; i &lt; count; i++)
195      dumpItems(items[i]);
196</pre>
197
198<p>(We're using an explicit "this" to make it clear that these are
199member variables.)</p>
200
201<p>A similar guideline is never call a method in the second clause of a "for"
202statement.  For example, the following code will execute the getCount() method
203once per iteration, which is a huge waste when you could have simply cached
204the value as an int:</p>
205
206<pre>for (int i = 0; i &lt; this.getCount(); i++)
207    dumpItems(this.getItem(i));
208</pre>
209
210<p>It's also usually a good idea to create a local variable if you're going to be
211accessing an instance field more than once.  For example:</p>
212
213<pre>
214    protected void drawHorizontalScrollBar(Canvas canvas, int width, int height) {
215        if (isHorizontalScrollBarEnabled()) {
216            int size = <strong>mScrollBar</strong>.getSize(<em>false</em>);
217            if (size &lt;= 0) {
218                size = mScrollBarSize;
219            }
220            <strong>mScrollBar</strong>.setBounds(0, <em>height</em> - size, width, height);
221            <strong>mScrollBar</strong>.setParams(
222                    computeHorizontalScrollRange(),
223                    computeHorizontalScrollOffset(),
224                    computeHorizontalScrollExtent(), <em>false</em>);
225            <strong>mScrollBar</strong>.draw(canvas);
226        }
227    }</pre>
228
229<p>That's four separate lookups of the member field <code>mScrollBar</code>.
230By caching mScrollBar in a local stack variable, the four member field lookups
231become four stack variable references, which are much more efficient.</p>
232
233<p>Incidentally, method arguments have the same performance characteristics
234as local variables.</p>
235
236<a name="use_final" id="use_final"></a>
237<h2>Declare Constants Final</h2>
238
239<p>Consider the following declaration at the top of a class:</p>
240
241<pre>static int intVal = 42;
242static String strVal = "Hello, world!";</pre>
243
244<p>The compiler generates a class initializer method, called
245<code>&lt;clinit&gt;</code>, that is executed when the class is first used.
246The method stores the value 42 into <code>intVal</code>, and extracts a
247reference from the classfile string constant table for <code>strVal</code>.
248When these values are referenced later on, they are accessed with field
249lookups.</p>
250
251<p>We can improve matters with the "final" keyword:</p>
252
253<pre>static final int intVal = 42;
254static final String strVal = "Hello, world!";</pre>
255
256<p>The class no longer requires a <code>&lt;clinit&gt;</code> method,
257because the constants go into classfile static field initializers, which are
258handled directly by the VM.  Code accessing <code>intVal</code> will use
259the integer value 42 directly, and accesses to <code>strVal</code> will
260use a relatively inexpensive "string constant" instruction instead of a
261field lookup.</p>
262
263<p>Declaring a method or class "final" does not confer any immediate
264performance benefits, but it does allow certain optimizations. For example, if
265the compiler knows that a "getter" method can't be overridden by a sub-class,
266it can inline the method call.</p>
267
268<p>You can also declare local variables final. However, this has no definitive
269performance benefits. For local variables, only use "final" if it makes the
270code clearer (or you have to, e.g. for use in an anonymous inner class).</p>
271
272<a name="foreach" id="foreach"></a>
273<h2>Use Enhanced For Loop Syntax With Caution</h2>
274
275<p>The enhanced for loop (also sometimes known as "for-each" loop) can be used for collections that implement the Iterable interface.
276With these objects, an iterator is allocated to make interface calls
277to hasNext() and next(). With an ArrayList, you're better off walking through
278it directly, but for other collections the enhanced for loop syntax will be equivalent
279to explicit iterator usage.</p>
280
281<p>Nevertheless, the following code shows an acceptable use of the enhanced for loop:</p>
282
283<pre>public class Foo {
284    int mSplat;
285    static Foo mArray[] = new Foo[27];
286
287    public static void zero() {
288        int sum = 0;
289        for (int i = 0; i &lt; mArray.length; i++) {
290            sum += mArray[i].mSplat;
291        }
292    }
293
294    public static void one() {
295        int sum = 0;
296        Foo[] localArray = mArray;
297        int len = localArray.length;
298
299        for (int i = 0; i &lt; len; i++) {
300            sum += localArray[i].mSplat;
301        }
302    }
303
304    public static void two() {
305        int sum = 0;
306        for (Foo a: mArray) {
307            sum += a.mSplat;
308        }
309    }
310}</pre>
311
312<p><strong>zero()</strong> retrieves the static field twice and gets the array
313length once for every iteration through the loop.</p>
314
315<p><strong>one()</strong> pulls everything out into local variables, avoiding
316the lookups.</p>
317
318<p><strong>two()</strong> uses the enhanced for loop syntax introduced in version 1.5 of
319the Java programming language. The code generated by the compiler takes care
320of copying the array reference and the array length to local variables, making
321it a good choice for walking through all elements of an array. It does
322generate an extra local load/store in the main loop (apparently preserving
323"a"), making it a teensy bit slower and 4 bytes longer than one().</p>
324
325<p>To summarize all that a bit more clearly: enhanced for loop syntax performs well
326with arrays, but be cautious when using it with Iterable objects since there is
327additional object creation.</p>
328
329<a name="avoid_enums" id="avoid_enums"></a>
330<h2>Avoid Enums</h2>
331
332<p>Enums are very convenient, but unfortunately can be painful when size
333and speed matter.  For example, this:</p>
334
335<pre>public class Foo {
336   public enum Shrubbery { GROUND, CRAWLING, HANGING }
337}</pre>
338
339<p>turns into a 900 byte .class file (Foo$Shrubbery.class). On first use, the
340class initializer invokes the &lt;init&gt; method on objects representing each
341of the enumerated values. Each object gets its own static field, and the full
342set is stored in an array (a static field called "$VALUES"). That's a lot of
343code and data, just for three integers.</p>
344
345<p>This:</p>
346
347<pre>Shrubbery shrub = Shrubbery.GROUND;</pre>
348
349<p>causes a static field lookup.  If "GROUND" were a static final int,
350the compiler would treat it as a known constant and inline it.</p>
351
352<p>The flip side, of course, is that with enums you get nicer APIs and
353some compile-time value checking.  So, the usual trade-off applies: you should
354by all means use enums for public APIs, but try to avoid them when performance
355matters.</p>
356
357<p>In some circumstances it can be helpful to get enum integer values
358through the <code>ordinal()</code> method.  For example, replace:</p>
359
360<pre>for (int n = 0; n &lt; list.size(); n++) {
361    if (list.items[n].e == MyEnum.VAL_X)
362       // do stuff 1
363    else if (list.items[n].e == MyEnum.VAL_Y)
364       // do stuff 2
365}</pre>
366
367<p>with:</p>
368
369<pre>   int valX = MyEnum.VAL_X.ordinal();
370   int valY = MyEnum.VAL_Y.ordinal();
371   int count = list.size();
372   MyItem items = list.items();
373
374   for (int  n = 0; n &lt; count; n++)
375   {
376        int  valItem = items[n].e.ordinal();
377
378        if (valItem == valX)
379          // do stuff 1
380        else if (valItem == valY)
381          // do stuff 2
382   }</pre>
383
384<p>In some cases, this will be faster, though this is not guaranteed.</p>
385
386<a name="package_inner" id="package_inner"></a>
387<h2>Use Package Scope with Inner Classes</h2>
388
389<p>Consider the following class definition:</p>
390
391<pre>public class Foo {
392    private int mValue;
393
394    public void run() {
395        Inner in = new Inner();
396        mValue = 27;
397        in.stuff();
398    }
399
400    private void doStuff(int value) {
401        System.out.println("Value is " + value);
402    }
403
404    private class Inner {
405        void stuff() {
406            Foo.this.doStuff(Foo.this.mValue);
407        }
408    }
409}</pre>
410
411<p>The key things to note here are that we define an inner class (Foo$Inner)
412that directly accesses a private method and a private instance field
413in the outer class.  This is legal, and the code prints "Value is 27" as
414expected.</p>
415
416<p>The problem is that Foo$Inner is technically (behind the scenes) a totally
417separate class, which makes direct access to Foo's private
418members illegal.  To bridge that gap, the compiler generates a
419couple of synthetic methods:</p>
420
421<pre>/*package*/ static int Foo.access$100(Foo foo) {
422    return foo.mValue;
423}
424/*package*/ static void Foo.access$200(Foo foo, int value) {
425    foo.doStuff(value);
426}</pre>
427
428<p>The inner-class code calls these static methods whenever it needs to
429access the "mValue" field or invoke the "doStuff" method in the outer
430class. What this means is that the code above really boils down to a case
431where you're accessing member fields through accessor methods instead of
432directly.  Earlier we talked about how accessors are slower than direct field
433accesses, so this is an example of a certain language idiom resulting in an
434"invisible" performance hit.</p>
435
436<p>We can avoid this problem by declaring fields and methods accessed
437by inner classes to have package scope, rather than private scope.
438This runs faster and removes the overhead of the generated methods.
439(Unfortunately it also means the fields could be accessed directly by other
440classes in the same package, which runs counter to the standard OO
441practice of making all fields private. Once again, if you're
442designing a public API you might want to carefully consider using this
443optimization.)</p>
444
445<a name="avoidfloat" id="avoidfloat"></a>
446<h2>Avoid Float</h2>
447
448<p>Before the release of the Pentium CPU, it was common for game authors to do
449as much as possible with integer math. With the Pentium, the floating point
450math co-processor became a built-in feature, and by interleaving integer and
451floating-point operations your game would actually go faster than it would
452with purely integer math. The common practice on desktop systems is to use
453floating point freely.</p>
454
455<p>Unfortunately, embedded processors frequently do not have hardware floating
456point support, so all operations on "float" and "double" are performed in
457software. Some basic floating point operations can take on the order of a
458millisecond to complete.</p>
459
460<p>Also, even for integers, some chips have hardware multiply but lack
461hardware divide. In such cases, integer division and modulus operations are
462performed in software &mdash; something to think about if you're designing a
463hash table or doing lots of math.</p>
464
465<a name="samples" id="samples"></a>
466<h2>Some Sample Performance Numbers</h2>
467
468<p>To illustrate some of our ideas, here is a table listing the approximate
469run times for a few basic actions. Note that these values should NOT be taken
470as absolute numbers: they are a combination of CPU and wall clock time, and
471will change as improvements are made to the system. However, it is worth
472noting how these values apply relative to each other &mdash; for example,
473adding a member variable currently takes about four times as long as adding a
474local variable.</p>
475
476<table>
477    <tr>
478        <th>Action</th>
479        <th>Time</th>
480    </tr>
481    <tr>
482        <td>Add a local variable </td>
483        <td>1</td>
484    </tr>
485    <tr class="alt">
486        <td>Add a member variable </td>
487        <td>4</td>
488    </tr>
489    <tr>
490        <td>Call String.length()</td>
491        <td>5</td>
492    </tr>
493    <tr class="alt">
494        <td>Call empty static native method</td>
495        <td>5</td>
496    </tr>
497    <tr>
498        <td>Call empty static method </td>
499        <td>12</td>
500    </tr>
501    <tr class="alt">
502        <td>Call empty virtual method </td>
503        <td>12.5</td>
504    </tr>
505    <tr>
506        <td>Call empty interface method </td>
507        <td>15</td>
508    </tr>
509    <tr class="alt">
510        <td>Call Iterator:next() on a HashMap </td>
511        <td>165</td>
512    </tr>
513    <tr>
514        <td>Call put() on a HashMap</td>
515        <td>600</td>
516    </tr>
517    <tr class="alt">
518        <td>Inflate 1 View from XML </td>
519        <td>22,000</td>
520    </tr>
521    <tr>
522        <td>Inflate 1 LinearLayout containing 1 TextView </td>
523        <td>25,000</td>
524    </tr>
525    <tr class="alt">
526        <td>Inflate 1 LinearLayout containing 6 View objects </td>
527        <td>100,000</td>
528    </tr>
529    <tr>
530        <td>Inflate 1 LinearLayout containing 6 TextView objects </td>
531        <td>135,000</td>
532    </tr>
533    <tr class="alt">
534        <td>Launch an empty activity </td>
535        <td>3,000,000</td>
536    </tr>
537</table>
538
539<a name="closing_notes" id="closing_notes"></a>
540<h2>Closing Notes</h2>
541
542<p>The best way to write good, efficient code for embedded systems is to
543understand what the code you write really does. If you really want to allocate
544an iterator, by all means use enhanced for loop syntax on a List; just make it a
545deliberate choice, not an inadvertent side effect.</p>
546
547<p>Forewarned is forearmed!  Know what you're getting into!  Insert your
548favorite maxim here, but always think carefully about what your code is doing,
549and be on the lookout for ways to speed it up.</p>
550