1page.title=Designing for Performance 2@jd:body 3 4<p>An Android application should be fast. Well, it's probably more accurate to 5say that it should be <em>efficient</em>. That is, it should execute as 6efficiently as possible in the mobile device environment, with its limited 7computing power and data storage, smaller screen, and constrained battery life. 8 9<p>As you develop your application, keep in mind that, while the application may 10perform well enough in your emulator, running on your dual-core development 11computer, it will not perform that well when run a mobile device — even 12the most powerful mobile device can't match the capabilities of a typical 13desktop system. For that reason, you should strive to write efficient code, to 14ensure the best possible performance on a variety of mobile devices.</p> 15 16<p>Generally speaking, writing fast or efficient code means keeping memory 17allocations to a minimum, writing tight code, and avoiding certain language and 18programming idioms that can subtly cripple performance. In object-oriented 19terms, most of this work takes place at the <em>method</em> level, on the order of 20actual lines of code, loops, and so on.</p> 21 22<p>This document covers these topics: </p> 23<ul> 24 <li><a href="#intro">Introduction</a></li> 25 <li><a href="#object_creation">Avoid Creating Objects</a></li> 26 <li><a href="#native_methods">Use Native Methods</a></li> 27 <li><a href="#prefer_virtual">Prefer Virtual Over Interface</a></li> 28 <li><a href="#prefer_static">Prefer Static Over Virtual</a></li> 29 <li><a href="#internal_get_set">Avoid Internal Getters/Setters</a></li> 30 <li><a href="#cache_fields">Cache Field Lookups</a></li> 31 <li><a href="#use_final">Declare Constants Final</a></li> 32 <li><a href="#foreach">Use Enhanced For Loop Syntax With Caution</a></li> 33 <li><a href="#avoid_enums">Avoid Enums</a></li> 34 <li><a href="#package_inner">Use Package Scope with Inner Classes</a></li> 35 <li><a href="#avoidfloat">Avoid Float</a> </li> 36 <li><a href="#samples">Some Sample Performance Numbers</a> </li> 37 <li><a href="#closing_notes">Closing Notes</a></li> 38</ul> 39 40<a name="intro" id="intro"></a> 41<h2>Introduction</h2> 42 43<p>There are two basic rules for resource-constrained systems:</p> 44 45<ul> 46 <li>Don't do work that you don't need to do.</li> 47 <li>Don't allocate memory if you can avoid it.</li> 48</ul> 49 50<p>All the tips below follow from these two basic tenets.</p> 51 52<p>Some would argue that much of the advice on this page amounts to "premature 53optimization." While it's true that micro-optimizations sometimes make it 54harder to develop efficient data structures and algorithms, on embedded 55devices like handsets you often simply have no choice. For instance, if you 56bring your assumptions about VM performance on desktop machines to Android, 57you're quite likely to write code that exhausts system memory. This will bring 58your application to a crawl — let alone what it will do to other programs 59running on the system!</p> 60 61<p>That's why these guidelines are important. Android's success depends on 62the user experience that your applications provide, and that user experience 63depends in part on whether your code is responsive and snappy, or slow and 64aggravating. Since all our applications will run on the same devices, we're 65all in this together, in a way. Think of this document as like the rules of 66the road you had to learn when you got your driver's license: things run 67smoothly when everybody follows them, but when you don't, you get your car 68smashed up.</p> 69 70<p>Before we get down to brass tacks, a brief observation: nearly all issues 71described below are valid whether or not the VM features a JIT compiler. If I 72have two methods that accomplish the same thing, and the interpreted execution 73of foo() is faster than bar(), then the compiled version of foo() will 74probably be as fast or faster than compiled bar(). It is unwise to rely on a 75compiler to "save" you and make your code fast enough.</p> 76 77<a name="object_creation"></a> 78<h2>Avoid Creating Objects</h2> 79 80<p>Object creation is never free. A generational GC with per-thread allocation 81pools for temporary objects can make allocation cheaper, but allocating memory 82is always more expensive than not allocating memory.</p> 83 84<p>If you allocate objects in a user interface loop, you will force a periodic 85garbage collection, creating little "hiccups" in the user experience.</p> 86 87<p>Thus, you should avoid creating object instances you don't need to. Some 88examples of things that can help:</p> 89 90<ul> 91 <li>When extracting strings from a set of input data, try 92 to return a substring of the original data, instead of creating a copy. 93 You will create a new String object, but it will share the char[] 94 with the data.</li> 95 <li>If you have a method returning a string, and you know that its result 96 will always be appended to a StringBuffer anyway, change your signature 97 and implementation so that the function does the append directly, 98 instead of creating a short-lived temporary object.</li> 99</ul> 100 101<p>A somewhat more radical idea is to slice up multidimensional arrays into parallel 102single one-dimension arrays:</p> 103 104<ul> 105 <li>An array of ints is a much better than an array of Integers, 106 but this also generalizes to the fact that two parallel arrays of ints 107 are also a <strong>lot</strong> more efficient than an array of (int,int) 108 objects. The same goes for any combination of primitive types.</li> 109 <li>If you need to implement a container that stores tuples of (Foo,Bar) 110 objects, try to remember that two parallel Foo[] and Bar[] arrays are 111 generally much better than a single array of custom (Foo,Bar) objects. 112 (The exception to this, of course, is when you're designing an API for 113 other code to access; in those cases, it's usually better to trade 114 correct API design for a small hit in speed. But in your own internal 115 code, you should try and be as efficient as possible.)</li> 116</ul> 117 118<p>Generally speaking, avoid creating short-term temporary objects if you 119can. Fewer objects created mean less-frequent garbage collection, which has 120a direct impact on user experience.</p> 121 122<a name="native_methods" id="native_methods"></a> 123<h2>Use Native Methods</h2> 124 125<p>When processing strings, don't hesitate to use specialty methods like 126String.indexOf(), String.lastIndexOf(), and their cousins. These are typically 127implemented in C/C++ code that easily runs 10-100x faster than doing the same 128thing in a Java loop.</p> 129 130<p>The flip side of that advice is that punching through to a native 131method is more expensive than calling an interpreted method. Don't use native 132methods for trivial computation, if you can avoid it.</p> 133 134<a name="prefer_virtual" id="prefer_virtual"></a> 135<h2>Prefer Virtual Over Interface</h2> 136 137<p>Suppose you have a HashMap object. You can declare it as a HashMap or as 138a generic Map:</p> 139 140<pre>Map myMap1 = new HashMap(); 141HashMap myMap2 = new HashMap();</pre> 142 143<p>Which is better?</p> 144 145<p>Conventional wisdom says that you should prefer Map, because it 146allows you to change the underlying implementation to anything that 147implements the Map interface. Conventional wisdom is correct for 148conventional programming, but isn't so great for embedded systems. Calling 149through an interface reference can take 2x longer than a virtual 150method call through a concrete reference.</p> 151 152<p>If you have chosen a HashMap because it fits what you're doing, there 153is little value in calling it a Map. Given the availability of 154IDEs that refactor your code for you, there's not much value in calling 155it a Map even if you're not sure where the code is headed. (Again, though, 156public APIs are an exception: a good API usually trumps small performance 157concerns.)</p> 158 159<a name="prefer_static" id="prefer_static"></a> 160<h2>Prefer Static Over Virtual</h2> 161 162<p>If you don't need to access an object's fields, make your method static. It can 163be called faster, because it doesn't require a virtual method table 164indirection. It's also good practice, because you can tell from the method 165signature that calling the method can't alter the object's state.</p> 166 167<a name="internal_get_set" id="internal_get_set"></a> 168<h2>Avoid Internal Getters/Setters</h2> 169 170<p>In native languages like C++ it's common practice to use getters (e.g. 171<code>i = getCount()</code>) instead of accessing the field directly (<code>i 172= mCount</code>). This is an excellent habit for C++, because the compiler can 173usually inline the access, and if you need to restrict or debug field access 174you can add the code at any time.</p> 175 176<p>On Android, this is a bad idea. Virtual method calls are expensive, 177much more so than instance field lookups. It's reasonable to follow 178common object-oriented programming practices and have getters and setters 179in the public interface, but within a class you should always access 180fields directly.</p> 181 182<a name="cache_fields" id="cache_fields"></a> 183<h2>Cache Field Lookups</h2> 184 185<p>Accessing object fields is much slower than accessing local variables. 186Instead of writing:</p> 187<pre>for (int i = 0; i < this.mCount; i++) 188 dumpItem(this.mItems[i]);</pre> 189 190<p>You should write:</p> 191<pre> int count = this.mCount; 192 Item[] items = this.mItems; 193 194 for (int i = 0; i < count; i++) 195 dumpItems(items[i]); 196</pre> 197 198<p>(We're using an explicit "this" to make it clear that these are 199member variables.)</p> 200 201<p>A similar guideline is never call a method in the second clause of a "for" 202statement. For example, the following code will execute the getCount() method 203once per iteration, which is a huge waste when you could have simply cached 204the value as an int:</p> 205 206<pre>for (int i = 0; i < this.getCount(); i++) 207 dumpItems(this.getItem(i)); 208</pre> 209 210<p>It's also usually a good idea to create a local variable if you're going to be 211accessing an instance field more than once. For example:</p> 212 213<pre> 214 protected void drawHorizontalScrollBar(Canvas canvas, int width, int height) { 215 if (isHorizontalScrollBarEnabled()) { 216 int size = <strong>mScrollBar</strong>.getSize(<em>false</em>); 217 if (size <= 0) { 218 size = mScrollBarSize; 219 } 220 <strong>mScrollBar</strong>.setBounds(0, <em>height</em> - size, width, height); 221 <strong>mScrollBar</strong>.setParams( 222 computeHorizontalScrollRange(), 223 computeHorizontalScrollOffset(), 224 computeHorizontalScrollExtent(), <em>false</em>); 225 <strong>mScrollBar</strong>.draw(canvas); 226 } 227 }</pre> 228 229<p>That's four separate lookups of the member field <code>mScrollBar</code>. 230By caching mScrollBar in a local stack variable, the four member field lookups 231become four stack variable references, which are much more efficient.</p> 232 233<p>Incidentally, method arguments have the same performance characteristics 234as local variables.</p> 235 236<a name="use_final" id="use_final"></a> 237<h2>Declare Constants Final</h2> 238 239<p>Consider the following declaration at the top of a class:</p> 240 241<pre>static int intVal = 42; 242static String strVal = "Hello, world!";</pre> 243 244<p>The compiler generates a class initializer method, called 245<code><clinit></code>, that is executed when the class is first used. 246The method stores the value 42 into <code>intVal</code>, and extracts a 247reference from the classfile string constant table for <code>strVal</code>. 248When these values are referenced later on, they are accessed with field 249lookups.</p> 250 251<p>We can improve matters with the "final" keyword:</p> 252 253<pre>static final int intVal = 42; 254static final String strVal = "Hello, world!";</pre> 255 256<p>The class no longer requires a <code><clinit></code> method, 257because the constants go into classfile static field initializers, which are 258handled directly by the VM. Code accessing <code>intVal</code> will use 259the integer value 42 directly, and accesses to <code>strVal</code> will 260use a relatively inexpensive "string constant" instruction instead of a 261field lookup.</p> 262 263<p>Declaring a method or class "final" does not confer any immediate 264performance benefits, but it does allow certain optimizations. For example, if 265the compiler knows that a "getter" method can't be overridden by a sub-class, 266it can inline the method call.</p> 267 268<p>You can also declare local variables final. However, this has no definitive 269performance benefits. For local variables, only use "final" if it makes the 270code clearer (or you have to, e.g. for use in an anonymous inner class).</p> 271 272<a name="foreach" id="foreach"></a> 273<h2>Use Enhanced For Loop Syntax With Caution</h2> 274 275<p>The enhanced for loop (also sometimes known as "for-each" loop) can be used for collections that implement the Iterable interface. 276With these objects, an iterator is allocated to make interface calls 277to hasNext() and next(). With an ArrayList, you're better off walking through 278it directly, but for other collections the enhanced for loop syntax will be equivalent 279to explicit iterator usage.</p> 280 281<p>Nevertheless, the following code shows an acceptable use of the enhanced for loop:</p> 282 283<pre>public class Foo { 284 int mSplat; 285 static Foo mArray[] = new Foo[27]; 286 287 public static void zero() { 288 int sum = 0; 289 for (int i = 0; i < mArray.length; i++) { 290 sum += mArray[i].mSplat; 291 } 292 } 293 294 public static void one() { 295 int sum = 0; 296 Foo[] localArray = mArray; 297 int len = localArray.length; 298 299 for (int i = 0; i < len; i++) { 300 sum += localArray[i].mSplat; 301 } 302 } 303 304 public static void two() { 305 int sum = 0; 306 for (Foo a: mArray) { 307 sum += a.mSplat; 308 } 309 } 310}</pre> 311 312<p><strong>zero()</strong> retrieves the static field twice and gets the array 313length once for every iteration through the loop.</p> 314 315<p><strong>one()</strong> pulls everything out into local variables, avoiding 316the lookups.</p> 317 318<p><strong>two()</strong> uses the enhanced for loop syntax introduced in version 1.5 of 319the Java programming language. The code generated by the compiler takes care 320of copying the array reference and the array length to local variables, making 321it a good choice for walking through all elements of an array. It does 322generate an extra local load/store in the main loop (apparently preserving 323"a"), making it a teensy bit slower and 4 bytes longer than one().</p> 324 325<p>To summarize all that a bit more clearly: enhanced for loop syntax performs well 326with arrays, but be cautious when using it with Iterable objects since there is 327additional object creation.</p> 328 329<a name="avoid_enums" id="avoid_enums"></a> 330<h2>Avoid Enums</h2> 331 332<p>Enums are very convenient, but unfortunately can be painful when size 333and speed matter. For example, this:</p> 334 335<pre>public class Foo { 336 public enum Shrubbery { GROUND, CRAWLING, HANGING } 337}</pre> 338 339<p>turns into a 900 byte .class file (Foo$Shrubbery.class). On first use, the 340class initializer invokes the <init> method on objects representing each 341of the enumerated values. Each object gets its own static field, and the full 342set is stored in an array (a static field called "$VALUES"). That's a lot of 343code and data, just for three integers.</p> 344 345<p>This:</p> 346 347<pre>Shrubbery shrub = Shrubbery.GROUND;</pre> 348 349<p>causes a static field lookup. If "GROUND" were a static final int, 350the compiler would treat it as a known constant and inline it.</p> 351 352<p>The flip side, of course, is that with enums you get nicer APIs and 353some compile-time value checking. So, the usual trade-off applies: you should 354by all means use enums for public APIs, but try to avoid them when performance 355matters.</p> 356 357<p>In some circumstances it can be helpful to get enum integer values 358through the <code>ordinal()</code> method. For example, replace:</p> 359 360<pre>for (int n = 0; n < list.size(); n++) { 361 if (list.items[n].e == MyEnum.VAL_X) 362 // do stuff 1 363 else if (list.items[n].e == MyEnum.VAL_Y) 364 // do stuff 2 365}</pre> 366 367<p>with:</p> 368 369<pre> int valX = MyEnum.VAL_X.ordinal(); 370 int valY = MyEnum.VAL_Y.ordinal(); 371 int count = list.size(); 372 MyItem items = list.items(); 373 374 for (int n = 0; n < count; n++) 375 { 376 int valItem = items[n].e.ordinal(); 377 378 if (valItem == valX) 379 // do stuff 1 380 else if (valItem == valY) 381 // do stuff 2 382 }</pre> 383 384<p>In some cases, this will be faster, though this is not guaranteed.</p> 385 386<a name="package_inner" id="package_inner"></a> 387<h2>Use Package Scope with Inner Classes</h2> 388 389<p>Consider the following class definition:</p> 390 391<pre>public class Foo { 392 private int mValue; 393 394 public void run() { 395 Inner in = new Inner(); 396 mValue = 27; 397 in.stuff(); 398 } 399 400 private void doStuff(int value) { 401 System.out.println("Value is " + value); 402 } 403 404 private class Inner { 405 void stuff() { 406 Foo.this.doStuff(Foo.this.mValue); 407 } 408 } 409}</pre> 410 411<p>The key things to note here are that we define an inner class (Foo$Inner) 412that directly accesses a private method and a private instance field 413in the outer class. This is legal, and the code prints "Value is 27" as 414expected.</p> 415 416<p>The problem is that Foo$Inner is technically (behind the scenes) a totally 417separate class, which makes direct access to Foo's private 418members illegal. To bridge that gap, the compiler generates a 419couple of synthetic methods:</p> 420 421<pre>/*package*/ static int Foo.access$100(Foo foo) { 422 return foo.mValue; 423} 424/*package*/ static void Foo.access$200(Foo foo, int value) { 425 foo.doStuff(value); 426}</pre> 427 428<p>The inner-class code calls these static methods whenever it needs to 429access the "mValue" field or invoke the "doStuff" method in the outer 430class. What this means is that the code above really boils down to a case 431where you're accessing member fields through accessor methods instead of 432directly. Earlier we talked about how accessors are slower than direct field 433accesses, so this is an example of a certain language idiom resulting in an 434"invisible" performance hit.</p> 435 436<p>We can avoid this problem by declaring fields and methods accessed 437by inner classes to have package scope, rather than private scope. 438This runs faster and removes the overhead of the generated methods. 439(Unfortunately it also means the fields could be accessed directly by other 440classes in the same package, which runs counter to the standard OO 441practice of making all fields private. Once again, if you're 442designing a public API you might want to carefully consider using this 443optimization.)</p> 444 445<a name="avoidfloat" id="avoidfloat"></a> 446<h2>Avoid Float</h2> 447 448<p>Before the release of the Pentium CPU, it was common for game authors to do 449as much as possible with integer math. With the Pentium, the floating point 450math co-processor became a built-in feature, and by interleaving integer and 451floating-point operations your game would actually go faster than it would 452with purely integer math. The common practice on desktop systems is to use 453floating point freely.</p> 454 455<p>Unfortunately, embedded processors frequently do not have hardware floating 456point support, so all operations on "float" and "double" are performed in 457software. Some basic floating point operations can take on the order of a 458millisecond to complete.</p> 459 460<p>Also, even for integers, some chips have hardware multiply but lack 461hardware divide. In such cases, integer division and modulus operations are 462performed in software — something to think about if you're designing a 463hash table or doing lots of math.</p> 464 465<a name="samples" id="samples"></a> 466<h2>Some Sample Performance Numbers</h2> 467 468<p>To illustrate some of our ideas, here is a table listing the approximate 469run times for a few basic actions. Note that these values should NOT be taken 470as absolute numbers: they are a combination of CPU and wall clock time, and 471will change as improvements are made to the system. However, it is worth 472noting how these values apply relative to each other — for example, 473adding a member variable currently takes about four times as long as adding a 474local variable.</p> 475 476<table> 477 <tr> 478 <th>Action</th> 479 <th>Time</th> 480 </tr> 481 <tr> 482 <td>Add a local variable </td> 483 <td>1</td> 484 </tr> 485 <tr class="alt"> 486 <td>Add a member variable </td> 487 <td>4</td> 488 </tr> 489 <tr> 490 <td>Call String.length()</td> 491 <td>5</td> 492 </tr> 493 <tr class="alt"> 494 <td>Call empty static native method</td> 495 <td>5</td> 496 </tr> 497 <tr> 498 <td>Call empty static method </td> 499 <td>12</td> 500 </tr> 501 <tr class="alt"> 502 <td>Call empty virtual method </td> 503 <td>12.5</td> 504 </tr> 505 <tr> 506 <td>Call empty interface method </td> 507 <td>15</td> 508 </tr> 509 <tr class="alt"> 510 <td>Call Iterator:next() on a HashMap </td> 511 <td>165</td> 512 </tr> 513 <tr> 514 <td>Call put() on a HashMap</td> 515 <td>600</td> 516 </tr> 517 <tr class="alt"> 518 <td>Inflate 1 View from XML </td> 519 <td>22,000</td> 520 </tr> 521 <tr> 522 <td>Inflate 1 LinearLayout containing 1 TextView </td> 523 <td>25,000</td> 524 </tr> 525 <tr class="alt"> 526 <td>Inflate 1 LinearLayout containing 6 View objects </td> 527 <td>100,000</td> 528 </tr> 529 <tr> 530 <td>Inflate 1 LinearLayout containing 6 TextView objects </td> 531 <td>135,000</td> 532 </tr> 533 <tr class="alt"> 534 <td>Launch an empty activity </td> 535 <td>3,000,000</td> 536 </tr> 537</table> 538 539<a name="closing_notes" id="closing_notes"></a> 540<h2>Closing Notes</h2> 541 542<p>The best way to write good, efficient code for embedded systems is to 543understand what the code you write really does. If you really want to allocate 544an iterator, by all means use enhanced for loop syntax on a List; just make it a 545deliberate choice, not an inadvertent side effect.</p> 546 547<p>Forewarned is forearmed! Know what you're getting into! Insert your 548favorite maxim here, but always think carefully about what your code is doing, 549and be on the lookout for ways to speed it up.</p> 550