1page.title=RenderScript 2parent.title=Computation 3parent.link=index.html 4 5@jd:body 6 7<div id="qv-wrapper"> 8 <div id="qv"> 9 <h2>In this document</h2> 10 11 <ol> 12 <li><a href="#writing-an-rs-kernel">Writing a RenderScript Kernel</a></li> 13 <li><a href="#access-rs-apis">Accessing RenderScript APIs from Java</a> 14 <ol> 15 <li><a href="#ide-setup">Setting Up Your Development Environment</a></li> 16 </ol> 17 </li> 18 <li><a href="#using-rs-from-java">Using RenderScript from Java Code</a></li> 19 <li><a href="#single-source-rs">Single-Source RenderScript</a></li> 20 <li><a href="#reduction-in-depth">Reduction Kernels in Depth</a> 21 <ol> 22 <li><a href="#writing-reduction-kernel">Writing a reduction kernel</a></li> 23 <li><a href="#calling-reduction-kernel">Calling a reduction kernel from Java code</a></li> 24 <li><a href="#more-example">More example reduction kernels</a></li> 25 </ol> 26 </li> 27 </ol> 28 29 <h2>Related Samples</h2> 30 31 <ol> 32 <li><a class="external-link"href="https://github.com/android/platform_development/tree/master/samples/RenderScript/HelloCompute">Hello 33 Compute</a></li> 34 </ol> 35 </div> 36</div> 37 38<p>RenderScript is a framework for running computationally intensive tasks at high performance on 39Android. RenderScript is primarily oriented for use with data-parallel computation, although serial 40workloads can benefit as well. The RenderScript runtime parallelizes 41work across processors available on a device, such as multi-core CPUs and GPUs. This allows 42you to focus on expressing algorithms rather than scheduling work. RenderScript is 43especially useful for applications performing image processing, computational photography, or 44computer vision.</p> 45 46<p>To begin with RenderScript, there are two main concepts you should understand:</p> 47<ul> 48 49<li>The <em>language</em> itself is a C99-derived language for writing high-performance compute 50code. <a href="#writing-an-rs-kernel">Writing a RenderScript Kernel</a> describes 51how to use it to write compute kernels.</li> 52 53<li>The <em>control API</em> is used for managing the lifetime of RenderScript resources and 54controlling kernel execution. It is available in three different languages: Java, C++ in Android 55NDK, and the C99-derived kernel language itself. 56<a href="#using-rs-from-java">Using RenderScript from Java Code</a> and 57<a href=#single-source-rs>Single-Source RenderScript</a> describe the first and the third 58options, respectively.</li> 59</ul> 60 61<h2 id="writing-an-rs-kernel">Writing a RenderScript Kernel</h2> 62 63<p>A RenderScript kernel typically resides in a <code>.rs</code> file in the 64<code><project_root>/src/</code> directory; each <code>.rs</code> file is called a 65<i>script</i>. Every script contains its own set of kernels, functions, and variables. A script can 66contain:</p> 67 68<ul> 69<li>A pragma declaration (<code>#pragma version(1)</code>) that declares the version of the 70RenderScript kernel language used in this script. Currently, 1 is the only valid value.</li> 71 72<li>A pragma declaration (<code>#pragma rs java_package_name(com.example.app)</code>) that 73declares the package name of the Java classes reflected from this script. 74Note that your <code>.rs</code> file must be part of your application package, and not in a 75library project.</li> 76 77<li>Zero or more <strong><i>invokable functions</i></strong>. An invokable function is a single-threaded RenderScript 78function that you can call from your Java code with arbitrary arguments. These are often useful for 79initial setup or serial computations within a larger processing pipeline.</li> 80 81<li><p>Zero or more <strong><i>script globals</i></strong>. A script global is equivalent to a global variable in C. You can 82access script globals from Java code, and these are often used for parameter passing to RenderScript 83kernels.</p></li> 84 85<li><p>Zero or more <strong><i>compute kernels</i></strong>. A compute kernel is a function 86or collection of functions that you can direct the RenderScript runtime to execute in parallel 87across a collection of data. There are two kinds of compute 88kernels: <i>mapping</i> kernels (also called <i>foreach</i> kernels) 89and <i>reduction</i> kernels.</p> 90 91<p>A <em>mapping kernel</em> is a parallel function that operates on a collection of {@link 92 android.renderscript.Allocation Allocations} of the same dimensions. By default, it executes 93 once for every coordinate in those dimensions. It is typically (but not exclusively) used to 94 transform a collection of input {@link android.renderscript.Allocation Allocations} to an 95 output {@link android.renderscript.Allocation} one {@link android.renderscript.Element} at a 96 time.</p> 97 98<ul> 99<li><p>Here is an example of a simple <strong>mapping kernel</strong>:</p> 100 101<pre>uchar4 RS_KERNEL invert(uchar4 in, uint32_t x, uint32_t y) { 102 uchar4 out = in; 103 out.r = 255 - in.r; 104 out.g = 255 - in.g; 105 out.b = 255 - in.b; 106 return out; 107}</pre> 108 109<p>In most respects, this is identical to a standard C 110 function. The <a href="#RS_KERNEL"><code>RS_KERNEL</code></a> property applied to the 111 function prototype specifies that the function is a RenderScript mapping kernel instead of an 112 invokable function. The <code>in</code> argument is automatically filled in based on the 113 input {@link android.renderscript.Allocation} passed to the kernel launch. The 114 arguments <code>x</code> and <code>y</code> are 115 discussed <a href="#special-arguments">below</a>. The value returned from the kernel is 116 automatically written to the appropriate location in the output {@link 117 android.renderscript.Allocation}. By default, this kernel is run across its entire input 118 {@link android.renderscript.Allocation}, with one execution of the kernel function per {@link 119 android.renderscript.Element} in the {@link android.renderscript.Allocation}.</p> 120 121<p>A mapping kernel may have one or more input {@link android.renderscript.Allocation 122 Allocations}, a single output {@link android.renderscript.Allocation}, or both. The 123 RenderScript runtime checks to ensure that all input and output Allocations have the same 124 dimensions, and that the {@link android.renderscript.Element} types of the input and output 125 Allocations match the kernel's prototype; if either of these checks fails, RenderScript 126 throws an exception.</p> 127 128<p class="note"><strong>NOTE:</strong> Before Android 6.0 (API level 23), a mapping kernel may 129 not have more than one input {@link android.renderscript.Allocation}.</p> 130 131<p>If you need more input or output {@link android.renderscript.Allocation Allocations} than 132 the kernel has, those objects should be bound to <code>rs_allocation</code> script globals 133 and accessed from a kernel or invokable function 134 via <code>rsGetElementAt_<i>type</i>()</code> or <code>rsSetElementAt_<i>type</i>()</code>.</p> 135 136<p><strong>NOTE:</strong> <a id="RS_KERNEL"><code>RS_KERNEL</code></a> is a macro 137 defined automatically by RenderScript for your convenience:</p> 138<pre> 139#define RS_KERNEL __attribute__((kernel)) 140</pre> 141</li> 142</ul> 143 144<p>A <em>reduction kernel</em> is a family of functions that operates on a collection of input 145 {@link android.renderscript.Allocation Allocations} of the same dimensions. By default, 146 its <a href="#accumulator-function">accumulator function</a> executes once for every 147 coordinate in those dimensions. It is typically (but not exclusively) used to "reduce" a 148 collection of input {@link android.renderscript.Allocation Allocations} to a single 149 value.</p> 150 151<ul> 152<li><p>Here is an <a id="example-addint">example</a> of a simple <strong>reduction 153kernel</strong> that adds up the {@link android.renderscript.Element Elements} of its 154input:</p> 155 156<pre>#pragma rs reduce(addint) accumulator(addintAccum) 157 158static void addintAccum(int *accum, int val) { 159 *accum += val; 160}</pre> 161 162<p>A reduction kernel consists of one or more user-written functions. 163<code>#pragma rs reduce</code> is used to define the kernel by specifying its name 164(<code>addint</code>, in this example) and the names and roles of the functions that make 165up the kernel (an <code>accumulator</code> function <code>addintAccum</code>, in this 166example). All such functions must be <code>static</code>. A reduction kernel always 167requires an <code>accumulator</code> function; it may also have other functions, depending 168on what you want the kernel to do.</p> 169 170<p>A reduction kernel accumulator function must return <code>void</code> and must have at least 171two arguments. The first argument (<code>accum</code>, in this example) is a pointer to 172an <i>accumulator data item</i> and the second (<code>val</code>, in this example) is 173automatically filled in based on the input {@link android.renderscript.Allocation} passed to 174the kernel launch. The accumulator data item is created by the RenderScript runtime; by 175default, it is initialized to zero. By default, this kernel is run across its entire input 176{@link android.renderscript.Allocation}, with one execution of the accumulator function per 177{@link android.renderscript.Element} in the {@link android.renderscript.Allocation}. By 178default, the final value of the accumulator data item is treated as the result of the 179reduction, and is returned to Java. The RenderScript runtime checks to ensure that the {@link 180android.renderscript.Element} type of the input Allocation matches the accumulator function's 181prototype; if it does not match, RenderScript throws an exception.</p> 182 183<p>A reduction kernel has one or more input {@link android.renderscript.Allocation 184Allocations} but no output {@link android.renderscript.Allocation Allocations}.</p></li> 185 186<p>Reduction kernels are explained in more detail <a href="#reduction-in-depth">here</a>.</p> 187 188<p>Reduction kernels are supported in Android 7.0 (API level 24) and later.</p> 189</li> 190</ul> 191 192<p>A mapping kernel function or a reduction kernel accumulator function may access the coordinates 193of the current execution using the <a id="special-arguments">special arguments</a> <code>x</code>, 194<code>y</code>, and <code>z</code>, which must be of type <code>int</code> or <code>uint32_t</code>. 195These arguments are optional.</p> 196 197<p>A mapping kernel function or a reduction kernel accumulator 198function may also take the optional special argument 199<code>context</code> of type <a 200href='reference/rs_for_each.html#android_rs:rs_kernel_context'>rs_kernel_context</a>. 201It is needed by a family of runtime APIs that are used to query 202certain properties of the current execution -- for example, <a 203href='reference/rs_for_each.html#android_rs:rsGetDimX'>rsGetDimX</a>. 204(The <code>context</code> argument is available in Android 6.0 (API level 23) and later.)</p> 205</li> 206 207<li>An optional <code>init()</code> function. An <code>init()</code> function is a special type of 208invokable function that RenderScript runs when the script is first instantiated. This allows for some 209computation to occur automatically at script creation.</li> 210 211<li>Zero or more <strong><i>static script globals and functions</i></strong>. A static script global is equivalent to a 212script global except that it cannot be accessed from Java code. A static function is a standard C 213function that can be called from any kernel or invokable function in the script but is not exposed 214to the Java API. If a script global or function does not need to be called from Java code, it is 215highly recommended that it be declared <code>static</code>.</li> </ul> 216 217<h4>Setting floating point precision</h4> 218 219<p>You can control the required level of floating point precision in a script. This is useful if 220full IEEE 754-2008 standard (used by default) is not required. The following pragmas can set a 221different level of floating point precision:</p> 222 223<ul> 224 225<li><code>#pragma rs_fp_full</code> (default if nothing is specified): For apps that require 226 floating point precision as outlined by the IEEE 754-2008 standard. 227 228</li> 229 230 <li><code>#pragma rs_fp_relaxed</code>: For apps that don’t require strict IEEE 754-2008 231 compliance and can tolerate less precision. This mode enables flush-to-zero for denorms and 232 round-towards-zero. 233 234</li> 235 236 <li><code>#pragma rs_fp_imprecise</code>: For apps that don’t have stringent precision 237 requirements. This mode enables everything in <code>rs_fp_relaxed</code> along with the 238 following: 239 240<ul> 241 242 <li>Operations resulting in -0.0 can return +0.0 instead.</li> 243 <li>Operations on INF and NAN are undefined.</li> 244</ul> 245</li> 246</ul> 247 248<p>Most applications can use <code>rs_fp_relaxed</code> without any side effects. This may be very 249beneficial on some architectures due to additional optimizations only available with relaxed 250precision (such as SIMD CPU instructions).</p> 251 252 253<h2 id="access-rs-apis">Accessing RenderScript APIs from Java</h2> 254 255<p>When developing an Android application that uses RenderScript, you can access its API from Java in 256 one of two ways:</p> 257 258<ul> 259 <li><strong>{@link android.renderscript}</strong> - The APIs in this class package are 260 available on devices running Android 3.0 (API level 11) and higher. </li> 261 <li><strong>{@link android.support.v8.renderscript}</strong> - The APIs in this package are 262 available through a <a href="{@docRoot}tools/support-library/features.html#v8">Support 263 Library</a>, which allows you to use them on devices running Android 2.3 (API level 9) and 264 higher.</li> 265</ul> 266 267<p>Here are the tradeoffs:</p> 268 269<ul> 270<li>If you use the Support Library APIs, the RenderScript portion of your application will be 271 compatible with devices running Android 2.3 (API level 9) and higher, regardless of which RenderScript 272 features you use. This allows your application to work on more devices than if you use the 273 native (<strong>{@link android.renderscript}</strong>) APIs.</li> 274<li>Certain RenderScript features are not available through the Support Library APIs.</li> 275<li>If you use the Support Library APIs, you will get (possibly significantly) larger APKs than 276if you use the native (<strong>{@link android.renderscript}</strong>) APIs.</li> 277</ul> 278 279<h3 id="ide-setup">Using the RenderScript Support Library APIs</h3> 280 281<p>In order to use the Support Library RenderScript APIs, you must configure your development 282 environment to be able to access them. The following Android SDK tools are required for using 283 these APIs:</p> 284 285<ul> 286 <li>Android SDK Tools revision 22.2 or higher</li> 287 <li>Android SDK Build-tools revision 18.1.0 or higher</li> 288</ul> 289 290<p>You can check and update the installed version of these tools in the 291 <a href="{@docRoot}tools/help/sdk-manager.html">Android SDK Manager</a>.</p> 292 293 294<p>To use the Support Library RenderScript APIs:</p> 295 296<ol> 297 <li>Make sure you have the required Android SDK version and Build Tools version installed.</li> 298 <li> Update the settings for the Android build process to include the RenderScript settings: 299 300 <ul> 301 <li>Open the {@code build.gradle} file in the app folder of your application module. </li> 302 <li>Add the following RenderScript settings to the file: 303 304<pre> 305android { 306 compileSdkVersion 23 307 buildToolsVersion "23.0.3" 308 309 defaultConfig { 310 minSdkVersion 9 311 targetSdkVersion 19 312<strong> 313 renderscriptTargetApi 18 314 renderscriptSupportModeEnabled true 315</strong> 316 } 317} 318</pre> 319 320 321 <p>The settings listed above control specific behavior in the Android build process:</p> 322 323 <ul> 324 <li>{@code renderscriptTargetApi} - Specifies the bytecode version to be generated. We 325 recommend you set this value to the lowest API level able to provide all the functionality 326 you are using and set {@code renderscriptSupportModeEnabled} to {@code true}. 327 Valid values for this setting are any integer value 328 from 11 to the most recently released API level. If your minimum SDK version specified in your 329 application manifest is set to a different value, that value is ignored and the target value 330 in the build file is used to set the minimum SDK version.</li> 331 <li>{@code renderscriptSupportModeEnabled} - Specifies that the generated bytecode should fall 332 back to a compatible version if the device it is running on does not support the target 333 version. 334 </li> 335 <li>{@code buildToolsVersion} - The version of the Android SDK build tools to use. This value 336 should be set to {@code 18.1.0} or higher. If this option is not specified, the highest 337 installed build tools version is used. You should always set this value to ensure the 338 consistency of builds across development machines with different configurations.</li> 339 </ul> 340 </li> 341 </ul> 342 343 <li>In your application classes that use RenderScript, add an import for the Support Library 344 classes: 345 346<pre> 347import android.support.v8.renderscript.*; 348</pre> 349 350 </li> 351 352</ol> 353 354<h2 id="using-rs-from-java">Using RenderScript from Java Code</h2> 355 356<p>Using RenderScript from Java code relies on the API classes located in the 357{@link android.renderscript} or the {@link android.support.v8.renderscript} package. Most 358applications follow the same basic usage pattern:</p> 359 360<ol> 361 362<li><strong>Initialize a RenderScript context.</strong> The {@link 363android.renderscript.RenderScript} context, created with {@link 364android.renderscript.RenderScript#create}, ensures that RenderScript can be used and provides an 365object to control the lifetime of all subsequent RenderScript objects. You should consider context 366creation to be a potentially long-running operation, since it may create resources on different 367pieces of hardware; it should not be in an application's critical path if at all 368possible. Typically, an application will have only a single RenderScript context at a time.</li> 369 370<li><strong>Create at least one {@link android.renderscript.Allocation} to be passed to a 371script.</strong> An {@link android.renderscript.Allocation} is a RenderScript object that provides 372storage for a fixed amount of data. Kernels in scripts take {@link android.renderscript.Allocation} 373objects as their input and output, and {@link android.renderscript.Allocation} objects can be 374accessed in kernels using <code>rsGetElementAt_<i>type</i>()</code> and 375<code>rsSetElementAt_<i>type</i>()</code> when bound as script globals. {@link 376android.renderscript.Allocation} objects allow arrays to be passed from Java code to RenderScript 377code and vice-versa. {@link android.renderscript.Allocation} objects are typically created using 378{@link android.renderscript.Allocation#createTyped createTyped()} or {@link 379android.renderscript.Allocation#createFromBitmap createFromBitmap()}.</li> 380 381<li><strong>Create whatever scripts are necessary.</strong> There are two types of scripts available 382to you when using RenderScript: 383 384<ul> 385 386<li><strong>ScriptC</strong>: These are the user-defined scripts as described in <a 387href="#writing-an-rs-kernel"><i>Writing a RenderScript Kernel</i></a> above. Every script has a Java class 388reflected by the RenderScript compiler in order to make it easy to access the script from Java code; 389this class has the name <code>ScriptC_<i>filename</i></code>. For example, if the mapping kernel 390above were located in <code>invert.rs</code> and a RenderScript context were already located in 391<code>mRenderScript</code>, the Java code to instantiate the script would be: 392 393<pre>ScriptC_invert invert = new ScriptC_invert(mRenderScript);</pre></li> 394 395<li><strong>ScriptIntrinsic</strong>: These are built-in RenderScript kernels for common operations, 396such as Gaussian blur, convolution, and image blending. For more information, see the subclasses of 397{@link android.renderscript.ScriptIntrinsic}.</li> 398 399</ul></li> 400 401<li><strong>Populate Allocations with data.</strong> Except for Allocations created with {@link 402android.renderscript.Allocation#createFromBitmap createFromBitmap()}, an Allocation is populated with empty data when it is 403first created. To populate an Allocation, use one of the "copy" methods in {@link 404android.renderscript.Allocation}. The "copy" methods are <a href="#asynchronous-model">synchronous</a>.</li> 405 406<li><strong>Set any necessary script globals.</strong> You may set globals using methods in the 407 same <code>ScriptC_<i>filename</i></code> class named <code>set_<i>globalname</i></code>. For 408 example, in order to set an <code>int</code> variable named <code>threshold</code>, use the 409 Java method <code>set_threshold(int)</code>; and in order to set 410 an <code>rs_allocation</code> variable named <code>lookup</code>, use the Java 411 method <code>set_lookup(Allocation)</code>. The <code>set</code> methods 412 are <a href="#asynchronous-model">asynchronous</a>.</li> 413 414<li><strong>Launch the appropriate kernels and invokable functions.</strong> 415<p>Methods to launch a given kernel are 416reflected in the same <code>ScriptC_<i>filename</i></code> class with methods named 417<code>forEach_<i>mappingKernelName</i>()</code> 418or <code>reduce_<i>reductionKernelName</i>()</code>. 419These launches are <a href="#asynchronous-model">asynchronous</a>. 420Depending on the arguments to the kernel, the 421method takes one or more Allocations, all of which must have the same dimensions. By default, a 422kernel executes over every coordinate in those dimensions; to execute a kernel over a subset of those coordinates, 423pass an appropriate {@link 424android.renderscript.Script.LaunchOptions} as the last argument to the <code>forEach</code> or <code>reduce</code> method.</p> 425 426<p>Launch invokable functions using the <code>invoke_<i>functionName</i></code> methods 427reflected in the same <code>ScriptC_<i>filename</i></code> class. 428These launches are <a href="#asynchronous-model">asynchronous</a>.</p></li> 429 430<li><strong>Retrieve data from {@link android.renderscript.Allocation} objects 431and <i><a href="#javaFutureType">javaFutureType</a></i> objects.</strong> 432In order to 433access data from an {@link android.renderscript.Allocation} from Java code, you must copy that data 434back to Java using one of the "copy" methods in {@link 435android.renderscript.Allocation}. 436In order to obtain the result of a reduction kernel, you must use the <code><i>javaFutureType</i>.get()</code> method. 437The "copy" and <code>get()</code> methods are <a href="#asynchronous-model">synchronous</a>.</li> 438 439<li><strong>Tear down the RenderScript context.</strong> You can destroy the RenderScript context 440with {@link android.renderscript.RenderScript#destroy} or by allowing the RenderScript context 441object to be garbage collected. This causes any further use of any object belonging to that 442context to throw an exception.</li> </ol> 443 444<h3 id="asynchronous-model">Asynchronous execution model</h3> 445 446<p>The reflected <code>forEach</code>, <code>invoke</code>, <code>reduce</code>, 447 and <code>set</code> methods are asynchronous -- each may return to Java before completing the 448 requested action. However, the individual actions are serialized in the order in which they are launched.</p> 449 450<p>The {@link android.renderscript.Allocation} class provides "copy" methods to copy data to 451 and from Allocations. A "copy" method is synchronous, and is serialized with respect to any 452 of the asynchronous actions above that touch the same Allocation.</p> 453 454<p>The reflected <i><a href="#javaFutureType">javaFutureType</a></i> classes provide 455 a <code>get()</code> method to obtain the result of a reduction. <code>get()</code> is 456 synchronous, and is serialized with respect to the reduction (which is asynchronous).</p> 457 458<h2 id="single-source-rs">Single-Source RenderScript</h2> 459 460<p>Android 7.0 (API level 24) introduces a new programming feature called <em>Single-Source 461RenderScript</em>, in which kernels are launched from the script where they are defined, rather than 462from Java. This approach is currently limited to mapping kernels, which are simply referred to as "kernels" 463in this section for conciseness. This new feature also supports creating allocations of type 464<a href={@docRoot}guide/topics/renderscript/reference/rs_object_types.html#android_rs:rs_allocation> 465<code>rs_allocation</code></a> from inside the script. It is now possible to 466implement a whole algorithm solely within a script, even if multiple kernel launches are required. 467The benefit is twofold: more readable code, because it keeps the implementation of an algorithm in 468one language; and potentially faster code, because of fewer transitions between Java and 469RenderScript across multiple kernel launches.</p> 470 471<p>In Single-Source RenderScript, you write kernels as described in <a href="#writing-an-rs-kernel"> 472Writing a RenderScript Kernel</a>. You then write an invokable function that calls 473<a href="{@docRoot}guide/topics/renderscript/reference/rs_for_each.html#android_rs:rsForEach"> 474<code>rsForEach()</code></a> to launch them. That API takes a kernel function as the first 475parameter, followed by input and output allocations. A similar API 476<a href="{@docRoot}guide/topics/renderscript/reference/rs_for_each.html#android_rs:rsForEachWithOptions"> 477<code>rsForEachWithOptions()</code></a> takes an extra argument of type 478<a href="{@docRoot}guide/topics/renderscript/reference/rs_for_each.html#android_rs:rs_script_call_t"> 479<code>rs_script_call_t</code></a>, which specifies a subset of the elements from the input and 480output allocations for the kernel function to process.</p> 481 482<p>To start RenderScript computation, you call the invokable function from Java. 483Follow the steps in <a href="#using-rs-from-java">Using RenderScript from Java Code</a>. 484In the step <a href="#launching_kernels">launch the appropriate kernels</a>, call 485the invokable function using <code>invoke_<i>function_name</i>()</code>, which will start the 486whole computation, including launching kernels.</p> 487 488<p>Allocations are often needed to save and pass 489intermediate results from one kernel launch to another. You can create them using 490<a href="{@docRoot}guide/topics/renderscript/reference/rs_allocation_create.html#android_rs:rsCreateAllocation"> 491rsCreateAllocation()</a>. One easy-to-use form of that API is <code> 492rsCreateAllocation_<T><W>(…)</code>, where <i>T</i> is the data type for an 493element, and <i>W</i> is the vector width for the element. The API takes the sizes in 494dimensions X, Y, and Z as arguments. For 1D or 2D allocations, the size for dimension Y or Z can 495be omitted. For example, <code>rsCreateAllocation_uchar4(16384)</code> creates a 1D allocation of 49616384 elements, each of which is of type <code>uchar4</code>.</p> 497 498<p>Allocations are managed by the system automatically. You 499do not have to explicitly release or free them. However, you can call 500<a href="{@docRoot}guide/topics/renderscript/reference/rs_object_info.html#android_rs:rsClearObject"> 501<code>rsClearObject(rs_allocation* alloc)</code></a> to indicate you no longer need the handle 502<code>alloc</code> to the underlying allocation, 503so that the system can free up resources as early as possible.</p> 504 505<p>The <a href="#writing-an-rs-kernel">Writing a RenderScript Kernel</a> section contains an example 506kernel that inverts an image. The example below expands that to apply more than one effect to an image, 507using Single-Source RenderScript. It includes another kernel, <code>greyscale</code>, which turns a 508color image into black-and-white. An invokable function <code>process()</code> then applies those two kernels 509consecutively to an input image, and produces an output image. Allocations for both the input and 510the output are passed in as arguments of type 511<a href={@docRoot}guide/topics/renderscript/reference/rs_object_types.html#android_rs:rs_allocation> 512<code>rs_allocation</code></a>.</p> 513 514<pre> 515// File: singlesource.rs 516 517#pragma version(1) 518#pragma rs java_package_name(com.android.rssample) 519 520static const float4 weight = {0.299f, 0.587f, 0.114f, 0.0f}; 521 522uchar4 RS_KERNEL invert(uchar4 in, uint32_t x, uint32_t y) { 523 uchar4 out = in; 524 out.r = 255 - in.r; 525 out.g = 255 - in.g; 526 out.b = 255 - in.b; 527 return out; 528} 529 530uchar4 RS_KERNEL greyscale(uchar4 in) { 531 const float4 inF = rsUnpackColor8888(in); 532 const float4 outF = (float4){ dot(inF, weight) }; 533 return rsPackColorTo8888(outF); 534} 535 536void process(rs_allocation inputImage, rs_allocation outputImage) { 537 const uint32_t imageWidth = rsAllocationGetDimX(inputImage); 538 const uint32_t imageHeight = rsAllocationGetDimY(inputImage); 539 rs_allocation tmp = rsCreateAllocation_uchar4(imageWidth, imageHeight); 540 rsForEach(invert, inputImage, tmp); 541 rsForEach(greyscale, tmp, outputImage); 542} 543</pre> 544 545<p>You can call the <code>process()</code> function from Java as follows:</p> 546 547<pre> 548// File SingleSource.java 549 550RenderScript RS = RenderScript.create(context); 551ScriptC_singlesource script = new ScriptC_singlesource(RS); 552Allocation inputAllocation = Allocation.createFromBitmapResource( 553 RS, getResources(), R.drawable.image); 554Allocation outputAllocation = Allocation.createTyped( 555 RS, inputAllocation.getType(), 556 Allocation.USAGE_SCRIPT | Allocation.USAGE_IO_OUTPUT); 557script.invoke_process(inputAllocation, outputAllocation); 558</pre> 559 560<p>This example shows how an algorithm that involves two kernel launches can be implemented completely 561in the RenderScript language itself. Without Single-Source 562RenderScript, you would have to launch both kernels from the Java code, separating kernel launches 563from kernel definitions and making it harder to understand the whole algorithm. Not only is the 564Single-Source RenderScript code easier to read, it also eliminates the transitioning 565between Java and the script across kernel launches. Some iterative algorithms may launch kernels 566hundreds of times, making the overhead of such transitioning considerable.</p> 567 568<h2 id="reduction-in-depth">Reduction Kernels in Depth</h2> 569 570<p><i>Reduction</i> is the process of combining a collection of data into a single 571value. This is a useful primitive in parallel programming, with applications such as the 572following:</p> 573<ul> 574 <li>computing the sum or product over all the data</li> 575 <li>computing logical operations (<code>and</code>, <code>or</code>, <code>xor</code>) 576 over all the data</li> 577 <li>finding the minimum or maximum value within the data</li> 578 <li>searching for a specific value or for the coordinate of a specific value within the data</li> 579</ul> 580 581<p>In Android 7.0 (API level 24) and later, RenderScript supports <i>reduction kernels</i> to allow 582efficient user-written reduction algorithms. You may launch reduction kernels on inputs with 5831, 2, or 3 dimensions.<p> 584 585<p>An example above shows a simple <a href="#example-addint">addint</a> reduction kernel. 586Here is a more complicated <a id="example-findMinAndMax">findMinAndMax</a> reduction kernel 587that finds the locations of the minimum and maximum <code>long</code> values in a 5881-dimensional {@link android.renderscript.Allocation}:</p> 589 590<pre> 591#define LONG_MAX (long)((1UL << 63) - 1) 592#define LONG_MIN (long)(1UL << 63) 593 594#pragma rs reduce(findMinAndMax) \ 595 initializer(fMMInit) accumulator(fMMAccumulator) \ 596 combiner(fMMCombiner) outconverter(fMMOutConverter) 597 598// Either a value and the location where it was found, or <a href="#INITVAL">INITVAL</a>. 599typedef struct { 600 long val; 601 int idx; // -1 indicates <a href="#INITVAL">INITVAL</a> 602} IndexedVal; 603 604typedef struct { 605 IndexedVal min, max; 606} MinAndMax; 607 608// In discussion below, this initial value { { LONG_MAX, -1 }, { LONG_MIN, -1 } } 609// is called <a id="INITVAL">INITVAL</a>. 610static void fMMInit(MinAndMax *accum) { 611 accum->min.val = LONG_MAX; 612 accum->min.idx = -1; 613 accum->max.val = LONG_MIN; 614 accum->max.idx = -1; 615} 616 617//---------------------------------------------------------------------- 618// In describing the behavior of the accumulator and combiner functions, 619// it is helpful to describe hypothetical functions 620// IndexedVal min(IndexedVal a, IndexedVal b) 621// IndexedVal max(IndexedVal a, IndexedVal b) 622// MinAndMax minmax(MinAndMax a, MinAndMax b) 623// MinAndMax minmax(MinAndMax accum, IndexedVal val) 624// 625// The effect of 626// IndexedVal min(IndexedVal a, IndexedVal b) 627// is to return the IndexedVal from among the two arguments 628// whose val is lesser, except that when an IndexedVal 629// has a negative index, that IndexedVal is never less than 630// any other IndexedVal; therefore, if exactly one of the 631// two arguments has a negative index, the min is the other 632// argument. Like ordinary arithmetic min and max, this function 633// is commutative and associative; that is, 634// 635// min(A, B) == min(B, A) // commutative 636// min(A, min(B, C)) == min((A, B), C) // associative 637// 638// The effect of 639// IndexedVal max(IndexedVal a, IndexedVal b) 640// is analogous (greater . . . never greater than). 641// 642// Then there is 643// 644// MinAndMax minmax(MinAndMax a, MinAndMax b) { 645// return MinAndMax(min(a.min, b.min), max(a.max, b.max)); 646// } 647// 648// Like ordinary arithmetic min and max, the above function 649// is commutative and associative; that is: 650// 651// minmax(A, B) == minmax(B, A) // commutative 652// minmax(A, minmax(B, C)) == minmax((A, B), C) // associative 653// 654// Finally define 655// 656// MinAndMax minmax(MinAndMax accum, IndexedVal val) { 657// return minmax(accum, MinAndMax(val, val)); 658// } 659//---------------------------------------------------------------------- 660 661// This function can be explained as doing: 662// *accum = minmax(*accum, IndexedVal(in, x)) 663// 664// This function simply computes minimum and maximum values as if 665// INITVAL.min were greater than any other minimum value and 666// INITVAL.max were less than any other maximum value. Note that if 667// *accum is INITVAL, then this function sets 668// *accum = IndexedVal(in, x) 669// 670// After this function is called, both accum->min.idx and accum->max.idx 671// will have nonnegative values: 672// - x is always nonnegative, so if this function ever sets one of the 673// idx fields, it will set it to a nonnegative value 674// - if one of the idx fields is negative, then the corresponding 675// val field must be LONG_MAX or LONG_MIN, so the function will always 676// set both the val and idx fields 677static void fMMAccumulator(MinAndMax *accum, long in, int x) { 678 IndexedVal me; 679 me.val = in; 680 me.idx = x; 681 682 if (me.val <= accum->min.val) 683 accum->min = me; 684 if (me.val >= accum->max.val) 685 accum->max = me; 686} 687 688// This function can be explained as doing: 689// *accum = minmax(*accum, *val) 690// 691// This function simply computes minimum and maximum values as if 692// INITVAL.min were greater than any other minimum value and 693// INITVAL.max were less than any other maximum value. Note that if 694// one of the two accumulator data items is INITVAL, then this 695// function sets *accum to the other one. 696static void fMMCombiner(MinAndMax *accum, 697 const MinAndMax *val) { 698 if ((accum->min.idx < 0) || (val->min.val < accum->min.val)) 699 accum->min = val->min; 700 if ((accum->max.idx < 0) || (val->max.val > accum->max.val)) 701 accum->max = val->max; 702} 703 704static void fMMOutConverter(int2 *result, 705 const MinAndMax *val) { 706 result->x = val->min.idx; 707 result->y = val->max.idx; 708} 709</pre> 710 711<p class="note"><strong>NOTE:</strong> There are more example reduction 712 kernels <a href="#more-example">here</a>.</p> 713 714<p>In order to run a reduction kernel, the RenderScript runtime creates <em>one or more</em> 715variables called <a id="accumulator-data-items"><strong><i>accumulator data 716items</i></strong></a> to hold the state of the reduction process. The RenderScript runtime 717picks the number of accumulator data items in such a way as to maximize performance. The type 718of the accumulator data items (<i>accumType</i>) is determined by the kernel's <i>accumulator 719function</i> -- the first argument to that function is a pointer to an accumulator data 720item. By default, every accumulator data item is initialized to zero (as if 721by <code>memset</code>); however, you may write an <i>initializer function</i> to do something 722different.</p> 723 724<p class="note"><strong>Example:</strong> In the <a href="#example-addint">addint</a> 725kernel, the accumulator data items (of type <code>int</code>) are used to add up input 726values. There is no initializer function, so each accumulator data item is initialized to 727zero.</p> 728 729<p class="note"><strong>Example:</strong> In 730the <a href="#example-findMinAndMax">findMinAndMax</a> kernel, the accumulator data items 731(of type <code>MinAndMax</code>) are used to keep track of the minimum and maximum values 732found so far. There is an initializer function to set these to <code>LONG_MAX</code> and 733<code>LONG_MIN</code>, respectively; and to set the locations of these values to -1, indicating that 734the values are not actually present in the (empty) portion of the input that has been 735processed.</p> 736 737<p>RenderScript calls your accumulator function once for every coordinate in the 738input(s). Typically, your function should update the accumulator data item in some way 739according to the input.</p> 740 741<p class="note"><strong>Example:</strong> In the <a href="#example-addint">addint</a> 742kernel, the accumulator function adds the value of an input Element to the accumulator 743data item.</p> 744 745<p class="note"><strong>Example:</strong> In 746the <a href="#example-findMinAndMax">findMinAndMax</a> kernel, the accumulator function 747checks to see whether the value of an input Element is less than or equal to the minimum 748value recorded in the accumulator data item and/or greater than or equal to the maximum 749value recorded in the accumulator data item, and updates the accumulator data item 750accordingly.</p> 751 752<p>After the accumulator function has been called once for every coordinate in the input(s), 753RenderScript must <strong>combine</strong> the <a href="#accumulator-data-items">accumulator 754data items</a> together into a single accumulator data item. You may write a <i>combiner 755function</i> to do this. If the accumulator function has a single input and 756no <a href="#special-arguments">special arguments</a>, then you do not need to write a combiner 757function; RenderScript will use the accumulator function to combine the accumulator data 758items. (You may still write a combiner function if this default behavior is not what you 759want.)</p> 760 761<p class="note"><strong>Example:</strong> In the <a href="#example-addint">addint</a> 762kernel, there is no combiner function, so the accumulator function will be used. This is 763the correct behavior, because if we split a collection of values into two pieces, and we 764add up the values in those two pieces separately, adding up those two sums is the same as 765adding up the entire collection.</p> 766 767<p class="note"><strong>Example:</strong> In 768the <a href="#example-findMinAndMax">findMinAndMax</a> kernel, the combiner function 769checks to see whether the minimum value recorded in the "source" accumulator data 770item <code>*val</code> is less then the minimum value recorded in the "destination" 771accumulator data item <code>*accum</code>, and updates <code>*accum</code> 772accordingly. It does similar work for the maximum value. This updates <code>*accum</code> 773to the state it would have had if all of the input values had been accumulated into 774<code>*accum</code> rather than some into <code>*accum</code> and some into 775<code>*val</code>.</p> 776 777<p>After all of the accumulator data items have been combined, RenderScript determines 778the result of the reduction to return to Java. You may write an <i>outconverter 779function</i> to do this. You do not need to write an outconverter function if you want 780the final value of the combined accumulator data items to be the result of the reduction.</p> 781 782<p class="note"><strong>Example:</strong> In the <a href="#example-addint">addint</a> kernel, 783there is no outconverter function. The final value of the combined data items is the sum of 784all Elements of the input, which is the value we want to return.</p> 785 786<p class="note"><strong>Example:</strong> In 787the <a href="#example-findMinAndMax">findMinAndMax</a> kernel, the outconverter function 788initializes an <code>int2</code> result value to hold the locations of the minimum and 789maximum values resulting from the combination of all of the accumulator data items.</p> 790 791<h3 id="writing-reduction-kernel">Writing a reduction kernel</h3> 792 793<p><code>#pragma rs reduce</code> defines a reduction kernel by 794specifying its name and the names and roles of the functions that make 795up the kernel. All such functions must be 796<code>static</code>. A reduction kernel always requires an <code>accumulator</code> 797function; you can omit some or all of the other functions, depending on what you want the 798kernel to do.</p> 799 800<pre>#pragma rs reduce(<i>kernelName</i>) \ 801 initializer(<i>initializerName</i>) \ 802 accumulator(<i>accumulatorName</i>) \ 803 combiner(<i>combinerName</i>) \ 804 outconverter(<i>outconverterName</i>) 805</pre> 806 807<p>The meaning of the items in the <code>#pragma</code> is as follows:</p> 808<ul> 809 810<li><code>reduce(<i>kernelName</i>)</code> (mandatory): Specifies that a reduction kernel is 811being defined. A reflected Java method <code>reduce_<i>kernelName</i></code> will launch the 812kernel.</li> 813 814<li><p><code>initializer(<i>initializerName</i>)</code> (optional): Specifies the name of the 815initializer function for this reduction kernel. When you launch the kernel, RenderScript calls 816this function once for each <a href="#accumulator-data-items">accumulator data item</a>. The 817function must be defined like this:</p> 818 819<pre>static void <i>initializerName</i>(<i>accumType</i> *accum) { … }</pre> 820 821<p><code>accum</code> is a pointer to an accumulator data item for this function to 822initialize.</p> 823 824<p>If you do not provide an initializer function, RenderScript initializes every accumulator 825data item to zero (as if by <code>memset</code>), behaving as if there were an initializer 826function that looks like this:</p> 827<pre>static void <i>initializerName</i>(<i>accumType</i> *accum) { 828 memset(accum, 0, sizeof(*accum)); 829}</pre> 830</li> 831 832<li><p><code><a id="accumulator-function">accumulator(<i>accumulatorName</i>)</a></code> 833(mandatory): Specifies the name of the accumulator function for this 834reduction kernel. When you launch the kernel, RenderScript calls 835this function once for every coordinate in the input(s), to update an 836accumulator data item in some way according to the input(s). The function 837must be defined like this:</p> 838 839<pre> 840static void <i>accumulatorName</i>(<i>accumType</i> *accum, 841 <i>in1Type</i> in1, <i>…,</i> <i>inNType</i> in<i>N</i> 842 <i>[, specialArguments]</i>) { … } 843</pre> 844 845<p><code>accum</code> is a pointer to an accumulator data item for this function to 846modify. <code>in1</code> through <code>in<i>N</i></code> are one <em>or more</em> arguments that 847are automatically filled in based on the inputs passed to the kernel launch, one argument 848per input. The accumulator function may optionally take any of the <a 849href="#special-arguments">special arguments</a>.</p> 850 851<p>An example kernel with multiple inputs is <a href="#dot-product"><code>dotProduct</code></a>.</p> 852</li> 853 854<li><code><a id="combiner-function">combiner(<i>combinerName</i>)</a></code> 855(optional): Specifies the name of the combiner function for this 856reduction kernel. After RenderScript calls the accumulator function 857once for every coordinate in the input(s), it calls this function as many 858times as necessary to combine all accumulator data items into a single 859accumulator data item. The function must be defined like this:</p> 860 861<pre>static void <i>combinerName</i>(<i>accumType</i> *accum, const <i>accumType</i> *other) { … }</pre> 862 863<p><code>accum</code> is a pointer to a "destination" accumulator data item for this 864function to modify. <code>other</code> is a pointer to a "source" accumulator data item 865for this function to "combine" into <code>*accum</code>.</p> 866 867<p class="note"><strong>NOTE:</strong> It is possible 868 that <code>*accum</code>, <code>*other</code>, or both have been initialized but have never 869 been passed to the accumulator function; that is, one or both have never been updated 870 according to any input data. For example, in 871 the <a href="#example-findMinAndMax">findMinAndMax</a> kernel, the combiner 872 function <code>fMMCombiner</code> explicitly checks for <code>idx < 0</code> because that 873 indicates such an accumulator data item, whose value is <a href="#INITVAL">INITVAL</a>.</p> 874 875<p>If you do not provide a combiner function, RenderScript uses the accumulator function in its 876place, behaving as if there were a combiner function that looks like this:</p> 877 878<pre>static void <i>combinerName</i>(<i>accumType</i> *accum, const <i>accumType</i> *other) { 879 <i>accumulatorName</i>(accum, *other); 880}</pre> 881 882<p>A combiner function is mandatory if the kernel has more than one input, if the input data 883 type is not the same as the accumulator data type, or if the accumulator function takes one 884 or more <a href="#special-arguments">special arguments</a>.</p> 885</li> 886 887<li><p><code><a id="outconverter-function">outconverter(<i>outconverterName</i>)</a></code> 888(optional): Specifies the name of the outconverter function for this 889reduction kernel. After RenderScript combines all of the accumulator 890data items, it calls this function to determine the result of the 891reduction to return to Java. The function must be defined like 892this:</p> 893 894<pre>static void <i>outconverterName</i>(<i>resultType</i> *result, const <i>accumType</i> *accum) { … }</pre> 895 896<p><code>result</code> is a pointer to a result data item (allocated but not initialized 897by the RenderScript runtime) for this function to initialize with the result of the 898reduction. <i>resultType</i> is the type of that data item, which need not be the same 899as <i>accumType</i>. <code>accum</code> is a pointer to the final accumulator data item 900computed by the <a href="#combiner-function">combiner function</a>.</p> 901 902<p>If you do not provide an outconverter function, RenderScript copies the final accumulator 903data item to the result data item, behaving as if there were an outconverter function that 904looks like this:</p> 905 906<pre>static void <i>outconverterName</i>(<i>accumType</i> *result, const <i>accumType</i> *accum) { 907 *result = *accum; 908}</pre> 909 910<p>If you want a different result type than the accumulator data type, then the outconverter function is mandatory.</p> 911</li> 912 913</ul> 914 915<p>Note that a kernel has input types, an accumulator data item type, and a result type, 916 none of which need to be the same. For example, in 917 the <a href="#example-findMinAndMax">findMinAndMax</a> kernel, the input 918 type <code>long</code>, accumulator data item type <code>MinAndMax</code>, and result 919 type <code>int2</code> are all different.</p> 920 921<h4 id="assume">What can't you assume?</h4> 922 923<p>You must not rely on the number of accumulator data items created by RenderScript for a 924 given kernel launch. There is no guarantee that two launches of the same kernel with the 925 same input(s) will create the same number of accumulator data items.</p> 926 927<p>You must not rely on the order in which RenderScript calls the initializer, accumulator, and 928 combiner functions; it may even call some of them in parallel. There is no guarantee that 929 two launches of the same kernel with the same input will follow the same order. The only 930 guarantee is that only the initializer function will ever see an uninitialized accumulator 931 data item. For example:</p> 932<ul> 933<li>There is no guarantee that all accumulator data items will be initialized before the 934 accumulator function is called, although it will only be called on an initialized accumulator 935 data item.</li> 936<li>There is no guarantee on the order in which input Elements are passed to the accumulator 937 function.</li> 938<li>There is no guarantee that the accumulator function has been called for all input Elements 939 before the combiner function is called.</li> 940</ul> 941 942<p>One consequence of this is that the <a href="#example-findMinAndMax">findMinAndMax</a> 943 kernel is not deterministic: If the input contains more than one occurrence of the same 944 minimum or maximum value, you have no way of knowing which occurrence the kernel will 945 find.</p> 946 947<h4 id="guarantee">What must you guarantee?</h4> 948 949<p>Because the RenderScript system can choose to execute a kernel <a href="#assume">in many 950 different ways</a>, you must follow certain rules to ensure that your kernel behaves the 951 way you want. If you do not follow these rules, you may get incorrect results, 952 nondeterministic behavior, or runtime errors.</p> 953 954<p>The rules below often say that two accumulator data items must have "<a id="the-same">the 955 same value"</a>. What does this mean? That depends on what you want the kernel to do. For 956 a mathematical reduction such as <a href="#example-addint">addint</a>, it usually makes sense 957 for "the same" to mean mathematical equality. For a "pick any" search such 958 as <a href="#example-findMinAndMax">findMinAndMax</a> ("find the location of minimum and 959 maximum input values") where there might be more than one occurrence of identical input 960 values, all locations of a given input value must be considered "the same". You could write 961 a similar kernel to "find the location of <em>leftmost</em> minimum and maximum input values" 962 where (say) a minimum value at location 100 is preferred over an identical minimum value at location 963 200; for this kernel, "the same" would mean identical <em>location</em>, not merely 964 identical <em>value</em>, and the accumulator and combiner functions would have to be 965 different than those for <a href="#example-findMinAndMax">findMinAndMax</a>.</p> 966 967<strong>The initializer function must create an <i>identity value</i>.</strong> That is, 968 if <code><i>I</i></code> and <code><i>A</i></code> are accumulator data items initialized 969 by the initializer function, and <code><i>I</i></code> has never been passed to the 970 accumulator function (but <code><i>A</i></code> may have been), then 971<ul> 972<li><code><i>combinerName</i>(&<i>A</i>, &<i>I</i>)</code> must 973 leave <code><i>A</i></code> <a href="#the-same">the same</a></li> 974<li><code><i>combinerName</i>(&<i>I</i>, &<i>A</i>)</code> must 975 leave <code><i>I</i></code> <a href="#the-same">the same</a> as <code><i>A</i></code></li> 976</ul> 977<p class="note"><strong>Example:</strong> In the <a href="#example-addint">addint</a> 978 kernel, an accumulator data item is initialized to zero. The combiner function for this 979 kernel performs addition; zero is the identity value for addition.</p> 980<div class="note"> 981<p><strong>Example:</strong> In the <a href="#example-findMinAndMax">findMinAndMax</a> 982 kernel, an accumulator data item is initialized 983 to <a href="#INITVAL"><code>INITVAL</code></a>. 984<ul> 985<li><code>fMMCombiner(&<i>A</i>, &<i>I</i>)</code> leaves <code><i>A</i></code> the same, 986 because <code><i>I</i></code> is <code>INITVAL</code>.</li> 987<li><code>fMMCombiner(&<i>I</i>, &<i>A</i>)</code> sets <code><i>I</i></code> 988 to <code><i>A</i></code>, because <code><i>I</i></code> is <code>INITVAL</code>.</li> 989</ul> 990Therefore, <code>INITVAL</code> is indeed an identity value. 991</p></div> 992 993<p><strong>The combiner function must be <i>commutative</i>.</strong> That is, 994 if <code><i>A</i></code> and <code><i>B</i></code> are accumulator data items initialized 995 by the initializer function, and that may have been passed to the accumulator function zero 996 or more times, then <code><i>combinerName</i>(&<i>A</i>, &<i>B</i>)</code> must 997 set <code><i>A</i></code> to <a href="#the-same">the same value</a> 998 that <code><i>combinerName</i>(&<i>B</i>, &<i>A</i>)</code> 999 sets <code><i>B</i></code>.</p> 1000<p class="note"><strong>Example:</strong> In the <a href="#example-addint">addint</a> 1001 kernel, the combiner function adds the two accumulator data item values; addition is 1002 commutative.</p> 1003<div class="note"> 1004<p><strong>Example:</strong> In the <a href="#example-findMinAndMax">findMinAndMax</a> kernel, 1005<pre> 1006fMMCombiner(&<i>A</i>, &<i>B</i>) 1007</pre> 1008is the same as 1009<pre> 1010<i>A</i> = minmax(<i>A</i>, <i>B</i>) 1011</pre> 1012and <code>minmax</code> is commutative, so <code>fMMCombiner</code> is also. 1013</p> 1014</div> 1015 1016<p><strong>The combiner function must be <i>associative</i>.</strong> That is, 1017 if <code><i>A</i></code>, <code><i>B</i></code>, and <code><i>C</i></code> are 1018 accumulator data items initialized by the initializer function, and that may have been passed 1019 to the accumulator function zero or more times, then the following two code sequences must 1020 set <code><i>A</i></code> to <a href="#the-same">the same value</a>:</p> 1021<ul> 1022<li><pre> 1023<i>combinerName</i>(&<i>A</i>, &<i>B</i>); 1024<i>combinerName</i>(&<i>A</i>, &<i>C</i>); 1025</pre></li> 1026<li><pre> 1027<i>combinerName</i>(&<i>B</i>, &<i>C</i>); 1028<i>combinerName</i>(&<i>A</i>, &<i>B</i>); 1029</pre></li> 1030</ul> 1031<div class="note"> 1032<p><strong>Example:</strong> In the <a href="#example-addint">addint</a> kernel, the 1033 combiner function adds the two accumulator data item values: 1034<ul> 1035<li><pre> 1036<i>A</i> = <i>A</i> + <i>B</i> 1037<i>A</i> = <i>A</i> + <i>C</i> 1038// Same as 1039// <i>A</i> = (<i>A</i> + <i>B</i>) + <i>C</i> 1040</pre></li> 1041<li><pre> 1042<i>B</i> = <i>B</i> + <i>C</i> 1043<i>A</i> = <i>A</i> + <i>B</i> 1044// Same as 1045// <i>A</i> = <i>A</i> + (<i>B</i> + <i>C</i>) 1046// <i>B</i> = <i>B</i> + <i>C</i> 1047</li> 1048</ul> 1049Addition is associative, and so the combiner function is also. 1050</p> 1051</div> 1052<div class="note"> 1053<p><strong>Example:</strong> In the <a href="#example-findMinAndMax">findMinAndMax</a> kernel, 1054<pre> 1055fMMCombiner(&<i>A</i>, &<i>B</i>) 1056</pre> 1057is the same as 1058<pre> 1059<i>A</i> = minmax(<i>A</i>, <i>B</i>) 1060</pre> 1061So the two sequences are 1062<ul> 1063<li><pre> 1064<i>A</i> = minmax(<i>A</i>, <i>B</i>) 1065<i>A</i> = minmax(<i>A</i>, <i>C</i>) 1066// Same as 1067// <i>A</i> = minmax(minmax(<i>A</i>, <i>B</i>), <i>C</i>) 1068</pre></li> 1069<li><pre> 1070<i>B</i> = minmax(<i>B</i>, <i>C</i>) 1071<i>A</i> = minmax(<i>A</i>, <i>B</i>) 1072// Same as 1073// <i>A</i> = minmax(<i>A</i>, minmax(<i>B</i>, <i>C</i>)) 1074// <i>B</i> = minmax(<i>B</i>, <i>C</i>) 1075</pre></li> 1076<code>minmax</code> is associative, and so <code>fMMCombiner</code> is also. 1077</p> 1078</div> 1079 1080<p><strong>The accumulator function and combiner function together must obey the <i>basic 1081 folding rule</i>.</strong> That is, if <code><i>A</i></code> 1082 and <code><i>B</i></code> are accumulator data items, <code><i>A</i></code> has been 1083 initialized by the initializer function and may have been passed to the accumulator function 1084 zero or more times, <code><i>B</i></code> has not been initialized, and <i>args</i> is 1085 the list of input arguments and special arguments for a particular call to the accumulator 1086 function, then the following two code sequences must set <code><i>A</i></code> 1087 to <a href="#the-same">the same value</a>:</p> 1088<ul> 1089<li><pre> 1090<i>accumulatorName</i>(&<i>A</i>, <i>args</i>); // statement 1 1091</pre></li> 1092<li><pre> 1093<i>initializerName</i>(&<i>B</i>); // statement 2 1094<i>accumulatorName</i>(&<i>B</i>, <i>args</i>); // statement 3 1095<i>combinerName</i>(&<i>A</i>, &<i>B</i>); // statement 4 1096</pre></li> 1097</ul> 1098<div class="note"> 1099<p><strong>Example:</strong> In the <a href="#example-addint">addint</a> kernel, for an input value <i>V</i>: 1100<ul> 1101<li>Statement 1 is the same as <code>A += <i>V</i></code></li> 1102<li>Statement 2 is the same as <code>B = 0</code></li> 1103<li>Statement 3 is the same as <code>B += <i>V</i></code>, which is the same as <code>B = <i>V</i></code></li> 1104<li>Statement 4 is the same as <code>A += B</code>, which is the same as <code>A += <i>V</i></code></li> 1105</ul> 1106Statements 1 and 4 set <code><i>A</i></code> to the same value, and so this kernel obeys the 1107basic folding rule. 1108</p> 1109</div> 1110<div class="note"> 1111<p><strong>Example:</strong> In the <a href="#example-findMinAndMax">findMinAndMax</a> kernel, for an input 1112 value <i>V</i> at coordinate <i>X</i>: 1113<ul> 1114<li>Statement 1 is the same as <code>A = minmax(A, IndexedVal(<i>V</i>, <i>X</i>))</code></li> 1115<li>Statement 2 is the same as <code>B = <a href="#INITVAL">INITVAL</a></code></li> 1116<li>Statement 3 is the same as 1117<pre> 1118B = minmax(B, IndexedVal(<i>V</i>, <i>X</i>)) 1119</pre> 1120which, because <i>B</i> is the initial value, is the same as 1121<pre> 1122B = IndexedVal(<i>V</i>, <i>X</i>) 1123</pre> 1124</li> 1125<li>Statement 4 is the same as 1126<pre> 1127A = minmax(A, B) 1128</pre> 1129which is the same as 1130<pre> 1131A = minmax(A, IndexedVal(<i>V</i>, <i>X</i>)) 1132</pre> 1133</ul> 1134Statements 1 and 4 set <code><i>A</i></code> to the same value, and so this kernel obeys the 1135basic folding rule. 1136</p> 1137</div> 1138 1139<h3 id="calling-reduction-kernel">Calling a reduction kernel from Java code</h3> 1140 1141<p>For a reduction kernel named <i>kernelName</i> defined in the 1142file <code><i>filename</i>.rs</code>, there are three methods reflected in the 1143class <code>ScriptC_<i>filename</i></code>:</p> 1144 1145<pre> 1146// Method 1 1147public <i>javaFutureType</i> reduce_<i>kernelName</i>(Allocation ain1, <i>…,</i> 1148 Allocation ain<i>N</i>); 1149 1150// Method 2 1151public <i>javaFutureType</i> reduce_<i>kernelName</i>(Allocation ain1, <i>…,</i> 1152 Allocation ain<i>N</i>, 1153 Script.LaunchOptions sc); 1154 1155// Method 3 1156public <i>javaFutureType</i> reduce_<i>kernelName</i>(<i><a href="#devec">devecSiIn1Type</a></i>[] in1, …, 1157 <i><a href="#devec">devecSiInNType</a></i>[] in<i>N</i>); 1158</pre> 1159 1160<p>Here are some examples of calling the <a href="#example-addint">addint</a> kernel:</p> 1161<pre> 1162ScriptC_example script = new ScriptC_example(mRenderScript); 1163 1164// 1D array 1165// and obtain answer immediately 1166int input1[] = <i>…</i>; 1167int sum1 = script.reduce_addint(input1).get(); // Method 3 1168 1169// 2D allocation 1170// and do some additional work before obtaining answer 1171Type.Builder typeBuilder = 1172 new Type.Builder(RS, Element.I32(RS)); 1173typeBuilder.setX(<i>…</i>); 1174typeBuilder.setY(<i>…</i>); 1175Allocation input2 = createTyped(RS, typeBuilder.create()); 1176<i>populateSomehow</i>(input2); // fill in input Allocation with data 1177script.result_int result2 = script.reduce_addint(input2); // Method 1 1178<i>doSomeAdditionalWork</i>(); // might run at same time as reduction 1179int sum2 = result2.get(); 1180</pre> 1181 1182<p><strong>Method 1</strong> has one input {@link android.renderscript.Allocation} argument for 1183 every input argument in the kernel's <a href="#accumulator-function">accumulator 1184 function</a>. The RenderScript runtime checks to ensure that all of the input Allocations 1185 have the same dimensions and that the {@link android.renderscript.Element} type of each of 1186 the input Allocations matches that of the corresponding input argument of the accumulator 1187 function's prototype. If any of these checks fail, RenderScript throws an exception. The 1188 kernel executes over every coordinate in those dimensions.</p> 1189 1190<p><strong>Method 2</strong> is the same as Method 1 except that Method 2 takes an additional 1191 argument <code>sc</code> that can be used to limit the kernel execution to a subset of the 1192 coordinates.</p> 1193 1194<p><strong><a id="reduce-method-3">Method 3</a></strong> is the same as Method 1 except that 1195 instead of taking Allocation inputs it takes Java array inputs. This is a convenience that 1196 saves you from having to write code to explicitly create an Allocation and copy data to it 1197 from a Java array. <em>However, using Method 3 instead of Method 1 does not increase the 1198 performance of the code</em>. For each input array, Method 3 creates a temporary 1199 1-dimensional Allocation with the appropriate {@link android.renderscript.Element} type and 1200 {@link android.renderscript.Allocation#setAutoPadding} enabled, and copies the array to the 1201 Allocation as if by the appropriate <code>copyFrom()</code> method of {@link 1202 android.renderscript.Allocation}. It then calls Method 1, passing those temporary 1203 Allocations.</p> 1204<p class="note"><strong>NOTE:</strong> If your application will make multiple kernel calls with 1205 the same array, or with different arrays of the same dimensions and Element type, you may improve 1206 performance by explicitly creating, populating, and reusing Allocations yourself, instead of 1207 by using Method 3.</p> 1208<p><strong><i><a id="javaFutureType">javaFutureType</a></i></strong>, 1209 the return type of the reflected reduction methods, is a reflected 1210 static nested class within the <code>ScriptC_<i>filename</i></code> 1211 class. It represents the future result of a reduction 1212 kernel run. To obtain the actual result of the run, call 1213 the <code>get()</code> method of that class, which returns a value 1214 of type <i>javaResultType</i>. <code>get()</code> is <a href="#asynchronous-model">synchronous</a>.</p> 1215 1216<pre> 1217public class ScriptC_<i>filename</i> extends ScriptC { 1218 public static class <i>javaFutureType</i> { 1219 public <i>javaResultType</i> get() { … } 1220 } 1221} 1222</pre> 1223 1224<p><strong><i>javaResultType</i></strong> is determined from the <i>resultType</i> of the 1225 <a href="#outconverter-function">outconverter function</a>. Unless <i>resultType</i> is an 1226 unsigned type (scalar, vector, or array), <i>javaResultType</i> is the directly corresponding 1227 Java type. If <i>resultType</i> is an unsigned type and there is a larger Java signed type, 1228 then <i>javaResultType</i> is that larger Java signed type; otherwise, it is the directly 1229 corresponding Java type. For example:</p> 1230<ul> 1231<li>If <i>resultType</i> is <code>int</code>, <code>int2</code>, or <code>int[15]</code>, 1232 then <i>javaResultType</i> is <code>int</code>, <code>Int2</code>, 1233 or <code>int[]</code>. All values of <i>resultType</i> can be represented 1234 by <i>javaResultType</i>.</li> 1235<li>If <i>resultType</i> is <code>uint</code>, <code>uint2</code>, or <code>uint[15]</code>, 1236 then <i>javaResultType</i> is <code>long</code>, <code>Long2</code>, 1237 or <code>long[]</code>. All values of <i>resultType</i> can be represented 1238 by <i>javaResultType</i>.</li> 1239<li>If <i>resultType</i> is <code>ulong</code>, <code>ulong2</code>, 1240 or <code>ulong[15]</code>, then <i>javaResultType</i> 1241 is <code>long</code>, <code>Long2</code>, or <code>long[]</code>. There are certain values 1242 of <i>resultType</i> that cannot be represented by <i>javaResultType</i>.</li> 1243</ul> 1244 1245<p><strong><i>javaFutureType</i></strong> is the future result type corresponding 1246 to the <i>resultType</i> of the <a href="#outconverter-function">outconverter 1247 function</a>.</p> 1248<ul> 1249<li>If <i>resultType</i> is not an array type, then <i>javaFutureType</i> 1250 is <code>result_<i>resultType</i></code>.</li> 1251<li>If <i>resultType</i> is an array of length <i>Count</i> with members of type <i>memberType</i>, 1252 then <i>javaFutureType</i> is <code>resultArray<i>Count</i>_<i>memberType</i></code>.</li> 1253</ul> 1254 1255<p>For example:</p> 1256 1257<pre> 1258public class ScriptC_<i>filename</i> extends ScriptC { 1259 // for kernels with int result 1260 public static class result_int { 1261 public int get() { … } 1262 } 1263 1264 // for kernels with int[10] result 1265 public static class resultArray10_int { 1266 public int[] get() { … } 1267 } 1268 1269 // for kernels with int2 result 1270 // note that the Java type name "Int2" is not the same as the script type name "int2" 1271 public static class result_int2 { 1272 public Int2 get() { … } 1273 } 1274 1275 // for kernels with int2[10] result 1276 // note that the Java type name "Int2" is not the same as the script type name "int2" 1277 public static class resultArray10_int2 { 1278 public Int2[] get() { … } 1279 } 1280 1281 // for kernels with uint result 1282 // note that the Java type "long" is a wider signed type than the unsigned script type "uint" 1283 public static class result_uint { 1284 public long get() { … } 1285 } 1286 1287 // for kernels with uint[10] result 1288 // note that the Java type "long" is a wider signed type than the unsigned script type "uint" 1289 public static class resultArray10_uint { 1290 public long[] get() { … } 1291 } 1292 1293 // for kernels with uint2 result 1294 // note that the Java type "Long2" is a wider signed type than the unsigned script type "uint2" 1295 public static class result_uint2 { 1296 public Long2 get() { … } 1297 } 1298 1299 // for kernels with uint2[10] result 1300 // note that the Java type "Long2" is a wider signed type than the unsigned script type "uint2" 1301 public static class resultArray10_uint2 { 1302 public Long2[] get() { … } 1303 } 1304} 1305</pre> 1306 1307<p>If <i>javaResultType</i> is an object type (including an array type), each call 1308 to <code><i>javaFutureType</i>.get()</code> on the same instance will return the same 1309 object.</p> 1310 1311<p>If <i>javaResultType</i> cannot represent all values of type <i>resultType</i>, and a 1312 reduction kernel produces an unrepresentible value, 1313 then <code><i>javaFutureType</i>.get()</code> throws an exception.</p> 1314 1315<h4 id="devec">Method 3 and <i>devecSiInXType</i></h4> 1316 1317<p><strong><i>devecSiInXType</i></strong> is the Java type corresponding to 1318 the <i>inXType</i> of the corresponding argument of 1319 the <a href="#accumulator-function">accumulator function</a>. Unless <i>inXType</i> is an 1320 unsigned type or a vector type, <i>devecSiInXType</i> is the directly corresponding Java 1321 type. If <i>inXType</i> is an unsigned scalar type, then <i>devecSiInXType</i> is the 1322 Java type directly corresponding to the signed scalar type of the same 1323 size. If <i>inXType</i> is a signed vector type, then <i>devecSiInXType</i> is the Java 1324 type directly corresponding to the vector component type. If <i>inXType</i> is an unsigned 1325 vector type, then <i>devecSiInXType</i> is the Java type directly corresponding to the 1326 signed scalar type of the same size as the vector component type. For example:</p> 1327<ul> 1328<li>If <i>inXType</i> is <code>int</code>, then <i>devecSiInXType</i> 1329 is <code>int</code>.</li> 1330<li>If <i>inXType</i> is <code>int2</code>, then <i>devecSiInXType</i> 1331 is <code>int</code>. The array is a <em>flattened</em> representation: It has twice as 1332 many <em>scalar</em> Elements as the Allocation has 2-component <em>vector</em> 1333 Elements. This is the same way that the <code>copyFrom()</code> methods of {@link 1334 android.renderscript.Allocation} work.</li> 1335<li>If <i>inXType</i> is <code>uint</code>, then <i>deviceSiInXType</i> 1336 is <code>int</code>. A signed value in the Java array is interpreted as an unsigned value of 1337 the same bitpattern in the Allocation. This is the same way that the <code>copyFrom()</code> 1338 methods of {@link android.renderscript.Allocation} work.</li> 1339<li>If <i>inXType</i> is <code>uint2</code>, then <i>deviceSiInXType</i> 1340 is <code>int</code>. This is a combination of the way <code>int2</code> and <code>uint</code> 1341 are handled: The array is a flattened representation, and Java array signed values are 1342 interpreted as RenderScript unsigned Element values.</li> 1343</ul> 1344 1345<p>Note that for <a href="#reduce-method-3">Method 3</a>, input types are handled differently 1346than result types:</p> 1347 1348<ul> 1349<li>A script's vector input is flattened on the Java side, whereas a script's vector result is not.</li> 1350<li>A script's unsigned input is represented as a signed input of the same size on the Java 1351 side, whereas a script's unsigned result is represented as a widened signed type on the Java 1352 side (except in the case of <code>ulong</code>).</li> 1353</ul> 1354 1355<h3 id="more-example">More example reduction kernels</h3> 1356 1357<pre id="dot-product"> 1358#pragma rs reduce(dotProduct) \ 1359 accumulator(dotProductAccum) combiner(dotProductSum) 1360 1361// Note: No initializer function -- therefore, 1362// each accumulator data item is implicitly initialized to 0.0f. 1363 1364static void dotProductAccum(float *accum, float in1, float in2) { 1365 *accum += in1*in2; 1366} 1367 1368// combiner function 1369static void dotProductSum(float *accum, const float *val) { 1370 *accum += *val; 1371} 1372</pre> 1373 1374<pre> 1375// Find a zero Element in a 2D allocation; return (-1, -1) if none 1376#pragma rs reduce(fz2) \ 1377 initializer(fz2Init) \ 1378 accumulator(fz2Accum) combiner(fz2Combine) 1379 1380static void fz2Init(int2 *accum) { accum->x = accum->y = -1; } 1381 1382static void fz2Accum(int2 *accum, 1383 int inVal, 1384 int x /* special arg */, 1385 int y /* special arg */) { 1386 if (inVal==0) { 1387 accum->x = x; 1388 accum->y = y; 1389 } 1390} 1391 1392static void fz2Combine(int2 *accum, const int2 *accum2) { 1393 if (accum2->x >= 0) *accum = *accum2; 1394} 1395</pre> 1396 1397<pre> 1398// Note that this kernel returns an array to Java 1399#pragma rs reduce(histogram) \ 1400 accumulator(hsgAccum) combiner(hsgCombine) 1401 1402#define BUCKETS 256 1403typedef uint32_t Histogram[BUCKETS]; 1404 1405// Note: No initializer function -- 1406// therefore, each bucket is implicitly initialized to 0. 1407 1408static void hsgAccum(Histogram *h, uchar in) { ++(*h)[in]; } 1409 1410static void hsgCombine(Histogram *accum, 1411 const Histogram *addend) { 1412 for (int i = 0; i < BUCKETS; ++i) 1413 (*accum)[i] += (*addend)[i]; 1414} 1415 1416// Determines the mode (most frequently occurring value), and returns 1417// the value and the frequency. 1418// 1419// If multiple values have the same highest frequency, returns the lowest 1420// of those values. 1421// 1422// Shares functions with the histogram reduction kernel. 1423#pragma rs reduce(mode) \ 1424 accumulator(hsgAccum) combiner(hsgCombine) \ 1425 outconverter(modeOutConvert) 1426 1427static void modeOutConvert(int2 *result, const Histogram *h) { 1428 uint32_t mode = 0; 1429 for (int i = 1; i < BUCKETS; ++i) 1430 if ((*h)[i] > (*h)[mode]) mode = i; 1431 result->x = mode; 1432 result->y = (*h)[mode]; 1433} 1434</pre> 1435