• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1page.title=RenderScript
2parent.title=Computation
3parent.link=index.html
4
5@jd:body
6
7<div id="qv-wrapper">
8  <div id="qv">
9    <h2>In this document</h2>
10
11    <ol>
12      <li><a href="#writing-an-rs-kernel">Writing a RenderScript Kernel</a></li>
13      <li><a href="#access-rs-apis">Accessing RenderScript APIs from Java</a>
14        <ol>
15          <li><a href="#ide-setup">Setting Up Your Development Environment</a></li>
16        </ol>
17      </li>
18      <li><a href="#using-rs-from-java">Using RenderScript from Java Code</a></li>
19      <li><a href="#single-source-rs">Single-Source RenderScript</a></li>
20      <li><a href="#reduction-in-depth">Reduction Kernels in Depth</a>
21        <ol>
22          <li><a href="#writing-reduction-kernel">Writing a reduction kernel</a></li>
23          <li><a href="#calling-reduction-kernel">Calling a reduction kernel from Java code</a></li>
24          <li><a href="#more-example">More example reduction kernels</a></li>
25        </ol>
26      </li>
27    </ol>
28
29    <h2>Related Samples</h2>
30
31    <ol>
32      <li><a class="external-link"href="https://github.com/android/platform_development/tree/master/samples/RenderScript/HelloCompute">Hello
33      Compute</a></li>
34    </ol>
35  </div>
36</div>
37
38<p>RenderScript is a framework for running computationally intensive tasks at high performance on
39Android. RenderScript is primarily oriented for use with data-parallel computation, although serial
40workloads can benefit as well. The RenderScript runtime parallelizes
41work across processors available on a device, such as multi-core CPUs and GPUs. This allows
42you to focus on expressing algorithms rather than scheduling work. RenderScript is
43especially useful for applications performing image processing, computational photography, or
44computer vision.</p>
45
46<p>To begin with RenderScript, there are two main concepts you should understand:</p>
47<ul>
48
49<li>The <em>language</em> itself is a C99-derived language for writing high-performance compute
50code. <a href="#writing-an-rs-kernel">Writing a RenderScript Kernel</a> describes
51how to use it to write compute kernels.</li>
52
53<li>The <em>control API</em> is used for managing the lifetime of RenderScript resources and
54controlling kernel execution. It is available in three different languages: Java, C++ in Android
55NDK, and the C99-derived kernel language itself.
56<a href="#using-rs-from-java">Using RenderScript from Java Code</a> and
57<a href=#single-source-rs>Single-Source RenderScript</a> describe the first and the third
58options, respectively.</li>
59</ul>
60
61<h2 id="writing-an-rs-kernel">Writing a RenderScript Kernel</h2>
62
63<p>A RenderScript kernel typically resides in a <code>.rs</code> file in the
64<code>&lt;project_root&gt;/src/</code> directory; each <code>.rs</code> file is called a
65<i>script</i>. Every script contains its own set of kernels, functions, and variables. A script can
66contain:</p>
67
68<ul>
69<li>A pragma declaration (<code>#pragma version(1)</code>) that declares the version of the
70RenderScript kernel language used in this script. Currently, 1 is the only valid value.</li>
71
72<li>A pragma declaration (<code>#pragma rs java_package_name(com.example.app)</code>) that
73declares the package name of the Java classes reflected from this script.
74Note that your <code>.rs</code> file must be part of your application package, and not in a
75library project.</li>
76
77<li>Zero or more <strong><i>invokable functions</i></strong>. An invokable function is a single-threaded RenderScript
78function that you can call from your Java code with arbitrary arguments. These are often useful for
79initial setup or serial computations within a larger processing pipeline.</li>
80
81<li><p>Zero or more <strong><i>script globals</i></strong>. A script global is equivalent to a global variable in C. You can
82access script globals from Java code, and these are often used for parameter passing to RenderScript
83kernels.</p></li>
84
85<li><p>Zero or more <strong><i>compute kernels</i></strong>. A compute kernel is a function
86or collection of functions that you can direct the RenderScript runtime to execute in parallel
87across a collection of data. There are two kinds of compute
88kernels: <i>mapping</i> kernels (also called <i>foreach</i> kernels)
89and <i>reduction</i> kernels.</p>
90
91<p>A <em>mapping kernel</em> is a parallel function that operates on a collection of {@link
92  android.renderscript.Allocation Allocations} of the same dimensions. By default, it executes
93  once for every coordinate in those dimensions. It is typically (but not exclusively) used to
94  transform a collection of input {@link android.renderscript.Allocation Allocations} to an
95  output {@link android.renderscript.Allocation} one {@link android.renderscript.Element} at a
96  time.</p>
97
98<ul>
99<li><p>Here is an example of a simple <strong>mapping kernel</strong>:</p>
100
101<pre>uchar4 RS_KERNEL invert(uchar4 in, uint32_t x, uint32_t y) {
102  uchar4 out = in;
103  out.r = 255 - in.r;
104  out.g = 255 - in.g;
105  out.b = 255 - in.b;
106  return out;
107}</pre>
108
109<p>In most respects, this is identical to a standard C
110  function. The <a href="#RS_KERNEL"><code>RS_KERNEL</code></a> property applied to the
111  function prototype specifies that the function is a RenderScript mapping kernel instead of an
112  invokable function. The <code>in</code> argument is automatically filled in based on the
113  input {@link android.renderscript.Allocation} passed to the kernel launch. The
114  arguments <code>x</code> and <code>y</code> are
115  discussed <a href="#special-arguments">below</a>. The value returned from the kernel is
116  automatically written to the appropriate location in the output {@link
117  android.renderscript.Allocation}. By default, this kernel is run across its entire input
118  {@link android.renderscript.Allocation}, with one execution of the kernel function per {@link
119  android.renderscript.Element} in the {@link android.renderscript.Allocation}.</p>
120
121<p>A mapping kernel may have one or more input {@link android.renderscript.Allocation
122  Allocations}, a single output {@link android.renderscript.Allocation}, or both. The
123  RenderScript runtime checks to ensure that all input and output Allocations have the same
124  dimensions, and that the {@link android.renderscript.Element} types of the input and output
125  Allocations match the kernel's prototype; if either of these checks fails, RenderScript
126  throws an exception.</p>
127
128<p class="note"><strong>NOTE:</strong> Before Android 6.0 (API level 23), a mapping kernel may
129  not have more than one input {@link android.renderscript.Allocation}.</p>
130
131<p>If you need more input or output {@link android.renderscript.Allocation Allocations} than
132  the kernel has, those objects should be bound to <code>rs_allocation</code> script globals
133  and accessed from a kernel or invokable function
134  via <code>rsGetElementAt_<i>type</i>()</code> or <code>rsSetElementAt_<i>type</i>()</code>.</p>
135
136<p><strong>NOTE:</strong> <a id="RS_KERNEL"><code>RS_KERNEL</code></a> is a macro
137  defined automatically by RenderScript for your convenience:</p>
138<pre>
139#define RS_KERNEL __attribute__((kernel))
140</pre>
141</li>
142</ul>
143
144<p>A <em>reduction kernel</em> is a family of functions that operates on a collection of input
145  {@link android.renderscript.Allocation Allocations} of the same dimensions. By default,
146  its <a href="#accumulator-function">accumulator function</a> executes once for every
147  coordinate in those dimensions.  It is typically (but not exclusively) used to "reduce" a
148  collection of input {@link android.renderscript.Allocation Allocations} to a single
149  value.</p>
150
151<ul>
152<li><p>Here is an <a id="example-addint">example</a> of a simple <strong>reduction
153kernel</strong> that adds up the {@link android.renderscript.Element Elements} of its
154input:</p>
155
156<pre>#pragma rs reduce(addint) accumulator(addintAccum)
157
158static void addintAccum(int *accum, int val) {
159  *accum += val;
160}</pre>
161
162<p>A reduction kernel consists of one or more user-written functions.
163<code>#pragma rs reduce</code> is used to define the kernel by specifying its name
164(<code>addint</code>, in this example) and the names and roles of the functions that make
165up the kernel (an <code>accumulator</code> function <code>addintAccum</code>, in this
166example). All such functions must be <code>static</code>. A reduction kernel always
167requires an <code>accumulator</code> function; it may also have other functions, depending
168on what you want the kernel to do.</p>
169
170<p>A reduction kernel accumulator function must return <code>void</code> and must have at least
171two arguments. The first argument (<code>accum</code>, in this example) is a pointer to
172an <i>accumulator data item</i> and the second (<code>val</code>, in this example) is
173automatically filled in based on the input {@link android.renderscript.Allocation} passed to
174the kernel launch. The accumulator data item is created by the RenderScript runtime; by
175default, it is initialized to zero. By default, this kernel is run across its entire input
176{@link android.renderscript.Allocation}, with one execution of the accumulator function per
177{@link android.renderscript.Element} in the {@link android.renderscript.Allocation}. By
178default, the final value of the accumulator data item is treated as the result of the
179reduction, and is returned to Java.  The RenderScript runtime checks to ensure that the {@link
180android.renderscript.Element} type of the input Allocation matches the accumulator function's
181prototype; if it does not match, RenderScript throws an exception.</p>
182
183<p>A reduction kernel has one or more input {@link android.renderscript.Allocation
184Allocations} but no output {@link android.renderscript.Allocation Allocations}.</p></li>
185
186<p>Reduction kernels are explained in more detail <a href="#reduction-in-depth">here</a>.</p>
187
188<p>Reduction kernels are supported in Android 7.0 (API level 24) and later.</p>
189</li>
190</ul>
191
192<p>A mapping kernel function or a reduction kernel accumulator function may access the coordinates
193of the current execution using the <a id="special-arguments">special arguments</a> <code>x</code>,
194<code>y</code>, and <code>z</code>, which must be of type <code>int</code> or <code>uint32_t</code>.
195These arguments are optional.</p>
196
197<p>A mapping kernel function or a reduction kernel accumulator
198function may also take the optional special argument
199<code>context</code> of type <a
200href='reference/rs_for_each.html#android_rs:rs_kernel_context'>rs_kernel_context</a>.
201It is needed by a family of runtime APIs that are used to query
202certain properties of the current execution -- for example, <a
203href='reference/rs_for_each.html#android_rs:rsGetDimX'>rsGetDimX</a>.
204(The <code>context</code> argument is available in Android 6.0 (API level 23) and later.)</p>
205</li>
206
207<li>An optional <code>init()</code> function. An <code>init()</code> function is a special type of
208invokable function that RenderScript runs when the script is first instantiated. This allows for some
209computation to occur automatically at script creation.</li>
210
211<li>Zero or more <strong><i>static script globals and functions</i></strong>. A static script global is equivalent to a
212script global except that it cannot be accessed from Java code. A static function is a standard C
213function that can be called from any kernel or invokable function in the script but is not exposed
214to the Java API. If a script global or function does not need to be called from Java code, it is
215highly recommended that it be declared <code>static</code>.</li> </ul>
216
217<h4>Setting floating point precision</h4>
218
219<p>You can control the required level of floating point precision in a script. This is useful if
220full IEEE 754-2008 standard (used by default) is not required. The following pragmas can set a
221different level of floating point precision:</p>
222
223<ul>
224
225<li><code>#pragma rs_fp_full</code> (default if nothing is specified): For apps that require
226  floating point precision as outlined by the IEEE 754-2008 standard.
227
228</li>
229
230  <li><code>#pragma rs_fp_relaxed</code>: For apps that don’t require strict IEEE 754-2008
231    compliance and can tolerate less precision. This mode enables flush-to-zero for denorms and
232    round-towards-zero.
233
234</li>
235
236  <li><code>#pragma rs_fp_imprecise</code>: For apps that don’t have stringent precision
237    requirements. This mode enables everything in <code>rs_fp_relaxed</code> along with the
238    following:
239
240<ul>
241
242  <li>Operations resulting in -0.0 can return +0.0 instead.</li>
243  <li>Operations on INF and NAN are undefined.</li>
244</ul>
245</li>
246</ul>
247
248<p>Most applications can use <code>rs_fp_relaxed</code> without any side effects. This may be very
249beneficial on some architectures due to additional optimizations only available with relaxed
250precision (such as SIMD CPU instructions).</p>
251
252
253<h2 id="access-rs-apis">Accessing RenderScript APIs from Java</h2>
254
255<p>When developing an Android application that uses RenderScript, you can access its API from Java in
256  one of two ways:</p>
257
258<ul>
259  <li><strong>{@link android.renderscript}</strong> - The APIs in this class package are
260    available on devices running Android 3.0 (API level 11) and higher. </li>
261  <li><strong>{@link android.support.v8.renderscript}</strong> - The APIs in this package are
262    available through a <a href="{@docRoot}tools/support-library/features.html#v8">Support
263    Library</a>, which allows you to use them on devices running Android 2.3 (API level 9) and
264    higher.</li>
265</ul>
266
267<p>Here are the tradeoffs:</p>
268
269<ul>
270<li>If you use the Support Library APIs, the RenderScript portion of your application will be
271  compatible with devices running Android 2.3 (API level 9) and higher, regardless of which RenderScript
272  features you use. This allows your application to work on more devices than if you use the
273  native (<strong>{@link android.renderscript}</strong>) APIs.</li>
274<li>Certain RenderScript features are not available through the Support Library APIs.</li>
275<li>If you use the Support Library APIs, you will get (possibly significantly) larger APKs than
276if you use the native (<strong>{@link android.renderscript}</strong>) APIs.</li>
277</ul>
278
279<h3 id="ide-setup">Using the RenderScript Support Library APIs</h3>
280
281<p>In order to use the Support Library RenderScript APIs, you must configure your development
282  environment to be able to access them. The following Android SDK tools are required for using
283  these APIs:</p>
284
285<ul>
286  <li>Android SDK Tools revision 22.2 or higher</li>
287  <li>Android SDK Build-tools revision 18.1.0 or higher</li>
288</ul>
289
290<p>You can check and update the installed version of these tools in the
291  <a href="{@docRoot}tools/help/sdk-manager.html">Android SDK Manager</a>.</p>
292
293
294<p>To use the Support Library RenderScript APIs:</p>
295
296<ol>
297  <li>Make sure you have the required Android SDK version and Build Tools version installed.</li>
298  <li> Update the settings for the Android build process to include the RenderScript settings:
299
300    <ul>
301      <li>Open the {@code build.gradle} file in the app folder of your application module. </li>
302      <li>Add the following RenderScript settings to the file:
303
304<pre>
305android {
306    compileSdkVersion 23
307    buildToolsVersion "23.0.3"
308
309    defaultConfig {
310        minSdkVersion 9
311        targetSdkVersion 19
312<strong>
313        renderscriptTargetApi 18
314        renderscriptSupportModeEnabled true
315</strong>
316    }
317}
318</pre>
319
320
321    <p>The settings listed above control specific behavior in the Android build process:</p>
322
323    <ul>
324      <li>{@code renderscriptTargetApi} - Specifies the bytecode version to be generated. We
325      recommend you set this value to the lowest API level able to provide all the functionality
326      you are using and set {@code renderscriptSupportModeEnabled} to {@code true}.
327      Valid values for this setting are any integer value
328      from 11 to the most recently released API level. If your minimum SDK version specified in your
329      application manifest is set to a different value, that value is ignored and the target value
330      in the build file is used to set the minimum SDK version.</li>
331      <li>{@code renderscriptSupportModeEnabled} - Specifies that the generated bytecode should fall
332      back to a compatible version if the device it is running on does not support the target
333      version.
334      </li>
335      <li>{@code buildToolsVersion} - The version of the Android SDK build tools to use. This value
336      should be set to {@code 18.1.0} or higher. If this option is not specified, the highest
337      installed build tools version is used. You should always set this value to ensure the
338      consistency of builds across development machines with different configurations.</li>
339    </ul>
340    </li>
341   </ul>
342
343  <li>In your application classes that use RenderScript, add an import for the Support Library
344    classes:
345
346<pre>
347import android.support.v8.renderscript.*;
348</pre>
349
350  </li>
351
352</ol>
353
354<h2 id="using-rs-from-java">Using RenderScript from Java Code</h2>
355
356<p>Using RenderScript from Java code relies on the API classes located in the
357{@link android.renderscript} or the {@link android.support.v8.renderscript} package. Most
358applications follow the same basic usage pattern:</p>
359
360<ol>
361
362<li><strong>Initialize a RenderScript context.</strong> The {@link
363android.renderscript.RenderScript} context, created with {@link
364android.renderscript.RenderScript#create}, ensures that RenderScript can be used and provides an
365object to control the lifetime of all subsequent RenderScript objects. You should consider context
366creation to be a potentially long-running operation, since it may create resources on different
367pieces of hardware; it should not be in an application's critical path if at all
368possible. Typically, an application will have only a single RenderScript context at a time.</li>
369
370<li><strong>Create at least one {@link android.renderscript.Allocation} to be passed to a
371script.</strong> An {@link android.renderscript.Allocation} is a RenderScript object that provides
372storage for a fixed amount of data. Kernels in scripts take {@link android.renderscript.Allocation}
373objects as their input and output, and {@link android.renderscript.Allocation} objects can be
374accessed in kernels using <code>rsGetElementAt_<i>type</i>()</code> and
375<code>rsSetElementAt_<i>type</i>()</code> when bound as script globals. {@link
376android.renderscript.Allocation} objects allow arrays to be passed from Java code to RenderScript
377code and vice-versa. {@link android.renderscript.Allocation} objects are typically created using
378{@link android.renderscript.Allocation#createTyped createTyped()} or {@link
379android.renderscript.Allocation#createFromBitmap createFromBitmap()}.</li>
380
381<li><strong>Create whatever scripts are necessary.</strong> There are two types of scripts available
382to you when using RenderScript:
383
384<ul>
385
386<li><strong>ScriptC</strong>: These are the user-defined scripts as described in <a
387href="#writing-an-rs-kernel"><i>Writing a RenderScript Kernel</i></a> above. Every script has a Java class
388reflected by the RenderScript compiler in order to make it easy to access the script from Java code;
389this class has the name <code>ScriptC_<i>filename</i></code>. For example, if the mapping kernel
390above were located in <code>invert.rs</code> and a RenderScript context were already located in
391<code>mRenderScript</code>, the Java code to instantiate the script would be:
392
393<pre>ScriptC_invert invert = new ScriptC_invert(mRenderScript);</pre></li>
394
395<li><strong>ScriptIntrinsic</strong>: These are built-in RenderScript kernels for common operations,
396such as Gaussian blur, convolution, and image blending. For more information, see the subclasses of
397{@link android.renderscript.ScriptIntrinsic}.</li>
398
399</ul></li>
400
401<li><strong>Populate Allocations with data.</strong> Except for Allocations created with {@link
402android.renderscript.Allocation#createFromBitmap createFromBitmap()}, an Allocation is populated with empty data when it is
403first created. To populate an Allocation, use one of the "copy" methods in {@link
404android.renderscript.Allocation}. The "copy" methods are <a href="#asynchronous-model">synchronous</a>.</li>
405
406<li><strong>Set any necessary script globals.</strong> You may set globals using methods in the
407  same <code>ScriptC_<i>filename</i></code> class named <code>set_<i>globalname</i></code>. For
408  example, in order to set an <code>int</code> variable named <code>threshold</code>, use the
409  Java method <code>set_threshold(int)</code>; and in order to set
410  an <code>rs_allocation</code> variable named <code>lookup</code>, use the Java
411  method <code>set_lookup(Allocation)</code>. The <code>set</code> methods
412  are <a href="#asynchronous-model">asynchronous</a>.</li>
413
414<li><strong>Launch the appropriate kernels and invokable functions.</strong>
415<p>Methods to launch a given kernel are
416reflected in the same <code>ScriptC_<i>filename</i></code> class with methods named
417<code>forEach_<i>mappingKernelName</i>()</code>
418or <code>reduce_<i>reductionKernelName</i>()</code>.
419These launches are <a href="#asynchronous-model">asynchronous</a>.
420Depending on the arguments to the kernel, the
421method takes one or more Allocations, all of which must have the same dimensions. By default, a
422kernel executes over every coordinate in those dimensions; to execute a kernel over a subset of those coordinates,
423pass an appropriate {@link
424android.renderscript.Script.LaunchOptions} as the last argument to the <code>forEach</code> or <code>reduce</code> method.</p>
425
426<p>Launch invokable functions using the <code>invoke_<i>functionName</i></code> methods
427reflected in the same <code>ScriptC_<i>filename</i></code> class.
428These launches are <a href="#asynchronous-model">asynchronous</a>.</p></li>
429
430<li><strong>Retrieve data from {@link android.renderscript.Allocation} objects
431and <i><a href="#javaFutureType">javaFutureType</a></i> objects.</strong>
432In order to
433access data from an {@link android.renderscript.Allocation} from Java code, you must copy that data
434back to Java using one of the "copy" methods in {@link
435android.renderscript.Allocation}.
436In order to obtain the result of a reduction kernel, you must use the <code><i>javaFutureType</i>.get()</code> method.
437The "copy" and <code>get()</code> methods are <a href="#asynchronous-model">synchronous</a>.</li>
438
439<li><strong>Tear down the RenderScript context.</strong> You can destroy the RenderScript context
440with {@link android.renderscript.RenderScript#destroy} or by allowing the RenderScript context
441object to be garbage collected. This causes any further use of any object belonging to that
442context to throw an exception.</li> </ol>
443
444<h3 id="asynchronous-model">Asynchronous execution model</h3>
445
446<p>The reflected <code>forEach</code>, <code>invoke</code>, <code>reduce</code>,
447  and <code>set</code> methods are asynchronous -- each may return to Java before completing the
448  requested action.  However, the individual actions are serialized in the order in which they are launched.</p>
449
450<p>The {@link android.renderscript.Allocation} class provides "copy" methods to copy data to
451  and from Allocations.  A "copy" method is synchronous, and is serialized with respect to any
452  of the asynchronous actions above that touch the same Allocation.</p>
453
454<p>The reflected <i><a href="#javaFutureType">javaFutureType</a></i> classes provide
455  a <code>get()</code> method to obtain the result of a reduction. <code>get()</code> is
456  synchronous, and is serialized with respect to the reduction (which is asynchronous).</p>
457
458<h2 id="single-source-rs">Single-Source RenderScript</h2>
459
460<p>Android 7.0 (API level 24) introduces a new programming feature called <em>Single-Source
461RenderScript</em>, in which kernels are launched from the script where they are defined, rather than
462from Java. This approach is currently limited to mapping kernels, which are simply referred to as "kernels"
463in this section for conciseness. This new feature also supports creating allocations of type
464<a href={@docRoot}guide/topics/renderscript/reference/rs_object_types.html#android_rs:rs_allocation>
465<code>rs_allocation</code></a> from inside the script. It is now possible to
466implement a whole algorithm solely within a script, even if multiple kernel launches are required.
467The benefit is twofold: more readable code, because it keeps the implementation of an algorithm in
468one language; and potentially faster code, because of fewer transitions between Java and
469RenderScript across multiple kernel launches.</p>
470
471<p>In Single-Source RenderScript, you write kernels as described in <a href="#writing-an-rs-kernel">
472Writing a RenderScript Kernel</a>. You then write an invokable function that calls
473<a href="{@docRoot}guide/topics/renderscript/reference/rs_for_each.html#android_rs:rsForEach">
474<code>rsForEach()</code></a> to launch them. That API takes a kernel function as the first
475parameter, followed by input and output allocations. A similar API
476<a href="{@docRoot}guide/topics/renderscript/reference/rs_for_each.html#android_rs:rsForEachWithOptions">
477<code>rsForEachWithOptions()</code></a> takes an extra argument of type
478<a href="{@docRoot}guide/topics/renderscript/reference/rs_for_each.html#android_rs:rs_script_call_t">
479<code>rs_script_call_t</code></a>, which specifies a subset of the elements from the input and
480output allocations for the kernel function to process.</p>
481
482<p>To start RenderScript computation, you call the invokable function from Java.
483Follow the steps in <a href="#using-rs-from-java">Using RenderScript from Java Code</a>.
484In the step <a href="#launching_kernels">launch the appropriate kernels</a>, call
485the invokable function using <code>invoke_<i>function_name</i>()</code>, which will start the
486whole computation, including launching kernels.</p>
487
488<p>Allocations are often needed to save and pass
489intermediate results from one kernel launch to another. You can create them using
490<a href="{@docRoot}guide/topics/renderscript/reference/rs_allocation_create.html#android_rs:rsCreateAllocation">
491rsCreateAllocation()</a>. One easy-to-use form of that API is <code>
492rsCreateAllocation_&ltT&gt&ltW&gt(&hellip;)</code>, where <i>T</i> is the data type for an
493element, and <i>W</i> is the vector width for the element. The API takes the sizes in
494dimensions X, Y, and Z as arguments. For 1D or 2D allocations, the size for dimension Y or Z can
495be omitted. For example, <code>rsCreateAllocation_uchar4(16384)</code> creates a 1D allocation of
49616384 elements, each of which is of type <code>uchar4</code>.</p>
497
498<p>Allocations are managed by the system automatically. You
499do not have to explicitly release or free them. However, you can call
500<a href="{@docRoot}guide/topics/renderscript/reference/rs_object_info.html#android_rs:rsClearObject">
501<code>rsClearObject(rs_allocation* alloc)</code></a> to indicate you no longer need the handle
502<code>alloc</code> to the underlying allocation,
503so that the system can free up resources as early as possible.</p>
504
505<p>The <a href="#writing-an-rs-kernel">Writing a RenderScript Kernel</a> section contains an example
506kernel that inverts an image. The example below expands that to apply more than one effect to an image,
507using Single-Source RenderScript. It includes another kernel, <code>greyscale</code>, which turns a
508color image into black-and-white. An invokable function <code>process()</code> then applies those two kernels
509consecutively to an input image, and produces an output image. Allocations for both the input and
510the output are passed in as arguments of type
511<a href={@docRoot}guide/topics/renderscript/reference/rs_object_types.html#android_rs:rs_allocation>
512<code>rs_allocation</code></a>.</p>
513
514<pre>
515// File: singlesource.rs
516
517#pragma version(1)
518#pragma rs java_package_name(com.android.rssample)
519
520static const float4 weight = {0.299f, 0.587f, 0.114f, 0.0f};
521
522uchar4 RS_KERNEL invert(uchar4 in, uint32_t x, uint32_t y) {
523  uchar4 out = in;
524  out.r = 255 - in.r;
525  out.g = 255 - in.g;
526  out.b = 255 - in.b;
527  return out;
528}
529
530uchar4 RS_KERNEL greyscale(uchar4 in) {
531  const float4 inF = rsUnpackColor8888(in);
532  const float4 outF = (float4){ dot(inF, weight) };
533  return rsPackColorTo8888(outF);
534}
535
536void process(rs_allocation inputImage, rs_allocation outputImage) {
537  const uint32_t imageWidth = rsAllocationGetDimX(inputImage);
538  const uint32_t imageHeight = rsAllocationGetDimY(inputImage);
539  rs_allocation tmp = rsCreateAllocation_uchar4(imageWidth, imageHeight);
540  rsForEach(invert, inputImage, tmp);
541  rsForEach(greyscale, tmp, outputImage);
542}
543</pre>
544
545<p>You can call the <code>process()</code> function from Java as follows:</p>
546
547<pre>
548// File SingleSource.java
549
550RenderScript RS = RenderScript.create(context);
551ScriptC_singlesource script = new ScriptC_singlesource(RS);
552Allocation inputAllocation = Allocation.createFromBitmapResource(
553    RS, getResources(), R.drawable.image);
554Allocation outputAllocation = Allocation.createTyped(
555    RS, inputAllocation.getType(),
556    Allocation.USAGE_SCRIPT | Allocation.USAGE_IO_OUTPUT);
557script.invoke_process(inputAllocation, outputAllocation);
558</pre>
559
560<p>This example shows how an algorithm that involves two kernel launches can be implemented completely
561in the RenderScript language itself. Without Single-Source
562RenderScript, you would have to launch both kernels from the Java code, separating kernel launches
563from kernel definitions and making it harder to understand the whole algorithm. Not only is the
564Single-Source RenderScript code easier to read, it also eliminates the transitioning
565between Java and the script across kernel launches. Some iterative algorithms may launch kernels
566hundreds of times, making the overhead of such transitioning considerable.</p>
567
568<h2 id="reduction-in-depth">Reduction Kernels in Depth</h2>
569
570<p><i>Reduction</i> is the process of combining a collection of data into a single
571value. This is a useful primitive in parallel programming, with applications such as the
572following:</p>
573<ul>
574  <li>computing the sum or product over all the data</li>
575  <li>computing logical operations (<code>and</code>, <code>or</code>, <code>xor</code>)
576  over all the data</li>
577  <li>finding the minimum or maximum value within the data</li>
578  <li>searching for a specific value or for the coordinate of a specific value within the data</li>
579</ul>
580
581<p>In Android 7.0 (API level 24) and later, RenderScript supports <i>reduction kernels</i> to allow
582efficient user-written reduction algorithms. You may launch reduction kernels on inputs with
5831, 2, or 3 dimensions.<p>
584
585<p>An example above shows a simple <a href="#example-addint">addint</a> reduction kernel.
586Here is a more complicated <a id="example-findMinAndMax">findMinAndMax</a> reduction kernel
587that finds the locations of the minimum and maximum <code>long</code> values in a
5881-dimensional {@link android.renderscript.Allocation}:</p>
589
590<pre>
591#define LONG_MAX (long)((1UL << 63) - 1)
592#define LONG_MIN (long)(1UL << 63)
593
594#pragma rs reduce(findMinAndMax) \
595  initializer(fMMInit) accumulator(fMMAccumulator) \
596  combiner(fMMCombiner) outconverter(fMMOutConverter)
597
598// Either a value and the location where it was found, or <a href="#INITVAL">INITVAL</a>.
599typedef struct {
600  long val;
601  int idx;     // -1 indicates <a href="#INITVAL">INITVAL</a>
602} IndexedVal;
603
604typedef struct {
605  IndexedVal min, max;
606} MinAndMax;
607
608// In discussion below, this initial value { { LONG_MAX, -1 }, { LONG_MIN, -1 } }
609// is called <a id="INITVAL">INITVAL</a>.
610static void fMMInit(MinAndMax *accum) {
611  accum->min.val = LONG_MAX;
612  accum->min.idx = -1;
613  accum->max.val = LONG_MIN;
614  accum->max.idx = -1;
615}
616
617//----------------------------------------------------------------------
618// In describing the behavior of the accumulator and combiner functions,
619// it is helpful to describe hypothetical functions
620//   IndexedVal min(IndexedVal a, IndexedVal b)
621//   IndexedVal max(IndexedVal a, IndexedVal b)
622//   MinAndMax  minmax(MinAndMax a, MinAndMax b)
623//   MinAndMax  minmax(MinAndMax accum, IndexedVal val)
624//
625// The effect of
626//   IndexedVal min(IndexedVal a, IndexedVal b)
627// is to return the IndexedVal from among the two arguments
628// whose val is lesser, except that when an IndexedVal
629// has a negative index, that IndexedVal is never less than
630// any other IndexedVal; therefore, if exactly one of the
631// two arguments has a negative index, the min is the other
632// argument. Like ordinary arithmetic min and max, this function
633// is commutative and associative; that is,
634//
635//   min(A, B) == min(B, A)               // commutative
636//   min(A, min(B, C)) == min((A, B), C)  // associative
637//
638// The effect of
639//   IndexedVal max(IndexedVal a, IndexedVal b)
640// is analogous (greater . . . never greater than).
641//
642// Then there is
643//
644//   MinAndMax minmax(MinAndMax a, MinAndMax b) {
645//     return MinAndMax(min(a.min, b.min), max(a.max, b.max));
646//   }
647//
648// Like ordinary arithmetic min and max, the above function
649// is commutative and associative; that is:
650//
651//   minmax(A, B) == minmax(B, A)                  // commutative
652//   minmax(A, minmax(B, C)) == minmax((A, B), C)  // associative
653//
654// Finally define
655//
656//   MinAndMax minmax(MinAndMax accum, IndexedVal val) {
657//     return minmax(accum, MinAndMax(val, val));
658//   }
659//----------------------------------------------------------------------
660
661// This function can be explained as doing:
662//   *accum = minmax(*accum, IndexedVal(in, x))
663//
664// This function simply computes minimum and maximum values as if
665// INITVAL.min were greater than any other minimum value and
666// INITVAL.max were less than any other maximum value.  Note that if
667// *accum is INITVAL, then this function sets
668//   *accum = IndexedVal(in, x)
669//
670// After this function is called, both accum->min.idx and accum->max.idx
671// will have nonnegative values:
672// - x is always nonnegative, so if this function ever sets one of the
673//   idx fields, it will set it to a nonnegative value
674// - if one of the idx fields is negative, then the corresponding
675//   val field must be LONG_MAX or LONG_MIN, so the function will always
676//   set both the val and idx fields
677static void fMMAccumulator(MinAndMax *accum, long in, int x) {
678  IndexedVal me;
679  me.val = in;
680  me.idx = x;
681
682  if (me.val <= accum->min.val)
683    accum->min = me;
684  if (me.val >= accum->max.val)
685    accum->max = me;
686}
687
688// This function can be explained as doing:
689//   *accum = minmax(*accum, *val)
690//
691// This function simply computes minimum and maximum values as if
692// INITVAL.min were greater than any other minimum value and
693// INITVAL.max were less than any other maximum value.  Note that if
694// one of the two accumulator data items is INITVAL, then this
695// function sets *accum to the other one.
696static void fMMCombiner(MinAndMax *accum,
697                        const MinAndMax *val) {
698  if ((accum->min.idx < 0) || (val->min.val < accum->min.val))
699    accum->min = val->min;
700  if ((accum->max.idx < 0) || (val->max.val > accum->max.val))
701    accum->max = val->max;
702}
703
704static void fMMOutConverter(int2 *result,
705                            const MinAndMax *val) {
706  result->x = val->min.idx;
707  result->y = val->max.idx;
708}
709</pre>
710
711<p class="note"><strong>NOTE:</strong> There are more example reduction
712  kernels <a href="#more-example">here</a>.</p>
713
714<p>In order to run a reduction kernel, the RenderScript runtime creates <em>one or more</em>
715variables called <a id="accumulator-data-items"><strong><i>accumulator data
716items</i></strong></a> to hold the state of the reduction process. The RenderScript runtime
717picks the number of accumulator data items in such a way as to maximize performance. The type
718of the accumulator data items (<i>accumType</i>) is determined by the kernel's <i>accumulator
719function</i> -- the first argument to that function is a pointer to an accumulator data
720item. By default, every accumulator data item is initialized to zero (as if
721by <code>memset</code>); however, you may write an <i>initializer function</i> to do something
722different.</p>
723
724<p class="note"><strong>Example:</strong> In the <a href="#example-addint">addint</a>
725kernel, the accumulator data items (of type <code>int</code>) are used to add up input
726values. There is no initializer function, so each accumulator data item is initialized to
727zero.</p>
728
729<p class="note"><strong>Example:</strong> In
730the <a href="#example-findMinAndMax">findMinAndMax</a> kernel, the accumulator data items
731(of type <code>MinAndMax</code>) are used to keep track of the minimum and maximum values
732found so far. There is an initializer function to set these to <code>LONG_MAX</code> and
733<code>LONG_MIN</code>, respectively; and to set the locations of these values to -1, indicating that
734the values are not actually present in the (empty) portion of the input that has been
735processed.</p>
736
737<p>RenderScript calls your accumulator function once for every coordinate in the
738input(s). Typically, your function should update the accumulator data item in some way
739according to the input.</p>
740
741<p class="note"><strong>Example:</strong> In the <a href="#example-addint">addint</a>
742kernel, the accumulator function adds the value of an input Element to the accumulator
743data item.</p>
744
745<p class="note"><strong>Example:</strong> In
746the <a href="#example-findMinAndMax">findMinAndMax</a> kernel, the accumulator function
747checks to see whether the value of an input Element is less than or equal to the minimum
748value recorded in the accumulator data item and/or greater than or equal to the maximum
749value recorded in the accumulator data item, and updates the accumulator data item
750accordingly.</p>
751
752<p>After the accumulator function has been called once for every coordinate in the input(s),
753RenderScript must <strong>combine</strong> the <a href="#accumulator-data-items">accumulator
754data items</a> together into a single accumulator data item. You may write a <i>combiner
755function</i> to do this. If the accumulator function has a single input and
756no <a href="#special-arguments">special arguments</a>, then you do not need to write a combiner
757function; RenderScript will use the accumulator function to combine the accumulator data
758items. (You may still write a combiner function if this default behavior is not what you
759want.)</p>
760
761<p class="note"><strong>Example:</strong> In the <a href="#example-addint">addint</a>
762kernel, there is no combiner function, so the accumulator function will be used. This is
763the correct behavior, because if we split a collection of values into two pieces, and we
764add up the values in those two pieces separately, adding up those two sums is the same as
765adding up the entire collection.</p>
766
767<p class="note"><strong>Example:</strong> In
768the <a href="#example-findMinAndMax">findMinAndMax</a> kernel, the combiner function
769checks to see whether the minimum value recorded in the "source" accumulator data
770item <code>*val</code> is less then the minimum value recorded in the "destination"
771accumulator data item <code>*accum</code>, and updates <code>*accum</code>
772accordingly. It does similar work for the maximum value. This updates <code>*accum</code>
773to the state it would have had if all of the input values had been accumulated into
774<code>*accum</code> rather than some into <code>*accum</code> and some into
775<code>*val</code>.</p>
776
777<p>After all of the accumulator data items have been combined, RenderScript determines
778the result of the reduction to return to Java. You may write an <i>outconverter
779function</i> to do this. You do not need to write an outconverter function if you want
780the final value of the combined accumulator data items to be the result of the reduction.</p>
781
782<p class="note"><strong>Example:</strong> In the <a href="#example-addint">addint</a> kernel,
783there is no outconverter function.  The final value of the combined data items is the sum of
784all Elements of the input, which is the value we want to return.</p>
785
786<p class="note"><strong>Example:</strong> In
787the <a href="#example-findMinAndMax">findMinAndMax</a> kernel, the outconverter function
788initializes an <code>int2</code> result value to hold the locations of the minimum and
789maximum values resulting from the combination of all of the accumulator data items.</p>
790
791<h3 id="writing-reduction-kernel">Writing a reduction kernel</h3>
792
793<p><code>#pragma rs reduce</code> defines a reduction kernel by
794specifying its name and the names and roles of the functions that make
795up the kernel.  All such functions must be
796<code>static</code>. A reduction kernel always requires an <code>accumulator</code>
797function; you can omit some or all of the other functions, depending on what you want the
798kernel to do.</p>
799
800<pre>#pragma rs reduce(<i>kernelName</i>) \
801  initializer(<i>initializerName</i>) \
802  accumulator(<i>accumulatorName</i>) \
803  combiner(<i>combinerName</i>) \
804  outconverter(<i>outconverterName</i>)
805</pre>
806
807<p>The meaning of the items in the <code>#pragma</code> is as follows:</p>
808<ul>
809
810<li><code>reduce(<i>kernelName</i>)</code> (mandatory): Specifies that a reduction kernel is
811being defined. A reflected Java method <code>reduce_<i>kernelName</i></code> will launch the
812kernel.</li>
813
814<li><p><code>initializer(<i>initializerName</i>)</code> (optional): Specifies the name of the
815initializer function for this reduction kernel. When you launch the kernel, RenderScript calls
816this function once for each <a href="#accumulator-data-items">accumulator data item</a>. The
817function must be defined like this:</p>
818
819<pre>static void <i>initializerName</i>(<i>accumType</i> *accum) { … }</pre>
820
821<p><code>accum</code> is a pointer to an accumulator data item for this function to
822initialize.</p>
823
824<p>If you do not provide an initializer function, RenderScript initializes every accumulator
825data item to zero (as if by <code>memset</code>), behaving as if there were an initializer
826function that looks like this:</p>
827<pre>static void <i>initializerName</i>(<i>accumType</i> *accum) {
828  memset(accum, 0, sizeof(*accum));
829}</pre>
830</li>
831
832<li><p><code><a id="accumulator-function">accumulator(<i>accumulatorName</i>)</a></code>
833(mandatory): Specifies the name of the accumulator function for this
834reduction kernel. When you launch the kernel, RenderScript calls
835this function once for every coordinate in the input(s), to update an
836accumulator data item in some way according to the input(s). The function
837must be defined like this:</p>
838
839<pre>
840static void <i>accumulatorName</i>(<i>accumType</i> *accum,
841                            <i>in1Type</i> in1, <i>&hellip;,</i> <i>inNType</i> in<i>N</i>
842                            <i>[, specialArguments]</i>) { &hellip; }
843</pre>
844
845<p><code>accum</code> is a pointer to an accumulator data item for this function to
846modify. <code>in1</code> through <code>in<i>N</i></code> are one <em>or more</em> arguments that
847are automatically filled in based on the inputs passed to the kernel launch, one argument
848per input. The accumulator function may optionally take any of the <a
849href="#special-arguments">special arguments</a>.</p>
850
851<p>An example kernel with multiple inputs is <a href="#dot-product"><code>dotProduct</code></a>.</p>
852</li>
853
854<li><code><a id="combiner-function">combiner(<i>combinerName</i>)</a></code>
855(optional): Specifies the name of the combiner function for this
856reduction kernel. After RenderScript calls the accumulator function
857once for every coordinate in the input(s), it calls this function as many
858times as necessary to combine all accumulator data items into a single
859accumulator data item. The function must be defined like this:</p>
860
861<pre>static void <i>combinerName</i>(<i>accumType</i> *accum, const <i>accumType</i> *other) { … }</pre>
862
863<p><code>accum</code> is a pointer to a "destination" accumulator data item for this
864function to modify. <code>other</code> is a pointer to a "source" accumulator data item
865for this function to "combine" into <code>*accum</code>.</p>
866
867<p class="note"><strong>NOTE:</strong> It is possible
868  that <code>*accum</code>, <code>*other</code>, or both have been initialized but have never
869  been passed to the accumulator function; that is, one or both have never been updated
870  according to any input data. For example, in
871  the <a href="#example-findMinAndMax">findMinAndMax</a> kernel, the combiner
872  function <code>fMMCombiner</code> explicitly checks for <code>idx &lt; 0</code> because that
873  indicates such an accumulator data item, whose value is <a href="#INITVAL">INITVAL</a>.</p>
874
875<p>If you do not provide a combiner function, RenderScript uses the accumulator function in its
876place, behaving as if there were a combiner function that looks like this:</p>
877
878<pre>static void <i>combinerName</i>(<i>accumType</i> *accum, const <i>accumType</i> *other) {
879  <i>accumulatorName</i>(accum, *other);
880}</pre>
881
882<p>A combiner function is mandatory if the kernel has more than one input, if the input data
883  type is not the same as the accumulator data type, or if the accumulator function takes one
884  or more <a href="#special-arguments">special arguments</a>.</p>
885</li>
886
887<li><p><code><a id="outconverter-function">outconverter(<i>outconverterName</i>)</a></code>
888(optional): Specifies the name of the outconverter function for this
889reduction kernel. After RenderScript combines all of the accumulator
890data items, it calls this function to determine the result of the
891reduction to return to Java. The function must be defined like
892this:</p>
893
894<pre>static void <i>outconverterName</i>(<i>resultType</i> *result, const <i>accumType</i> *accum) { … }</pre>
895
896<p><code>result</code> is a pointer to a result data item (allocated but not initialized
897by the RenderScript runtime) for this function to initialize with the result of the
898reduction. <i>resultType</i> is the type of that data item, which need not be the same
899as <i>accumType</i>. <code>accum</code> is a pointer to the final accumulator data item
900computed by the <a href="#combiner-function">combiner function</a>.</p>
901
902<p>If you do not provide an outconverter function, RenderScript copies the final accumulator
903data item to the result data item, behaving as if there were an outconverter function that
904looks like this:</p>
905
906<pre>static void <i>outconverterName</i>(<i>accumType</i> *result, const <i>accumType</i> *accum) {
907  *result = *accum;
908}</pre>
909
910<p>If you want a different result type than the accumulator data type, then the outconverter function is mandatory.</p>
911</li>
912
913</ul>
914
915<p>Note that a kernel has input types, an accumulator data item type, and a result type,
916  none of which need to be the same. For example, in
917  the <a href="#example-findMinAndMax">findMinAndMax</a> kernel, the input
918  type <code>long</code>, accumulator data item type <code>MinAndMax</code>, and result
919  type <code>int2</code> are all different.</p>
920
921<h4 id="assume">What can't you assume?</h4>
922
923<p>You must not rely on the number of accumulator data items created by RenderScript for a
924  given kernel launch.  There is no guarantee that two launches of the same kernel with the
925  same input(s) will create the same number of accumulator data items.</p>
926
927<p>You must not rely on the order in which RenderScript calls the initializer, accumulator, and
928  combiner functions; it may even call some of them in parallel.  There is no guarantee that
929  two launches of the same kernel with the same input will follow the same order.  The only
930  guarantee is that only the initializer function will ever see an uninitialized accumulator
931  data item. For example:</p>
932<ul>
933<li>There is no guarantee that all accumulator data items will be initialized before the
934  accumulator function is called, although it will only be called on an initialized accumulator
935  data item.</li>
936<li>There is no guarantee on the order in which input Elements are passed to the accumulator
937  function.</li>
938<li>There is no guarantee that the accumulator function has been called for all input Elements
939  before the combiner function is called.</li>
940</ul>
941
942<p>One consequence of this is that the <a href="#example-findMinAndMax">findMinAndMax</a>
943  kernel is not deterministic: If the input contains more than one occurrence of the same
944  minimum or maximum value, you have no way of knowing which occurrence the kernel will
945  find.</p>
946
947<h4 id="guarantee">What must you guarantee?</h4>
948
949<p>Because the RenderScript system can choose to execute a kernel <a href="#assume">in many
950    different ways</a>, you must follow certain rules to ensure that your kernel behaves the
951    way you want. If you do not follow these rules, you may get incorrect results,
952    nondeterministic behavior, or runtime errors.</p>
953
954<p>The rules below often say that two accumulator data items must have "<a id="the-same">the
955  same value"</a>.  What does this mean?  That depends on what you want the kernel to do.  For
956  a mathematical reduction such as <a href="#example-addint">addint</a>, it usually makes sense
957  for "the same" to mean mathematical equality.  For a "pick any" search such
958  as <a href="#example-findMinAndMax">findMinAndMax</a> ("find the location of minimum and
959  maximum input values") where there might be more than one occurrence of identical input
960  values, all locations of a given input value must be considered "the same".  You could write
961  a similar kernel to "find the location of <em>leftmost</em> minimum and maximum input values"
962  where (say) a minimum value at location 100 is preferred over an identical minimum value at location
963  200; for this kernel, "the same" would mean identical <em>location</em>, not merely
964  identical <em>value</em>, and the accumulator and combiner functions would have to be
965  different than those for <a href="#example-findMinAndMax">findMinAndMax</a>.</p>
966
967<strong>The initializer function must create an <i>identity value</i>.</strong>  That is,
968  if <code><i>I</i></code> and <code><i>A</i></code> are accumulator data items initialized
969  by the initializer function, and <code><i>I</i></code> has never been passed to the
970  accumulator function (but <code><i>A</i></code> may have been), then
971<ul>
972<li><code><i>combinerName</i>(&<i>A</i>, &<i>I</i>)</code> must
973  leave <code><i>A</i></code> <a href="#the-same">the same</a></li>
974<li><code><i>combinerName</i>(&<i>I</i>, &<i>A</i>)</code> must
975  leave <code><i>I</i></code> <a href="#the-same">the same</a> as <code><i>A</i></code></li>
976</ul>
977<p class="note"><strong>Example:</strong> In the <a href="#example-addint">addint</a>
978  kernel, an accumulator data item is initialized to zero. The combiner function for this
979  kernel performs addition; zero is the identity value for addition.</p>
980<div class="note">
981<p><strong>Example:</strong> In the <a href="#example-findMinAndMax">findMinAndMax</a>
982  kernel, an accumulator data item is initialized
983  to <a href="#INITVAL"><code>INITVAL</code></a>.
984<ul>
985<li><code>fMMCombiner(&<i>A</i>, &<i>I</i>)</code> leaves <code><i>A</i></code> the same,
986  because <code><i>I</i></code> is <code>INITVAL</code>.</li>
987<li><code>fMMCombiner(&<i>I</i>, &<i>A</i>)</code> sets <code><i>I</i></code>
988  to <code><i>A</i></code>, because <code><i>I</i></code> is <code>INITVAL</code>.</li>
989</ul>
990Therefore, <code>INITVAL</code> is indeed an identity value.
991</p></div>
992
993<p><strong>The combiner function must be <i>commutative</i>.</strong>  That is,
994  if <code><i>A</i></code> and <code><i>B</i></code> are accumulator data items initialized
995  by the initializer function, and that may have been passed to the accumulator function zero
996  or more times, then <code><i>combinerName</i>(&<i>A</i>, &<i>B</i>)</code> must
997  set <code><i>A</i></code> to <a href="#the-same">the same value</a>
998  that <code><i>combinerName</i>(&<i>B</i>, &<i>A</i>)</code>
999  sets <code><i>B</i></code>.</p>
1000<p class="note"><strong>Example:</strong> In the <a href="#example-addint">addint</a>
1001  kernel, the combiner function adds the two accumulator data item values; addition is
1002  commutative.</p>
1003<div class="note">
1004<p><strong>Example:</strong> In the <a href="#example-findMinAndMax">findMinAndMax</a> kernel,
1005<pre>
1006fMMCombiner(&<i>A</i>, &<i>B</i>)
1007</pre>
1008is the same as
1009<pre>
1010<i>A</i> = minmax(<i>A</i>, <i>B</i>)
1011</pre>
1012and <code>minmax</code> is commutative, so <code>fMMCombiner</code> is also.
1013</p>
1014</div>
1015
1016<p><strong>The combiner function must be <i>associative</i>.</strong>  That is,
1017  if <code><i>A</i></code>, <code><i>B</i></code>, and <code><i>C</i></code> are
1018  accumulator data items initialized by the initializer function, and that may have been passed
1019  to the accumulator function zero or more times, then the following two code sequences must
1020  set <code><i>A</i></code> to <a href="#the-same">the same value</a>:</p>
1021<ul>
1022<li><pre>
1023<i>combinerName</i>(&<i>A</i>, &<i>B</i>);
1024<i>combinerName</i>(&<i>A</i>, &<i>C</i>);
1025</pre></li>
1026<li><pre>
1027<i>combinerName</i>(&<i>B</i>, &<i>C</i>);
1028<i>combinerName</i>(&<i>A</i>, &<i>B</i>);
1029</pre></li>
1030</ul>
1031<div class="note">
1032<p><strong>Example:</strong> In the <a href="#example-addint">addint</a> kernel, the
1033  combiner function adds the two accumulator data item values:
1034<ul>
1035<li><pre>
1036<i>A</i> = <i>A</i> + <i>B</i>
1037<i>A</i> = <i>A</i> + <i>C</i>
1038// Same as
1039//   <i>A</i> = (<i>A</i> + <i>B</i>) + <i>C</i>
1040</pre></li>
1041<li><pre>
1042<i>B</i> = <i>B</i> + <i>C</i>
1043<i>A</i> = <i>A</i> + <i>B</i>
1044// Same as
1045//   <i>A</i> = <i>A</i> + (<i>B</i> + <i>C</i>)
1046//   <i>B</i> = <i>B</i> + <i>C</i>
1047</li>
1048</ul>
1049Addition is associative, and so the combiner function is also.
1050</p>
1051</div>
1052<div class="note">
1053<p><strong>Example:</strong> In the <a href="#example-findMinAndMax">findMinAndMax</a> kernel,
1054<pre>
1055fMMCombiner(&<i>A</i>, &<i>B</i>)
1056</pre>
1057is the same as
1058<pre>
1059<i>A</i> = minmax(<i>A</i>, <i>B</i>)
1060</pre>
1061So the two sequences are
1062<ul>
1063<li><pre>
1064<i>A</i> = minmax(<i>A</i>, <i>B</i>)
1065<i>A</i> = minmax(<i>A</i>, <i>C</i>)
1066// Same as
1067//   <i>A</i> = minmax(minmax(<i>A</i>, <i>B</i>), <i>C</i>)
1068</pre></li>
1069<li><pre>
1070<i>B</i> = minmax(<i>B</i>, <i>C</i>)
1071<i>A</i> = minmax(<i>A</i>, <i>B</i>)
1072// Same as
1073//   <i>A</i> = minmax(<i>A</i>, minmax(<i>B</i>, <i>C</i>))
1074//   <i>B</i> = minmax(<i>B</i>, <i>C</i>)
1075</pre></li>
1076<code>minmax</code> is associative, and so <code>fMMCombiner</code> is also.
1077</p>
1078</div>
1079
1080<p><strong>The accumulator function and combiner function together must obey the <i>basic
1081  folding rule</i>.</strong>  That is, if <code><i>A</i></code>
1082  and <code><i>B</i></code> are accumulator data items, <code><i>A</i></code> has been
1083  initialized by the initializer function and may have been passed to the accumulator function
1084  zero or more times, <code><i>B</i></code> has not been initialized, and <i>args</i> is
1085  the list of input arguments and special arguments for a particular call to the accumulator
1086  function, then the following two code sequences must set <code><i>A</i></code>
1087  to <a href="#the-same">the same value</a>:</p>
1088<ul>
1089<li><pre>
1090<i>accumulatorName</i>(&<i>A</i>, <i>args</i>);  // statement 1
1091</pre></li>
1092<li><pre>
1093<i>initializerName</i>(&<i>B</i>);        // statement 2
1094<i>accumulatorName</i>(&<i>B</i>, <i>args</i>);  // statement 3
1095<i>combinerName</i>(&<i>A</i>, &<i>B</i>);       // statement 4
1096</pre></li>
1097</ul>
1098<div class="note">
1099<p><strong>Example:</strong> In the <a href="#example-addint">addint</a> kernel, for an input value <i>V</i>:
1100<ul>
1101<li>Statement 1 is the same as <code>A += <i>V</i></code></li>
1102<li>Statement 2 is the same as <code>B = 0</code></li>
1103<li>Statement 3 is the same as <code>B += <i>V</i></code>, which is the same as <code>B = <i>V</i></code></li>
1104<li>Statement 4 is the same as <code>A += B</code>, which is the same as <code>A += <i>V</i></code></li>
1105</ul>
1106Statements 1 and 4 set <code><i>A</i></code> to the same value, and so this kernel obeys the
1107basic folding rule.
1108</p>
1109</div>
1110<div class="note">
1111<p><strong>Example:</strong> In the <a href="#example-findMinAndMax">findMinAndMax</a> kernel, for an input
1112  value <i>V</i> at coordinate <i>X</i>:
1113<ul>
1114<li>Statement 1 is the same as <code>A = minmax(A, IndexedVal(<i>V</i>, <i>X</i>))</code></li>
1115<li>Statement 2 is the same as <code>B = <a href="#INITVAL">INITVAL</a></code></li>
1116<li>Statement 3 is the same as
1117<pre>
1118B = minmax(B, IndexedVal(<i>V</i>, <i>X</i>))
1119</pre>
1120which, because <i>B</i> is the initial value, is the same as
1121<pre>
1122B = IndexedVal(<i>V</i>, <i>X</i>)
1123</pre>
1124</li>
1125<li>Statement 4 is the same as
1126<pre>
1127A = minmax(A, B)
1128</pre>
1129which is the same as
1130<pre>
1131A = minmax(A, IndexedVal(<i>V</i>, <i>X</i>))
1132</pre>
1133</ul>
1134Statements 1 and 4 set <code><i>A</i></code> to the same value, and so this kernel obeys the
1135basic folding rule.
1136</p>
1137</div>
1138
1139<h3 id="calling-reduction-kernel">Calling a reduction kernel from Java code</h3>
1140
1141<p>For a reduction kernel named <i>kernelName</i> defined in the
1142file <code><i>filename</i>.rs</code>, there are three methods reflected in the
1143class <code>ScriptC_<i>filename</i></code>:</p>
1144
1145<pre>
1146// Method 1
1147public <i>javaFutureType</i> reduce_<i>kernelName</i>(Allocation ain1, <i>&hellip;,</i>
1148                                        Allocation ain<i>N</i>);
1149
1150// Method 2
1151public <i>javaFutureType</i> reduce_<i>kernelName</i>(Allocation ain1, <i>&hellip;,</i>
1152                                        Allocation ain<i>N</i>,
1153                                        Script.LaunchOptions sc);
1154
1155// Method 3
1156public <i>javaFutureType</i> reduce_<i>kernelName</i>(<i><a href="#devec">devecSiIn1Type</a></i>[] in1, &hellip;,
1157                                        <i><a href="#devec">devecSiInNType</a></i>[] in<i>N</i>);
1158</pre>
1159
1160<p>Here are some examples of calling the <a href="#example-addint">addint</a> kernel:</p>
1161<pre>
1162ScriptC_example script = new ScriptC_example(mRenderScript);
1163
1164// 1D array
1165//   and obtain answer immediately
1166int input1[] = <i>&hellip;</i>;
1167int sum1 = script.reduce_addint(input1).get();  // Method 3
1168
1169// 2D allocation
1170//   and do some additional work before obtaining answer
1171Type.Builder typeBuilder =
1172  new Type.Builder(RS, Element.I32(RS));
1173typeBuilder.setX(<i>&hellip;</i>);
1174typeBuilder.setY(<i>&hellip;</i>);
1175Allocation input2 = createTyped(RS, typeBuilder.create());
1176<i>populateSomehow</i>(input2);  // fill in input Allocation with data
1177script.result_int result2 = script.reduce_addint(input2);  // Method 1
1178<i>doSomeAdditionalWork</i>(); // might run at same time as reduction
1179int sum2 = result2.get();
1180</pre>
1181
1182<p><strong>Method 1</strong> has one input {@link android.renderscript.Allocation} argument for
1183  every input argument in the kernel's <a href="#accumulator-function">accumulator
1184    function</a>. The RenderScript runtime checks to ensure that all of the input Allocations
1185  have the same dimensions and that the {@link android.renderscript.Element} type of each of
1186  the input Allocations matches that of the corresponding input argument of the accumulator
1187  function's prototype. If any of these checks fail, RenderScript throws an exception. The
1188  kernel executes over every coordinate in those dimensions.</p>
1189
1190<p><strong>Method 2</strong> is the same as Method 1 except that Method 2 takes an additional
1191  argument <code>sc</code> that can be used to limit the kernel execution to a subset of the
1192  coordinates.</p>
1193
1194<p><strong><a id="reduce-method-3">Method 3</a></strong> is the same as Method 1 except that
1195  instead of taking Allocation inputs it takes Java array inputs. This is a convenience that
1196  saves you from having to write code to explicitly create an Allocation and copy data to it
1197  from a Java array. <em>However, using Method 3 instead of Method 1 does not increase the
1198  performance of the code</em>. For each input array, Method 3 creates a temporary
1199  1-dimensional Allocation with the appropriate {@link android.renderscript.Element} type and
1200  {@link android.renderscript.Allocation#setAutoPadding} enabled, and copies the array to the
1201  Allocation as if by the appropriate <code>copyFrom()</code> method of {@link
1202  android.renderscript.Allocation}. It then calls Method 1, passing those temporary
1203  Allocations.</p>
1204<p class="note"><strong>NOTE:</strong> If your application will make multiple kernel calls with
1205  the same array, or with different arrays of the same dimensions and Element type, you may improve
1206  performance by explicitly creating, populating, and reusing Allocations yourself, instead of
1207  by using Method 3.</p>
1208<p><strong><i><a id="javaFutureType">javaFutureType</a></i></strong>,
1209  the return type of the reflected reduction methods, is a reflected
1210  static nested class within the <code>ScriptC_<i>filename</i></code>
1211  class. It represents the future result of a reduction
1212  kernel run. To obtain the actual result of the run, call
1213  the <code>get()</code> method of that class, which returns a value
1214  of type <i>javaResultType</i>. <code>get()</code> is <a href="#asynchronous-model">synchronous</a>.</p>
1215
1216<pre>
1217public class ScriptC_<i>filename</i> extends ScriptC {
1218  public static class <i>javaFutureType</i> {
1219    public <i>javaResultType</i> get() { &hellip; }
1220  }
1221}
1222</pre>
1223
1224<p><strong><i>javaResultType</i></strong> is determined from the <i>resultType</i> of the
1225  <a href="#outconverter-function">outconverter function</a>. Unless <i>resultType</i> is an
1226  unsigned type (scalar, vector, or array), <i>javaResultType</i> is the directly corresponding
1227  Java type. If <i>resultType</i> is an unsigned type and there is a larger Java signed type,
1228  then <i>javaResultType</i> is that larger Java signed type; otherwise, it is the directly
1229  corresponding Java type. For example:</p>
1230<ul>
1231<li>If <i>resultType</i> is <code>int</code>, <code>int2</code>, or <code>int[15]</code>,
1232  then <i>javaResultType</i> is <code>int</code>, <code>Int2</code>,
1233  or <code>int[]</code>. All values of <i>resultType</i> can be represented
1234  by <i>javaResultType</i>.</li>
1235<li>If <i>resultType</i> is <code>uint</code>, <code>uint2</code>, or <code>uint[15]</code>,
1236  then <i>javaResultType</i> is <code>long</code>, <code>Long2</code>,
1237  or <code>long[]</code>.  All values of <i>resultType</i> can be represented
1238  by <i>javaResultType</i>.</li>
1239<li>If <i>resultType</i> is <code>ulong</code>, <code>ulong2</code>,
1240  or <code>ulong[15]</code>, then <i>javaResultType</i>
1241  is <code>long</code>, <code>Long2</code>, or <code>long[]</code>. There are certain values
1242  of <i>resultType</i> that cannot be represented by <i>javaResultType</i>.</li>
1243</ul>
1244
1245<p><strong><i>javaFutureType</i></strong> is the future result type corresponding
1246  to the <i>resultType</i> of the <a href="#outconverter-function">outconverter
1247  function</a>.</p>
1248<ul>
1249<li>If <i>resultType</i> is not an array type, then <i>javaFutureType</i>
1250  is <code>result_<i>resultType</i></code>.</li>
1251<li>If <i>resultType</i> is an array of length <i>Count</i> with members of type <i>memberType</i>,
1252  then <i>javaFutureType</i> is <code>resultArray<i>Count</i>_<i>memberType</i></code>.</li>
1253</ul>
1254
1255<p>For example:</p>
1256
1257<pre>
1258public class ScriptC_<i>filename</i> extends ScriptC {
1259  // for kernels with int result
1260  public static class result_int {
1261    public int get() { &hellip; }
1262  }
1263
1264  // for kernels with int[10] result
1265  public static class resultArray10_int {
1266    public int[] get() { &hellip; }
1267  }
1268
1269  // for kernels with int2 result
1270  //   note that the Java type name "Int2" is not the same as the script type name "int2"
1271  public static class result_int2 {
1272    public Int2 get() { &hellip; }
1273  }
1274
1275  // for kernels with int2[10] result
1276  //   note that the Java type name "Int2" is not the same as the script type name "int2"
1277  public static class resultArray10_int2 {
1278    public Int2[] get() { &hellip; }
1279  }
1280
1281  // for kernels with uint result
1282  //   note that the Java type "long" is a wider signed type than the unsigned script type "uint"
1283  public static class result_uint {
1284    public long get() { &hellip; }
1285  }
1286
1287  // for kernels with uint[10] result
1288  //   note that the Java type "long" is a wider signed type than the unsigned script type "uint"
1289  public static class resultArray10_uint {
1290    public long[] get() { &hellip; }
1291  }
1292
1293  // for kernels with uint2 result
1294  //   note that the Java type "Long2" is a wider signed type than the unsigned script type "uint2"
1295  public static class result_uint2 {
1296    public Long2 get() { &hellip; }
1297  }
1298
1299  // for kernels with uint2[10] result
1300  //   note that the Java type "Long2" is a wider signed type than the unsigned script type "uint2"
1301  public static class resultArray10_uint2 {
1302    public Long2[] get() { &hellip; }
1303  }
1304}
1305</pre>
1306
1307<p>If <i>javaResultType</i> is an object type (including an array type), each call
1308  to <code><i>javaFutureType</i>.get()</code> on the same instance will return the same
1309  object.</p>
1310
1311<p>If <i>javaResultType</i> cannot represent all values of type <i>resultType</i>, and a
1312  reduction kernel produces an unrepresentible value,
1313  then <code><i>javaFutureType</i>.get()</code> throws an exception.</p>
1314
1315<h4 id="devec">Method 3 and <i>devecSiInXType</i></h4>
1316
1317<p><strong><i>devecSiInXType</i></strong> is the Java type corresponding to
1318  the <i>inXType</i> of the corresponding argument of
1319  the <a href="#accumulator-function">accumulator function</a>. Unless <i>inXType</i> is an
1320  unsigned type or a vector type, <i>devecSiInXType</i> is the directly corresponding Java
1321  type. If <i>inXType</i> is an unsigned scalar type, then <i>devecSiInXType</i> is the
1322  Java type directly corresponding to the signed scalar type of the same
1323  size. If <i>inXType</i> is a signed vector type, then <i>devecSiInXType</i> is the Java
1324  type directly corresponding to the vector component type. If <i>inXType</i> is an unsigned
1325  vector type, then <i>devecSiInXType</i> is the Java type directly corresponding to the
1326  signed scalar type of the same size as the vector component type. For example:</p>
1327<ul>
1328<li>If <i>inXType</i> is <code>int</code>, then <i>devecSiInXType</i>
1329  is <code>int</code>.</li>
1330<li>If <i>inXType</i> is <code>int2</code>, then <i>devecSiInXType</i>
1331  is <code>int</code>. The array is a <em>flattened</em> representation: It has twice as
1332  many <em>scalar</em> Elements as the Allocation has 2-component <em>vector</em>
1333  Elements. This is the same way that the <code>copyFrom()</code> methods of {@link
1334  android.renderscript.Allocation} work.</li>
1335<li>If <i>inXType</i> is <code>uint</code>, then <i>deviceSiInXType</i>
1336  is <code>int</code>. A signed value in the Java array is interpreted as an unsigned value of
1337  the same bitpattern in the Allocation. This is the same way that the <code>copyFrom()</code>
1338  methods of {@link android.renderscript.Allocation} work.</li>
1339<li>If <i>inXType</i> is <code>uint2</code>, then <i>deviceSiInXType</i>
1340  is <code>int</code>. This is a combination of the way <code>int2</code> and <code>uint</code>
1341  are handled: The array is a flattened representation, and Java array signed values are
1342  interpreted as RenderScript unsigned Element values.</li>
1343</ul>
1344
1345<p>Note that for <a href="#reduce-method-3">Method 3</a>, input types are handled differently
1346than result types:</p>
1347
1348<ul>
1349<li>A script's vector input is flattened on the Java side, whereas a script's vector result is not.</li>
1350<li>A script's unsigned input is represented as a signed input of the same size on the Java
1351  side, whereas a script's unsigned result is represented as a widened signed type on the Java
1352  side (except in the case of <code>ulong</code>).</li>
1353</ul>
1354
1355<h3 id="more-example">More example reduction kernels</h3>
1356
1357<pre id="dot-product">
1358#pragma rs reduce(dotProduct) \
1359  accumulator(dotProductAccum) combiner(dotProductSum)
1360
1361// Note: No initializer function -- therefore,
1362// each accumulator data item is implicitly initialized to 0.0f.
1363
1364static void dotProductAccum(float *accum, float in1, float in2) {
1365  *accum += in1*in2;
1366}
1367
1368// combiner function
1369static void dotProductSum(float *accum, const float *val) {
1370  *accum += *val;
1371}
1372</pre>
1373
1374<pre>
1375// Find a zero Element in a 2D allocation; return (-1, -1) if none
1376#pragma rs reduce(fz2) \
1377  initializer(fz2Init) \
1378  accumulator(fz2Accum) combiner(fz2Combine)
1379
1380static void fz2Init(int2 *accum) { accum->x = accum->y = -1; }
1381
1382static void fz2Accum(int2 *accum,
1383                     int inVal,
1384                     int x /* special arg */,
1385                     int y /* special arg */) {
1386  if (inVal==0) {
1387    accum->x = x;
1388    accum->y = y;
1389  }
1390}
1391
1392static void fz2Combine(int2 *accum, const int2 *accum2) {
1393  if (accum2->x >= 0) *accum = *accum2;
1394}
1395</pre>
1396
1397<pre>
1398// Note that this kernel returns an array to Java
1399#pragma rs reduce(histogram) \
1400  accumulator(hsgAccum) combiner(hsgCombine)
1401
1402#define BUCKETS 256
1403typedef uint32_t Histogram[BUCKETS];
1404
1405// Note: No initializer function --
1406// therefore, each bucket is implicitly initialized to 0.
1407
1408static void hsgAccum(Histogram *h, uchar in) { ++(*h)[in]; }
1409
1410static void hsgCombine(Histogram *accum,
1411                       const Histogram *addend) {
1412  for (int i = 0; i < BUCKETS; ++i)
1413    (*accum)[i] += (*addend)[i];
1414}
1415
1416// Determines the mode (most frequently occurring value), and returns
1417// the value and the frequency.
1418//
1419// If multiple values have the same highest frequency, returns the lowest
1420// of those values.
1421//
1422// Shares functions with the histogram reduction kernel.
1423#pragma rs reduce(mode) \
1424  accumulator(hsgAccum) combiner(hsgCombine) \
1425  outconverter(modeOutConvert)
1426
1427static void modeOutConvert(int2 *result, const Histogram *h) {
1428  uint32_t mode = 0;
1429  for (int i = 1; i < BUCKETS; ++i)
1430    if ((*h)[i] > (*h)[mode]) mode = i;
1431  result->x = mode;
1432  result->y = (*h)[mode];
1433}
1434</pre>
1435