• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1Requirements:
2
3- automake, autoconf, libtool
4	(not needed when compiling a release)
5- pkg-config (http://www.freedesktop.org/wiki/Software/pkg-config)
6	(not needed when compiling a release using the included isl and pet)
7- gmp (http://gmplib.org/)
8- libyaml (http://pyyaml.org/wiki/LibYAML)
9	(only needed if you want to compile the pet executable)
10- LLVM/clang libraries, 2.9 or higher (http://clang.llvm.org/get_started.html)
11	Unless you have some other reasons for wanting to use the svn version,
12	it is best to install the latest release (3.9).
13	For more details, see pet/README.
14
15If you are installing on Ubuntu, then you can install the following packages:
16
17automake autoconf libtool pkg-config libgmp3-dev libyaml-dev libclang-dev llvm
18
19Note that you need at least version 3.2 of libclang-dev (ubuntu raring).
20Older versions of this package did not include the required libraries.
21If you are using an older version of ubuntu, then you need to compile and
22install LLVM/clang from source.
23
24
25Preparing:
26
27Grab the latest release and extract it or get the source from
28the git repository as follows.  This process requires autoconf,
29automake, libtool and pkg-config.
30
31	git clone git://repo.or.cz/ppcg.git
32	cd ppcg
33	./get_submodules.sh
34	./autogen.sh
35
36
37Compilation:
38
39	./configure
40	make
41	make check
42
43If you have installed any of the required libraries in a non-standard
44location, then you may need to use the --with-gmp-prefix,
45--with-libyaml-prefix and/or --with-clang-prefix options
46when calling "./configure".
47
48
49Using PPCG to generate CUDA or OpenCL code
50
51To convert a fragment of a C program to CUDA, insert a line containing
52
53	#pragma scop
54
55before the fragment and add a line containing
56
57	#pragma endscop
58
59after the fragment.  To generate CUDA code run
60
61	ppcg --target=cuda file.c
62
63where file.c is the file containing the fragment.  The generated
64code is stored in file_host.cu and file_kernel.cu.
65
66To generate OpenCL code run
67
68	ppcg --target=opencl file.c
69
70where file.c is the file containing the fragment.  The generated code
71is stored in file_host.c and file_kernel.cl.
72
73
74Specifying tile, grid and block sizes
75
76The iterations space tile size, grid size and block size can
77be specified using the --sizes option.  The argument is a union map
78in isl notation mapping kernels identified by their sequence number
79in a "kernel" space to singleton sets in the "tile", "grid" and "block"
80spaces.  The sizes are specified outermost to innermost.
81
82The dimension of the "tile" space indicates the (maximal) number of loop
83dimensions to tile.  The elements of the single integer tuple
84specify the tile sizes in each dimension.
85In case of hybrid tiling, the first element is half the size of
86the tile in the time (sequential) dimension.  The second element
87specifies the number of elements in the base of the hexagon.
88The remaining elements specify the tile sizes in the remaining space
89dimensions.
90
91The dimension of the "grid" space indicates the (maximal) number of block
92dimensions in the grid.  The elements of the single integer tuple
93specify the number of blocks in each dimension.
94
95The dimension of the "block" space indicates the (maximal) number of thread
96dimensions in the grid.  The elements of the single integer tuple
97specify the number of threads in each dimension.
98
99For example,
100
101    { kernel[0] -> tile[64,64]; kernel[i] -> block[16] : i != 4 }
102
103specifies that in kernel 0, two loops should be tiled with a tile
104size of 64 in both dimensions and that all kernels except kernel 4
105should be run using a block of 16 threads.
106
107Since PPCG performs some scheduling, it can be difficult to predict
108what exactly will end up in a kernel.  If you want to specify
109tile, grid or block sizes, you may want to run PPCG first with the defaults,
110examine the kernels and then run PPCG again with the desired sizes.
111Instead of examining the kernels, you can also specify the option
112--dump-sizes on the first run to obtain the effectively used default sizes.
113
114
115Compiling the generated CUDA code with nvcc
116
117To get optimal performance from nvcc, it is important to choose --arch
118according to your target GPU.  Specifically, use the flag "--arch sm_20"
119for fermi, "--arch sm_30" for GK10x Kepler and "--arch sm_35" for
120GK110 Kepler.  We discourage the use of older cards as we have seen
121correctness issues with compilation for older architectures.
122Note that in the absence of any --arch flag, nvcc defaults to
123"--arch sm_13". This will not only be slower, but can also cause
124correctness issues.
125If you want to obtain results that are identical to those obtained
126by the original code, then you may need to disable some optimizations
127by passing the "--fmad=false" option.
128
129
130Compiling the generated OpenCL code with gcc
131
132To compile the host code you need to link against the file
133ocl_utilities.c which contains utility functions used by the generated
134OpenCL host code.  To compile the host code with gcc, run
135
136  gcc -std=c99 file_host.c ocl_utilities.c -lOpenCL
137
138Note that we have experienced the generated OpenCL code freezing
139on some inputs (e.g., the PolyBench symm benchmark) when using
140at least some version of the Nvidia OpenCL library, while the
141corresponding CUDA code runs fine.
142We have experienced no such freezes when using AMD, ARM or Intel
143OpenCL libraries.
144
145By default, the compiled executable will need the _kernel.cl file at
146run time.  Alternatively, the option --opencl-embed-kernel-code may be
147given to place the kernel code in a string literal.  The kernel code is
148then compiled into the host binary, such that the _kernel.cl file is no
149longer needed at run time.  Any kernel include files, in particular
150those supplied using --opencl-include-file, will still be required at
151run time.
152
153
154Function calls
155
156Function calls inside the analyzed fragment are reproduced
157in the CUDA or OpenCL code, but for now it is left to the user
158to make sure that the functions that are being called are
159available from the generated kernels.
160
161In the case of OpenCL code, the --opencl-include-file option
162may be used to specify one or more files to be #include'd
163from the generated code.  These files may then contain
164the definitions of the functions being called from the
165program fragment.  If the pathnames of the included files
166are relative to the current directory, then you may need
167to additionally specify the --opencl-compiler-options=-I.
168to make sure that the files can be found by the OpenCL compiler.
169The included files may contain definitions of types used by the
170generated kernels.  By default, PPCG generates definitions for
171types as needed, but these definitions may collide with those in
172the included files, as PPCG does not consider the contents of the
173included files.  The --no-opencl-print-kernel-types will prevent
174PPCG from generating type definitions.
175
176
177GNU extensions
178
179By default, PPCG may print out macro definitions that involve
180GNU extensions such as __typeof__ and statement expressions.
181Some compilers may not support these extensions.
182In particular, OpenCL 1.2 beignet 1.1.1 (git-6de6918)
183has been reported not to support __typeof__.
184The use of these extensions can be turned off with the
185--no-allow-gnu-extensions option.
186
187
188Processing PolyBench
189
190When processing a PolyBench/C 3.2 benchmark, you should always specify
191-DPOLYBENCH_USE_C99_PROTO on the ppcg command line.  Otherwise, the source
192files are inconsistent, having fixed size arrays but parametrically
193bounded loops iterating over them.
194However, you should not specify this define when compiling
195the PPCG generated code using nvcc since CUDA does not support VLAs.
196
197
198CUDA and function overloading
199
200While CUDA supports function overloading based on the arguments types,
201no such function overloading exists in the input language C.  Since PPCG
202simply prints out the same function name as in the original code, this
203may result in a different function being called based on the types
204of the arguments.  For example, if the original code contains a call
205to the function sqrt() with a float argument, then the argument will
206be promoted to a double and the sqrt() function will be called.
207In the transformed (CUDA) code, however, overloading will cause the
208function sqrtf() to be called.  Until this issue has been resolved in PPCG,
209we recommend that users either explicitly call the function sqrtf() or
210explicitly cast the argument to double in the input code.
211
212
213Contact
214
215For bug reports, feature requests and questions,
216contact http://groups.google.com/group/isl-development
217
218Whenever you report a bug, please mention the exact version of PPCG
219that you are using (output of "./ppcg --version").  If you are unable
220to compile PPCG, then report the git version (output of "git describe")
221or the version number included in the name of the tarball.
222
223
224Citing PPCG
225
226If you use PPCG for your research, you are invited to cite
227the following paper.
228
229@article{Verdoolaege2013PPCG,
230    author = {Verdoolaege, Sven and Juega, Juan Carlos and Cohen, Albert and
231		G\'{o}mez, Jos{\'e} Ignacio and Tenllado, Christian and
232		Catthoor, Francky},
233    title = {Polyhedral parallel code generation for CUDA},
234    journal = {ACM Trans. Archit. Code Optim.},
235    issue_date = {January 2013},
236    volume = {9},
237    number = {4},
238    month = jan,
239    year = {2013},
240    issn = {1544-3566},
241    pages = {54:1--54:23},
242    doi = {10.1145/2400682.2400713},
243    acmid = {2400713},
244    publisher = {ACM},
245    address = {New York, NY, USA},
246}
247