• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1<html>
2<head>
3<title>pcre2serialize specification</title>
4</head>
5<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
6<h1>pcre2serialize man page</h1>
7<p>
8Return to the <a href="index.html">PCRE2 index page</a>.
9</p>
10<p>
11This page is part of the PCRE2 HTML documentation. It was generated
12automatically from the original man page. If there is any nonsense in it,
13please consult the man page, in case the conversion went wrong.
14<br>
15<ul>
16<li><a name="TOC1" href="#SEC1">SAVING AND RE-USING PRECOMPILED PCRE2 PATTERNS</a>
17<li><a name="TOC2" href="#SEC2">SECURITY CONCERNS</a>
18<li><a name="TOC3" href="#SEC3">SAVING COMPILED PATTERNS</a>
19<li><a name="TOC4" href="#SEC4">RE-USING PRECOMPILED PATTERNS</a>
20<li><a name="TOC5" href="#SEC5">AUTHOR</a>
21<li><a name="TOC6" href="#SEC6">REVISION</a>
22</ul>
23<br><a name="SEC1" href="#TOC1">SAVING AND RE-USING PRECOMPILED PCRE2 PATTERNS</a><br>
24<P>
25<b>int32_t pcre2_serialize_decode(pcre2_code **<i>codes</i>,</b>
26<b>  int32_t <i>number_of_codes</i>, const uint32_t *<i>bytes</i>,</b>
27<b>  pcre2_general_context *<i>gcontext</i>);</b>
28<br>
29<br>
30<b>int32_t pcre2_serialize_encode(pcre2_code **<i>codes</i>,</b>
31<b>  int32_t <i>number_of_codes</i>, uint32_t **<i>serialized_bytes</i>,</b>
32<b>  PCRE2_SIZE *<i>serialized_size</i>, pcre2_general_context *<i>gcontext</i>);</b>
33<br>
34<br>
35<b>void pcre2_serialize_free(uint8_t *<i>bytes</i>);</b>
36<br>
37<br>
38<b>int32_t pcre2_serialize_get_number_of_codes(const uint8_t *<i>bytes</i>);</b>
39<br>
40<br>
41If you are running an application that uses a large number of regular
42expression patterns, it may be useful to store them in a precompiled form
43instead of having to compile them every time the application is run. However,
44if you are using the just-in-time optimization feature, it is not possible to
45save and reload the JIT data, because it is position-dependent. The host on
46which the patterns are reloaded must be running the same version of PCRE2, with
47the same code unit width, and must also have the same endianness, pointer width
48and PCRE2_SIZE type. For example, patterns compiled on a 32-bit system using
49PCRE2's 16-bit library cannot be reloaded on a 64-bit system, nor can they be
50reloaded using the 8-bit library.
51</P>
52<br><a name="SEC2" href="#TOC1">SECURITY CONCERNS</a><br>
53<P>
54The facility for saving and restoring compiled patterns is intended for use
55within individual applications. As such, the data supplied to
56<b>pcre2_serialize_decode()</b> is expected to be trusted data, not data from
57arbitrary external sources. There is only some simple consistency checking, not
58complete validation of what is being re-loaded.
59</P>
60<br><a name="SEC3" href="#TOC1">SAVING COMPILED PATTERNS</a><br>
61<P>
62Before compiled patterns can be saved they must be serialized, that is,
63converted to a stream of bytes. A single byte stream may contain any number of
64compiled patterns, but they must all use the same character tables. A single
65copy of the tables is included in the byte stream (its size is 1088 bytes). For
66more details of character tables, see the
67<a href="pcre2api.html#localesupport">section on locale support</a>
68in the
69<a href="pcre2api.html"><b>pcre2api</b></a>
70documentation.
71</P>
72<P>
73The function <b>pcre2_serialize_encode()</b> creates a serialized byte stream
74from a list of compiled patterns. Its first two arguments specify the list,
75being a pointer to a vector of pointers to compiled patterns, and the length of
76the vector. The third and fourth arguments point to variables which are set to
77point to the created byte stream and its length, respectively. The final
78argument is a pointer to a general context, which can be used to specify custom
79memory mangagement functions. If this argument is NULL, <b>malloc()</b> is used
80to obtain memory for the byte stream. The yield of the function is the number
81of serialized patterns, or one of the following negative error codes:
82<pre>
83  PCRE2_ERROR_BADDATA      the number of patterns is zero or less
84  PCRE2_ERROR_BADMAGIC     mismatch of id bytes in one of the patterns
85  PCRE2_ERROR_MEMORY       memory allocation failed
86  PCRE2_ERROR_MIXEDTABLES  the patterns do not all use the same tables
87  PCRE2_ERROR_NULL         the 1st, 3rd, or 4th argument is NULL
88</pre>
89PCRE2_ERROR_BADMAGIC means either that a pattern's code has been corrupted, or
90that a slot in the vector does not point to a compiled pattern.
91</P>
92<P>
93Once a set of patterns has been serialized you can save the data in any
94appropriate manner. Here is sample code that compiles two patterns and writes
95them to a file. It assumes that the variable <i>fd</i> refers to a file that is
96open for output. The error checking that should be present in a real
97application has been omitted for simplicity.
98<pre>
99  int errorcode;
100  uint8_t *bytes;
101  PCRE2_SIZE erroroffset;
102  PCRE2_SIZE bytescount;
103  pcre2_code *list_of_codes[2];
104  list_of_codes[0] = pcre2_compile("first pattern",
105    PCRE2_ZERO_TERMINATED, 0, &errorcode, &erroroffset, NULL);
106  list_of_codes[1] = pcre2_compile("second pattern",
107    PCRE2_ZERO_TERMINATED, 0, &errorcode, &erroroffset, NULL);
108  errorcode = pcre2_serialize_encode(list_of_codes, 2, &bytes,
109    &bytescount, NULL);
110  errorcode = fwrite(bytes, 1, bytescount, fd);
111</pre>
112Note that the serialized data is binary data that may contain any of the 256
113possible byte values. On systems that make a distinction between binary and
114non-binary data, be sure that the file is opened for binary output.
115</P>
116<P>
117Serializing a set of patterns leaves the original data untouched, so they can
118still be used for matching. Their memory must eventually be freed in the usual
119way by calling <b>pcre2_code_free()</b>. When you have finished with the byte
120stream, it too must be freed by calling <b>pcre2_serialize_free()</b>.
121</P>
122<br><a name="SEC4" href="#TOC1">RE-USING PRECOMPILED PATTERNS</a><br>
123<P>
124In order to re-use a set of saved patterns you must first make the serialized
125byte stream available in main memory (for example, by reading from a file). The
126management of this memory block is up to the application. You can use the
127<b>pcre2_serialize_get_number_of_codes()</b> function to find out how many
128compiled patterns are in the serialized data without actually decoding the
129patterns:
130<pre>
131  uint8_t *bytes = &#60;serialized data&#62;;
132  int32_t number_of_codes = pcre2_serialize_get_number_of_codes(bytes);
133</pre>
134The <b>pcre2_serialize_decode()</b> function reads a byte stream and recreates
135the compiled patterns in new memory blocks, setting pointers to them in a
136vector. The first two arguments are a pointer to a suitable vector and its
137length, and the third argument points to a byte stream. The final argument is a
138pointer to a general context, which can be used to specify custom memory
139mangagement functions for the decoded patterns. If this argument is NULL,
140<b>malloc()</b> and <b>free()</b> are used. After deserialization, the byte
141stream is no longer needed and can be discarded.
142<pre>
143  int32_t number_of_codes;
144  pcre2_code *list_of_codes[2];
145  uint8_t *bytes = &#60;serialized data&#62;;
146  int32_t number_of_codes =
147    pcre2_serialize_decode(list_of_codes, 2, bytes, NULL);
148</pre>
149If the vector is not large enough for all the patterns in the byte stream, it
150is filled with those that fit, and the remainder are ignored. The yield of the
151function is the number of decoded patterns, or one of the following negative
152error codes:
153<pre>
154  PCRE2_ERROR_BADDATA    second argument is zero or less
155  PCRE2_ERROR_BADMAGIC   mismatch of id bytes in the data
156  PCRE2_ERROR_BADMODE    mismatch of code unit size or PCRE2 version
157  PCRE2_ERROR_BADSERIALIZEDDATA  other sanity check failure
158  PCRE2_ERROR_MEMORY     memory allocation failed
159  PCRE2_ERROR_NULL       first or third argument is NULL
160</pre>
161PCRE2_ERROR_BADMAGIC may mean that the data is corrupt, or that it was compiled
162on a system with different endianness.
163</P>
164<P>
165Decoded patterns can be used for matching in the usual way, and must be freed
166by calling <b>pcre2_code_free()</b>. However, be aware that there is a potential
167race issue if you are using multiple patterns that were decoded from a single
168byte stream in a multithreaded application. A single copy of the character
169tables is used by all the decoded patterns and a reference count is used to
170arrange for its memory to be automatically freed when the last pattern is
171freed, but there is no locking on this reference count. Therefore, if you want
172to call <b>pcre2_code_free()</b> for these patterns in different threads, you
173must arrange your own locking, and ensure that <b>pcre2_code_free()</b> cannot
174be called by two threads at the same time.
175</P>
176<P>
177If a pattern was processed by <b>pcre2_jit_compile()</b> before being
178serialized, the JIT data is discarded and so is no longer available after a
179save/restore cycle. You can, however, process a restored pattern with
180<b>pcre2_jit_compile()</b> if you wish.
181</P>
182<br><a name="SEC5" href="#TOC1">AUTHOR</a><br>
183<P>
184Philip Hazel
185<br>
186University Computing Service
187<br>
188Cambridge, England.
189<br>
190</P>
191<br><a name="SEC6" href="#TOC1">REVISION</a><br>
192<P>
193Last updated: 24 May 2016
194<br>
195Copyright &copy; 1997-2016 University of Cambridge.
196<br>
197<p>
198Return to the <a href="index.html">PCRE2 index page</a>.
199</p>
200