1<html> 2<head> 3<title>pcre2serialize specification</title> 4</head> 5<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB"> 6<h1>pcre2serialize man page</h1> 7<p> 8Return to the <a href="index.html">PCRE2 index page</a>. 9</p> 10<p> 11This page is part of the PCRE2 HTML documentation. It was generated 12automatically from the original man page. If there is any nonsense in it, 13please consult the man page, in case the conversion went wrong. 14<br> 15<ul> 16<li><a name="TOC1" href="#SEC1">SAVING AND RE-USING PRECOMPILED PCRE2 PATTERNS</a> 17<li><a name="TOC2" href="#SEC2">SECURITY CONCERNS</a> 18<li><a name="TOC3" href="#SEC3">SAVING COMPILED PATTERNS</a> 19<li><a name="TOC4" href="#SEC4">RE-USING PRECOMPILED PATTERNS</a> 20<li><a name="TOC5" href="#SEC5">AUTHOR</a> 21<li><a name="TOC6" href="#SEC6">REVISION</a> 22</ul> 23<br><a name="SEC1" href="#TOC1">SAVING AND RE-USING PRECOMPILED PCRE2 PATTERNS</a><br> 24<P> 25<b>int32_t pcre2_serialize_decode(pcre2_code **<i>codes</i>,</b> 26<b> int32_t <i>number_of_codes</i>, const uint32_t *<i>bytes</i>,</b> 27<b> pcre2_general_context *<i>gcontext</i>);</b> 28<br> 29<br> 30<b>int32_t pcre2_serialize_encode(pcre2_code **<i>codes</i>,</b> 31<b> int32_t <i>number_of_codes</i>, uint32_t **<i>serialized_bytes</i>,</b> 32<b> PCRE2_SIZE *<i>serialized_size</i>, pcre2_general_context *<i>gcontext</i>);</b> 33<br> 34<br> 35<b>void pcre2_serialize_free(uint8_t *<i>bytes</i>);</b> 36<br> 37<br> 38<b>int32_t pcre2_serialize_get_number_of_codes(const uint8_t *<i>bytes</i>);</b> 39<br> 40<br> 41If you are running an application that uses a large number of regular 42expression patterns, it may be useful to store them in a precompiled form 43instead of having to compile them every time the application is run. However, 44if you are using the just-in-time optimization feature, it is not possible to 45save and reload the JIT data, because it is position-dependent. The host on 46which the patterns are reloaded must be running the same version of PCRE2, with 47the same code unit width, and must also have the same endianness, pointer width 48and PCRE2_SIZE type. For example, patterns compiled on a 32-bit system using 49PCRE2's 16-bit library cannot be reloaded on a 64-bit system, nor can they be 50reloaded using the 8-bit library. 51</P> 52<P> 53Note that "serialization" in PCRE2 does not convert compiled patterns to an 54abstract format like Java or .NET serialization. The serialized output is 55really just a bytecode dump, which is why it can only be reloaded in the same 56environment as the one that created it. Hence the restrictions mentioned above. 57Applications that are not statically linked with a fixed version of PCRE2 must 58be prepared to recompile patterns from their sources, in order to be immune to 59PCRE2 upgrades. 60</P> 61<br><a name="SEC2" href="#TOC1">SECURITY CONCERNS</a><br> 62<P> 63The facility for saving and restoring compiled patterns is intended for use 64within individual applications. As such, the data supplied to 65<b>pcre2_serialize_decode()</b> is expected to be trusted data, not data from 66arbitrary external sources. There is only some simple consistency checking, not 67complete validation of what is being re-loaded. Corrupted data may cause 68undefined results. For example, if the length field of a pattern in the 69serialized data is corrupted, the deserializing code may read beyond the end of 70the byte stream that is passed to it. 71</P> 72<br><a name="SEC3" href="#TOC1">SAVING COMPILED PATTERNS</a><br> 73<P> 74Before compiled patterns can be saved they must be serialized, which in PCRE2 75means converting the pattern to a stream of bytes. A single byte stream may 76contain any number of compiled patterns, but they must all use the same 77character tables. A single copy of the tables is included in the byte stream 78(its size is 1088 bytes). For more details of character tables, see the 79<a href="pcre2api.html#localesupport">section on locale support</a> 80in the 81<a href="pcre2api.html"><b>pcre2api</b></a> 82documentation. 83</P> 84<P> 85The function <b>pcre2_serialize_encode()</b> creates a serialized byte stream 86from a list of compiled patterns. Its first two arguments specify the list, 87being a pointer to a vector of pointers to compiled patterns, and the length of 88the vector. The third and fourth arguments point to variables which are set to 89point to the created byte stream and its length, respectively. The final 90argument is a pointer to a general context, which can be used to specify custom 91memory mangagement functions. If this argument is NULL, <b>malloc()</b> is used 92to obtain memory for the byte stream. The yield of the function is the number 93of serialized patterns, or one of the following negative error codes: 94<pre> 95 PCRE2_ERROR_BADDATA the number of patterns is zero or less 96 PCRE2_ERROR_BADMAGIC mismatch of id bytes in one of the patterns 97 PCRE2_ERROR_MEMORY memory allocation failed 98 PCRE2_ERROR_MIXEDTABLES the patterns do not all use the same tables 99 PCRE2_ERROR_NULL the 1st, 3rd, or 4th argument is NULL 100</pre> 101PCRE2_ERROR_BADMAGIC means either that a pattern's code has been corrupted, or 102that a slot in the vector does not point to a compiled pattern. 103</P> 104<P> 105Once a set of patterns has been serialized you can save the data in any 106appropriate manner. Here is sample code that compiles two patterns and writes 107them to a file. It assumes that the variable <i>fd</i> refers to a file that is 108open for output. The error checking that should be present in a real 109application has been omitted for simplicity. 110<pre> 111 int errorcode; 112 uint8_t *bytes; 113 PCRE2_SIZE erroroffset; 114 PCRE2_SIZE bytescount; 115 pcre2_code *list_of_codes[2]; 116 list_of_codes[0] = pcre2_compile("first pattern", 117 PCRE2_ZERO_TERMINATED, 0, &errorcode, &erroroffset, NULL); 118 list_of_codes[1] = pcre2_compile("second pattern", 119 PCRE2_ZERO_TERMINATED, 0, &errorcode, &erroroffset, NULL); 120 errorcode = pcre2_serialize_encode(list_of_codes, 2, &bytes, 121 &bytescount, NULL); 122 errorcode = fwrite(bytes, 1, bytescount, fd); 123</pre> 124Note that the serialized data is binary data that may contain any of the 256 125possible byte values. On systems that make a distinction between binary and 126non-binary data, be sure that the file is opened for binary output. 127</P> 128<P> 129Serializing a set of patterns leaves the original data untouched, so they can 130still be used for matching. Their memory must eventually be freed in the usual 131way by calling <b>pcre2_code_free()</b>. When you have finished with the byte 132stream, it too must be freed by calling <b>pcre2_serialize_free()</b>. If this 133function is called with a NULL argument, it returns immediately without doing 134anything. 135</P> 136<br><a name="SEC4" href="#TOC1">RE-USING PRECOMPILED PATTERNS</a><br> 137<P> 138In order to re-use a set of saved patterns you must first make the serialized 139byte stream available in main memory (for example, by reading from a file). The 140management of this memory block is up to the application. You can use the 141<b>pcre2_serialize_get_number_of_codes()</b> function to find out how many 142compiled patterns are in the serialized data without actually decoding the 143patterns: 144<pre> 145 uint8_t *bytes = <serialized data>; 146 int32_t number_of_codes = pcre2_serialize_get_number_of_codes(bytes); 147</pre> 148The <b>pcre2_serialize_decode()</b> function reads a byte stream and recreates 149the compiled patterns in new memory blocks, setting pointers to them in a 150vector. The first two arguments are a pointer to a suitable vector and its 151length, and the third argument points to a byte stream. The final argument is a 152pointer to a general context, which can be used to specify custom memory 153mangagement functions for the decoded patterns. If this argument is NULL, 154<b>malloc()</b> and <b>free()</b> are used. After deserialization, the byte 155stream is no longer needed and can be discarded. 156<pre> 157 int32_t number_of_codes; 158 pcre2_code *list_of_codes[2]; 159 uint8_t *bytes = <serialized data>; 160 int32_t number_of_codes = 161 pcre2_serialize_decode(list_of_codes, 2, bytes, NULL); 162</pre> 163If the vector is not large enough for all the patterns in the byte stream, it 164is filled with those that fit, and the remainder are ignored. The yield of the 165function is the number of decoded patterns, or one of the following negative 166error codes: 167<pre> 168 PCRE2_ERROR_BADDATA second argument is zero or less 169 PCRE2_ERROR_BADMAGIC mismatch of id bytes in the data 170 PCRE2_ERROR_BADMODE mismatch of code unit size or PCRE2 version 171 PCRE2_ERROR_BADSERIALIZEDDATA other sanity check failure 172 PCRE2_ERROR_MEMORY memory allocation failed 173 PCRE2_ERROR_NULL first or third argument is NULL 174</pre> 175PCRE2_ERROR_BADMAGIC may mean that the data is corrupt, or that it was compiled 176on a system with different endianness. 177</P> 178<P> 179Decoded patterns can be used for matching in the usual way, and must be freed 180by calling <b>pcre2_code_free()</b>. However, be aware that there is a potential 181race issue if you are using multiple patterns that were decoded from a single 182byte stream in a multithreaded application. A single copy of the character 183tables is used by all the decoded patterns and a reference count is used to 184arrange for its memory to be automatically freed when the last pattern is 185freed, but there is no locking on this reference count. Therefore, if you want 186to call <b>pcre2_code_free()</b> for these patterns in different threads, you 187must arrange your own locking, and ensure that <b>pcre2_code_free()</b> cannot 188be called by two threads at the same time. 189</P> 190<P> 191If a pattern was processed by <b>pcre2_jit_compile()</b> before being 192serialized, the JIT data is discarded and so is no longer available after a 193save/restore cycle. You can, however, process a restored pattern with 194<b>pcre2_jit_compile()</b> if you wish. 195</P> 196<br><a name="SEC5" href="#TOC1">AUTHOR</a><br> 197<P> 198Philip Hazel 199<br> 200University Computing Service 201<br> 202Cambridge, England. 203<br> 204</P> 205<br><a name="SEC6" href="#TOC1">REVISION</a><br> 206<P> 207Last updated: 27 June 2018 208<br> 209Copyright © 1997-2018 University of Cambridge. 210<br> 211<p> 212Return to the <a href="index.html">PCRE2 index page</a>. 213</p> 214