• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1.. highlight:: c
2
3.. _unicodeobjects:
4
5Unicode Objects and Codecs
6--------------------------
7
8.. sectionauthor:: Marc-André Lemburg <mal@lemburg.com>
9.. sectionauthor:: Georg Brandl <georg@python.org>
10
11Unicode Objects
12^^^^^^^^^^^^^^^
13
14Since the implementation of :pep:`393` in Python 3.3, Unicode objects internally
15use a variety of representations, in order to allow handling the complete range
16of Unicode characters while staying memory efficient.  There are special cases
17for strings where all code points are below 128, 256, or 65536; otherwise, code
18points must be below 1114112 (which is the full Unicode range).
19
20:c:type:`Py_UNICODE*` and UTF-8 representations are created on demand and cached
21in the Unicode object.  The :c:type:`Py_UNICODE*` representation is deprecated
22and inefficient; it should be avoided in performance- or memory-sensitive
23situations.
24
25Due to the transition between the old APIs and the new APIs, Unicode objects
26can internally be in two states depending on how they were created:
27
28* "canonical" Unicode objects are all objects created by a non-deprecated
29  Unicode API.  They use the most efficient representation allowed by the
30  implementation.
31
32* "legacy" Unicode objects have been created through one of the deprecated
33  APIs (typically :c:func:`PyUnicode_FromUnicode`) and only bear the
34  :c:type:`Py_UNICODE*` representation; you will have to call
35  :c:func:`PyUnicode_READY` on them before calling any other API.
36
37.. note::
38   The "legacy" Unicode object will be removed in Python 3.12 with deprecated
39   APIs. All Unicode objects will be "canonical" since then. See :pep:`623`
40   for more information.
41
42
43Unicode Type
44""""""""""""
45
46These are the basic Unicode object types used for the Unicode implementation in
47Python:
48
49.. c:type:: Py_UCS4
50            Py_UCS2
51            Py_UCS1
52
53   These types are typedefs for unsigned integer types wide enough to contain
54   characters of 32 bits, 16 bits and 8 bits, respectively.  When dealing with
55   single Unicode characters, use :c:type:`Py_UCS4`.
56
57   .. versionadded:: 3.3
58
59
60.. c:type:: Py_UNICODE
61
62   This is a typedef of :c:type:`wchar_t`, which is a 16-bit type or 32-bit type
63   depending on the platform.
64
65   .. versionchanged:: 3.3
66      In previous versions, this was a 16-bit type or a 32-bit type depending on
67      whether you selected a "narrow" or "wide" Unicode version of Python at
68      build time.
69
70
71.. c:type:: PyASCIIObject
72            PyCompactUnicodeObject
73            PyUnicodeObject
74
75   These subtypes of :c:type:`PyObject` represent a Python Unicode object.  In
76   almost all cases, they shouldn't be used directly, since all API functions
77   that deal with Unicode objects take and return :c:type:`PyObject` pointers.
78
79   .. versionadded:: 3.3
80
81
82.. c:var:: PyTypeObject PyUnicode_Type
83
84   This instance of :c:type:`PyTypeObject` represents the Python Unicode type.  It
85   is exposed to Python code as ``str``.
86
87
88The following APIs are really C macros and can be used to do fast checks and to
89access internal read-only data of Unicode objects:
90
91.. c:function:: int PyUnicode_Check(PyObject *o)
92
93   Return true if the object *o* is a Unicode object or an instance of a Unicode
94   subtype.
95
96
97.. c:function:: int PyUnicode_CheckExact(PyObject *o)
98
99   Return true if the object *o* is a Unicode object, but not an instance of a
100   subtype.
101
102
103.. c:function:: int PyUnicode_READY(PyObject *o)
104
105   Ensure the string object *o* is in the "canonical" representation.  This is
106   required before using any of the access macros described below.
107
108   .. XXX expand on when it is not required
109
110   Returns ``0`` on success and ``-1`` with an exception set on failure, which in
111   particular happens if memory allocation fails.
112
113   .. versionadded:: 3.3
114
115   .. deprecated-removed:: 3.10 3.12
116      This API will be removed with :c:func:`PyUnicode_FromUnicode`.
117
118
119.. c:function:: Py_ssize_t PyUnicode_GET_LENGTH(PyObject *o)
120
121   Return the length of the Unicode string, in code points.  *o* has to be a
122   Unicode object in the "canonical" representation (not checked).
123
124   .. versionadded:: 3.3
125
126
127.. c:function:: Py_UCS1* PyUnicode_1BYTE_DATA(PyObject *o)
128                Py_UCS2* PyUnicode_2BYTE_DATA(PyObject *o)
129                Py_UCS4* PyUnicode_4BYTE_DATA(PyObject *o)
130
131   Return a pointer to the canonical representation cast to UCS1, UCS2 or UCS4
132   integer types for direct character access.  No checks are performed if the
133   canonical representation has the correct character size; use
134   :c:func:`PyUnicode_KIND` to select the right macro.  Make sure
135   :c:func:`PyUnicode_READY` has been called before accessing this.
136
137   .. versionadded:: 3.3
138
139
140.. c:macro:: PyUnicode_WCHAR_KIND
141             PyUnicode_1BYTE_KIND
142             PyUnicode_2BYTE_KIND
143             PyUnicode_4BYTE_KIND
144
145   Return values of the :c:func:`PyUnicode_KIND` macro.
146
147   .. versionadded:: 3.3
148
149   .. deprecated-removed:: 3.10 3.12
150      ``PyUnicode_WCHAR_KIND`` is deprecated.
151
152
153.. c:function:: int PyUnicode_KIND(PyObject *o)
154
155   Return one of the PyUnicode kind constants (see above) that indicate how many
156   bytes per character this Unicode object uses to store its data.  *o* has to
157   be a Unicode object in the "canonical" representation (not checked).
158
159   .. XXX document "0" return value?
160
161   .. versionadded:: 3.3
162
163
164.. c:function:: void* PyUnicode_DATA(PyObject *o)
165
166   Return a void pointer to the raw Unicode buffer.  *o* has to be a Unicode
167   object in the "canonical" representation (not checked).
168
169   .. versionadded:: 3.3
170
171
172.. c:function:: void PyUnicode_WRITE(int kind, void *data, Py_ssize_t index, \
173                                     Py_UCS4 value)
174
175   Write into a canonical representation *data* (as obtained with
176   :c:func:`PyUnicode_DATA`).  This macro does not do any sanity checks and is
177   intended for usage in loops.  The caller should cache the *kind* value and
178   *data* pointer as obtained from other macro calls.  *index* is the index in
179   the string (starts at 0) and *value* is the new code point value which should
180   be written to that location.
181
182   .. versionadded:: 3.3
183
184
185.. c:function:: Py_UCS4 PyUnicode_READ(int kind, void *data, Py_ssize_t index)
186
187   Read a code point from a canonical representation *data* (as obtained with
188   :c:func:`PyUnicode_DATA`).  No checks or ready calls are performed.
189
190   .. versionadded:: 3.3
191
192
193.. c:function:: Py_UCS4 PyUnicode_READ_CHAR(PyObject *o, Py_ssize_t index)
194
195   Read a character from a Unicode object *o*, which must be in the "canonical"
196   representation.  This is less efficient than :c:func:`PyUnicode_READ` if you
197   do multiple consecutive reads.
198
199   .. versionadded:: 3.3
200
201
202.. c:macro:: PyUnicode_MAX_CHAR_VALUE(o)
203
204   Return the maximum code point that is suitable for creating another string
205   based on *o*, which must be in the "canonical" representation.  This is
206   always an approximation but more efficient than iterating over the string.
207
208   .. versionadded:: 3.3
209
210
211.. c:function:: Py_ssize_t PyUnicode_GET_SIZE(PyObject *o)
212
213   Return the size of the deprecated :c:type:`Py_UNICODE` representation, in
214   code units (this includes surrogate pairs as 2 units).  *o* has to be a
215   Unicode object (not checked).
216
217   .. deprecated-removed:: 3.3 3.12
218      Part of the old-style Unicode API, please migrate to using
219      :c:func:`PyUnicode_GET_LENGTH`.
220
221
222.. c:function:: Py_ssize_t PyUnicode_GET_DATA_SIZE(PyObject *o)
223
224   Return the size of the deprecated :c:type:`Py_UNICODE` representation in
225   bytes.  *o* has to be a Unicode object (not checked).
226
227   .. deprecated-removed:: 3.3 3.12
228      Part of the old-style Unicode API, please migrate to using
229      :c:func:`PyUnicode_GET_LENGTH`.
230
231
232.. c:function:: Py_UNICODE* PyUnicode_AS_UNICODE(PyObject *o)
233                const char* PyUnicode_AS_DATA(PyObject *o)
234
235   Return a pointer to a :c:type:`Py_UNICODE` representation of the object.  The
236   returned buffer is always terminated with an extra null code point.  It
237   may also contain embedded null code points, which would cause the string
238   to be truncated when used in most C functions.  The ``AS_DATA`` form
239   casts the pointer to :c:type:`const char *`.  The *o* argument has to be
240   a Unicode object (not checked).
241
242   .. versionchanged:: 3.3
243      This macro is now inefficient -- because in many cases the
244      :c:type:`Py_UNICODE` representation does not exist and needs to be created
245      -- and can fail (return ``NULL`` with an exception set).  Try to port the
246      code to use the new :c:func:`PyUnicode_nBYTE_DATA` macros or use
247      :c:func:`PyUnicode_WRITE` or :c:func:`PyUnicode_READ`.
248
249   .. deprecated-removed:: 3.3 3.12
250      Part of the old-style Unicode API, please migrate to using the
251      :c:func:`PyUnicode_nBYTE_DATA` family of macros.
252
253
254.. c:function:: int PyUnicode_IsIdentifier(PyObject *o)
255
256   Return ``1`` if the string is a valid identifier according to the language
257   definition, section :ref:`identifiers`. Return ``0`` otherwise.
258
259   .. versionchanged:: 3.9
260      The function does not call :c:func:`Py_FatalError` anymore if the string
261      is not ready.
262
263
264Unicode Character Properties
265""""""""""""""""""""""""""""
266
267Unicode provides many different character properties. The most often needed ones
268are available through these macros which are mapped to C functions depending on
269the Python configuration.
270
271
272.. c:function:: int Py_UNICODE_ISSPACE(Py_UNICODE ch)
273
274   Return ``1`` or ``0`` depending on whether *ch* is a whitespace character.
275
276
277.. c:function:: int Py_UNICODE_ISLOWER(Py_UNICODE ch)
278
279   Return ``1`` or ``0`` depending on whether *ch* is a lowercase character.
280
281
282.. c:function:: int Py_UNICODE_ISUPPER(Py_UNICODE ch)
283
284   Return ``1`` or ``0`` depending on whether *ch* is an uppercase character.
285
286
287.. c:function:: int Py_UNICODE_ISTITLE(Py_UNICODE ch)
288
289   Return ``1`` or ``0`` depending on whether *ch* is a titlecase character.
290
291
292.. c:function:: int Py_UNICODE_ISLINEBREAK(Py_UNICODE ch)
293
294   Return ``1`` or ``0`` depending on whether *ch* is a linebreak character.
295
296
297.. c:function:: int Py_UNICODE_ISDECIMAL(Py_UNICODE ch)
298
299   Return ``1`` or ``0`` depending on whether *ch* is a decimal character.
300
301
302.. c:function:: int Py_UNICODE_ISDIGIT(Py_UNICODE ch)
303
304   Return ``1`` or ``0`` depending on whether *ch* is a digit character.
305
306
307.. c:function:: int Py_UNICODE_ISNUMERIC(Py_UNICODE ch)
308
309   Return ``1`` or ``0`` depending on whether *ch* is a numeric character.
310
311
312.. c:function:: int Py_UNICODE_ISALPHA(Py_UNICODE ch)
313
314   Return ``1`` or ``0`` depending on whether *ch* is an alphabetic character.
315
316
317.. c:function:: int Py_UNICODE_ISALNUM(Py_UNICODE ch)
318
319   Return ``1`` or ``0`` depending on whether *ch* is an alphanumeric character.
320
321
322.. c:function:: int Py_UNICODE_ISPRINTABLE(Py_UNICODE ch)
323
324   Return ``1`` or ``0`` depending on whether *ch* is a printable character.
325   Nonprintable characters are those characters defined in the Unicode character
326   database as "Other" or "Separator", excepting the ASCII space (0x20) which is
327   considered printable.  (Note that printable characters in this context are
328   those which should not be escaped when :func:`repr` is invoked on a string.
329   It has no bearing on the handling of strings written to :data:`sys.stdout` or
330   :data:`sys.stderr`.)
331
332
333These APIs can be used for fast direct character conversions:
334
335
336.. c:function:: Py_UNICODE Py_UNICODE_TOLOWER(Py_UNICODE ch)
337
338   Return the character *ch* converted to lower case.
339
340   .. deprecated:: 3.3
341      This function uses simple case mappings.
342
343
344.. c:function:: Py_UNICODE Py_UNICODE_TOUPPER(Py_UNICODE ch)
345
346   Return the character *ch* converted to upper case.
347
348   .. deprecated:: 3.3
349      This function uses simple case mappings.
350
351
352.. c:function:: Py_UNICODE Py_UNICODE_TOTITLE(Py_UNICODE ch)
353
354   Return the character *ch* converted to title case.
355
356   .. deprecated:: 3.3
357      This function uses simple case mappings.
358
359
360.. c:function:: int Py_UNICODE_TODECIMAL(Py_UNICODE ch)
361
362   Return the character *ch* converted to a decimal positive integer.  Return
363   ``-1`` if this is not possible.  This macro does not raise exceptions.
364
365
366.. c:function:: int Py_UNICODE_TODIGIT(Py_UNICODE ch)
367
368   Return the character *ch* converted to a single digit integer. Return ``-1`` if
369   this is not possible.  This macro does not raise exceptions.
370
371
372.. c:function:: double Py_UNICODE_TONUMERIC(Py_UNICODE ch)
373
374   Return the character *ch* converted to a double. Return ``-1.0`` if this is not
375   possible.  This macro does not raise exceptions.
376
377
378These APIs can be used to work with surrogates:
379
380.. c:macro:: Py_UNICODE_IS_SURROGATE(ch)
381
382   Check if *ch* is a surrogate (``0xD800 <= ch <= 0xDFFF``).
383
384.. c:macro:: Py_UNICODE_IS_HIGH_SURROGATE(ch)
385
386   Check if *ch* is a high surrogate (``0xD800 <= ch <= 0xDBFF``).
387
388.. c:macro:: Py_UNICODE_IS_LOW_SURROGATE(ch)
389
390   Check if *ch* is a low surrogate (``0xDC00 <= ch <= 0xDFFF``).
391
392.. c:macro:: Py_UNICODE_JOIN_SURROGATES(high, low)
393
394   Join two surrogate characters and return a single Py_UCS4 value.
395   *high* and *low* are respectively the leading and trailing surrogates in a
396   surrogate pair.
397
398
399Creating and accessing Unicode strings
400""""""""""""""""""""""""""""""""""""""
401
402To create Unicode objects and access their basic sequence properties, use these
403APIs:
404
405.. c:function:: PyObject* PyUnicode_New(Py_ssize_t size, Py_UCS4 maxchar)
406
407   Create a new Unicode object.  *maxchar* should be the true maximum code point
408   to be placed in the string.  As an approximation, it can be rounded up to the
409   nearest value in the sequence 127, 255, 65535, 1114111.
410
411   This is the recommended way to allocate a new Unicode object.  Objects
412   created using this function are not resizable.
413
414   .. versionadded:: 3.3
415
416
417.. c:function:: PyObject* PyUnicode_FromKindAndData(int kind, const void *buffer, \
418                                                    Py_ssize_t size)
419
420   Create a new Unicode object with the given *kind* (possible values are
421   :c:macro:`PyUnicode_1BYTE_KIND` etc., as returned by
422   :c:func:`PyUnicode_KIND`).  The *buffer* must point to an array of *size*
423   units of 1, 2 or 4 bytes per character, as given by the kind.
424
425   .. versionadded:: 3.3
426
427
428.. c:function:: PyObject* PyUnicode_FromStringAndSize(const char *u, Py_ssize_t size)
429
430   Create a Unicode object from the char buffer *u*.  The bytes will be
431   interpreted as being UTF-8 encoded.  The buffer is copied into the new
432   object. If the buffer is not ``NULL``, the return value might be a shared
433   object, i.e. modification of the data is not allowed.
434
435   If *u* is ``NULL``, this function behaves like :c:func:`PyUnicode_FromUnicode`
436   with the buffer set to ``NULL``.  This usage is deprecated in favor of
437   :c:func:`PyUnicode_New`.
438
439
440.. c:function:: PyObject *PyUnicode_FromString(const char *u)
441
442   Create a Unicode object from a UTF-8 encoded null-terminated char buffer
443   *u*.
444
445
446.. c:function:: PyObject* PyUnicode_FromFormat(const char *format, ...)
447
448   Take a C :c:func:`printf`\ -style *format* string and a variable number of
449   arguments, calculate the size of the resulting Python Unicode string and return
450   a string with the values formatted into it.  The variable arguments must be C
451   types and must correspond exactly to the format characters in the *format*
452   ASCII-encoded string. The following format characters are allowed:
453
454   .. % This should be exactly the same as the table in PyErr_Format.
455   .. % The descriptions for %zd and %zu are wrong, but the truth is complicated
456   .. % because not all compilers support the %z width modifier -- we fake it
457   .. % when necessary via interpolating PY_FORMAT_SIZE_T.
458   .. % Similar comments apply to the %ll width modifier and
459
460   .. tabularcolumns:: |l|l|L|
461
462   +-------------------+---------------------+----------------------------------+
463   | Format Characters | Type                | Comment                          |
464   +===================+=====================+==================================+
465   | :attr:`%%`        | *n/a*               | The literal % character.         |
466   +-------------------+---------------------+----------------------------------+
467   | :attr:`%c`        | int                 | A single character,              |
468   |                   |                     | represented as a C int.          |
469   +-------------------+---------------------+----------------------------------+
470   | :attr:`%d`        | int                 | Equivalent to                    |
471   |                   |                     | ``printf("%d")``. [1]_           |
472   +-------------------+---------------------+----------------------------------+
473   | :attr:`%u`        | unsigned int        | Equivalent to                    |
474   |                   |                     | ``printf("%u")``. [1]_           |
475   +-------------------+---------------------+----------------------------------+
476   | :attr:`%ld`       | long                | Equivalent to                    |
477   |                   |                     | ``printf("%ld")``. [1]_          |
478   +-------------------+---------------------+----------------------------------+
479   | :attr:`%li`       | long                | Equivalent to                    |
480   |                   |                     | ``printf("%li")``. [1]_          |
481   +-------------------+---------------------+----------------------------------+
482   | :attr:`%lu`       | unsigned long       | Equivalent to                    |
483   |                   |                     | ``printf("%lu")``. [1]_          |
484   +-------------------+---------------------+----------------------------------+
485   | :attr:`%lld`      | long long           | Equivalent to                    |
486   |                   |                     | ``printf("%lld")``. [1]_         |
487   +-------------------+---------------------+----------------------------------+
488   | :attr:`%lli`      | long long           | Equivalent to                    |
489   |                   |                     | ``printf("%lli")``. [1]_         |
490   +-------------------+---------------------+----------------------------------+
491   | :attr:`%llu`      | unsigned long long  | Equivalent to                    |
492   |                   |                     | ``printf("%llu")``. [1]_         |
493   +-------------------+---------------------+----------------------------------+
494   | :attr:`%zd`       | Py_ssize_t          | Equivalent to                    |
495   |                   |                     | ``printf("%zd")``. [1]_          |
496   +-------------------+---------------------+----------------------------------+
497   | :attr:`%zi`       | Py_ssize_t          | Equivalent to                    |
498   |                   |                     | ``printf("%zi")``. [1]_          |
499   +-------------------+---------------------+----------------------------------+
500   | :attr:`%zu`       | size_t              | Equivalent to                    |
501   |                   |                     | ``printf("%zu")``. [1]_          |
502   +-------------------+---------------------+----------------------------------+
503   | :attr:`%i`        | int                 | Equivalent to                    |
504   |                   |                     | ``printf("%i")``. [1]_           |
505   +-------------------+---------------------+----------------------------------+
506   | :attr:`%x`        | int                 | Equivalent to                    |
507   |                   |                     | ``printf("%x")``. [1]_           |
508   +-------------------+---------------------+----------------------------------+
509   | :attr:`%s`        | const char\*        | A null-terminated C character    |
510   |                   |                     | array.                           |
511   +-------------------+---------------------+----------------------------------+
512   | :attr:`%p`        | const void\*        | The hex representation of a C    |
513   |                   |                     | pointer. Mostly equivalent to    |
514   |                   |                     | ``printf("%p")`` except that     |
515   |                   |                     | it is guaranteed to start with   |
516   |                   |                     | the literal ``0x`` regardless    |
517   |                   |                     | of what the platform's           |
518   |                   |                     | ``printf`` yields.               |
519   +-------------------+---------------------+----------------------------------+
520   | :attr:`%A`        | PyObject\*          | The result of calling            |
521   |                   |                     | :func:`ascii`.                   |
522   +-------------------+---------------------+----------------------------------+
523   | :attr:`%U`        | PyObject\*          | A Unicode object.                |
524   +-------------------+---------------------+----------------------------------+
525   | :attr:`%V`        | PyObject\*,         | A Unicode object (which may be   |
526   |                   | const char\*        | ``NULL``) and a null-terminated  |
527   |                   |                     | C character array as a second    |
528   |                   |                     | parameter (which will be used,   |
529   |                   |                     | if the first parameter is        |
530   |                   |                     | ``NULL``).                       |
531   +-------------------+---------------------+----------------------------------+
532   | :attr:`%S`        | PyObject\*          | The result of calling            |
533   |                   |                     | :c:func:`PyObject_Str`.          |
534   +-------------------+---------------------+----------------------------------+
535   | :attr:`%R`        | PyObject\*          | The result of calling            |
536   |                   |                     | :c:func:`PyObject_Repr`.         |
537   +-------------------+---------------------+----------------------------------+
538
539   An unrecognized format character causes all the rest of the format string to be
540   copied as-is to the result string, and any extra arguments discarded.
541
542   .. note::
543      The width formatter unit is number of characters rather than bytes.
544      The precision formatter unit is number of bytes for ``"%s"`` and
545      ``"%V"`` (if the ``PyObject*`` argument is ``NULL``), and a number of
546      characters for ``"%A"``, ``"%U"``, ``"%S"``, ``"%R"`` and ``"%V"``
547      (if the ``PyObject*`` argument is not ``NULL``).
548
549   .. [1] For integer specifiers (d, u, ld, li, lu, lld, lli, llu, zd, zi,
550      zu, i, x): the 0-conversion flag has effect even when a precision is given.
551
552   .. versionchanged:: 3.2
553      Support for ``"%lld"`` and ``"%llu"`` added.
554
555   .. versionchanged:: 3.3
556      Support for ``"%li"``, ``"%lli"`` and ``"%zi"`` added.
557
558   .. versionchanged:: 3.4
559      Support width and precision formatter for ``"%s"``, ``"%A"``, ``"%U"``,
560      ``"%V"``, ``"%S"``, ``"%R"`` added.
561
562
563.. c:function:: PyObject* PyUnicode_FromFormatV(const char *format, va_list vargs)
564
565   Identical to :c:func:`PyUnicode_FromFormat` except that it takes exactly two
566   arguments.
567
568
569.. c:function:: PyObject* PyUnicode_FromEncodedObject(PyObject *obj, \
570                               const char *encoding, const char *errors)
571
572   Decode an encoded object *obj* to a Unicode object.
573
574   :class:`bytes`, :class:`bytearray` and other
575   :term:`bytes-like objects <bytes-like object>`
576   are decoded according to the given *encoding* and using the error handling
577   defined by *errors*. Both can be ``NULL`` to have the interface use the default
578   values (see :ref:`builtincodecs` for details).
579
580   All other objects, including Unicode objects, cause a :exc:`TypeError` to be
581   set.
582
583   The API returns ``NULL`` if there was an error.  The caller is responsible for
584   decref'ing the returned objects.
585
586
587.. c:function:: Py_ssize_t PyUnicode_GetLength(PyObject *unicode)
588
589   Return the length of the Unicode object, in code points.
590
591   .. versionadded:: 3.3
592
593
594.. c:function:: Py_ssize_t PyUnicode_CopyCharacters(PyObject *to, \
595                                                    Py_ssize_t to_start, \
596                                                    PyObject *from, \
597                                                    Py_ssize_t from_start, \
598                                                    Py_ssize_t how_many)
599
600   Copy characters from one Unicode object into another.  This function performs
601   character conversion when necessary and falls back to :c:func:`memcpy` if
602   possible.  Returns ``-1`` and sets an exception on error, otherwise returns
603   the number of copied characters.
604
605   .. versionadded:: 3.3
606
607
608.. c:function:: Py_ssize_t PyUnicode_Fill(PyObject *unicode, Py_ssize_t start, \
609                        Py_ssize_t length, Py_UCS4 fill_char)
610
611   Fill a string with a character: write *fill_char* into
612   ``unicode[start:start+length]``.
613
614   Fail if *fill_char* is bigger than the string maximum character, or if the
615   string has more than 1 reference.
616
617   Return the number of written character, or return ``-1`` and raise an
618   exception on error.
619
620   .. versionadded:: 3.3
621
622
623.. c:function:: int PyUnicode_WriteChar(PyObject *unicode, Py_ssize_t index, \
624                                        Py_UCS4 character)
625
626   Write a character to a string.  The string must have been created through
627   :c:func:`PyUnicode_New`.  Since Unicode strings are supposed to be immutable,
628   the string must not be shared, or have been hashed yet.
629
630   This function checks that *unicode* is a Unicode object, that the index is
631   not out of bounds, and that the object can be modified safely (i.e. that it
632   its reference count is one).
633
634   .. versionadded:: 3.3
635
636
637.. c:function:: Py_UCS4 PyUnicode_ReadChar(PyObject *unicode, Py_ssize_t index)
638
639   Read a character from a string.  This function checks that *unicode* is a
640   Unicode object and the index is not out of bounds, in contrast to the macro
641   version :c:func:`PyUnicode_READ_CHAR`.
642
643   .. versionadded:: 3.3
644
645
646.. c:function:: PyObject* PyUnicode_Substring(PyObject *str, Py_ssize_t start, \
647                                              Py_ssize_t end)
648
649   Return a substring of *str*, from character index *start* (included) to
650   character index *end* (excluded).  Negative indices are not supported.
651
652   .. versionadded:: 3.3
653
654
655.. c:function:: Py_UCS4* PyUnicode_AsUCS4(PyObject *u, Py_UCS4 *buffer, \
656                                          Py_ssize_t buflen, int copy_null)
657
658   Copy the string *u* into a UCS4 buffer, including a null character, if
659   *copy_null* is set.  Returns ``NULL`` and sets an exception on error (in
660   particular, a :exc:`SystemError` if *buflen* is smaller than the length of
661   *u*).  *buffer* is returned on success.
662
663   .. versionadded:: 3.3
664
665
666.. c:function:: Py_UCS4* PyUnicode_AsUCS4Copy(PyObject *u)
667
668   Copy the string *u* into a new UCS4 buffer that is allocated using
669   :c:func:`PyMem_Malloc`.  If this fails, ``NULL`` is returned with a
670   :exc:`MemoryError` set.  The returned buffer always has an extra
671   null code point appended.
672
673   .. versionadded:: 3.3
674
675
676Deprecated Py_UNICODE APIs
677""""""""""""""""""""""""""
678
679.. deprecated-removed:: 3.3 4.0
680
681These API functions are deprecated with the implementation of :pep:`393`.
682Extension modules can continue using them, as they will not be removed in Python
6833.x, but need to be aware that their use can now cause performance and memory hits.
684
685
686.. c:function:: PyObject* PyUnicode_FromUnicode(const Py_UNICODE *u, Py_ssize_t size)
687
688   Create a Unicode object from the Py_UNICODE buffer *u* of the given size. *u*
689   may be ``NULL`` which causes the contents to be undefined. It is the user's
690   responsibility to fill in the needed data.  The buffer is copied into the new
691   object.
692
693   If the buffer is not ``NULL``, the return value might be a shared object.
694   Therefore, modification of the resulting Unicode object is only allowed when
695   *u* is ``NULL``.
696
697   If the buffer is ``NULL``, :c:func:`PyUnicode_READY` must be called once the
698   string content has been filled before using any of the access macros such as
699   :c:func:`PyUnicode_KIND`.
700
701   .. deprecated-removed:: 3.3 3.12
702      Part of the old-style Unicode API, please migrate to using
703      :c:func:`PyUnicode_FromKindAndData`, :c:func:`PyUnicode_FromWideChar`, or
704      :c:func:`PyUnicode_New`.
705
706
707.. c:function:: Py_UNICODE* PyUnicode_AsUnicode(PyObject *unicode)
708
709   Return a read-only pointer to the Unicode object's internal
710   :c:type:`Py_UNICODE` buffer, or ``NULL`` on error. This will create the
711   :c:type:`Py_UNICODE*` representation of the object if it is not yet
712   available. The buffer is always terminated with an extra null code point.
713   Note that the resulting :c:type:`Py_UNICODE` string may also contain
714   embedded null code points, which would cause the string to be truncated when
715   used in most C functions.
716
717   .. deprecated-removed:: 3.3 3.12
718      Part of the old-style Unicode API, please migrate to using
719      :c:func:`PyUnicode_AsUCS4`, :c:func:`PyUnicode_AsWideChar`,
720      :c:func:`PyUnicode_ReadChar` or similar new APIs.
721
722   .. deprecated-removed:: 3.3 3.10
723
724
725.. c:function:: PyObject* PyUnicode_TransformDecimalToASCII(Py_UNICODE *s, Py_ssize_t size)
726
727   Create a Unicode object by replacing all decimal digits in
728   :c:type:`Py_UNICODE` buffer of the given *size* by ASCII digits 0--9
729   according to their decimal value.  Return ``NULL`` if an exception occurs.
730
731
732.. c:function:: Py_UNICODE* PyUnicode_AsUnicodeAndSize(PyObject *unicode, Py_ssize_t *size)
733
734   Like :c:func:`PyUnicode_AsUnicode`, but also saves the :c:func:`Py_UNICODE`
735   array length (excluding the extra null terminator) in *size*.
736   Note that the resulting :c:type:`Py_UNICODE*` string
737   may contain embedded null code points, which would cause the string to be
738   truncated when used in most C functions.
739
740   .. versionadded:: 3.3
741
742   .. deprecated-removed:: 3.3 3.12
743      Part of the old-style Unicode API, please migrate to using
744      :c:func:`PyUnicode_AsUCS4`, :c:func:`PyUnicode_AsWideChar`,
745      :c:func:`PyUnicode_ReadChar` or similar new APIs.
746
747
748.. c:function:: Py_UNICODE* PyUnicode_AsUnicodeCopy(PyObject *unicode)
749
750   Create a copy of a Unicode string ending with a null code point. Return ``NULL``
751   and raise a :exc:`MemoryError` exception on memory allocation failure,
752   otherwise return a new allocated buffer (use :c:func:`PyMem_Free` to free
753   the buffer). Note that the resulting :c:type:`Py_UNICODE*` string may
754   contain embedded null code points, which would cause the string to be
755   truncated when used in most C functions.
756
757   .. versionadded:: 3.2
758
759   Please migrate to using :c:func:`PyUnicode_AsUCS4Copy` or similar new APIs.
760
761
762.. c:function:: Py_ssize_t PyUnicode_GetSize(PyObject *unicode)
763
764   Return the size of the deprecated :c:type:`Py_UNICODE` representation, in
765   code units (this includes surrogate pairs as 2 units).
766
767   .. deprecated-removed:: 3.3 3.12
768      Part of the old-style Unicode API, please migrate to using
769      :c:func:`PyUnicode_GET_LENGTH`.
770
771
772.. c:function:: PyObject* PyUnicode_FromObject(PyObject *obj)
773
774   Copy an instance of a Unicode subtype to a new true Unicode object if
775   necessary. If *obj* is already a true Unicode object (not a subtype),
776   return the reference with incremented refcount.
777
778   Objects other than Unicode or its subtypes will cause a :exc:`TypeError`.
779
780
781Locale Encoding
782"""""""""""""""
783
784The current locale encoding can be used to decode text from the operating
785system.
786
787.. c:function:: PyObject* PyUnicode_DecodeLocaleAndSize(const char *str, \
788                                                        Py_ssize_t len, \
789                                                        const char *errors)
790
791   Decode a string from UTF-8 on Android and VxWorks, or from the current
792   locale encoding on other platforms. The supported
793   error handlers are ``"strict"`` and ``"surrogateescape"``
794   (:pep:`383`). The decoder uses ``"strict"`` error handler if
795   *errors* is ``NULL``.  *str* must end with a null character but
796   cannot contain embedded null characters.
797
798   Use :c:func:`PyUnicode_DecodeFSDefaultAndSize` to decode a string from
799   :c:data:`Py_FileSystemDefaultEncoding` (the locale encoding read at
800   Python startup).
801
802   This function ignores the Python UTF-8 mode.
803
804   .. seealso::
805
806      The :c:func:`Py_DecodeLocale` function.
807
808   .. versionadded:: 3.3
809
810   .. versionchanged:: 3.7
811      The function now also uses the current locale encoding for the
812      ``surrogateescape`` error handler, except on Android. Previously, :c:func:`Py_DecodeLocale`
813      was used for the ``surrogateescape``, and the current locale encoding was
814      used for ``strict``.
815
816
817.. c:function:: PyObject* PyUnicode_DecodeLocale(const char *str, const char *errors)
818
819   Similar to :c:func:`PyUnicode_DecodeLocaleAndSize`, but compute the string
820   length using :c:func:`strlen`.
821
822   .. versionadded:: 3.3
823
824
825.. c:function:: PyObject* PyUnicode_EncodeLocale(PyObject *unicode, const char *errors)
826
827   Encode a Unicode object to UTF-8 on Android and VxWorks, or to the current
828   locale encoding on other platforms. The
829   supported error handlers are ``"strict"`` and ``"surrogateescape"``
830   (:pep:`383`). The encoder uses ``"strict"`` error handler if
831   *errors* is ``NULL``. Return a :class:`bytes` object. *unicode* cannot
832   contain embedded null characters.
833
834   Use :c:func:`PyUnicode_EncodeFSDefault` to encode a string to
835   :c:data:`Py_FileSystemDefaultEncoding` (the locale encoding read at
836   Python startup).
837
838   This function ignores the Python UTF-8 mode.
839
840   .. seealso::
841
842      The :c:func:`Py_EncodeLocale` function.
843
844   .. versionadded:: 3.3
845
846   .. versionchanged:: 3.7
847      The function now also uses the current locale encoding for the
848      ``surrogateescape`` error handler, except on Android. Previously,
849      :c:func:`Py_EncodeLocale`
850      was used for the ``surrogateescape``, and the current locale encoding was
851      used for ``strict``.
852
853
854File System Encoding
855""""""""""""""""""""
856
857To encode and decode file names and other environment strings,
858:c:data:`Py_FileSystemDefaultEncoding` should be used as the encoding, and
859:c:data:`Py_FileSystemDefaultEncodeErrors` should be used as the error handler
860(:pep:`383` and :pep:`529`). To encode file names to :class:`bytes` during
861argument parsing, the ``"O&"`` converter should be used, passing
862:c:func:`PyUnicode_FSConverter` as the conversion function:
863
864.. c:function:: int PyUnicode_FSConverter(PyObject* obj, void* result)
865
866   ParseTuple converter: encode :class:`str` objects -- obtained directly or
867   through the :class:`os.PathLike` interface -- to :class:`bytes` using
868   :c:func:`PyUnicode_EncodeFSDefault`; :class:`bytes` objects are output as-is.
869   *result* must be a :c:type:`PyBytesObject*` which must be released when it is
870   no longer used.
871
872   .. versionadded:: 3.1
873
874   .. versionchanged:: 3.6
875      Accepts a :term:`path-like object`.
876
877To decode file names to :class:`str` during argument parsing, the ``"O&"``
878converter should be used, passing :c:func:`PyUnicode_FSDecoder` as the
879conversion function:
880
881.. c:function:: int PyUnicode_FSDecoder(PyObject* obj, void* result)
882
883   ParseTuple converter: decode :class:`bytes` objects -- obtained either
884   directly or indirectly through the :class:`os.PathLike` interface -- to
885   :class:`str` using :c:func:`PyUnicode_DecodeFSDefaultAndSize`; :class:`str`
886   objects are output as-is. *result* must be a :c:type:`PyUnicodeObject*` which
887   must be released when it is no longer used.
888
889   .. versionadded:: 3.2
890
891   .. versionchanged:: 3.6
892      Accepts a :term:`path-like object`.
893
894
895.. c:function:: PyObject* PyUnicode_DecodeFSDefaultAndSize(const char *s, Py_ssize_t size)
896
897   Decode a string using :c:data:`Py_FileSystemDefaultEncoding` and the
898   :c:data:`Py_FileSystemDefaultEncodeErrors` error handler.
899
900   If :c:data:`Py_FileSystemDefaultEncoding` is not set, fall back to the
901   locale encoding.
902
903   :c:data:`Py_FileSystemDefaultEncoding` is initialized at startup from the
904   locale encoding and cannot be modified later. If you need to decode a string
905   from the current locale encoding, use
906   :c:func:`PyUnicode_DecodeLocaleAndSize`.
907
908   .. seealso::
909
910      The :c:func:`Py_DecodeLocale` function.
911
912   .. versionchanged:: 3.6
913      Use :c:data:`Py_FileSystemDefaultEncodeErrors` error handler.
914
915
916.. c:function:: PyObject* PyUnicode_DecodeFSDefault(const char *s)
917
918   Decode a null-terminated string using :c:data:`Py_FileSystemDefaultEncoding`
919   and the :c:data:`Py_FileSystemDefaultEncodeErrors` error handler.
920
921   If :c:data:`Py_FileSystemDefaultEncoding` is not set, fall back to the
922   locale encoding.
923
924   Use :c:func:`PyUnicode_DecodeFSDefaultAndSize` if you know the string length.
925
926   .. versionchanged:: 3.6
927      Use :c:data:`Py_FileSystemDefaultEncodeErrors` error handler.
928
929
930.. c:function:: PyObject* PyUnicode_EncodeFSDefault(PyObject *unicode)
931
932   Encode a Unicode object to :c:data:`Py_FileSystemDefaultEncoding` with the
933   :c:data:`Py_FileSystemDefaultEncodeErrors` error handler, and return
934   :class:`bytes`. Note that the resulting :class:`bytes` object may contain
935   null bytes.
936
937   If :c:data:`Py_FileSystemDefaultEncoding` is not set, fall back to the
938   locale encoding.
939
940   :c:data:`Py_FileSystemDefaultEncoding` is initialized at startup from the
941   locale encoding and cannot be modified later. If you need to encode a string
942   to the current locale encoding, use :c:func:`PyUnicode_EncodeLocale`.
943
944   .. seealso::
945
946      The :c:func:`Py_EncodeLocale` function.
947
948   .. versionadded:: 3.2
949
950   .. versionchanged:: 3.6
951      Use :c:data:`Py_FileSystemDefaultEncodeErrors` error handler.
952
953wchar_t Support
954"""""""""""""""
955
956:c:type:`wchar_t` support for platforms which support it:
957
958.. c:function:: PyObject* PyUnicode_FromWideChar(const wchar_t *w, Py_ssize_t size)
959
960   Create a Unicode object from the :c:type:`wchar_t` buffer *w* of the given *size*.
961   Passing ``-1`` as the *size* indicates that the function must itself compute the length,
962   using wcslen.
963   Return ``NULL`` on failure.
964
965
966.. c:function:: Py_ssize_t PyUnicode_AsWideChar(PyObject *unicode, wchar_t *w, Py_ssize_t size)
967
968   Copy the Unicode object contents into the :c:type:`wchar_t` buffer *w*.  At most
969   *size* :c:type:`wchar_t` characters are copied (excluding a possibly trailing
970   null termination character).  Return the number of :c:type:`wchar_t` characters
971   copied or ``-1`` in case of an error.  Note that the resulting :c:type:`wchar_t*`
972   string may or may not be null-terminated.  It is the responsibility of the caller
973   to make sure that the :c:type:`wchar_t*` string is null-terminated in case this is
974   required by the application. Also, note that the :c:type:`wchar_t*` string
975   might contain null characters, which would cause the string to be truncated
976   when used with most C functions.
977
978
979.. c:function:: wchar_t* PyUnicode_AsWideCharString(PyObject *unicode, Py_ssize_t *size)
980
981   Convert the Unicode object to a wide character string. The output string
982   always ends with a null character. If *size* is not ``NULL``, write the number
983   of wide characters (excluding the trailing null termination character) into
984   *\*size*. Note that the resulting :c:type:`wchar_t` string might contain
985   null characters, which would cause the string to be truncated when used with
986   most C functions. If *size* is ``NULL`` and the :c:type:`wchar_t*` string
987   contains null characters a :exc:`ValueError` is raised.
988
989   Returns a buffer allocated by :c:func:`PyMem_Alloc` (use
990   :c:func:`PyMem_Free` to free it) on success. On error, returns ``NULL``
991   and *\*size* is undefined. Raises a :exc:`MemoryError` if memory allocation
992   is failed.
993
994   .. versionadded:: 3.2
995
996   .. versionchanged:: 3.7
997      Raises a :exc:`ValueError` if *size* is ``NULL`` and the :c:type:`wchar_t*`
998      string contains null characters.
999
1000
1001.. _builtincodecs:
1002
1003Built-in Codecs
1004^^^^^^^^^^^^^^^
1005
1006Python provides a set of built-in codecs which are written in C for speed. All of
1007these codecs are directly usable via the following functions.
1008
1009Many of the following APIs take two arguments encoding and errors, and they
1010have the same semantics as the ones of the built-in :func:`str` string object
1011constructor.
1012
1013Setting encoding to ``NULL`` causes the default encoding to be used
1014which is UTF-8.  The file system calls should use
1015:c:func:`PyUnicode_FSConverter` for encoding file names. This uses the
1016variable :c:data:`Py_FileSystemDefaultEncoding` internally. This
1017variable should be treated as read-only: on some systems, it will be a
1018pointer to a static string, on others, it will change at run-time
1019(such as when the application invokes setlocale).
1020
1021Error handling is set by errors which may also be set to ``NULL`` meaning to use
1022the default handling defined for the codec.  Default error handling for all
1023built-in codecs is "strict" (:exc:`ValueError` is raised).
1024
1025The codecs all use a similar interface.  Only deviation from the following
1026generic ones are documented for simplicity.
1027
1028
1029Generic Codecs
1030""""""""""""""
1031
1032These are the generic codec APIs:
1033
1034
1035.. c:function:: PyObject* PyUnicode_Decode(const char *s, Py_ssize_t size, \
1036                              const char *encoding, const char *errors)
1037
1038   Create a Unicode object by decoding *size* bytes of the encoded string *s*.
1039   *encoding* and *errors* have the same meaning as the parameters of the same name
1040   in the :func:`str` built-in function.  The codec to be used is looked up
1041   using the Python codec registry.  Return ``NULL`` if an exception was raised by
1042   the codec.
1043
1044
1045.. c:function:: PyObject* PyUnicode_AsEncodedString(PyObject *unicode, \
1046                              const char *encoding, const char *errors)
1047
1048   Encode a Unicode object and return the result as Python bytes object.
1049   *encoding* and *errors* have the same meaning as the parameters of the same
1050   name in the Unicode :meth:`~str.encode` method. The codec to be used is looked up
1051   using the Python codec registry. Return ``NULL`` if an exception was raised by
1052   the codec.
1053
1054
1055.. c:function:: PyObject* PyUnicode_Encode(const Py_UNICODE *s, Py_ssize_t size, \
1056                              const char *encoding, const char *errors)
1057
1058   Encode the :c:type:`Py_UNICODE` buffer *s* of the given *size* and return a Python
1059   bytes object.  *encoding* and *errors* have the same meaning as the
1060   parameters of the same name in the Unicode :meth:`~str.encode` method.  The codec
1061   to be used is looked up using the Python codec registry.  Return ``NULL`` if an
1062   exception was raised by the codec.
1063
1064   .. deprecated-removed:: 3.3 4.0
1065      Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
1066      :c:func:`PyUnicode_AsEncodedString`.
1067
1068
1069UTF-8 Codecs
1070""""""""""""
1071
1072These are the UTF-8 codec APIs:
1073
1074
1075.. c:function:: PyObject* PyUnicode_DecodeUTF8(const char *s, Py_ssize_t size, const char *errors)
1076
1077   Create a Unicode object by decoding *size* bytes of the UTF-8 encoded string
1078   *s*. Return ``NULL`` if an exception was raised by the codec.
1079
1080
1081.. c:function:: PyObject* PyUnicode_DecodeUTF8Stateful(const char *s, Py_ssize_t size, \
1082                              const char *errors, Py_ssize_t *consumed)
1083
1084   If *consumed* is ``NULL``, behave like :c:func:`PyUnicode_DecodeUTF8`. If
1085   *consumed* is not ``NULL``, trailing incomplete UTF-8 byte sequences will not be
1086   treated as an error. Those bytes will not be decoded and the number of bytes
1087   that have been decoded will be stored in *consumed*.
1088
1089
1090.. c:function:: PyObject* PyUnicode_AsUTF8String(PyObject *unicode)
1091
1092   Encode a Unicode object using UTF-8 and return the result as Python bytes
1093   object.  Error handling is "strict".  Return ``NULL`` if an exception was
1094   raised by the codec.
1095
1096
1097.. c:function:: const char* PyUnicode_AsUTF8AndSize(PyObject *unicode, Py_ssize_t *size)
1098
1099   Return a pointer to the UTF-8 encoding of the Unicode object, and
1100   store the size of the encoded representation (in bytes) in *size*.  The
1101   *size* argument can be ``NULL``; in this case no size will be stored.  The
1102   returned buffer always has an extra null byte appended (not included in
1103   *size*), regardless of whether there are any other null code points.
1104
1105   In the case of an error, ``NULL`` is returned with an exception set and no
1106   *size* is stored.
1107
1108   This caches the UTF-8 representation of the string in the Unicode object, and
1109   subsequent calls will return a pointer to the same buffer.  The caller is not
1110   responsible for deallocating the buffer.
1111
1112   .. versionadded:: 3.3
1113
1114   .. versionchanged:: 3.7
1115      The return type is now ``const char *`` rather of ``char *``.
1116
1117
1118.. c:function:: const char* PyUnicode_AsUTF8(PyObject *unicode)
1119
1120   As :c:func:`PyUnicode_AsUTF8AndSize`, but does not store the size.
1121
1122   .. versionadded:: 3.3
1123
1124   .. versionchanged:: 3.7
1125      The return type is now ``const char *`` rather of ``char *``.
1126
1127
1128.. c:function:: PyObject* PyUnicode_EncodeUTF8(const Py_UNICODE *s, Py_ssize_t size, const char *errors)
1129
1130   Encode the :c:type:`Py_UNICODE` buffer *s* of the given *size* using UTF-8 and
1131   return a Python bytes object.  Return ``NULL`` if an exception was raised by
1132   the codec.
1133
1134   .. deprecated-removed:: 3.3 4.0
1135      Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
1136      :c:func:`PyUnicode_AsUTF8String`, :c:func:`PyUnicode_AsUTF8AndSize` or
1137      :c:func:`PyUnicode_AsEncodedString`.
1138
1139
1140UTF-32 Codecs
1141"""""""""""""
1142
1143These are the UTF-32 codec APIs:
1144
1145
1146.. c:function:: PyObject* PyUnicode_DecodeUTF32(const char *s, Py_ssize_t size, \
1147                              const char *errors, int *byteorder)
1148
1149   Decode *size* bytes from a UTF-32 encoded buffer string and return the
1150   corresponding Unicode object.  *errors* (if non-``NULL``) defines the error
1151   handling. It defaults to "strict".
1152
1153   If *byteorder* is non-``NULL``, the decoder starts decoding using the given byte
1154   order::
1155
1156      *byteorder == -1: little endian
1157      *byteorder == 0:  native order
1158      *byteorder == 1:  big endian
1159
1160   If ``*byteorder`` is zero, and the first four bytes of the input data are a
1161   byte order mark (BOM), the decoder switches to this byte order and the BOM is
1162   not copied into the resulting Unicode string.  If ``*byteorder`` is ``-1`` or
1163   ``1``, any byte order mark is copied to the output.
1164
1165   After completion, *\*byteorder* is set to the current byte order at the end
1166   of input data.
1167
1168   If *byteorder* is ``NULL``, the codec starts in native order mode.
1169
1170   Return ``NULL`` if an exception was raised by the codec.
1171
1172
1173.. c:function:: PyObject* PyUnicode_DecodeUTF32Stateful(const char *s, Py_ssize_t size, \
1174                              const char *errors, int *byteorder, Py_ssize_t *consumed)
1175
1176   If *consumed* is ``NULL``, behave like :c:func:`PyUnicode_DecodeUTF32`. If
1177   *consumed* is not ``NULL``, :c:func:`PyUnicode_DecodeUTF32Stateful` will not treat
1178   trailing incomplete UTF-32 byte sequences (such as a number of bytes not divisible
1179   by four) as an error. Those bytes will not be decoded and the number of bytes
1180   that have been decoded will be stored in *consumed*.
1181
1182
1183.. c:function:: PyObject* PyUnicode_AsUTF32String(PyObject *unicode)
1184
1185   Return a Python byte string using the UTF-32 encoding in native byte
1186   order. The string always starts with a BOM mark.  Error handling is "strict".
1187   Return ``NULL`` if an exception was raised by the codec.
1188
1189
1190.. c:function:: PyObject* PyUnicode_EncodeUTF32(const Py_UNICODE *s, Py_ssize_t size, \
1191                              const char *errors, int byteorder)
1192
1193   Return a Python bytes object holding the UTF-32 encoded value of the Unicode
1194   data in *s*.  Output is written according to the following byte order::
1195
1196      byteorder == -1: little endian
1197      byteorder == 0:  native byte order (writes a BOM mark)
1198      byteorder == 1:  big endian
1199
1200   If byteorder is ``0``, the output string will always start with the Unicode BOM
1201   mark (U+FEFF). In the other two modes, no BOM mark is prepended.
1202
1203   If ``Py_UNICODE_WIDE`` is not defined, surrogate pairs will be output
1204   as a single code point.
1205
1206   Return ``NULL`` if an exception was raised by the codec.
1207
1208   .. deprecated-removed:: 3.3 4.0
1209      Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
1210      :c:func:`PyUnicode_AsUTF32String` or :c:func:`PyUnicode_AsEncodedString`.
1211
1212
1213UTF-16 Codecs
1214"""""""""""""
1215
1216These are the UTF-16 codec APIs:
1217
1218
1219.. c:function:: PyObject* PyUnicode_DecodeUTF16(const char *s, Py_ssize_t size, \
1220                              const char *errors, int *byteorder)
1221
1222   Decode *size* bytes from a UTF-16 encoded buffer string and return the
1223   corresponding Unicode object.  *errors* (if non-``NULL``) defines the error
1224   handling. It defaults to "strict".
1225
1226   If *byteorder* is non-``NULL``, the decoder starts decoding using the given byte
1227   order::
1228
1229      *byteorder == -1: little endian
1230      *byteorder == 0:  native order
1231      *byteorder == 1:  big endian
1232
1233   If ``*byteorder`` is zero, and the first two bytes of the input data are a
1234   byte order mark (BOM), the decoder switches to this byte order and the BOM is
1235   not copied into the resulting Unicode string.  If ``*byteorder`` is ``-1`` or
1236   ``1``, any byte order mark is copied to the output (where it will result in
1237   either a ``\ufeff`` or a ``\ufffe`` character).
1238
1239   After completion, *\*byteorder* is set to the current byte order at the end
1240   of input data.
1241
1242   If *byteorder* is ``NULL``, the codec starts in native order mode.
1243
1244   Return ``NULL`` if an exception was raised by the codec.
1245
1246
1247.. c:function:: PyObject* PyUnicode_DecodeUTF16Stateful(const char *s, Py_ssize_t size, \
1248                              const char *errors, int *byteorder, Py_ssize_t *consumed)
1249
1250   If *consumed* is ``NULL``, behave like :c:func:`PyUnicode_DecodeUTF16`. If
1251   *consumed* is not ``NULL``, :c:func:`PyUnicode_DecodeUTF16Stateful` will not treat
1252   trailing incomplete UTF-16 byte sequences (such as an odd number of bytes or a
1253   split surrogate pair) as an error. Those bytes will not be decoded and the
1254   number of bytes that have been decoded will be stored in *consumed*.
1255
1256
1257.. c:function:: PyObject* PyUnicode_AsUTF16String(PyObject *unicode)
1258
1259   Return a Python byte string using the UTF-16 encoding in native byte
1260   order. The string always starts with a BOM mark.  Error handling is "strict".
1261   Return ``NULL`` if an exception was raised by the codec.
1262
1263
1264.. c:function:: PyObject* PyUnicode_EncodeUTF16(const Py_UNICODE *s, Py_ssize_t size, \
1265                              const char *errors, int byteorder)
1266
1267   Return a Python bytes object holding the UTF-16 encoded value of the Unicode
1268   data in *s*.  Output is written according to the following byte order::
1269
1270      byteorder == -1: little endian
1271      byteorder == 0:  native byte order (writes a BOM mark)
1272      byteorder == 1:  big endian
1273
1274   If byteorder is ``0``, the output string will always start with the Unicode BOM
1275   mark (U+FEFF). In the other two modes, no BOM mark is prepended.
1276
1277   If ``Py_UNICODE_WIDE`` is defined, a single :c:type:`Py_UNICODE` value may get
1278   represented as a surrogate pair. If it is not defined, each :c:type:`Py_UNICODE`
1279   values is interpreted as a UCS-2 character.
1280
1281   Return ``NULL`` if an exception was raised by the codec.
1282
1283   .. deprecated-removed:: 3.3 4.0
1284      Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
1285      :c:func:`PyUnicode_AsUTF16String` or :c:func:`PyUnicode_AsEncodedString`.
1286
1287
1288UTF-7 Codecs
1289""""""""""""
1290
1291These are the UTF-7 codec APIs:
1292
1293
1294.. c:function:: PyObject* PyUnicode_DecodeUTF7(const char *s, Py_ssize_t size, const char *errors)
1295
1296   Create a Unicode object by decoding *size* bytes of the UTF-7 encoded string
1297   *s*.  Return ``NULL`` if an exception was raised by the codec.
1298
1299
1300.. c:function:: PyObject* PyUnicode_DecodeUTF7Stateful(const char *s, Py_ssize_t size, \
1301                              const char *errors, Py_ssize_t *consumed)
1302
1303   If *consumed* is ``NULL``, behave like :c:func:`PyUnicode_DecodeUTF7`.  If
1304   *consumed* is not ``NULL``, trailing incomplete UTF-7 base-64 sections will not
1305   be treated as an error.  Those bytes will not be decoded and the number of
1306   bytes that have been decoded will be stored in *consumed*.
1307
1308
1309.. c:function:: PyObject* PyUnicode_EncodeUTF7(const Py_UNICODE *s, Py_ssize_t size, \
1310                              int base64SetO, int base64WhiteSpace, const char *errors)
1311
1312   Encode the :c:type:`Py_UNICODE` buffer of the given size using UTF-7 and
1313   return a Python bytes object.  Return ``NULL`` if an exception was raised by
1314   the codec.
1315
1316   If *base64SetO* is nonzero, "Set O" (punctuation that has no otherwise
1317   special meaning) will be encoded in base-64.  If *base64WhiteSpace* is
1318   nonzero, whitespace will be encoded in base-64.  Both are set to zero for the
1319   Python "utf-7" codec.
1320
1321   .. deprecated-removed:: 3.3 4.0
1322      Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
1323      :c:func:`PyUnicode_AsEncodedString`.
1324
1325
1326Unicode-Escape Codecs
1327"""""""""""""""""""""
1328
1329These are the "Unicode Escape" codec APIs:
1330
1331
1332.. c:function:: PyObject* PyUnicode_DecodeUnicodeEscape(const char *s, \
1333                              Py_ssize_t size, const char *errors)
1334
1335   Create a Unicode object by decoding *size* bytes of the Unicode-Escape encoded
1336   string *s*.  Return ``NULL`` if an exception was raised by the codec.
1337
1338
1339.. c:function:: PyObject* PyUnicode_AsUnicodeEscapeString(PyObject *unicode)
1340
1341   Encode a Unicode object using Unicode-Escape and return the result as a
1342   bytes object.  Error handling is "strict".  Return ``NULL`` if an exception was
1343   raised by the codec.
1344
1345
1346.. c:function:: PyObject* PyUnicode_EncodeUnicodeEscape(const Py_UNICODE *s, Py_ssize_t size)
1347
1348   Encode the :c:type:`Py_UNICODE` buffer of the given *size* using Unicode-Escape and
1349   return a bytes object.  Return ``NULL`` if an exception was raised by the codec.
1350
1351   .. deprecated-removed:: 3.3 4.0
1352      Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
1353      :c:func:`PyUnicode_AsUnicodeEscapeString`.
1354
1355
1356Raw-Unicode-Escape Codecs
1357"""""""""""""""""""""""""
1358
1359These are the "Raw Unicode Escape" codec APIs:
1360
1361
1362.. c:function:: PyObject* PyUnicode_DecodeRawUnicodeEscape(const char *s, \
1363                              Py_ssize_t size, const char *errors)
1364
1365   Create a Unicode object by decoding *size* bytes of the Raw-Unicode-Escape
1366   encoded string *s*.  Return ``NULL`` if an exception was raised by the codec.
1367
1368
1369.. c:function:: PyObject* PyUnicode_AsRawUnicodeEscapeString(PyObject *unicode)
1370
1371   Encode a Unicode object using Raw-Unicode-Escape and return the result as
1372   a bytes object.  Error handling is "strict".  Return ``NULL`` if an exception
1373   was raised by the codec.
1374
1375
1376.. c:function:: PyObject* PyUnicode_EncodeRawUnicodeEscape(const Py_UNICODE *s, \
1377                              Py_ssize_t size)
1378
1379   Encode the :c:type:`Py_UNICODE` buffer of the given *size* using Raw-Unicode-Escape
1380   and return a bytes object.  Return ``NULL`` if an exception was raised by the codec.
1381
1382   .. deprecated-removed:: 3.3 4.0
1383      Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
1384      :c:func:`PyUnicode_AsRawUnicodeEscapeString` or
1385      :c:func:`PyUnicode_AsEncodedString`.
1386
1387
1388Latin-1 Codecs
1389""""""""""""""
1390
1391These are the Latin-1 codec APIs: Latin-1 corresponds to the first 256 Unicode
1392ordinals and only these are accepted by the codecs during encoding.
1393
1394
1395.. c:function:: PyObject* PyUnicode_DecodeLatin1(const char *s, Py_ssize_t size, const char *errors)
1396
1397   Create a Unicode object by decoding *size* bytes of the Latin-1 encoded string
1398   *s*.  Return ``NULL`` if an exception was raised by the codec.
1399
1400
1401.. c:function:: PyObject* PyUnicode_AsLatin1String(PyObject *unicode)
1402
1403   Encode a Unicode object using Latin-1 and return the result as Python bytes
1404   object.  Error handling is "strict".  Return ``NULL`` if an exception was
1405   raised by the codec.
1406
1407
1408.. c:function:: PyObject* PyUnicode_EncodeLatin1(const Py_UNICODE *s, Py_ssize_t size, const char *errors)
1409
1410   Encode the :c:type:`Py_UNICODE` buffer of the given *size* using Latin-1 and
1411   return a Python bytes object.  Return ``NULL`` if an exception was raised by
1412   the codec.
1413
1414   .. deprecated-removed:: 3.3 4.0
1415      Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
1416      :c:func:`PyUnicode_AsLatin1String` or
1417      :c:func:`PyUnicode_AsEncodedString`.
1418
1419
1420ASCII Codecs
1421""""""""""""
1422
1423These are the ASCII codec APIs.  Only 7-bit ASCII data is accepted. All other
1424codes generate errors.
1425
1426
1427.. c:function:: PyObject* PyUnicode_DecodeASCII(const char *s, Py_ssize_t size, const char *errors)
1428
1429   Create a Unicode object by decoding *size* bytes of the ASCII encoded string
1430   *s*.  Return ``NULL`` if an exception was raised by the codec.
1431
1432
1433.. c:function:: PyObject* PyUnicode_AsASCIIString(PyObject *unicode)
1434
1435   Encode a Unicode object using ASCII and return the result as Python bytes
1436   object.  Error handling is "strict".  Return ``NULL`` if an exception was
1437   raised by the codec.
1438
1439
1440.. c:function:: PyObject* PyUnicode_EncodeASCII(const Py_UNICODE *s, Py_ssize_t size, const char *errors)
1441
1442   Encode the :c:type:`Py_UNICODE` buffer of the given *size* using ASCII and
1443   return a Python bytes object.  Return ``NULL`` if an exception was raised by
1444   the codec.
1445
1446   .. deprecated-removed:: 3.3 4.0
1447      Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
1448      :c:func:`PyUnicode_AsASCIIString` or
1449      :c:func:`PyUnicode_AsEncodedString`.
1450
1451
1452Character Map Codecs
1453""""""""""""""""""""
1454
1455This codec is special in that it can be used to implement many different codecs
1456(and this is in fact what was done to obtain most of the standard codecs
1457included in the :mod:`encodings` package). The codec uses mapping to encode and
1458decode characters.  The mapping objects provided must support the
1459:meth:`__getitem__` mapping interface; dictionaries and sequences work well.
1460
1461These are the mapping codec APIs:
1462
1463.. c:function:: PyObject* PyUnicode_DecodeCharmap(const char *data, Py_ssize_t size, \
1464                              PyObject *mapping, const char *errors)
1465
1466   Create a Unicode object by decoding *size* bytes of the encoded string *s*
1467   using the given *mapping* object.  Return ``NULL`` if an exception was raised
1468   by the codec.
1469
1470   If *mapping* is ``NULL``, Latin-1 decoding will be applied.  Else
1471   *mapping* must map bytes ordinals (integers in the range from 0 to 255)
1472   to Unicode strings, integers (which are then interpreted as Unicode
1473   ordinals) or ``None``.  Unmapped data bytes -- ones which cause a
1474   :exc:`LookupError`, as well as ones which get mapped to ``None``,
1475   ``0xFFFE`` or ``'\ufffe'``, are treated as undefined mappings and cause
1476   an error.
1477
1478
1479.. c:function:: PyObject* PyUnicode_AsCharmapString(PyObject *unicode, PyObject *mapping)
1480
1481   Encode a Unicode object using the given *mapping* object and return the
1482   result as a bytes object.  Error handling is "strict".  Return ``NULL`` if an
1483   exception was raised by the codec.
1484
1485   The *mapping* object must map Unicode ordinal integers to bytes objects,
1486   integers in the range from 0 to 255 or ``None``.  Unmapped character
1487   ordinals (ones which cause a :exc:`LookupError`) as well as mapped to
1488   ``None`` are treated as "undefined mapping" and cause an error.
1489
1490
1491.. c:function:: PyObject* PyUnicode_EncodeCharmap(const Py_UNICODE *s, Py_ssize_t size, \
1492                              PyObject *mapping, const char *errors)
1493
1494   Encode the :c:type:`Py_UNICODE` buffer of the given *size* using the given
1495   *mapping* object and return the result as a bytes object.  Return ``NULL`` if
1496   an exception was raised by the codec.
1497
1498   .. deprecated-removed:: 3.3 4.0
1499      Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
1500      :c:func:`PyUnicode_AsCharmapString` or
1501      :c:func:`PyUnicode_AsEncodedString`.
1502
1503
1504The following codec API is special in that maps Unicode to Unicode.
1505
1506.. c:function:: PyObject* PyUnicode_Translate(PyObject *str, PyObject *table, const char *errors)
1507
1508   Translate a string by applying a character mapping table to it and return the
1509   resulting Unicode object. Return ``NULL`` if an exception was raised by the
1510   codec.
1511
1512   The mapping table must map Unicode ordinal integers to Unicode ordinal integers
1513   or ``None`` (causing deletion of the character).
1514
1515   Mapping tables need only provide the :meth:`__getitem__` interface; dictionaries
1516   and sequences work well.  Unmapped character ordinals (ones which cause a
1517   :exc:`LookupError`) are left untouched and are copied as-is.
1518
1519   *errors* has the usual meaning for codecs. It may be ``NULL`` which indicates to
1520   use the default error handling.
1521
1522
1523.. c:function:: PyObject* PyUnicode_TranslateCharmap(const Py_UNICODE *s, Py_ssize_t size, \
1524                              PyObject *mapping, const char *errors)
1525
1526   Translate a :c:type:`Py_UNICODE` buffer of the given *size* by applying a
1527   character *mapping* table to it and return the resulting Unicode object.
1528   Return ``NULL`` when an exception was raised by the codec.
1529
1530   .. deprecated-removed:: 3.3 4.0
1531      Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
1532      :c:func:`PyUnicode_Translate`. or :ref:`generic codec based API
1533      <codec-registry>`
1534
1535
1536MBCS codecs for Windows
1537"""""""""""""""""""""""
1538
1539These are the MBCS codec APIs. They are currently only available on Windows and
1540use the Win32 MBCS converters to implement the conversions.  Note that MBCS (or
1541DBCS) is a class of encodings, not just one.  The target encoding is defined by
1542the user settings on the machine running the codec.
1543
1544.. c:function:: PyObject* PyUnicode_DecodeMBCS(const char *s, Py_ssize_t size, const char *errors)
1545
1546   Create a Unicode object by decoding *size* bytes of the MBCS encoded string *s*.
1547   Return ``NULL`` if an exception was raised by the codec.
1548
1549
1550.. c:function:: PyObject* PyUnicode_DecodeMBCSStateful(const char *s, Py_ssize_t size, \
1551                              const char *errors, Py_ssize_t *consumed)
1552
1553   If *consumed* is ``NULL``, behave like :c:func:`PyUnicode_DecodeMBCS`. If
1554   *consumed* is not ``NULL``, :c:func:`PyUnicode_DecodeMBCSStateful` will not decode
1555   trailing lead byte and the number of bytes that have been decoded will be stored
1556   in *consumed*.
1557
1558
1559.. c:function:: PyObject* PyUnicode_AsMBCSString(PyObject *unicode)
1560
1561   Encode a Unicode object using MBCS and return the result as Python bytes
1562   object.  Error handling is "strict".  Return ``NULL`` if an exception was
1563   raised by the codec.
1564
1565
1566.. c:function:: PyObject* PyUnicode_EncodeCodePage(int code_page, PyObject *unicode, const char *errors)
1567
1568   Encode the Unicode object using the specified code page and return a Python
1569   bytes object.  Return ``NULL`` if an exception was raised by the codec. Use
1570   :c:data:`CP_ACP` code page to get the MBCS encoder.
1571
1572   .. versionadded:: 3.3
1573
1574
1575.. c:function:: PyObject* PyUnicode_EncodeMBCS(const Py_UNICODE *s, Py_ssize_t size, const char *errors)
1576
1577   Encode the :c:type:`Py_UNICODE` buffer of the given *size* using MBCS and return
1578   a Python bytes object.  Return ``NULL`` if an exception was raised by the
1579   codec.
1580
1581   .. deprecated-removed:: 3.3 4.0
1582      Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
1583      :c:func:`PyUnicode_AsMBCSString`, :c:func:`PyUnicode_EncodeCodePage` or
1584      :c:func:`PyUnicode_AsEncodedString`.
1585
1586
1587Methods & Slots
1588"""""""""""""""
1589
1590
1591.. _unicodemethodsandslots:
1592
1593Methods and Slot Functions
1594^^^^^^^^^^^^^^^^^^^^^^^^^^
1595
1596The following APIs are capable of handling Unicode objects and strings on input
1597(we refer to them as strings in the descriptions) and return Unicode objects or
1598integers as appropriate.
1599
1600They all return ``NULL`` or ``-1`` if an exception occurs.
1601
1602
1603.. c:function:: PyObject* PyUnicode_Concat(PyObject *left, PyObject *right)
1604
1605   Concat two strings giving a new Unicode string.
1606
1607
1608.. c:function:: PyObject* PyUnicode_Split(PyObject *s, PyObject *sep, Py_ssize_t maxsplit)
1609
1610   Split a string giving a list of Unicode strings.  If *sep* is ``NULL``, splitting
1611   will be done at all whitespace substrings.  Otherwise, splits occur at the given
1612   separator.  At most *maxsplit* splits will be done.  If negative, no limit is
1613   set.  Separators are not included in the resulting list.
1614
1615
1616.. c:function:: PyObject* PyUnicode_Splitlines(PyObject *s, int keepend)
1617
1618   Split a Unicode string at line breaks, returning a list of Unicode strings.
1619   CRLF is considered to be one line break.  If *keepend* is ``0``, the Line break
1620   characters are not included in the resulting strings.
1621
1622
1623.. c:function:: PyObject* PyUnicode_Join(PyObject *separator, PyObject *seq)
1624
1625   Join a sequence of strings using the given *separator* and return the resulting
1626   Unicode string.
1627
1628
1629.. c:function:: Py_ssize_t PyUnicode_Tailmatch(PyObject *str, PyObject *substr, \
1630                        Py_ssize_t start, Py_ssize_t end, int direction)
1631
1632   Return ``1`` if *substr* matches ``str[start:end]`` at the given tail end
1633   (*direction* == ``-1`` means to do a prefix match, *direction* == ``1`` a suffix match),
1634   ``0`` otherwise. Return ``-1`` if an error occurred.
1635
1636
1637.. c:function:: Py_ssize_t PyUnicode_Find(PyObject *str, PyObject *substr, \
1638                               Py_ssize_t start, Py_ssize_t end, int direction)
1639
1640   Return the first position of *substr* in ``str[start:end]`` using the given
1641   *direction* (*direction* == ``1`` means to do a forward search, *direction* == ``-1`` a
1642   backward search).  The return value is the index of the first match; a value of
1643   ``-1`` indicates that no match was found, and ``-2`` indicates that an error
1644   occurred and an exception has been set.
1645
1646
1647.. c:function:: Py_ssize_t PyUnicode_FindChar(PyObject *str, Py_UCS4 ch, \
1648                               Py_ssize_t start, Py_ssize_t end, int direction)
1649
1650   Return the first position of the character *ch* in ``str[start:end]`` using
1651   the given *direction* (*direction* == ``1`` means to do a forward search,
1652   *direction* == ``-1`` a backward search).  The return value is the index of the
1653   first match; a value of ``-1`` indicates that no match was found, and ``-2``
1654   indicates that an error occurred and an exception has been set.
1655
1656   .. versionadded:: 3.3
1657
1658   .. versionchanged:: 3.7
1659      *start* and *end* are now adjusted to behave like ``str[start:end]``.
1660
1661
1662.. c:function:: Py_ssize_t PyUnicode_Count(PyObject *str, PyObject *substr, \
1663                               Py_ssize_t start, Py_ssize_t end)
1664
1665   Return the number of non-overlapping occurrences of *substr* in
1666   ``str[start:end]``.  Return ``-1`` if an error occurred.
1667
1668
1669.. c:function:: PyObject* PyUnicode_Replace(PyObject *str, PyObject *substr, \
1670                              PyObject *replstr, Py_ssize_t maxcount)
1671
1672   Replace at most *maxcount* occurrences of *substr* in *str* with *replstr* and
1673   return the resulting Unicode object. *maxcount* == ``-1`` means replace all
1674   occurrences.
1675
1676
1677.. c:function:: int PyUnicode_Compare(PyObject *left, PyObject *right)
1678
1679   Compare two strings and return ``-1``, ``0``, ``1`` for less than, equal, and greater than,
1680   respectively.
1681
1682   This function returns ``-1`` upon failure, so one should call
1683   :c:func:`PyErr_Occurred` to check for errors.
1684
1685
1686.. c:function:: int PyUnicode_CompareWithASCIIString(PyObject *uni, const char *string)
1687
1688   Compare a Unicode object, *uni*, with *string* and return ``-1``, ``0``, ``1`` for less
1689   than, equal, and greater than, respectively. It is best to pass only
1690   ASCII-encoded strings, but the function interprets the input string as
1691   ISO-8859-1 if it contains non-ASCII characters.
1692
1693   This function does not raise exceptions.
1694
1695
1696.. c:function:: PyObject* PyUnicode_RichCompare(PyObject *left,  PyObject *right,  int op)
1697
1698   Rich compare two Unicode strings and return one of the following:
1699
1700   * ``NULL`` in case an exception was raised
1701   * :const:`Py_True` or :const:`Py_False` for successful comparisons
1702   * :const:`Py_NotImplemented` in case the type combination is unknown
1703
1704   Possible values for *op* are :const:`Py_GT`, :const:`Py_GE`, :const:`Py_EQ`,
1705   :const:`Py_NE`, :const:`Py_LT`, and :const:`Py_LE`.
1706
1707
1708.. c:function:: PyObject* PyUnicode_Format(PyObject *format, PyObject *args)
1709
1710   Return a new string object from *format* and *args*; this is analogous to
1711   ``format % args``.
1712
1713
1714.. c:function:: int PyUnicode_Contains(PyObject *container, PyObject *element)
1715
1716   Check whether *element* is contained in *container* and return true or false
1717   accordingly.
1718
1719   *element* has to coerce to a one element Unicode string. ``-1`` is returned
1720   if there was an error.
1721
1722
1723.. c:function:: void PyUnicode_InternInPlace(PyObject **string)
1724
1725   Intern the argument *\*string* in place.  The argument must be the address of a
1726   pointer variable pointing to a Python Unicode string object.  If there is an
1727   existing interned string that is the same as *\*string*, it sets *\*string* to
1728   it (decrementing the reference count of the old string object and incrementing
1729   the reference count of the interned string object), otherwise it leaves
1730   *\*string* alone and interns it (incrementing its reference count).
1731   (Clarification: even though there is a lot of talk about reference counts, think
1732   of this function as reference-count-neutral; you own the object after the call
1733   if and only if you owned it before the call.)
1734
1735
1736.. c:function:: PyObject* PyUnicode_InternFromString(const char *v)
1737
1738   A combination of :c:func:`PyUnicode_FromString` and
1739   :c:func:`PyUnicode_InternInPlace`, returning either a new Unicode string
1740   object that has been interned, or a new ("owned") reference to an earlier
1741   interned string object with the same value.
1742