1:mod:`struct` --- Interpret bytes as packed binary data 2======================================================= 3 4.. module:: struct 5 :synopsis: Interpret bytes as packed binary data. 6 7**Source code:** :source:`Lib/struct.py` 8 9.. index:: 10 pair: C; structures 11 triple: packing; binary; data 12 13-------------- 14 15This module performs conversions between Python values and C structs represented 16as Python :class:`bytes` objects. This can be used in handling binary data 17stored in files or from network connections, among other sources. It uses 18:ref:`struct-format-strings` as compact descriptions of the layout of the C 19structs and the intended conversion to/from Python values. 20 21.. note:: 22 23 By default, the result of packing a given C struct includes pad bytes in 24 order to maintain proper alignment for the C types involved; similarly, 25 alignment is taken into account when unpacking. This behavior is chosen so 26 that the bytes of a packed struct correspond exactly to the layout in memory 27 of the corresponding C struct. To handle platform-independent data formats 28 or omit implicit pad bytes, use ``standard`` size and alignment instead of 29 ``native`` size and alignment: see :ref:`struct-alignment` for details. 30 31Several :mod:`struct` functions (and methods of :class:`Struct`) take a *buffer* 32argument. This refers to objects that implement the :ref:`bufferobjects` and 33provide either a readable or read-writable buffer. The most common types used 34for that purpose are :class:`bytes` and :class:`bytearray`, but many other types 35that can be viewed as an array of bytes implement the buffer protocol, so that 36they can be read/filled without additional copying from a :class:`bytes` object. 37 38 39Functions and Exceptions 40------------------------ 41 42The module defines the following exception and functions: 43 44 45.. exception:: error 46 47 Exception raised on various occasions; argument is a string describing what 48 is wrong. 49 50 51.. function:: pack(format, v1, v2, ...) 52 53 Return a bytes object containing the values *v1*, *v2*, ... packed according 54 to the format string *format*. The arguments must match the values required by 55 the format exactly. 56 57 58.. function:: pack_into(format, buffer, offset, v1, v2, ...) 59 60 Pack the values *v1*, *v2*, ... according to the format string *format* and 61 write the packed bytes into the writable buffer *buffer* starting at 62 position *offset*. Note that *offset* is a required argument. 63 64 65.. function:: unpack(format, buffer) 66 67 Unpack from the buffer *buffer* (presumably packed by ``pack(format, ...)``) 68 according to the format string *format*. The result is a tuple even if it 69 contains exactly one item. The buffer's size in bytes must match the 70 size required by the format, as reflected by :func:`calcsize`. 71 72 73.. function:: unpack_from(format, /, buffer, offset=0) 74 75 Unpack from *buffer* starting at position *offset*, according to the format 76 string *format*. The result is a tuple even if it contains exactly one 77 item. The buffer's size in bytes, starting at position *offset*, must be at 78 least the size required by the format, as reflected by :func:`calcsize`. 79 80 81.. function:: iter_unpack(format, buffer) 82 83 Iteratively unpack from the buffer *buffer* according to the format 84 string *format*. This function returns an iterator which will read 85 equally-sized chunks from the buffer until all its contents have been 86 consumed. The buffer's size in bytes must be a multiple of the size 87 required by the format, as reflected by :func:`calcsize`. 88 89 Each iteration yields a tuple as specified by the format string. 90 91 .. versionadded:: 3.4 92 93 94.. function:: calcsize(format) 95 96 Return the size of the struct (and hence of the bytes object produced by 97 ``pack(format, ...)``) corresponding to the format string *format*. 98 99 100.. _struct-format-strings: 101 102Format Strings 103-------------- 104 105Format strings are the mechanism used to specify the expected layout when 106packing and unpacking data. They are built up from :ref:`format-characters`, 107which specify the type of data being packed/unpacked. In addition, there are 108special characters for controlling the :ref:`struct-alignment`. 109 110 111.. _struct-alignment: 112 113Byte Order, Size, and Alignment 114^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 115 116By default, C types are represented in the machine's native format and byte 117order, and properly aligned by skipping pad bytes if necessary (according to the 118rules used by the C compiler). 119 120.. index:: 121 single: @ (at); in struct format strings 122 single: = (equals); in struct format strings 123 single: < (less); in struct format strings 124 single: > (greater); in struct format strings 125 single: ! (exclamation); in struct format strings 126 127Alternatively, the first character of the format string can be used to indicate 128the byte order, size and alignment of the packed data, according to the 129following table: 130 131+-----------+------------------------+----------+-----------+ 132| Character | Byte order | Size | Alignment | 133+===========+========================+==========+===========+ 134| ``@`` | native | native | native | 135+-----------+------------------------+----------+-----------+ 136| ``=`` | native | standard | none | 137+-----------+------------------------+----------+-----------+ 138| ``<`` | little-endian | standard | none | 139+-----------+------------------------+----------+-----------+ 140| ``>`` | big-endian | standard | none | 141+-----------+------------------------+----------+-----------+ 142| ``!`` | network (= big-endian) | standard | none | 143+-----------+------------------------+----------+-----------+ 144 145If the first character is not one of these, ``'@'`` is assumed. 146 147Native byte order is big-endian or little-endian, depending on the host 148system. For example, Intel x86 and AMD64 (x86-64) are little-endian; 149Motorola 68000 and PowerPC G5 are big-endian; ARM and Intel Itanium feature 150switchable endianness (bi-endian). Use ``sys.byteorder`` to check the 151endianness of your system. 152 153Native size and alignment are determined using the C compiler's 154``sizeof`` expression. This is always combined with native byte order. 155 156Standard size depends only on the format character; see the table in 157the :ref:`format-characters` section. 158 159Note the difference between ``'@'`` and ``'='``: both use native byte order, but 160the size and alignment of the latter is standardized. 161 162The form ``'!'`` represents the network byte order which is always big-endian 163as defined in `IETF RFC 1700 <IETF RFC 1700_>`_. 164 165There is no way to indicate non-native byte order (force byte-swapping); use the 166appropriate choice of ``'<'`` or ``'>'``. 167 168Notes: 169 170(1) Padding is only automatically added between successive structure members. 171 No padding is added at the beginning or the end of the encoded struct. 172 173(2) No padding is added when using non-native size and alignment, e.g. 174 with '<', '>', '=', and '!'. 175 176(3) To align the end of a structure to the alignment requirement of a 177 particular type, end the format with the code for that type with a repeat 178 count of zero. See :ref:`struct-examples`. 179 180 181.. _format-characters: 182 183Format Characters 184^^^^^^^^^^^^^^^^^ 185 186Format characters have the following meaning; the conversion between C and 187Python values should be obvious given their types. The 'Standard size' column 188refers to the size of the packed value in bytes when using standard size; that 189is, when the format string starts with one of ``'<'``, ``'>'``, ``'!'`` or 190``'='``. When using native size, the size of the packed value is 191platform-dependent. 192 193+--------+--------------------------+--------------------+----------------+------------+ 194| Format | C Type | Python type | Standard size | Notes | 195+========+==========================+====================+================+============+ 196| ``x`` | pad byte | no value | | | 197+--------+--------------------------+--------------------+----------------+------------+ 198| ``c`` | :c:type:`char` | bytes of length 1 | 1 | | 199+--------+--------------------------+--------------------+----------------+------------+ 200| ``b`` | :c:type:`signed char` | integer | 1 | \(1), \(2) | 201+--------+--------------------------+--------------------+----------------+------------+ 202| ``B`` | :c:type:`unsigned char` | integer | 1 | \(2) | 203+--------+--------------------------+--------------------+----------------+------------+ 204| ``?`` | :c:type:`_Bool` | bool | 1 | \(1) | 205+--------+--------------------------+--------------------+----------------+------------+ 206| ``h`` | :c:type:`short` | integer | 2 | \(2) | 207+--------+--------------------------+--------------------+----------------+------------+ 208| ``H`` | :c:type:`unsigned short` | integer | 2 | \(2) | 209+--------+--------------------------+--------------------+----------------+------------+ 210| ``i`` | :c:type:`int` | integer | 4 | \(2) | 211+--------+--------------------------+--------------------+----------------+------------+ 212| ``I`` | :c:type:`unsigned int` | integer | 4 | \(2) | 213+--------+--------------------------+--------------------+----------------+------------+ 214| ``l`` | :c:type:`long` | integer | 4 | \(2) | 215+--------+--------------------------+--------------------+----------------+------------+ 216| ``L`` | :c:type:`unsigned long` | integer | 4 | \(2) | 217+--------+--------------------------+--------------------+----------------+------------+ 218| ``q`` | :c:type:`long long` | integer | 8 | \(2) | 219+--------+--------------------------+--------------------+----------------+------------+ 220| ``Q`` | :c:type:`unsigned long | integer | 8 | \(2) | 221| | long` | | | | 222+--------+--------------------------+--------------------+----------------+------------+ 223| ``n`` | :c:type:`ssize_t` | integer | | \(3) | 224+--------+--------------------------+--------------------+----------------+------------+ 225| ``N`` | :c:type:`size_t` | integer | | \(3) | 226+--------+--------------------------+--------------------+----------------+------------+ 227| ``e`` | \(6) | float | 2 | \(4) | 228+--------+--------------------------+--------------------+----------------+------------+ 229| ``f`` | :c:type:`float` | float | 4 | \(4) | 230+--------+--------------------------+--------------------+----------------+------------+ 231| ``d`` | :c:type:`double` | float | 8 | \(4) | 232+--------+--------------------------+--------------------+----------------+------------+ 233| ``s`` | :c:type:`char[]` | bytes | | | 234+--------+--------------------------+--------------------+----------------+------------+ 235| ``p`` | :c:type:`char[]` | bytes | | | 236+--------+--------------------------+--------------------+----------------+------------+ 237| ``P`` | :c:type:`void \*` | integer | | \(5) | 238+--------+--------------------------+--------------------+----------------+------------+ 239 240.. versionchanged:: 3.3 241 Added support for the ``'n'`` and ``'N'`` formats. 242 243.. versionchanged:: 3.6 244 Added support for the ``'e'`` format. 245 246 247Notes: 248 249(1) 250 .. index:: single: ? (question mark); in struct format strings 251 252 The ``'?'`` conversion code corresponds to the :c:type:`_Bool` type defined by 253 C99. If this type is not available, it is simulated using a :c:type:`char`. In 254 standard mode, it is always represented by one byte. 255 256(2) 257 When attempting to pack a non-integer using any of the integer conversion 258 codes, if the non-integer has a :meth:`__index__` method then that method is 259 called to convert the argument to an integer before packing. 260 261 .. versionchanged:: 3.2 262 Added use of the :meth:`__index__` method for non-integers. 263 264(3) 265 The ``'n'`` and ``'N'`` conversion codes are only available for the native 266 size (selected as the default or with the ``'@'`` byte order character). 267 For the standard size, you can use whichever of the other integer formats 268 fits your application. 269 270(4) 271 For the ``'f'``, ``'d'`` and ``'e'`` conversion codes, the packed 272 representation uses the IEEE 754 binary32, binary64 or binary16 format (for 273 ``'f'``, ``'d'`` or ``'e'`` respectively), regardless of the floating-point 274 format used by the platform. 275 276(5) 277 The ``'P'`` format character is only available for the native byte ordering 278 (selected as the default or with the ``'@'`` byte order character). The byte 279 order character ``'='`` chooses to use little- or big-endian ordering based 280 on the host system. The struct module does not interpret this as native 281 ordering, so the ``'P'`` format is not available. 282 283(6) 284 The IEEE 754 binary16 "half precision" type was introduced in the 2008 285 revision of the `IEEE 754 standard <ieee 754 standard_>`_. It has a sign 286 bit, a 5-bit exponent and 11-bit precision (with 10 bits explicitly stored), 287 and can represent numbers between approximately ``6.1e-05`` and ``6.5e+04`` 288 at full precision. This type is not widely supported by C compilers: on a 289 typical machine, an unsigned short can be used for storage, but not for math 290 operations. See the Wikipedia page on the `half-precision floating-point 291 format <half precision format_>`_ for more information. 292 293 294A format character may be preceded by an integral repeat count. For example, 295the format string ``'4h'`` means exactly the same as ``'hhhh'``. 296 297Whitespace characters between formats are ignored; a count and its format must 298not contain whitespace though. 299 300For the ``'s'`` format character, the count is interpreted as the length of the 301bytes, not a repeat count like for the other format characters; for example, 302``'10s'`` means a single 10-byte string, while ``'10c'`` means 10 characters. 303If a count is not given, it defaults to 1. For packing, the string is 304truncated or padded with null bytes as appropriate to make it fit. For 305unpacking, the resulting bytes object always has exactly the specified number 306of bytes. As a special case, ``'0s'`` means a single, empty string (while 307``'0c'`` means 0 characters). 308 309When packing a value ``x`` using one of the integer formats (``'b'``, 310``'B'``, ``'h'``, ``'H'``, ``'i'``, ``'I'``, ``'l'``, ``'L'``, 311``'q'``, ``'Q'``), if ``x`` is outside the valid range for that format 312then :exc:`struct.error` is raised. 313 314.. versionchanged:: 3.1 315 Previously, some of the integer formats wrapped out-of-range values and 316 raised :exc:`DeprecationWarning` instead of :exc:`struct.error`. 317 318The ``'p'`` format character encodes a "Pascal string", meaning a short 319variable-length string stored in a *fixed number of bytes*, given by the count. 320The first byte stored is the length of the string, or 255, whichever is 321smaller. The bytes of the string follow. If the string passed in to 322:func:`pack` is too long (longer than the count minus 1), only the leading 323``count-1`` bytes of the string are stored. If the string is shorter than 324``count-1``, it is padded with null bytes so that exactly count bytes in all 325are used. Note that for :func:`unpack`, the ``'p'`` format character consumes 326``count`` bytes, but that the string returned can never contain more than 255 327bytes. 328 329.. index:: single: ? (question mark); in struct format strings 330 331For the ``'?'`` format character, the return value is either :const:`True` or 332:const:`False`. When packing, the truth value of the argument object is used. 333Either 0 or 1 in the native or standard bool representation will be packed, and 334any non-zero value will be ``True`` when unpacking. 335 336 337 338.. _struct-examples: 339 340Examples 341^^^^^^^^ 342 343.. note:: 344 All examples assume a native byte order, size, and alignment with a 345 big-endian machine. 346 347A basic example of packing/unpacking three integers:: 348 349 >>> from struct import * 350 >>> pack('hhl', 1, 2, 3) 351 b'\x00\x01\x00\x02\x00\x00\x00\x03' 352 >>> unpack('hhl', b'\x00\x01\x00\x02\x00\x00\x00\x03') 353 (1, 2, 3) 354 >>> calcsize('hhl') 355 8 356 357Unpacked fields can be named by assigning them to variables or by wrapping 358the result in a named tuple:: 359 360 >>> record = b'raymond \x32\x12\x08\x01\x08' 361 >>> name, serialnum, school, gradelevel = unpack('<10sHHb', record) 362 363 >>> from collections import namedtuple 364 >>> Student = namedtuple('Student', 'name serialnum school gradelevel') 365 >>> Student._make(unpack('<10sHHb', record)) 366 Student(name=b'raymond ', serialnum=4658, school=264, gradelevel=8) 367 368The ordering of format characters may have an impact on size since the padding 369needed to satisfy alignment requirements is different:: 370 371 >>> pack('ci', b'*', 0x12131415) 372 b'*\x00\x00\x00\x12\x13\x14\x15' 373 >>> pack('ic', 0x12131415, b'*') 374 b'\x12\x13\x14\x15*' 375 >>> calcsize('ci') 376 8 377 >>> calcsize('ic') 378 5 379 380The following format ``'llh0l'`` specifies two pad bytes at the end, assuming 381longs are aligned on 4-byte boundaries:: 382 383 >>> pack('llh0l', 1, 2, 3) 384 b'\x00\x00\x00\x01\x00\x00\x00\x02\x00\x03\x00\x00' 385 386This only works when native size and alignment are in effect; standard size and 387alignment does not enforce any alignment. 388 389 390.. seealso:: 391 392 Module :mod:`array` 393 Packed binary storage of homogeneous data. 394 395 Module :mod:`xdrlib` 396 Packing and unpacking of XDR data. 397 398 399.. _struct-objects: 400 401Classes 402------- 403 404The :mod:`struct` module also defines the following type: 405 406 407.. class:: Struct(format) 408 409 Return a new Struct object which writes and reads binary data according to 410 the format string *format*. Creating a Struct object once and calling its 411 methods is more efficient than calling the :mod:`struct` functions with the 412 same format since the format string only needs to be compiled once. 413 414 .. note:: 415 416 The compiled versions of the most recent format strings passed to 417 :class:`Struct` and the module-level functions are cached, so programs 418 that use only a few format strings needn't worry about reusing a single 419 :class:`Struct` instance. 420 421 Compiled Struct objects support the following methods and attributes: 422 423 .. method:: pack(v1, v2, ...) 424 425 Identical to the :func:`pack` function, using the compiled format. 426 (``len(result)`` will equal :attr:`size`.) 427 428 429 .. method:: pack_into(buffer, offset, v1, v2, ...) 430 431 Identical to the :func:`pack_into` function, using the compiled format. 432 433 434 .. method:: unpack(buffer) 435 436 Identical to the :func:`unpack` function, using the compiled format. 437 The buffer's size in bytes must equal :attr:`size`. 438 439 440 .. method:: unpack_from(buffer, offset=0) 441 442 Identical to the :func:`unpack_from` function, using the compiled format. 443 The buffer's size in bytes, starting at position *offset*, must be at least 444 :attr:`size`. 445 446 447 .. method:: iter_unpack(buffer) 448 449 Identical to the :func:`iter_unpack` function, using the compiled format. 450 The buffer's size in bytes must be a multiple of :attr:`size`. 451 452 .. versionadded:: 3.4 453 454 .. attribute:: format 455 456 The format string used to construct this Struct object. 457 458 .. versionchanged:: 3.7 459 The format string type is now :class:`str` instead of :class:`bytes`. 460 461 .. attribute:: size 462 463 The calculated size of the struct (and hence of the bytes object produced 464 by the :meth:`pack` method) corresponding to :attr:`format`. 465 466 467.. _half precision format: https://en.wikipedia.org/wiki/Half-precision_floating-point_format 468 469.. _ieee 754 standard: https://en.wikipedia.org/wiki/IEEE_floating_point#IEEE_754-2008 470 471.. _IETF RFC 1700: https://tools.ietf.org/html/rfc1700 472