• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1
2:mod:`struct` --- Interpret strings as packed binary data
3=========================================================
4
5.. module:: struct
6   :synopsis: Interpret strings as packed binary data.
7
8.. index::
9   pair: C; structures
10   triple: packing; binary; data
11
12This module performs conversions between Python values and C structs represented
13as Python strings.  This can be used in handling binary data stored in files or
14from network connections, among other sources.  It uses
15:ref:`struct-format-strings` as compact descriptions of the layout of the C
16structs and the intended conversion to/from Python values.
17
18.. note::
19
20   By default, the result of packing a given C struct includes pad bytes in
21   order to maintain proper alignment for the C types involved; similarly,
22   alignment is taken into account when unpacking.  This behavior is chosen so
23   that the bytes of a packed struct correspond exactly to the layout in memory
24   of the corresponding C struct.  To handle platform-independent data formats
25   or omit implicit pad bytes, use ``standard`` size and alignment instead of
26   ``native`` size and alignment: see :ref:`struct-alignment` for details.
27
28Functions and Exceptions
29------------------------
30
31The module defines the following exception and functions:
32
33
34.. exception:: error
35
36   Exception raised on various occasions; argument is a string describing what
37   is wrong.
38
39
40.. function:: pack(fmt, v1, v2, ...)
41
42   Return a string containing the values ``v1, v2, ...`` packed according to the
43   given format.  The arguments must match the values required by the format
44   exactly.
45
46
47.. function:: pack_into(fmt, buffer, offset, v1, v2, ...)
48
49   Pack the values ``v1, v2, ...`` according to the given format, write the
50   packed bytes into the writable *buffer* starting at *offset*. Note that the
51   offset is a required argument.
52
53   .. versionadded:: 2.5
54
55
56.. function:: unpack(fmt, string)
57
58   Unpack the string (presumably packed by ``pack(fmt, ...)``) according to the
59   given format.  The result is a tuple even if it contains exactly one item.
60   The string must contain exactly the amount of data required by the format
61   (``len(string)`` must equal ``calcsize(fmt)``).
62
63
64.. function:: unpack_from(fmt, buffer[,offset=0])
65
66   Unpack the *buffer* according to the given format. The result is a tuple even
67   if it contains exactly one item. The *buffer* must contain at least the
68   amount of data required by the format (``len(buffer[offset:])`` must be at
69   least ``calcsize(fmt)``).
70
71   .. versionadded:: 2.5
72
73
74.. function:: calcsize(fmt)
75
76   Return the size of the struct (and hence of the string) corresponding to the
77   given format.
78
79.. _struct-format-strings:
80
81Format Strings
82--------------
83
84Format strings are the mechanism used to specify the expected layout when
85packing and unpacking data.  They are built up from :ref:`format-characters`,
86which specify the type of data being packed/unpacked.  In addition, there are
87special characters for controlling the :ref:`struct-alignment`.
88
89
90.. _struct-alignment:
91
92Byte Order, Size, and Alignment
93^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
94
95By default, C types are represented in the machine's native format and byte
96order, and properly aligned by skipping pad bytes if necessary (according to the
97rules used by the C compiler).
98
99Alternatively, the first character of the format string can be used to indicate
100the byte order, size and alignment of the packed data, according to the
101following table:
102
103+-----------+------------------------+----------+-----------+
104| Character | Byte order             | Size     | Alignment |
105+===========+========================+==========+===========+
106| ``@``     | native                 | native   | native    |
107+-----------+------------------------+----------+-----------+
108| ``=``     | native                 | standard | none      |
109+-----------+------------------------+----------+-----------+
110| ``<``     | little-endian          | standard | none      |
111+-----------+------------------------+----------+-----------+
112| ``>``     | big-endian             | standard | none      |
113+-----------+------------------------+----------+-----------+
114| ``!``     | network (= big-endian) | standard | none      |
115+-----------+------------------------+----------+-----------+
116
117If the first character is not one of these, ``'@'`` is assumed.
118
119Native byte order is big-endian or little-endian, depending on the host
120system. For example, Intel x86 and AMD64 (x86-64) are little-endian;
121Motorola 68000 and PowerPC G5 are big-endian; ARM and Intel Itanium feature
122switchable endianness (bi-endian). Use ``sys.byteorder`` to check the
123endianness of your system.
124
125Native size and alignment are determined using the C compiler's
126``sizeof`` expression.  This is always combined with native byte order.
127
128Standard size depends only on the format character;  see the table in
129the :ref:`format-characters` section.
130
131Note the difference between ``'@'`` and ``'='``: both use native byte order, but
132the size and alignment of the latter is standardized.
133
134The form ``'!'`` is available for those poor souls who claim they can't remember
135whether network byte order is big-endian or little-endian.
136
137There is no way to indicate non-native byte order (force byte-swapping); use the
138appropriate choice of ``'<'`` or ``'>'``.
139
140Notes:
141
142(1) Padding is only automatically added between successive structure members.
143    No padding is added at the beginning or the end of the encoded struct.
144
145(2) No padding is added when using non-native size and alignment, e.g.
146    with '<', '>', '=', and '!'.
147
148(3) To align the end of a structure to the alignment requirement of a
149    particular type, end the format with the code for that type with a repeat
150    count of zero.  See :ref:`struct-examples`.
151
152
153.. _format-characters:
154
155Format Characters
156^^^^^^^^^^^^^^^^^
157
158Format characters have the following meaning; the conversion between C and
159Python values should be obvious given their types.  The 'Standard size' column
160refers to the size of the packed value in bytes when using standard size; that
161is, when the format string starts with one of ``'<'``, ``'>'``, ``'!'`` or
162``'='``.  When using native size, the size of the packed value is
163platform-dependent.
164
165+--------+--------------------------+--------------------+----------------+------------+
166| Format | C Type                   | Python type        | Standard size  | Notes      |
167+========+==========================+====================+================+============+
168| ``x``  | pad byte                 | no value           |                |            |
169+--------+--------------------------+--------------------+----------------+------------+
170| ``c``  | :c:type:`char`           | string of length 1 | 1              |            |
171+--------+--------------------------+--------------------+----------------+------------+
172| ``b``  | :c:type:`signed char`    | integer            | 1              | \(3)       |
173+--------+--------------------------+--------------------+----------------+------------+
174| ``B``  | :c:type:`unsigned char`  | integer            | 1              | \(3)       |
175+--------+--------------------------+--------------------+----------------+------------+
176| ``?``  | :c:type:`_Bool`          | bool               | 1              | \(1)       |
177+--------+--------------------------+--------------------+----------------+------------+
178| ``h``  | :c:type:`short`          | integer            | 2              | \(3)       |
179+--------+--------------------------+--------------------+----------------+------------+
180| ``H``  | :c:type:`unsigned short` | integer            | 2              | \(3)       |
181+--------+--------------------------+--------------------+----------------+------------+
182| ``i``  | :c:type:`int`            | integer            | 4              | \(3)       |
183+--------+--------------------------+--------------------+----------------+------------+
184| ``I``  | :c:type:`unsigned int`   | integer            | 4              | \(3)       |
185+--------+--------------------------+--------------------+----------------+------------+
186| ``l``  | :c:type:`long`           | integer            | 4              | \(3)       |
187+--------+--------------------------+--------------------+----------------+------------+
188| ``L``  | :c:type:`unsigned long`  | integer            | 4              | \(3)       |
189+--------+--------------------------+--------------------+----------------+------------+
190| ``q``  | :c:type:`long long`      | integer            | 8              | \(2), \(3) |
191+--------+--------------------------+--------------------+----------------+------------+
192| ``Q``  | :c:type:`unsigned long   | integer            | 8              | \(2), \(3) |
193|        | long`                    |                    |                |            |
194+--------+--------------------------+--------------------+----------------+------------+
195| ``f``  | :c:type:`float`          | float              | 4              | \(4)       |
196+--------+--------------------------+--------------------+----------------+------------+
197| ``d``  | :c:type:`double`         | float              | 8              | \(4)       |
198+--------+--------------------------+--------------------+----------------+------------+
199| ``s``  | :c:type:`char[]`         | string             |                |            |
200+--------+--------------------------+--------------------+----------------+------------+
201| ``p``  | :c:type:`char[]`         | string             |                |            |
202+--------+--------------------------+--------------------+----------------+------------+
203| ``P``  | :c:type:`void \*`        | integer            |                | \(5), \(3) |
204+--------+--------------------------+--------------------+----------------+------------+
205
206Notes:
207
208(1)
209   The ``'?'`` conversion code corresponds to the :c:type:`_Bool` type defined by
210   C99. If this type is not available, it is simulated using a :c:type:`char`. In
211   standard mode, it is always represented by one byte.
212
213   .. versionadded:: 2.6
214
215(2)
216   The ``'q'`` and ``'Q'`` conversion codes are available in native mode only if
217   the platform C compiler supports C :c:type:`long long`, or, on Windows,
218   :c:type:`__int64`.  They are always available in standard modes.
219
220   .. versionadded:: 2.2
221
222(3)
223   When attempting to pack a non-integer using any of the integer conversion
224   codes, if the non-integer has a :meth:`__index__` method then that method is
225   called to convert the argument to an integer before packing.  If no
226   :meth:`__index__` method exists, or the call to :meth:`__index__` raises
227   :exc:`TypeError`, then the :meth:`__int__` method is tried.  However, the use
228   of :meth:`__int__` is deprecated, and will raise :exc:`DeprecationWarning`.
229
230   .. versionchanged:: 2.7
231      Use of the :meth:`__index__` method for non-integers is new in 2.7.
232
233   .. versionchanged:: 2.7
234      Prior to version 2.7, not all integer conversion codes would use the
235      :meth:`__int__` method to convert, and :exc:`DeprecationWarning` was
236      raised only for float arguments.
237
238(4)
239   For the ``'f'`` and ``'d'`` conversion codes, the packed representation uses
240   the IEEE 754 binary32 (for ``'f'``) or binary64 (for ``'d'``) format,
241   regardless of the floating-point format used by the platform.
242
243(5)
244   The ``'P'`` format character is only available for the native byte ordering
245   (selected as the default or with the ``'@'`` byte order character). The byte
246   order character ``'='`` chooses to use little- or big-endian ordering based
247   on the host system. The struct module does not interpret this as native
248   ordering, so the ``'P'`` format is not available.
249
250
251A format character may be preceded by an integral repeat count.  For example,
252the format string ``'4h'`` means exactly the same as ``'hhhh'``.
253
254Whitespace characters between formats are ignored; a count and its format must
255not contain whitespace though.
256
257For the ``'s'`` format character, the count is interpreted as the size of the
258string, not a repeat count like for the other format characters; for example,
259``'10s'`` means a single 10-byte string, while ``'10c'`` means 10 characters.
260If a count is not given, it defaults to 1.  For packing, the string is
261truncated or padded with null bytes as appropriate to make it fit. For
262unpacking, the resulting string always has exactly the specified number of
263bytes.  As a special case, ``'0s'`` means a single, empty string (while
264``'0c'`` means 0 characters).
265
266The ``'p'`` format character encodes a "Pascal string", meaning a short
267variable-length string stored in a *fixed number of bytes*, given by the count.
268The first byte stored is the length of the string, or 255, whichever is smaller.
269The bytes of the string follow.  If the string passed in to :func:`pack` is too
270long (longer than the count minus 1), only the leading ``count-1`` bytes of the
271string are stored.  If the string is shorter than ``count-1``, it is padded with
272null bytes so that exactly count bytes in all are used.  Note that for
273:func:`unpack`, the ``'p'`` format character consumes count bytes, but that the
274string returned can never contain more than 255 characters.
275
276For the ``'P'`` format character, the return value is a Python integer or long
277integer, depending on the size needed to hold a pointer when it has been cast to
278an integer type.  A *NULL* pointer will always be returned as the Python integer
279``0``. When packing pointer-sized values, Python integer or long integer objects
280may be used.  For example, the Alpha and Merced processors use 64-bit pointer
281values, meaning a Python long integer will be used to hold the pointer; other
282platforms use 32-bit pointers and will use a Python integer.
283
284For the ``'?'`` format character, the return value is either :const:`True` or
285:const:`False`. When packing, the truth value of the argument object is used.
286Either 0 or 1 in the native or standard bool representation will be packed, and
287any non-zero value will be ``True`` when unpacking.
288
289
290
291.. _struct-examples:
292
293Examples
294^^^^^^^^
295
296.. note::
297   All examples assume a native byte order, size, and alignment with a
298   big-endian machine.
299
300A basic example of packing/unpacking three integers::
301
302   >>> from struct import *
303   >>> pack('hhl', 1, 2, 3)
304   '\x00\x01\x00\x02\x00\x00\x00\x03'
305   >>> unpack('hhl', '\x00\x01\x00\x02\x00\x00\x00\x03')
306   (1, 2, 3)
307   >>> calcsize('hhl')
308   8
309
310Unpacked fields can be named by assigning them to variables or by wrapping
311the result in a named tuple::
312
313    >>> record = 'raymond   \x32\x12\x08\x01\x08'
314    >>> name, serialnum, school, gradelevel = unpack('<10sHHb', record)
315
316    >>> from collections import namedtuple
317    >>> Student = namedtuple('Student', 'name serialnum school gradelevel')
318    >>> Student._make(unpack('<10sHHb', record))
319    Student(name='raymond   ', serialnum=4658, school=264, gradelevel=8)
320
321The ordering of format characters may have an impact on size since the padding
322needed to satisfy alignment requirements is different::
323
324    >>> pack('ci', '*', 0x12131415)
325    '*\x00\x00\x00\x12\x13\x14\x15'
326    >>> pack('ic', 0x12131415, '*')
327    '\x12\x13\x14\x15*'
328    >>> calcsize('ci')
329    8
330    >>> calcsize('ic')
331    5
332
333The following format ``'llh0l'`` specifies two pad bytes at the end, assuming
334longs are aligned on 4-byte boundaries::
335
336    >>> pack('llh0l', 1, 2, 3)
337    '\x00\x00\x00\x01\x00\x00\x00\x02\x00\x03\x00\x00'
338
339This only works when native size and alignment are in effect; standard size and
340alignment does not enforce any alignment.
341
342
343.. seealso::
344
345   Module :mod:`array`
346      Packed binary storage of homogeneous data.
347
348   Module :mod:`xdrlib`
349      Packing and unpacking of XDR data.
350
351
352.. _struct-objects:
353
354Classes
355-------
356
357The :mod:`struct` module also defines the following type:
358
359
360.. class:: Struct(format)
361
362   Return a new Struct object which writes and reads binary data according to
363   the format string *format*.  Creating a Struct object once and calling its
364   methods is more efficient than calling the :mod:`struct` functions with the
365   same format since the format string only needs to be compiled once.
366
367   .. versionadded:: 2.5
368
369   Compiled Struct objects support the following methods and attributes:
370
371
372   .. method:: pack(v1, v2, ...)
373
374      Identical to the :func:`pack` function, using the compiled format.
375      (``len(result)`` will equal :attr:`self.size`.)
376
377
378   .. method:: pack_into(buffer, offset, v1, v2, ...)
379
380      Identical to the :func:`pack_into` function, using the compiled format.
381
382
383   .. method:: unpack(string)
384
385      Identical to the :func:`unpack` function, using the compiled format.
386      (``len(string)`` must equal :attr:`self.size`).
387
388
389   .. method:: unpack_from(buffer, offset=0)
390
391      Identical to the :func:`unpack_from` function, using the compiled format.
392      (``len(buffer[offset:])`` must be at least :attr:`self.size`).
393
394
395   .. attribute:: format
396
397      The format string used to construct this Struct object.
398
399   .. attribute:: size
400
401      The calculated size of the struct (and hence of the string) corresponding
402      to :attr:`format`.
403
404