• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# Universal Types with BER/DER Decoder and DER Encoder
2
3The *asn1crypto* library is a combination of universal type classes that
4implement BER/DER decoding and DER encoding, a PEM encoder and decoder, and a
5number of pre-built cryptographic type classes. This document covers the
6universal type classes.
7
8For a general overview of ASN.1 as used in cryptography, please see
9[A Layman's Guide to a Subset of ASN.1, BER, and DER](http://luca.ntop.org/Teaching/Appunti/asn1.html).
10
11This page contains the following sections:
12
13 - [Universal Types](#universal-types)
14 - [Basic Usage](#basic-usage)
15 - [Sequence](#sequence)
16 - [Set](#set)
17 - [SequenceOf](#sequenceof)
18 - [SetOf](#setof)
19 - [Integer](#integer)
20 - [Enumerated](#enumerated)
21 - [ObjectIdentifier](#objectidentifier)
22 - [BitString](#bitstring)
23 - [Strings](#strings)
24 - [UTCTime](#utctime)
25 - [GeneralizedTime](#generalizedtime)
26 - [Choice](#choice)
27 - [Any](#any)
28 - [Specification via OID](#specification-via-oid)
29 - [Explicit and Implicit Tagging](#explicit-and-implicit-tagging)
30
31## Universal Types
32
33For general purpose ASN.1 parsing, the `asn1crypto.core` module is used. It
34contains the following classes, that parse, represent and serialize all of the
35ASN.1 universal types:
36
37| Class              | Native Type                            | Implementation Notes                 |
38| ------------------ | -------------------------------------- | ------------------------------------ |
39| `Boolean`          | `bool`                                 |                                      |
40| `Integer`          | `int`                                  | may be `long` on Python 2            |
41| `BitString`        | `tuple` of `int` or `set` of `unicode` | `set` used if `_map` present         |
42| `OctetString`      | `bytes` (`str`)                        |                                      |
43| `Null`             | `None`                                 |                                      |
44| `ObjectIdentifier` | `str` (`unicode`)                      | string is dotted integer format      |
45| `ObjectDescriptor` |                                        | no native conversion                 |
46| `InstanceOf`       |                                        | no native conversion                 |
47| `Real`             |                                        | no native conversion                 |
48| `Enumerated`       | `str` (`unicode`)                      | `_map` must be set                   |
49| `UTF8String`       | `str` (`unicode`)                      |                                      |
50| `RelativeOid`      | `str` (`unicode`)                      | string is dotted integer format      |
51| `Sequence`         | `OrderedDict`                          |                                      |
52| `SequenceOf`       | `list`                                 |                                      |
53| `Set`              | `OrderedDict`                          |                                      |
54| `SetOf`            | `list`                                 |                                      |
55| `EmbeddedPdv`      | `OrderedDict`                          | no named field parsing               |
56| `NumericString`    | `str` (`unicode`)                      | no charset limitations               |
57| `PrintableString`  | `str` (`unicode`)                      | no charset limitations               |
58| `TeletexString`    | `str` (`unicode`)                      |                                      |
59| `VideotexString`   | `bytes` (`str`)                        | no unicode conversion                |
60| `IA5String`        | `str` (`unicode`)                      |                                      |
61| `UTCTime`          | `datetime.datetime`                    |                                      |
62| `GeneralizedTime`  | `datetime.datetime`                    | treated as UTC when no timezone      |
63| `GraphicString`    | `str` (`unicode`)                      | unicode conversion as latin1         |
64| `VisibleString`    | `str` (`unicode`)                      | no charset limitations               |
65| `GeneralString`    | `str` (`unicode`)                      | unicode conversion as latin1         |
66| `UniversalString`  | `str` (`unicode`)                      |                                      |
67| `CharacterString`  | `str` (`unicode`)                      | unicode conversion as latin1         |
68| `BMPString`        | `str` (`unicode`)                      |                                      |
69
70For *Native Type*, the Python 3 type is listed first, with the Python 2 type
71in parentheses.
72
73As mentioned next to some of the types, value parsing may not be implemented
74for types not currently used in cryptography (such as `ObjectDescriptor`,
75`InstanceOf` and `Real`). Additionally some of the string classes don't
76enforce character set limitations, and for some string types that accept all
77different encodings, the default encoding is set to latin1.
78
79In addition, there are a few overridden types where various specifications use
80a `BitString` or `OctetString` type to represent a different type. These
81include:
82
83| Class                | Native Type         | Implementation Notes            |
84| -------------------- | ------------------- | ------------------------------- |
85| `OctetBitString`     | `bytes` (`str`)     |                                 |
86| `IntegerBitString`   | `int`               | may be `long` on Python 2       |
87| `IntegerOctetString` | `int`               | may be `long` on Python 2       |
88
89For situations where the DER encoded bytes from one type is embedded in another,
90the `ParsableOctetString` and `ParsableOctetBitString` classes exist. These
91function the same as `OctetString` and `OctetBitString`, however they also
92have an attribute `.parsed` and a method `.parse()` that allows for
93parsing the content as ASN.1 structures.
94
95All of these overrides can be used with the `cast()` method to convert between
96them. The only requirement is that the class being casted to has the same tag
97as the original class. No re-encoding is done, rather the contents are simply
98re-interpreted.
99
100```python
101from asn1crypto.core import BitString, OctetBitString, IntegerBitString
102
103bit = BitString({
104    0, 0, 0, 0, 0, 0, 0, 1,
105    0, 0, 0, 0, 0, 0, 1, 0,
106})
107
108# Will print (0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0)
109print(bit.native)
110
111octet = bit.cast(OctetBitString)
112
113# Will print b'\x01\x02'
114print(octet.native)
115
116i = bit.cast(IntegerBitString)
117
118# Will print 258
119print(i.native)
120```
121
122## Basic Usage
123
124All of the universal types implement four methods, a class method `.load()` and
125the instance methods `.dump()`, `.copy()` and `.debug()`.
126
127`.load()` accepts a byte string of DER or BER encoded data and returns an
128object of the class it was called on. `.dump()` returns the serialization of
129an object into DER encoding.
130
131```python
132from asn1crypto.core import Sequence
133
134parsed = Sequence.load(der_byte_string)
135serialized = parsed.dump()
136```
137
138By default, *asn1crypto* tries to be efficient and caches serialized data for
139better performance. If the input data is possibly BER encoded, but the output
140must be DER encoded, the `force` parameter may be used with `.dump()`.
141
142```python
143from asn1crypto.core import Sequence
144
145parsed = Sequence.load(der_byte_string)
146der_serialized = parsed.dump(force=True)
147```
148
149The `.copy()` method creates a deep copy of an object, allowing child fields to
150be modified without affecting the original.
151
152```python
153from asn1crypto.core import Sequence
154
155seq1 = Sequence.load(der_byte_string)
156seq2 = seq1.copy()
157seq2[0] = seq1[0] + 1
158if seq1[0] != seq2[0]:
159    print('Copies have distinct contents')
160```
161
162The `.debug()` method is available to help in situations where interaction with
163another ASN.1 serializer or parsing is not functioning as expected. Calling
164this method will print a tree structure with information about the header bytes,
165class, method, tag, special tagging, content bytes, native Python value, child
166fields and any sub-parsed values.
167
168```python
169from asn1crypto.core import Sequence
170
171parsed = Sequence.load(der_byte_string)
172parsed.debug()
173```
174
175In addition to the available methods, every instance has a `.native` property
176that converts the data into a native Python data type.
177
178```python
179import pprint
180from asn1crypto.core import Sequence
181
182parsed = Sequence.load(der_byte_string)
183pprint(parsed.native)
184```
185
186## Sequence
187
188One of the core structures when dealing with ASN.1 is the Sequence type. The
189`Sequence` class can handle field with universal data types, however in most
190situations the `_fields` property will need to be set with the expected
191definition of each field in the Sequence.
192
193### Configuration
194
195The `_fields` property must be set to a `list` of 2-3 element `tuple`s. The
196first element in the tuple must be a unicode string of the field name. The
197second must be a type class - either a universal type, or a custom type. The
198third, and optional, element is a `dict` with parameters to pass to the type
199class for things like default values, marking the field as optional, or
200implicit/explicit tagging.
201
202```python
203from asn1crypto.core import Sequence, Integer, OctetString, IA5String
204
205class MySequence(Sequence):
206    _fields = [
207        ('field_one', Integer),
208        ('field_two', OctetString),
209        ('field_three', IA5String, {'optional': True}),
210    ]
211```
212
213Implicit and explicit tagging will be covered in more detail later, however
214the following are options that can be set for each field type class:
215
216 - `{'default: 1}` sets the field's default value to `1`, allowing it to be
217   omitted from the serialized form
218 - `{'optional': True}` set the field to be optional, allowing it to be
219   omitted
220
221### Usage
222
223To access values of the sequence, use dict-like access via `[]` and use the
224name of the field:
225
226```python
227seq = MySequence.load(der_byte_string)
228print(seq['field_two'].native)
229```
230
231The values of fields can be set by assigning via `[]`. If the value assigned is
232of the correct type class, it will be used as-is. If the value is not of the
233correct type class, a new instance of that type class will be created and the
234value will be passed to the constructor.
235
236```python
237seq = MySequence.load(der_byte_string)
238# These statements will result in the same state
239seq['field_one'] = Integer(5)
240seq['field_one'] = 5
241```
242
243When fields are complex types such as `Sequence` or `SequenceOf`, there is no
244way to construct the value out of a native Python data type.
245
246### Optional Fields
247
248When a field is configured via the `optional` parameter, not present in the
249`Sequence`, but accessed, the `VOID` object will be returned. This is an object
250that is serialized to an empty byte string and returns `None` when `.native` is
251accessed.
252
253## Set
254
255The `Set` class is configured in the same was as `Sequence`, however it allows
256serialized fields to be in any order, per the ASN.1 standard.
257
258```python
259from asn1crypto.core import Set, Integer, OctetString, IA5String
260
261class MySet(Set):
262    _fields = [
263        ('field_one', Integer),
264        ('field_two', OctetString),
265        ('field_three', IA5String, {'optional': True}),
266    ]
267```
268
269## SequenceOf
270
271The `SequenceOf` class is used to allow for zero or more instances of a type.
272The class uses the `_child_spec` property to define the instance class type.
273
274```python
275from asn1crypto.core import SequenceOf, Integer
276
277class Integers(SequenceOf):
278    _child_spec = Integer
279```
280
281Values in the `SequenceOf` can be accessed via `[]` with an integer key. The
282length of the `SequenceOf` is determined via `len()`.
283
284```python
285values = Integers.load(der_byte_string)
286for i in range(0, len(values)):
287    print(values[i].native)
288```
289
290## SetOf
291
292The `SetOf` class is an exact duplicate of `SequenceOf`. According to the ASN.1
293standard, the difference is that a `SequenceOf` is explicitly ordered, however
294`SetOf` may be in any order. This is an equivalent comparison of a Python `list`
295and `set`.
296
297```python
298from asn1crypto.core import SetOf, Integer
299
300class Integers(SetOf):
301    _child_spec = Integer
302```
303
304## Integer
305
306The `Integer` class allows values to be *named*. An `Integer` with named values
307may contain any integer, however special values with named will be represented
308as those names when `.native` is called.
309
310Named values are configured via the `_map` property, which must be a `dict`
311with the keys being integers and the values being unicode strings.
312
313```python
314from asn1crypto.core import Integer
315
316class Version(Integer):
317    _map = {
318        1: 'v1',
319        2: 'v2',
320    }
321
322# Will print: "v1"
323print(Version(1).native)
324
325# Will print: 4
326print(Version(4).native)
327```
328
329## Enumerated
330
331The `Enumerated` class is almost identical to `Integer`, however only values in
332the `_map` property are valid.
333
334```python
335from asn1crypto.core import Enumerated
336
337class Version(Enumerated):
338    _map = {
339        1: 'v1',
340        2: 'v2',
341    }
342
343# Will print: "v1"
344print(Version(1).native)
345
346# Will raise a ValueError exception
347print(Version(4).native)
348```
349
350## ObjectIdentifier
351
352The `ObjectIdentifier` class represents values of the ASN.1 type of the same
353name. `ObjectIdentifier` instances are converted to a unicode string in a
354dotted-integer format when `.native` is accessed.
355
356While this standard conversion is a reasonable baseline, in most situations
357it will be more maintainable to map the OID strings to a unicode string
358containing a description of what the OID repesents.
359
360The mapping of OID strings to name strings is configured via the `_map`
361property, which is a `dict` object with keys being unicode OID string and the
362values being a unicode string.
363
364The `.dotted` attribute will always return a unicode string of the dotted
365integer form of the OID.
366
367The class methods `.map()` and `.unmap()` will convert a dotted integer unicode
368string to the user-friendly name, and vice-versa.
369
370```python
371from asn1crypto.core import ObjectIdentifier
372
373class MyType(ObjectIdentifier):
374    _map = {
375        '1.8.2.1.23': 'value_name',
376        '1.8.2.1.24': 'other_value',
377    }
378
379# Will print: "value_name"
380print(MyType('1.8.2.1.23').native)
381
382# Will print: "1.8.2.1.23"
383print(MyType('1.8.2.1.23').dotted)
384
385# Will print: "1.8.2.1.25"
386print(MyType('1.8.2.1.25').native)
387
388# Will print "value_name"
389print(MyType.map('1.8.2.1.23'))
390
391# Will print "1.8.2.1.23"
392print(MyType.unmap('value_name'))
393```
394
395## BitString
396
397When no `_map` is set for a `BitString` class, the native representation is a
398`tuple` of `int`s (being either `1` or `0`).
399
400```python
401from asn1crypto.core import BitString
402
403b1 = BitString((1, 0, 1))
404```
405
406Additionally, it is possible to set the `_map` property to a dict where the
407keys are bit indexes and the values are unicode string names. This allows
408checking the value of a given bit by item access, and the native representation
409becomes a `set` of unicode strings.
410
411```python
412from asn1crypto.core import BitString
413
414class MyFlags(BitString):
415    _map = {
416        0: 'edit',
417        1: 'delete',
418        2: 'manage_users',
419    }
420
421permissions = MyFlags({'edit', 'delete'})
422
423# This will be printed
424if permissions['edit'] and permissions['delete']:
425    print('Can edit and delete')
426
427# This will not
428if 'manage_users' in permissions.native:
429    print('Is admin')
430```
431
432## Strings
433
434ASN.1 contains quite a number of string types:
435
436| Type              | Standard Encoding                 | Implementation Encoding | Notes                                                                     |
437| ----------------- | --------------------------------- | ----------------------- | ------------------------------------------------------------------------- |
438| `UTF8String`      | UTF-8                             | UTF-8                   |                                                                           |
439| `NumericString`   | ASCII `[0-9 ]`                    | ISO 8859-1              | The implementation is a superset of supported characters                  |
440| `PrintableString` | ASCII `[a-zA-Z0-9 '()+,\\-./:=?]` | ISO 8859-1              | The implementation is a superset of supported characters                  |
441| `TeletexString`   | ITU T.61                          | Custom                  | The implementation is based off of https://en.wikipedia.org/wiki/ITU_T.61 |
442| `VideotexString`  | *?*                               | *None*                  | This has no set encoding, and it not used in cryptography                 |
443| `IA5String`       | ITU T.50 (very similar to ASCII)  | ISO 8859-1              | The implementation is a superset of supported characters                  |
444| `GraphicString`   | *                                 | ISO 8859-1              | This has not set encoding, but seems to often contain ISO 8859-1          |
445| `VisibleString`   | ASCII (printable)                 | ISO 8859-1              | The implementation is a superset of supported characters                  |
446| `GeneralString`   | *                                 | ISO 8859-1              | This has not set encoding, but seems to often contain ISO 8859-1          |
447| `UniversalString` | UTF-32                            | UTF-32                  |                                                                           |
448| `CharacterString` | *                                 | ISO 8859-1              | This has not set encoding, but seems to often contain ISO 8859-1          |
449| `BMPString`       | UTF-16                            | UTF-16                  |                                                                           |
450
451As noted in the table above, many of the implementations are supersets of the
452supported characters. This simplifies parsing, but puts the onus of using valid
453characters on the developer. However, in general `UTF8String`, `BMPString` or
454`UniversalString` should be preferred when a choice is given.
455
456All string types other than `VideotexString` are created from unicode strings.
457
458```python
459from asn1crypto.core import IA5String
460
461print(IA5String('Testing!').native)
462```
463
464## UTCTime
465
466The class `UTCTime` accepts a unicode string in one of the formats:
467
468 - `%y%m%d%H%MZ`
469 - `%y%m%d%H%M%SZ`
470 - `%y%m%d%H%M%z`
471 - `%y%m%d%H%M%S%z`
472
473or a `datetime.datetime` instance. See the
474[Python datetime strptime() reference](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior)
475for details of the formats.
476
477When `.native` is accessed, it returns a `datetime.datetime` object with a
478`tzinfo` of `asn1crypto.util.timezone.utc`.
479
480## GeneralizedTime
481
482The class `GeneralizedTime` accepts a unicode string in one of the formats:
483
484 - `%Y%m%d%H`
485 - `%Y%m%d%H%M`
486 - `%Y%m%d%H%M%S`
487 - `%Y%m%d%H%M%S.%f`
488 - `%Y%m%d%HZ`
489 - `%Y%m%d%H%MZ`
490 - `%Y%m%d%H%M%SZ`
491 - `%Y%m%d%H%M%S.%fZ`
492 - `%Y%m%d%H%z`
493 - `%Y%m%d%H%M%z`
494 - `%Y%m%d%H%M%S%z`
495 - `%Y%m%d%H%M%S.%f%z`
496
497or a `datetime.datetime` instance. See the
498[Python datetime strptime() reference](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior)
499for details of the formats.
500
501When `.native` is accessed, it returns a `datetime.datetime` object with a
502`tzinfo` of `asn1crypto.util.timezone.utc`. For formats where the time has a
503timezone offset is specified (`[+-]\d{4}`), the time is converted to UTC. For
504times without a timezone, the time is assumed to be in UTC.
505
506## Choice
507
508The `Choice` class allows handling ASN.1 Choice structures. The `_alternatives`
509property must be set to a `list` containing 2-3 element `tuple`s. The first
510element in the tuple is the alternative name. The second element is the type
511class for the alternative. The, optional, third element is a `dict` of
512parameters to pass to the type class constructor. This is used primarily for
513implicit and explicit tagging.
514
515```python
516from asn1crypto.core import Choice, Integer, OctetString, IA5String
517
518class MyChoice(Choice):
519    _alternatives = [
520        ('option_one', Integer),
521        ('option_two', OctetString),
522        ('option_three', IA5String),
523    ]
524```
525
526`Choice` objects has two extra properties, `.name` and `.chosen`. The `.name`
527property contains the name of the chosen alternative. The `.chosen` property
528contains the instance of the chosen type class.
529
530```python
531parsed = MyChoice.load(der_bytes)
532print(parsed.name)
533print(type(parsed.chosen))
534```
535
536The `.native` property and `.dump()` method work as with the universal type
537classes. Under the hood they just proxy the calls to the `.chosen` object.
538
539## Any
540
541The `Any` class implements the ASN.1 Any type, which allows any data type. By
542default objects of this class do not perform any parsing. However, the
543`.parse()` instance method allows parsing the contents of the `Any` object,
544either into a universal type, or to a specification pass in via the `spec`
545parameter.
546
547This type is not used as a top-level structure, but instead allows `Sequence`
548and `Set` objects to accept varying contents, usually based on some sort of
549`ObjectIdentifier`.
550
551```python
552from asn1crypto.core import Sequence, ObjectIdentifier, Any, Integer, OctetString
553
554class MySequence(Sequence):
555    _fields = [
556        ('type', ObjectIdentifier),
557        ('value', Any),
558    ]
559```
560
561## Specification via OID
562
563Throughout the usage of ASN.1 in cryptography, a pattern is present where an
564`ObjectIdenfitier` is used to determine what specification should be used to
565interpret another field in a `Sequence`. Usually the other field is an instance
566of `Any`, however occasionally it is an `OctetString` or `OctetBitString`.
567
568*asn1crypto* provides the `_oid_pair` and `_oid_specs` properties of the
569`Sequence` class to allow handling these situations.
570
571The `_oid_pair` is a tuple with two unicode string elements. The first is the
572name of the field that is an `ObjectIdentifier` and the second if the name of
573the field that has a variable specification based on the first field. *In
574situations where the value field should be an `OctetString` or `OctetBitString`,
575`ParsableOctetString` and `ParsableOctetBitString` will need to be used instead
576to allow for the sub-parsing of the contents.*
577
578The `_oid_specs` property is a `dict` object with `ObjectIdentifier` values as
579the keys (either dotted or mapped notation) and a type class as the value. When
580the first field in `_oid_pair` has a value equal to one of the keys in
581`_oid_specs`, then the corresponding type class will be used as the
582specification for the second field of `_oid_pair`.
583
584```python
585from asn1crypto.core import Sequence, ObjectIdentifier, Any, OctetString, Integer
586
587class MyId(ObjectIdentifier):
588    _map = {
589        '1.2.3.4': 'initialization_vector',
590        '1.2.3.5': 'iterations',
591    }
592
593class MySequence(Sequence):
594    _fields = [
595        ('type', MyId),
596        ('value', Any),
597    ]
598
599    _oid_pair = ('type', 'value')
600    _oid_specs = {
601        'initialization_vector': OctetString,
602        'iterations': Integer,
603    }
604```
605
606## Explicit and Implicit Tagging
607
608When working with `Sequence`, `Set` and `Choice` it is often necessary to
609disambiguate between fields because of a number of factors:
610
611 - In `Sequence` the presence of an optional field must be determined by tag number
612 - In `Set`, each field must have a different tag number since they can be in any order
613 - In `Choice`, each alternative must have a different tag number to determine which is present
614
615The universal types all have unique tag numbers. However, if a `Sequence`, `Set`
616or `Choice` has more than one field with the same universal type, tagging allows
617a way to keep the semantics of the original type, but with a different tag
618number.
619
620Implicit tagging simply changes the tag number of a type to a different value.
621However, Explicit tagging wraps the existing type in another tag with the
622specified tag number.
623
624In general, most situations allow for implicit tagging, with the notable
625exception than a field that is a `Choice` type must always be explicitly tagged.
626Otherwise, using implicit tagging would modify the tag of the chosen
627alternative, breaking the mechanism by which `Choice` works.
628
629Here is an example of implicit and explicit tagging where explicit tagging on
630the `Sequence` allows a `Choice` type field to be optional, and where implicit
631tagging in the `Choice` structure allows disambiguating between two string of
632the same type.
633
634```python
635from asn1crypto.core import Sequence, Choice, IA5String, UTCTime, ObjectIdentifier
636
637class Person(Choice):
638    _alternatives = [
639        ('name', IA5String),
640        ('email', IA5String, {'implicit': 0}),
641    ]
642
643class Record(Sequence):
644    _fields = [
645        ('id', ObjectIdentifier),
646        ('created', UTCTime),
647        ('creator', Person, {'explicit': 0, 'optional': True}),
648    ]
649```
650
651As is shown above, the keys `implicit` and `explicit` are used for tagging,
652and are passed to a type class constructor via the optional third element of
653a field or alternative tuple. Both parameters may be an integer tag number, or
654a 2-element tuple of string class name and integer tag.
655
656If a tagging value needs its tagging changed, the `.untag()` method can be used
657to create a copy of the object without explicit/implicit tagging. The `.retag()`
658method can be used to change the tagging. This method accepts one parameter, a
659dict with either or both of the keys `implicit` and `explicit`.
660
661```python
662person = Person(name='email', value='will@wbond.net')
663
664# Will display True
665print(person.implicit)
666
667# Will display False
668print(person.untag().implicit)
669
670# Will display 0
671print(person.tag)
672
673# Will display 1
674print(person.retag({'implicit': 1}).tag)
675```
676