1# Universal Types with BER/DER Decoder and DER Encoder 2 3The *asn1crypto* library is a combination of universal type classes that 4implement BER/DER decoding and DER encoding, a PEM encoder and decoder, and a 5number of pre-built cryptographic type classes. This document covers the 6universal type classes. 7 8For a general overview of ASN.1 as used in cryptography, please see 9[A Layman's Guide to a Subset of ASN.1, BER, and DER](http://luca.ntop.org/Teaching/Appunti/asn1.html). 10 11This page contains the following sections: 12 13 - [Universal Types](#universal-types) 14 - [Basic Usage](#basic-usage) 15 - [Sequence](#sequence) 16 - [Set](#set) 17 - [SequenceOf](#sequenceof) 18 - [SetOf](#setof) 19 - [Integer](#integer) 20 - [Enumerated](#enumerated) 21 - [ObjectIdentifier](#objectidentifier) 22 - [BitString](#bitstring) 23 - [Strings](#strings) 24 - [UTCTime](#utctime) 25 - [GeneralizedTime](#generalizedtime) 26 - [Choice](#choice) 27 - [Any](#any) 28 - [Specification via OID](#specification-via-oid) 29 - [Explicit and Implicit Tagging](#explicit-and-implicit-tagging) 30 31## Universal Types 32 33For general purpose ASN.1 parsing, the `asn1crypto.core` module is used. It 34contains the following classes, that parse, represent and serialize all of the 35ASN.1 universal types: 36 37| Class | Native Type | Implementation Notes | 38| ------------------ | -------------------------------------- | ------------------------------------ | 39| `Boolean` | `bool` | | 40| `Integer` | `int` | may be `long` on Python 2 | 41| `BitString` | `tuple` of `int` or `set` of `unicode` | `set` used if `_map` present | 42| `OctetString` | `bytes` (`str`) | | 43| `Null` | `None` | | 44| `ObjectIdentifier` | `str` (`unicode`) | string is dotted integer format | 45| `ObjectDescriptor` | | no native conversion | 46| `InstanceOf` | | no native conversion | 47| `Real` | | no native conversion | 48| `Enumerated` | `str` (`unicode`) | `_map` must be set | 49| `UTF8String` | `str` (`unicode`) | | 50| `RelativeOid` | `str` (`unicode`) | string is dotted integer format | 51| `Sequence` | `OrderedDict` | | 52| `SequenceOf` | `list` | | 53| `Set` | `OrderedDict` | | 54| `SetOf` | `list` | | 55| `EmbeddedPdv` | `OrderedDict` | no named field parsing | 56| `NumericString` | `str` (`unicode`) | no charset limitations | 57| `PrintableString` | `str` (`unicode`) | no charset limitations | 58| `TeletexString` | `str` (`unicode`) | | 59| `VideotexString` | `bytes` (`str`) | no unicode conversion | 60| `IA5String` | `str` (`unicode`) | | 61| `UTCTime` | `datetime.datetime` | | 62| `GeneralizedTime` | `datetime.datetime` | treated as UTC when no timezone | 63| `GraphicString` | `str` (`unicode`) | unicode conversion as latin1 | 64| `VisibleString` | `str` (`unicode`) | no charset limitations | 65| `GeneralString` | `str` (`unicode`) | unicode conversion as latin1 | 66| `UniversalString` | `str` (`unicode`) | | 67| `CharacterString` | `str` (`unicode`) | unicode conversion as latin1 | 68| `BMPString` | `str` (`unicode`) | | 69 70For *Native Type*, the Python 3 type is listed first, with the Python 2 type 71in parentheses. 72 73As mentioned next to some of the types, value parsing may not be implemented 74for types not currently used in cryptography (such as `ObjectDescriptor`, 75`InstanceOf` and `Real`). Additionally some of the string classes don't 76enforce character set limitations, and for some string types that accept all 77different encodings, the default encoding is set to latin1. 78 79In addition, there are a few overridden types where various specifications use 80a `BitString` or `OctetString` type to represent a different type. These 81include: 82 83| Class | Native Type | Implementation Notes | 84| -------------------- | ------------------- | ------------------------------- | 85| `OctetBitString` | `bytes` (`str`) | | 86| `IntegerBitString` | `int` | may be `long` on Python 2 | 87| `IntegerOctetString` | `int` | may be `long` on Python 2 | 88 89For situations where the DER encoded bytes from one type is embedded in another, 90the `ParsableOctetString` and `ParsableOctetBitString` classes exist. These 91function the same as `OctetString` and `OctetBitString`, however they also 92have an attribute `.parsed` and a method `.parse()` that allows for 93parsing the content as ASN.1 structures. 94 95All of these overrides can be used with the `cast()` method to convert between 96them. The only requirement is that the class being casted to has the same tag 97as the original class. No re-encoding is done, rather the contents are simply 98re-interpreted. 99 100```python 101from asn1crypto.core import BitString, OctetBitString, IntegerBitString 102 103bit = BitString({ 104 0, 0, 0, 0, 0, 0, 0, 1, 105 0, 0, 0, 0, 0, 0, 1, 0, 106}) 107 108# Will print (0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0) 109print(bit.native) 110 111octet = bit.cast(OctetBitString) 112 113# Will print b'\x01\x02' 114print(octet.native) 115 116i = bit.cast(IntegerBitString) 117 118# Will print 258 119print(i.native) 120``` 121 122## Basic Usage 123 124All of the universal types implement four methods, a class method `.load()` and 125the instance methods `.dump()`, `.copy()` and `.debug()`. 126 127`.load()` accepts a byte string of DER or BER encoded data and returns an 128object of the class it was called on. `.dump()` returns the serialization of 129an object into DER encoding. 130 131```python 132from asn1crypto.core import Sequence 133 134parsed = Sequence.load(der_byte_string) 135serialized = parsed.dump() 136``` 137 138By default, *asn1crypto* tries to be efficient and caches serialized data for 139better performance. If the input data is possibly BER encoded, but the output 140must be DER encoded, the `force` parameter may be used with `.dump()`. 141 142```python 143from asn1crypto.core import Sequence 144 145parsed = Sequence.load(der_byte_string) 146der_serialized = parsed.dump(force=True) 147``` 148 149The `.copy()` method creates a deep copy of an object, allowing child fields to 150be modified without affecting the original. 151 152```python 153from asn1crypto.core import Sequence 154 155seq1 = Sequence.load(der_byte_string) 156seq2 = seq1.copy() 157seq2[0] = seq1[0] + 1 158if seq1[0] != seq2[0]: 159 print('Copies have distinct contents') 160``` 161 162The `.debug()` method is available to help in situations where interaction with 163another ASN.1 serializer or parsing is not functioning as expected. Calling 164this method will print a tree structure with information about the header bytes, 165class, method, tag, special tagging, content bytes, native Python value, child 166fields and any sub-parsed values. 167 168```python 169from asn1crypto.core import Sequence 170 171parsed = Sequence.load(der_byte_string) 172parsed.debug() 173``` 174 175In addition to the available methods, every instance has a `.native` property 176that converts the data into a native Python data type. 177 178```python 179import pprint 180from asn1crypto.core import Sequence 181 182parsed = Sequence.load(der_byte_string) 183pprint(parsed.native) 184``` 185 186## Sequence 187 188One of the core structures when dealing with ASN.1 is the Sequence type. The 189`Sequence` class can handle field with universal data types, however in most 190situations the `_fields` property will need to be set with the expected 191definition of each field in the Sequence. 192 193### Configuration 194 195The `_fields` property must be set to a `list` of 2-3 element `tuple`s. The 196first element in the tuple must be a unicode string of the field name. The 197second must be a type class - either a universal type, or a custom type. The 198third, and optional, element is a `dict` with parameters to pass to the type 199class for things like default values, marking the field as optional, or 200implicit/explicit tagging. 201 202```python 203from asn1crypto.core import Sequence, Integer, OctetString, IA5String 204 205class MySequence(Sequence): 206 _fields = [ 207 ('field_one', Integer), 208 ('field_two', OctetString), 209 ('field_three', IA5String, {'optional': True}), 210 ] 211``` 212 213Implicit and explicit tagging will be covered in more detail later, however 214the following are options that can be set for each field type class: 215 216 - `{'default: 1}` sets the field's default value to `1`, allowing it to be 217 omitted from the serialized form 218 - `{'optional': True}` set the field to be optional, allowing it to be 219 omitted 220 221### Usage 222 223To access values of the sequence, use dict-like access via `[]` and use the 224name of the field: 225 226```python 227seq = MySequence.load(der_byte_string) 228print(seq['field_two'].native) 229``` 230 231The values of fields can be set by assigning via `[]`. If the value assigned is 232of the correct type class, it will be used as-is. If the value is not of the 233correct type class, a new instance of that type class will be created and the 234value will be passed to the constructor. 235 236```python 237seq = MySequence.load(der_byte_string) 238# These statements will result in the same state 239seq['field_one'] = Integer(5) 240seq['field_one'] = 5 241``` 242 243When fields are complex types such as `Sequence` or `SequenceOf`, there is no 244way to construct the value out of a native Python data type. 245 246### Optional Fields 247 248When a field is configured via the `optional` parameter, not present in the 249`Sequence`, but accessed, the `VOID` object will be returned. This is an object 250that is serialized to an empty byte string and returns `None` when `.native` is 251accessed. 252 253## Set 254 255The `Set` class is configured in the same was as `Sequence`, however it allows 256serialized fields to be in any order, per the ASN.1 standard. 257 258```python 259from asn1crypto.core import Set, Integer, OctetString, IA5String 260 261class MySet(Set): 262 _fields = [ 263 ('field_one', Integer), 264 ('field_two', OctetString), 265 ('field_three', IA5String, {'optional': True}), 266 ] 267``` 268 269## SequenceOf 270 271The `SequenceOf` class is used to allow for zero or more instances of a type. 272The class uses the `_child_spec` property to define the instance class type. 273 274```python 275from asn1crypto.core import SequenceOf, Integer 276 277class Integers(SequenceOf): 278 _child_spec = Integer 279``` 280 281Values in the `SequenceOf` can be accessed via `[]` with an integer key. The 282length of the `SequenceOf` is determined via `len()`. 283 284```python 285values = Integers.load(der_byte_string) 286for i in range(0, len(values)): 287 print(values[i].native) 288``` 289 290## SetOf 291 292The `SetOf` class is an exact duplicate of `SequenceOf`. According to the ASN.1 293standard, the difference is that a `SequenceOf` is explicitly ordered, however 294`SetOf` may be in any order. This is an equivalent comparison of a Python `list` 295and `set`. 296 297```python 298from asn1crypto.core import SetOf, Integer 299 300class Integers(SetOf): 301 _child_spec = Integer 302``` 303 304## Integer 305 306The `Integer` class allows values to be *named*. An `Integer` with named values 307may contain any integer, however special values with named will be represented 308as those names when `.native` is called. 309 310Named values are configured via the `_map` property, which must be a `dict` 311with the keys being integers and the values being unicode strings. 312 313```python 314from asn1crypto.core import Integer 315 316class Version(Integer): 317 _map = { 318 1: 'v1', 319 2: 'v2', 320 } 321 322# Will print: "v1" 323print(Version(1).native) 324 325# Will print: 4 326print(Version(4).native) 327``` 328 329## Enumerated 330 331The `Enumerated` class is almost identical to `Integer`, however only values in 332the `_map` property are valid. 333 334```python 335from asn1crypto.core import Enumerated 336 337class Version(Enumerated): 338 _map = { 339 1: 'v1', 340 2: 'v2', 341 } 342 343# Will print: "v1" 344print(Version(1).native) 345 346# Will raise a ValueError exception 347print(Version(4).native) 348``` 349 350## ObjectIdentifier 351 352The `ObjectIdentifier` class represents values of the ASN.1 type of the same 353name. `ObjectIdentifier` instances are converted to a unicode string in a 354dotted-integer format when `.native` is accessed. 355 356While this standard conversion is a reasonable baseline, in most situations 357it will be more maintainable to map the OID strings to a unicode string 358containing a description of what the OID repesents. 359 360The mapping of OID strings to name strings is configured via the `_map` 361property, which is a `dict` object with keys being unicode OID string and the 362values being a unicode string. 363 364The `.dotted` attribute will always return a unicode string of the dotted 365integer form of the OID. 366 367The class methods `.map()` and `.unmap()` will convert a dotted integer unicode 368string to the user-friendly name, and vice-versa. 369 370```python 371from asn1crypto.core import ObjectIdentifier 372 373class MyType(ObjectIdentifier): 374 _map = { 375 '1.8.2.1.23': 'value_name', 376 '1.8.2.1.24': 'other_value', 377 } 378 379# Will print: "value_name" 380print(MyType('1.8.2.1.23').native) 381 382# Will print: "1.8.2.1.23" 383print(MyType('1.8.2.1.23').dotted) 384 385# Will print: "1.8.2.1.25" 386print(MyType('1.8.2.1.25').native) 387 388# Will print "value_name" 389print(MyType.map('1.8.2.1.23')) 390 391# Will print "1.8.2.1.23" 392print(MyType.unmap('value_name')) 393``` 394 395## BitString 396 397When no `_map` is set for a `BitString` class, the native representation is a 398`tuple` of `int`s (being either `1` or `0`). 399 400```python 401from asn1crypto.core import BitString 402 403b1 = BitString((1, 0, 1)) 404``` 405 406Additionally, it is possible to set the `_map` property to a dict where the 407keys are bit indexes and the values are unicode string names. This allows 408checking the value of a given bit by item access, and the native representation 409becomes a `set` of unicode strings. 410 411```python 412from asn1crypto.core import BitString 413 414class MyFlags(BitString): 415 _map = { 416 0: 'edit', 417 1: 'delete', 418 2: 'manage_users', 419 } 420 421permissions = MyFlags({'edit', 'delete'}) 422 423# This will be printed 424if permissions['edit'] and permissions['delete']: 425 print('Can edit and delete') 426 427# This will not 428if 'manage_users' in permissions.native: 429 print('Is admin') 430``` 431 432## Strings 433 434ASN.1 contains quite a number of string types: 435 436| Type | Standard Encoding | Implementation Encoding | Notes | 437| ----------------- | --------------------------------- | ----------------------- | ------------------------------------------------------------------------- | 438| `UTF8String` | UTF-8 | UTF-8 | | 439| `NumericString` | ASCII `[0-9 ]` | ISO 8859-1 | The implementation is a superset of supported characters | 440| `PrintableString` | ASCII `[a-zA-Z0-9 '()+,\\-./:=?]` | ISO 8859-1 | The implementation is a superset of supported characters | 441| `TeletexString` | ITU T.61 | Custom | The implementation is based off of https://en.wikipedia.org/wiki/ITU_T.61 | 442| `VideotexString` | *?* | *None* | This has no set encoding, and it not used in cryptography | 443| `IA5String` | ITU T.50 (very similar to ASCII) | ISO 8859-1 | The implementation is a superset of supported characters | 444| `GraphicString` | * | ISO 8859-1 | This has not set encoding, but seems to often contain ISO 8859-1 | 445| `VisibleString` | ASCII (printable) | ISO 8859-1 | The implementation is a superset of supported characters | 446| `GeneralString` | * | ISO 8859-1 | This has not set encoding, but seems to often contain ISO 8859-1 | 447| `UniversalString` | UTF-32 | UTF-32 | | 448| `CharacterString` | * | ISO 8859-1 | This has not set encoding, but seems to often contain ISO 8859-1 | 449| `BMPString` | UTF-16 | UTF-16 | | 450 451As noted in the table above, many of the implementations are supersets of the 452supported characters. This simplifies parsing, but puts the onus of using valid 453characters on the developer. However, in general `UTF8String`, `BMPString` or 454`UniversalString` should be preferred when a choice is given. 455 456All string types other than `VideotexString` are created from unicode strings. 457 458```python 459from asn1crypto.core import IA5String 460 461print(IA5String('Testing!').native) 462``` 463 464## UTCTime 465 466The class `UTCTime` accepts a unicode string in one of the formats: 467 468 - `%y%m%d%H%MZ` 469 - `%y%m%d%H%M%SZ` 470 - `%y%m%d%H%M%z` 471 - `%y%m%d%H%M%S%z` 472 473or a `datetime.datetime` instance. See the 474[Python datetime strptime() reference](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior) 475for details of the formats. 476 477When `.native` is accessed, it returns a `datetime.datetime` object with a 478`tzinfo` of `asn1crypto.util.timezone.utc`. 479 480## GeneralizedTime 481 482The class `GeneralizedTime` accepts a unicode string in one of the formats: 483 484 - `%Y%m%d%H` 485 - `%Y%m%d%H%M` 486 - `%Y%m%d%H%M%S` 487 - `%Y%m%d%H%M%S.%f` 488 - `%Y%m%d%HZ` 489 - `%Y%m%d%H%MZ` 490 - `%Y%m%d%H%M%SZ` 491 - `%Y%m%d%H%M%S.%fZ` 492 - `%Y%m%d%H%z` 493 - `%Y%m%d%H%M%z` 494 - `%Y%m%d%H%M%S%z` 495 - `%Y%m%d%H%M%S.%f%z` 496 497or a `datetime.datetime` instance. See the 498[Python datetime strptime() reference](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior) 499for details of the formats. 500 501When `.native` is accessed, it returns a `datetime.datetime` object with a 502`tzinfo` of `asn1crypto.util.timezone.utc`. For formats where the time has a 503timezone offset is specified (`[+-]\d{4}`), the time is converted to UTC. For 504times without a timezone, the time is assumed to be in UTC. 505 506## Choice 507 508The `Choice` class allows handling ASN.1 Choice structures. The `_alternatives` 509property must be set to a `list` containing 2-3 element `tuple`s. The first 510element in the tuple is the alternative name. The second element is the type 511class for the alternative. The, optional, third element is a `dict` of 512parameters to pass to the type class constructor. This is used primarily for 513implicit and explicit tagging. 514 515```python 516from asn1crypto.core import Choice, Integer, OctetString, IA5String 517 518class MyChoice(Choice): 519 _alternatives = [ 520 ('option_one', Integer), 521 ('option_two', OctetString), 522 ('option_three', IA5String), 523 ] 524``` 525 526`Choice` objects has two extra properties, `.name` and `.chosen`. The `.name` 527property contains the name of the chosen alternative. The `.chosen` property 528contains the instance of the chosen type class. 529 530```python 531parsed = MyChoice.load(der_bytes) 532print(parsed.name) 533print(type(parsed.chosen)) 534``` 535 536The `.native` property and `.dump()` method work as with the universal type 537classes. Under the hood they just proxy the calls to the `.chosen` object. 538 539## Any 540 541The `Any` class implements the ASN.1 Any type, which allows any data type. By 542default objects of this class do not perform any parsing. However, the 543`.parse()` instance method allows parsing the contents of the `Any` object, 544either into a universal type, or to a specification pass in via the `spec` 545parameter. 546 547This type is not used as a top-level structure, but instead allows `Sequence` 548and `Set` objects to accept varying contents, usually based on some sort of 549`ObjectIdentifier`. 550 551```python 552from asn1crypto.core import Sequence, ObjectIdentifier, Any, Integer, OctetString 553 554class MySequence(Sequence): 555 _fields = [ 556 ('type', ObjectIdentifier), 557 ('value', Any), 558 ] 559``` 560 561## Specification via OID 562 563Throughout the usage of ASN.1 in cryptography, a pattern is present where an 564`ObjectIdenfitier` is used to determine what specification should be used to 565interpret another field in a `Sequence`. Usually the other field is an instance 566of `Any`, however occasionally it is an `OctetString` or `OctetBitString`. 567 568*asn1crypto* provides the `_oid_pair` and `_oid_specs` properties of the 569`Sequence` class to allow handling these situations. 570 571The `_oid_pair` is a tuple with two unicode string elements. The first is the 572name of the field that is an `ObjectIdentifier` and the second if the name of 573the field that has a variable specification based on the first field. *In 574situations where the value field should be an `OctetString` or `OctetBitString`, 575`ParsableOctetString` and `ParsableOctetBitString` will need to be used instead 576to allow for the sub-parsing of the contents.* 577 578The `_oid_specs` property is a `dict` object with `ObjectIdentifier` values as 579the keys (either dotted or mapped notation) and a type class as the value. When 580the first field in `_oid_pair` has a value equal to one of the keys in 581`_oid_specs`, then the corresponding type class will be used as the 582specification for the second field of `_oid_pair`. 583 584```python 585from asn1crypto.core import Sequence, ObjectIdentifier, Any, OctetString, Integer 586 587class MyId(ObjectIdentifier): 588 _map = { 589 '1.2.3.4': 'initialization_vector', 590 '1.2.3.5': 'iterations', 591 } 592 593class MySequence(Sequence): 594 _fields = [ 595 ('type', MyId), 596 ('value', Any), 597 ] 598 599 _oid_pair = ('type', 'value') 600 _oid_specs = { 601 'initialization_vector': OctetString, 602 'iterations': Integer, 603 } 604``` 605 606## Explicit and Implicit Tagging 607 608When working with `Sequence`, `Set` and `Choice` it is often necessary to 609disambiguate between fields because of a number of factors: 610 611 - In `Sequence` the presence of an optional field must be determined by tag number 612 - In `Set`, each field must have a different tag number since they can be in any order 613 - In `Choice`, each alternative must have a different tag number to determine which is present 614 615The universal types all have unique tag numbers. However, if a `Sequence`, `Set` 616or `Choice` has more than one field with the same universal type, tagging allows 617a way to keep the semantics of the original type, but with a different tag 618number. 619 620Implicit tagging simply changes the tag number of a type to a different value. 621However, Explicit tagging wraps the existing type in another tag with the 622specified tag number. 623 624In general, most situations allow for implicit tagging, with the notable 625exception than a field that is a `Choice` type must always be explicitly tagged. 626Otherwise, using implicit tagging would modify the tag of the chosen 627alternative, breaking the mechanism by which `Choice` works. 628 629Here is an example of implicit and explicit tagging where explicit tagging on 630the `Sequence` allows a `Choice` type field to be optional, and where implicit 631tagging in the `Choice` structure allows disambiguating between two string of 632the same type. 633 634```python 635from asn1crypto.core import Sequence, Choice, IA5String, UTCTime, ObjectIdentifier 636 637class Person(Choice): 638 _alternatives = [ 639 ('name', IA5String), 640 ('email', IA5String, {'implicit': 0}), 641 ] 642 643class Record(Sequence): 644 _fields = [ 645 ('id', ObjectIdentifier), 646 ('created', UTCTime), 647 ('creator', Person, {'explicit': 0, 'optional': True}), 648 ] 649``` 650 651As is shown above, the keys `implicit` and `explicit` are used for tagging, 652and are passed to a type class constructor via the optional third element of 653a field or alternative tuple. Both parameters may be an integer tag number, or 654a 2-element tuple of string class name and integer tag. 655 656If a tagging value needs its tagging changed, the `.untag()` method can be used 657to create a copy of the object without explicit/implicit tagging. The `.retag()` 658method can be used to change the tagging. This method accepts one parameter, a 659dict with either or both of the keys `implicit` and `explicit`. 660 661```python 662person = Person(name='email', value='will@wbond.net') 663 664# Will display True 665print(person.implicit) 666 667# Will display False 668print(person.untag().implicit) 669 670# Will display 0 671print(person.tag) 672 673# Will display 1 674print(person.retag({'implicit': 1}).tag) 675``` 676