1# Assembly File Format Specification 2 3## Introduction 4 5This document describes assembly file format for Panda platform. Assembly files are human-readable and human-writeable plain text files, they are supposed to be fed to the Panda assembler, a dedicated tool that translates them to binary files that can be executed by the Panda virtual machine. Please note that this document does not describe bytecode instructions supported by the Panda virtual machine, refer to the [Bytecode ISA Specification](isa/isa.yaml) instead. This document does not specify the binary format of executables supported by the Panda virtual machine, please refer to the [Binary Format Specification](file_format.md) instead. 6 7### Requirements 8 9Panda as a platform is multilingual and flexible by design: 10 11* Panda assembly should not "favor" by any means any existing programming language that is (or intended to be) supported by the platform. Instead, Panda assembly can be thought as a separate close-to-byte-code language with a minimal feature set. All language-specific "traits" that should be supported to generate valid executable binaries with respect to the higher-level semantics should be implemented via metadata annotations (see below). 12* Panda assembly should not focus on a certain programming paradigm. E.g. we should not enforce concepts of "class", "object", "method" at the assembly language level because we might support a language which does not implement classic OOP at all. 13* When Panda assembler generates a binary executable file, it is not expected to check for language semantics. This responsibility is delegate to "source to binaries" compilers and runtime. 14* Panda assembler should not impose any limitation of quantity and internal structure of source code files writtebn in Panda assembly language. Assembler should process as many input source code files as the developer specifies. 15* Panda assembler should not follow any implicit conventions about the name of the entry point. 16 17## Comments 18 19Comments are marked with the `#` character. All characters that follow it (including the `#` character itself) are ignored. 20 21## Literals 22 23### Numeric Literals 24 25Following numeric literals are supported: 26 27* Signed/Unsigned decimal/hexadecimal/binary integers not larger than 64 bits. Hexadecimal literals are prefixed with `0x`. Binary literals are prefixed with `0b`. 28* Floating-point decimal/hexadecimal literals that can be represented with IEEE 754. Hexadecimal floating-point literals are prefixed with `0x`. They are first converted to a bit representation that corresponds to a hex, and then converted to a double using a bit_cast in accordance with the IEEE 754 standard. 29 30### String Literals 31 32String literal is a sequence of any characters enclosed in `"` characters. Non-printable characters and characters out of Latin-1 character set must be encoded with `mutf8` encoding. For example: `"ζεθδΎ"` string literal should be encoded as `"\xe6\x96\x87\xe5\xad\x97\xe8\x8c\x83\xe4\xbe\x8b"` 33 34The following escape sequences can be used in string literals: 35 36 - `\"` double quote, `\x22` 37 - `\a` alert, `\x07` 38 - `\b` backspace, `\x08` 39 - `\f` form feed, `\x0c` 40 - `\n` newline, `\x0a` 41 - `\r` carriage return, `\x0d` 42 - `\t` horizontal tab, `\x09` 43 44## Identifiers 45 46### Simple Identifiers 47 48A simple identifier is a sequence of ASCII characters. Allowed characters in the sequence are: 49 50* Letters from `a` to `z`. 51* Letters from `A` to `Z`. 52* Digits from `0` to `9`. 53* Following characters: `_`, `$`. 54 55Following constraints apply: 56 57* A valid identifier starts with any letter or with `_`. 58* All identifiers are case sensitive. 59 60Simple identifiers can be used for naming metadata annotations, primitive data types, aggregate data types, members of aggregate data types, functions and labels. 61 62### Prefixed Identifiers 63 64A prefixed identifier is a sequence of simple identifiers delimited by the `.` char without whitespaces. 65 66Prefixed identifiers can be used for naming metadata annotations, aggregate data types and functions. 67 68## Metadata Annotations 69 70As stated above, current version of Panda assembly does not favor any language as the platform is designed to support many of them. To deal with language-specific metadata, annotations are used, defined as follows: 71 72``` 73<key1=value1, key2=value2, ...> 74``` 75 76Values are optional. In such case, only `key` is needed. 77 78Following constraints apply: 79 80* Each key is a valid indetifier. 81* All keys are unique within a single annotation list. 82* If present, a value is a valid identifier, with following exception: Values can start with a digit. 83 84In all cases where annotations can be optionally used, `optional_annotation` marker is used in this document. 85 86There are keys that indicate that a function must not have an implementation. The absence of these keys suggests otherwise. We shall call metadata containing such keys --- `lonely metadata`. 87 88### Function metadata annotations 89 90#### Standard metadata 91 92A definition of a function is assumed. 93 94| Key | Description | 95| ------ | ------ | 96 97#### Lonely metadata 98 99A declaration of a function is assumed. 100 101| Key | Description | 102| ------ | ------ | 103| `external` | Marks an externally defined function. Does not require value. | 104| `native` | Marks an externally defined function. Does not require value. | 105| `noimpl` | Marks a function without implementation. Does not require value. | 106| `static` | Marks a function as static. Does not require value. | 107| `ctor` | Marks a function as object constructor. It will be renamed in binary file according to particular language rules (`.ctor` for Panda Assembly) | 108| `cctor` | Marks a function as static constructor. It will be renamed in binary file according to particular language rules (`.cctor` for Panda Assembly) | 109 110### Record metadata annotations 111 112#### Standard metadata 113 114A definition of a record is assumed. 115 116| Key | Description | 117| ------ | ------ | 118 119#### Lonely metadata 120 121A declaration of a record is assumed. 122 123| Key | Description | 124| ------ | ------ | 125| `external` | Marks an externally defined record. Does not require value. | 126 127### Field metadata annotations 128 129| Key | Description | 130| ------ | ------ | 131| `external` | Marks an externally defined field. Does not require value. | 132| `static` | Marks an statically defined field. Does not require value. | 133 134### Language specific annotations 135 136Currently Panda Assembly supports annotations for the following languages: 137 138- Java 139- PandaAssembly 140 141To specify language `.language` directive is used. It must be declared before any other declarations: 142``` 143.language Java 144 145.function void f() {} 146``` 147By default PandaAssembly language is assumed. 148 149#### Java annotations 150 151Currently Panda Assembly supports following Java annotations 152 153| Key | Description | 154| --- | --- | 155| `java.access` | Used to specify access level of record, field or function. Possible values: `private`, `protected`, `public`. | 156| `java.extends` | Used to specify inheritance between records. Value is the name of the base record. | 157| `java.implements` | Used to specify interface inheritance between records. Value is the name of the interface record. Allowed multiple definition. | 158| `java.interface` | Used to specify that the record represents Java interface. | 159| `java.enum` | Used to specify that the record and its fields represent Java enum. | 160| `java.annotation` | Used to specify that the record represents Java annotation. | 161| `java.annotation.type` | Used to specify type of annotation. Possible values: `class`, `runtime`. | 162| `java.annotation.class` | Used to specify annotation class. Allowed multiple definitions. Value is the name of the record that represent Java annotation | 163| `java.annotation.id` | Used to specify annotation id. Annotations with id are used as values of other annotation elements. `java.annotation.class` must be defined first. Allowed multiple definitions (but only one definition for each annotation). | 164| `java.annotation.element.name` | Used to specify name of the annotation element. `java.annotation.class` must be defined first. Allowed multiple definitions (but only one definition for each annotation element). | 165| `java.annotation.element.type` | Used to specify type of the annotation element. `java.annotation.element.name` must be defined first. Allowed multiple definitions (but only one definition for each annotation element). Possible values: `u1`, `i8`, `u8`, `i16`, `u16`, `i32`, `u32`, `i64`, `u64`, `f32`, `f64`, `string`, `class`, `enum`, `annotation`, `array`. | 166| `java.annotation.element.array.component.type` | Used to specify component type of the array annotation element. `java.annotation.element.type` must be defined first and have `array` value. Allowed multiple definitions (but only one definition for each annotation element). Possible values: `u1`, `i8`, `u8`, `i16`, `u16`, `i32`, `u32`, `i64`, `u64`, `f32`, `f64`, `string`, `class`, `enum`, `annotation`. | 167| `java.annotation.element.value` | Used to specify value of the annotation element. Allowed multiple definitions (also multiple definitions for one annotation element if it has `array` type). | 168 169 170Example: 171 172``` 173.language Java 174 175.record A <java.access=public> {} 176.record B <java.access=public, java.extends=A> {} 177 178.record Iface1 <java.interface> 179.record Iface2 <java.interface> 180 181.record C <java.implements=Iface1, java.implements=Iface2> {} 182 183.record A1 <java.annotation, java.annotation.type=runtime> {} 184.record A2 <java.annotation, java.annotation.type=runtime> {} 185 186# Annotation elements are represented using abstract methods 187 188.function i32[] A1.NameArr() <noimpl> 189.function A1 A2.Name() <noimpl> 190 191# @A2(Name=@A1(NameArr={1,2})) 192.record R <java.annotation.class=A1, java.annotation.id=id1, java.annotation.element.name=NameArr, java.annotation.element.type=array, java.annotation.element.array.component.type=i32, java.annotation.element.value=1, java.annotation.element.value=2, java.annotation.class=A2, java.annotation.element.name=Name, java.annotation.element.type=annotation, java.annotation.element.value=id1> 193``` 194 195## Data Types 196 197Semantics of operations on all data types defined below follows the semantics defined in [Bytecode ISA Specification](isa/isa.yaml). 198 199### Primitive Data Types 200 201Following primitive types are supported: 202 203| Panda Assembler Type | Description | 204| ------ | ------ | 205| `void` | Type for the result of a function that returns normally, but does not provide a result value to its caller | 206| `u1` | Unsinged 1-bit integer number | 207| `u8` | Unsigned 8-bit integer number | 208| `i8` | Signed 8-bit integer number | 209| `u16` | Unsigned 16-bit integer number | 210| `i16` | Signed 16-bit integer number | 211| `u32` | Unsigned 32-bit integer number | 212| `i32` | Signed 32-bit integer number | 213| `u64` | Unsigned 64-bit integer number | 214| `i64` | Signed 64-bit integer number | 215| `f32` | 32-bit single precision floating point number, compliant with IEEE 754 standard | 216| `f64` | 64-bit double precision floating point number, compliant with IEEE 754 standard | 217 218All identifiers that are used for naming primitive data types cannot be used for any other purpose. 219 220### Reference Data Types 221 222Following reference types are supported: 223 224| Panda Assembler Type | Description | 225| ------ | ------ | 226| `cref` | code reference, represents references to the bytecode executable by Panda virtual machine | 227| `dref` | data reference, represents references to aggregate data types (see below) | 228 229All identifiers that are used for naming reference data types cannot be used for any other purpose. 230 231### Aggregate Data Types 232 233Aggregate data types are defined as follows: 234 235``` 236.record RecordName optional_annotation { 237 type1 member1 optional_annotation1 238 type2 member2 optional_annotation2 239 # ... 240 typeN memberN optional_annotationN 241} 242``` 243 244Following constraints apply: 245 246* `RecordName`, `type1`, ... `typeN`, `member1`, ... `memberN` are valid identifiers. 247* `member1`, ... `memberN` are unique identifiers within a record. 248* `RecordName` is unique across all source code files. 249 250Whenever a record should incorporate another record, the name of the nested record must be specified. However, in this context this name implicitly denotes a `dref` type which implements a reference to the data represented by that record. Example: 251 252``` 253.record Foo { 254 i32 member1 255 f32 member2 256} 257 258.record Bar { 259 Foo foo 260 f64 member1 261 f64 member2 262} 263``` 264 265#### Informal Notice 266 267`.record`s are like `struct`s in C, but without support for "by instance" nesting. This is because the result of a field load should be valid for any member, hence a record member should fit the virtusal register. Constraints on register are defined in [Bytecode ISA Specification](isa/isa.yaml). 268 269### Builtin Aggregate Data Types 270 271Platform has following builtin aggregate types 272 273| Panda Assembler Type | Description | 274| ------ | ------ | 275| `panda.String` | UTF16 string | 276 277### Arrays 278 279Platform support arrays of primitive and aggregate data types. Array of type `T` has type name `T[]`. Example: 280``` 281.function void f() { 282 ... 283 newarr v1, v0, i32[] 284 ... 285 newarr v1, v0, panda.String[] 286 ... 287 newarr v1, v0, f32[][][] 288 ... 289} 290``` 291 292## Functions 293 294Functions are defined as follows: 295 296``` 297.function FunctionName(ArgumentType0 a0, ... ArgumentTypeN argN) optional_annotation 298{ 299 # code 300} 301``` 302 303Following constraints apply: 304 305* `FunctionName`, `ArgumentType0`, ... `ArgumentTypeN`, `a0`, ... `aN` are valid identifiers. 306* All `a0`, ... `aN` are unique within the argument list of the function. 307* `FunctionName` is unique across all source code files. 308 309### Function Arguments and Local Variables 310 311By convention, all arguments are named `a0`, ... `aN` and all local variables are named `v0`, ... `vM`. Panda assembler guarantees that all these entities are unambiguously mapped to the underlying virtual registers. 312 313### Function Body 314 315If a function has a body, it consists of optionally labeled sequence of bytecode instructions, one instruction defined per line. Instruction opcodes and formats follow [Bytecode ISA Specification](isa/isa.yaml). 316 317### Static and virtual functions 318 319By default all function are static except ones that are binded to record and accept reference to it as the first parameter: 320 321``` 322.record R {} 323 324.function void R.foo(R a0) {} # virtual function 325 326.function void R.foo(R a0) <static> {} # static function 327 328.function void R.foo(i32 a0) {} # static function 329``` 330 331#### Call instructions 332 333Assembler relaxes constraints for call instructions: 334 335- If number of arguments is less than specified in [Bytecode ISA Specification](isa/isa.yaml) it passes `v0` instead of unspecified ones. 336 337- For non range call instructions assembler chooses optimal encoding according to number of specified arguments. 338 339Example: 340 341Following instruction in assembly 342``` 343call.static f, v1 344``` 345will be emitted as 346``` 347call.short.static f, v1, v0 348``` 349 350### Program Entry Point 351 352Any function which accepts an array of strings as its single argument may serve as a program entry point. The name of the entry point must be specified as a part of the input to the assembler program. An example of a possible entry point is: 353 354``` 355.record _panda_array_string <external> 356 357.function foo(_panda_array_string a0) 358{ 359 # code 360} 361``` 362 363### Exception handlers 364 365Try, catch and finally blocks can be declared using `.catch` and `.catchall` directives: 366``` 367.catch <exception_record>, <try_begin_label>, <try_end_label>, <catch_begin_label> 368.catchall <try_begin_label>, <try_end_label>, <catch_begin_label> 369``` 370 371Example: 372``` 373.record Exception1 {} 374.record Exception2 {} 375 376.function void foo() 377{ 378 ... 379try_begin: 380 ... 381try_end: 382 ... 383catch_begin1: 384 ... 385catch_begin2: 386 ... 387catchall_begin1: 388 ... 389 390 .catch Exception1, try_begin, try_end, catch_begin1 391 .catch Exception2, try_begin, try_end, catch_begin2 392 .catchall try_begin, try_end, catchall_begin1 393} 394``` 395 396Also there are more safer directives, which allow to specify exact bounds 397of an exceptions handler for more precise verification of control-flow in 398byte-code verifier. 399 400``` 401.catch <exception_record>, <try_begin_label>, <try_end_label>, <catch_begin_label>, <catch_end_label> 402.catchall <try_begin_label>, <try_end_label>, <catch_begin_label>, <catch_end_label> 403``` 404 405They are almost identical to `.catch` and `.catchall` differ only by specifying end label of the 406exception handler. End label is the label that immediately follows last instruction of the 407exception handler. 408 409## Pseudo-BNF 410 411Instruction flow is omitted for simplicity: 412 413``` 414# Literals are represented in double-quotes as "literal value". 415# Free-form descriptions are represented as "<description here>" 416# Empty symbol is represented as E. 417 418defs := defs def | E 419def := rec_def | func_def 420 421# Identifiers: 422 423letter_lower := "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" | "j" | "k" | "l" | "m" | "n" | "o" | "p" | "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" | "y" | "z" 424letter_upper := "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" | "J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" | "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z" 425digit := "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" 426char_misc := "_" 427 428char_non_dig := letter_lower | letter_upper | char_misc 429char_simple := char_non_dig | digit 430 431id_simple := char_non_dig id_simple_tail 432id_simple_tail := id_simple_tail char_simple | E 433 434id_prefixed := id_simple | id_simple "." id_prefixed 435 436# Records and types: 437rec_def := ".record" rec_name rec_add 438rec_add := def_pair_rec_meta rec_body | def_lonely_rec_meta 439rec_name := id_prefixed 440rec_body := "{" fields "}" 441type_def := "u1" | "u8" | "i8" | "u16" | "i16" | "u32" | "i32" | "i64" | "f32" | "f64" | "any" | rec_name | type_def [] 442 443# Fields of records: 444fields := fields field_def | E 445field_def := field_type field_name def_field_meta 446field_type := type_def 447field_name := id_simple 448 449# Functions: 450func_def := ".function" func_sig func_add 451func_add := def_pair_func_meta func_body | def_lonely_func_meta 452func_sig := func_ret func_name func_args 453func_ret := type_def 454func_name := id_prefixed 455func_args := "(" arg_list ")" 456arg_list := <","-separated list of argument names and their respective types> 457func_body := "{" func_code "}" 458func_code := <newline-separated sequence of bytecode instructions and their operands> 459 460# Function metadata annotations: 461def_pair_func_meta := "<" func_meta_list ">" | E 462def_lonely_func_meta := "<" func_lonely_meta_list ">" 463func_meta_list := func_meta_list func_meta_item "," | E 464func_meta_item := func_kv_pair | func_id 465func_kv_pair := <an element of the function standard metadata list that assumes the assignment of a value, and the value that is assigned to it, separated by the sign "="> 466func_id := <an element of the function standard metadata list> 467func_lonely_meta_list := func_lonely_meta_list func_meta_item "," | E 468func_meta_item := func_kv_lonely_pair | func_lonely_id 469func_kv_lonely_pair := <an element of the function lonely metadata list that assumes the assignment of a value, and the value that is assigned to it, separated by the sign "="> 470func_lonely_id := <an element of the function lonely metadata list> 471 472# Record metadata annotations: 473def_pair_rec_meta := "<" rec_meta_list ">" | E 474def_lonely_rec_meta := "<" rec_lonely_meta_list ">" 475rec_meta_list := rec_meta_list rec_meta_item "," | E 476rec_meta_item := rec_kv_pair | rec_id 477rec_kv_pair := <an element of the record standard metadata list that assumes the assignment of a value, and the value that is assigned to it, separated by the sign "="> 478rec_id := <an element of the record standard metadata list> 479rec_lonely_meta_list := rec_lonely_meta_list rec_meta_item "," | E 480rec_meta_item := rec_kv_lonely_pair | rec_lonely_id 481rec_kv_lonely_pair := <an element of the record lonely metadata list that assumes the assignment of a value, and the value that is assigned to it, separated by the sign "="> 482rec_lonely_id := <an element of the record lonely metadata list> 483 484# Field metadata annotations: 485def_field_meta := "<" field_meta_list ">" | E 486field_meta_list := field_meta_list field_meta_item "," | E 487field_meta_item := field_kv_pair | field_id 488field_kv_pair := <an element of the field metadata list that assumes the assignment of a value, and the value that is assigned to it, separated by the sign "="> 489field_id := <an element of the field metadata list> 490``` 491 492## Important notes 493 494- Assembler doesn't guarantee that functions, records and their fields will be located in binary file in the same order as they are located in assembly one 495 496## Appendix A, Informative: Code Layout Sample 497 498``` 499# External records and functions: 500.record Record1 <external> 501.function Record1.function1(Record1 a0, f64 a1) <external> 502 503.record Foo <java.extends=SomeRecord> { 504 i32 member1 <java.access=private> 505 i32 member2 <java.access=public> 506 i32 member3 <java.access=static, java.instantiation=static> 507} 508 509.function Foo.constructor1(Foo a0) <java.ctor> 510{ 511 # code for an overloaded "constructor" (whatever you mean by it) 512} 513 514.function Foo.constructor2(Foo a0, i32 a1) <java.ctor> 515{ 516 # code for an overloaded "constructor" (whatever you mean by it) 517} 518 519.function Foo.func1(Foo a0, i32 a1) <java.access=public> 520{ 521 # code 522} 523 524# "Interface" function: 525.function Foo.func2(Foo a0, i32 a1) <noimpl> 526 527.function entry_point(_panda_array_string a0) 528{ 529 # After loading the binary, control will be transferred here 530} 531``` 532 533Apart from metadata annotations, `Foo.` prefixes (remaining a pure naming convention for the assembler!) can be additionally processed during linkage to "bind" functions to records making them "true" methods from the OOP world. 534 535**Strings** and **arrays** can be thought as `external` record with some manipulating functions. There is no support for generics due to the low-level nature of the assembler, hence arrays of different types are implemented with different external record. 536 537## Appendix B, Informative: Mapping Panda Assembler TYpes to JVM Types 538 539This section serves purely illustrative purposes. 540 541| Panda Assembler Type | Corresponding JVM Type | 542| ------ | ------ | 543| `u1` | `bool` | 544| `u8` | N/A | 545| `i8` | `byte` | 546| `u16` | `char` | 547| `i16` | `short` | 548| `u32` | N/A | 549| `i32` | `int` | 550| `u64` | N/A | 551| `i64` | `long` | 552| `f32` | `float` | 553| `f64` | `double` | 554| `cref` | N/A | 555| `dref` | `reference` | 556 557## Appendix C, TODO List 558 559* Specify `cref` and indirect calls to functions. 560* Elaborate on bytecode definition. 561* Compose formal definitions for literals. 562