• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# Assembly File Format Specification
2
3## Introduction
4
5This document describes assembly file format for Panda platform. Assembly files are human-readable and human-writeable plain text files, they are supposed to be fed to the Panda assembler, a dedicated tool that translates them to binary files that can be executed by the Panda virtual machine. Please note that this document does not describe bytecode instructions supported by the Panda virtual machine, refer to the [Bytecode ISA Specification](isa/isa.yaml) instead. This document does not specify the binary format of executables supported by the Panda virtual machine, please refer to the [Binary Format Specification](file_format.md) instead.
6
7### Requirements
8
9Panda as a platform is multilingual and flexible by design:
10
11* Panda assembly should not "favor" by any means any existing programming language that is (or intended to be) supported by the platform. Instead, Panda assembly can be thought as a separate close-to-byte-code language with a minimal feature set. All language-specific "traits" that should be supported to generate valid executable binaries with respect to the higher-level semantics should be implemented via metadata annotations (see below).
12* Panda assembly should not focus on a certain programming paradigm. E.g. we should not enforce concepts of "class", "object", "method" at the assembly language level because we might support a language which does not implement classic OOP at all.
13* When Panda assembler generates a binary executable file, it is not expected to check for language semantics. This responsibility is delegate to "source to binaries" compilers and runtime.
14* Panda assembler should not impose any limitation of quantity and internal structure of source code files writtebn in Panda assembly language. Assembler should process as many input source code files as the developer specifies.
15* Panda assembler should not follow any implicit conventions about the name of the entry point.
16
17## Comments
18
19Comments are marked with the `#` character. All characters that follow it (including the `#` character itself) are ignored.
20
21## Literals
22
23### Numeric Literals
24
25Following numeric literals are supported:
26
27* Signed/Unsigned decimal/hexadecimal/binary integers not larger than 64 bits. Hexadecimal literals are prefixed with `0x`. Binary literals are prefixed with `0b`.
28* Floating-point decimal/hexadecimal literals that can be represented with IEEE 754. Hexadecimal floating-point literals are prefixed with `0x`. They are first converted to a bit representation that corresponds to a hex, and then converted to a double using a bit_cast in accordance with the IEEE 754 standard.
29
30### String Literals
31
32String literal is a sequence of any characters enclosed in `"` characters. Non-printable characters and characters out of Latin-1 character set must be encoded with `mutf8` encoding. For example: `"ζ–‡ε­—θŒƒδΎ‹"` string literal should be encoded as `"\xe6\x96\x87\xe5\xad\x97\xe8\x8c\x83\xe4\xbe\x8b"`
33
34The following escape sequences can be used in string literals:
35
36    - `\"` double quote, `\x22`
37    - `\a` alert, `\x07`
38    - `\b` backspace, `\x08`
39    - `\f` form feed, `\x0c`
40    - `\n` newline, `\x0a`
41    - `\r` carriage return, `\x0d`
42    - `\t` horizontal tab, `\x09`
43
44## Identifiers
45
46### Simple Identifiers
47
48A simple identifier is a sequence of ASCII characters. Allowed characters in the sequence are:
49
50* Letters from `a` to `z`.
51* Letters from `A` to `Z`.
52* Digits from `0` to `9`.
53* Following characters: `_`, `$`.
54
55Following constraints apply:
56
57* A valid identifier starts with any letter or with `_`.
58* All identifiers are case sensitive.
59
60Simple identifiers can be used for naming metadata annotations, primitive data types, aggregate data types, members of aggregate data types, functions and labels.
61
62### Prefixed Identifiers
63
64A prefixed identifier is a sequence of simple identifiers delimited by the `.` char without whitespaces.
65
66Prefixed identifiers can be used for naming metadata annotations, aggregate data types and functions.
67
68## Metadata Annotations
69
70As stated above, current version of Panda assembly does not favor any language as the platform is designed to support many of them. To deal with language-specific metadata, annotations are used, defined as follows:
71
72```
73<key1=value1, key2=value2, ...>
74```
75
76Values are optional. In such case, only `key` is needed.
77
78Following constraints apply:
79
80* Each key is a valid indetifier.
81* All keys are unique within a single annotation list.
82* If present, a value is a valid identifier, with following exception: Values can start with a digit.
83
84In all cases where annotations can be optionally used, `optional_annotation` marker is used in this document.
85
86There are keys that indicate that a function must not have an implementation. The absence of these keys suggests otherwise. We shall call metadata containing such keys --- `lonely metadata`.
87
88### Function metadata annotations
89
90#### Standard metadata
91
92A definition of a function is assumed.
93
94| Key | Description |
95| ------ | ------ |
96
97#### Lonely metadata
98
99A declaration of a function is assumed.
100
101| Key | Description |
102| ------ | ------ |
103| `external` | Marks an externally defined function. Does not require value. |
104| `native` | Marks an externally defined function. Does not require value. |
105| `noimpl` | Marks a function without implementation. Does not require value. |
106| `static` | Marks a function as static. Does not require value. |
107| `ctor`   | Marks a function as object constructor. It will be renamed in binary file according to particular language rules (`.ctor` for Panda Assembly) |
108| `cctor`  | Marks a function as static constructor. It will be renamed in binary file according to particular language rules (`.cctor` for Panda Assembly) |
109
110### Record metadata annotations
111
112#### Standard metadata
113
114A definition of a record is assumed.
115
116| Key | Description |
117| ------ | ------ |
118
119#### Lonely metadata
120
121A declaration of a record is assumed.
122
123| Key | Description |
124| ------ | ------ |
125| `external` | Marks an externally defined record. Does not require value. |
126
127### Field metadata annotations
128
129| Key | Description |
130| ------ | ------ |
131| `external` | Marks an externally defined field. Does not require value. |
132| `static` | Marks an statically defined field. Does not require value. |
133
134### Language specific annotations
135
136Currently Panda Assembly supports annotations for the following languages:
137
138- Java
139- PandaAssembly
140
141To specify language `.language` directive is used. It must be declared before any other declarations:
142```
143.language Java
144
145.function void f() {}
146```
147By default PandaAssembly language is assumed.
148
149#### Java annotations
150
151Currently Panda Assembly supports following Java annotations
152
153| Key | Description |
154| --- | --- |
155| `java.access` | Used to specify access level of record, field or function. Possible values: `private`, `protected`, `public`. |
156| `java.extends` | Used to specify inheritance between records. Value is the name of the base record. |
157| `java.implements` | Used to specify interface inheritance between records. Value is the name of the interface record. Allowed multiple definition. |
158| `java.interface` | Used to specify that the record represents Java interface. |
159| `java.enum` | Used to specify that the record and its fields represent Java enum. |
160| `java.annotation` | Used to specify that the record represents Java annotation. |
161| `java.annotation.type` | Used to specify type of annotation. Possible values: `class`, `runtime`. |
162| `java.annotation.class` | Used to specify annotation class. Allowed multiple definitions. Value is the name of the record that represent Java annotation |
163| `java.annotation.id` | Used to specify annotation id. Annotations with id are used as values of other annotation elements. `java.annotation.class` must be defined first. Allowed multiple definitions (but only one definition for each annotation). |
164| `java.annotation.element.name` | Used to specify name of the annotation element. `java.annotation.class` must be defined first. Allowed multiple definitions (but only one definition for each annotation element). |
165| `java.annotation.element.type` | Used to specify type of the annotation element. `java.annotation.element.name` must be defined first. Allowed multiple definitions (but only one definition for each annotation element). Possible values: `u1`, `i8`, `u8`, `i16`, `u16`, `i32`, `u32`, `i64`, `u64`, `f32`, `f64`, `string`, `class`, `enum`, `annotation`, `array`. |
166| `java.annotation.element.array.component.type` | Used to specify component type of the array annotation element. `java.annotation.element.type` must be defined first and have `array` value. Allowed multiple definitions (but only one definition for each annotation element). Possible values: `u1`, `i8`, `u8`, `i16`, `u16`, `i32`, `u32`, `i64`, `u64`, `f32`, `f64`, `string`, `class`, `enum`, `annotation`. |
167| `java.annotation.element.value` | Used to specify value of the annotation element. Allowed multiple definitions (also multiple definitions for one annotation element if it has `array` type). |
168
169
170Example:
171
172```
173.language Java
174
175.record A <java.access=public> {}
176.record B <java.access=public, java.extends=A> {}
177
178.record Iface1 <java.interface>
179.record Iface2 <java.interface>
180
181.record C <java.implements=Iface1, java.implements=Iface2> {}
182
183.record A1 <java.annotation, java.annotation.type=runtime> {}
184.record A2 <java.annotation, java.annotation.type=runtime> {}
185
186# Annotation elements are represented using abstract methods
187
188.function i32[] A1.NameArr() <noimpl>
189.function A1 A2.Name() <noimpl>
190
191# @A2(Name=@A1(NameArr={1,2}))
192.record R <java.annotation.class=A1, java.annotation.id=id1, java.annotation.element.name=NameArr, java.annotation.element.type=array, java.annotation.element.array.component.type=i32, java.annotation.element.value=1, java.annotation.element.value=2, java.annotation.class=A2, java.annotation.element.name=Name, java.annotation.element.type=annotation, java.annotation.element.value=id1>
193```
194
195## Data Types
196
197Semantics of operations on all data types defined below follows the semantics defined in [Bytecode ISA Specification](isa/isa.yaml).
198
199### Primitive Data Types
200
201Following primitive types are supported:
202
203| Panda Assembler Type | Description |
204| ------ | ------ |
205| `void` | Type for the result of a function that returns normally, but does not provide a result value to its caller |
206| `u1` | Unsinged 1-bit integer number |
207| `u8` | Unsigned 8-bit integer number |
208| `i8` | Signed 8-bit integer number |
209| `u16` | Unsigned 16-bit integer number |
210| `i16` | Signed 16-bit integer number |
211| `u32` | Unsigned 32-bit integer number |
212| `i32` | Signed 32-bit integer number |
213| `u64` | Unsigned 64-bit integer number |
214| `i64` | Signed 64-bit integer number |
215| `f32` | 32-bit single precision floating point number, compliant with IEEE 754 standard |
216| `f64` | 64-bit double precision floating point number, compliant with IEEE 754 standard |
217
218All identifiers that are used for naming primitive data types cannot be used for any other purpose.
219
220### Reference Data Types
221
222Following reference types are supported:
223
224| Panda Assembler Type | Description |
225| ------ | ------ |
226| `cref` | code reference, represents references to the bytecode executable by Panda virtual machine |
227| `dref` | data reference, represents references to aggregate data types (see below) |
228
229All identifiers that are used for naming reference data types cannot be used for any other purpose.
230
231### Aggregate Data Types
232
233Aggregate data types are defined as follows:
234
235```
236.record RecordName optional_annotation {
237    type1 member1 optional_annotation1
238    type2 member2 optional_annotation2
239    # ...
240    typeN memberN optional_annotationN
241}
242```
243
244Following constraints apply:
245
246* `RecordName`, `type1`, ... `typeN`, `member1`, ... `memberN` are valid identifiers.
247* `member1`, ... `memberN` are unique identifiers within a record.
248* `RecordName` is unique across all source code files.
249
250Whenever a record should incorporate another record, the name of the nested record must be specified. However, in this context this name implicitly denotes a `dref` type which implements a reference to the data represented by that record. Example:
251
252```
253.record Foo {
254    i32 member1
255    f32 member2
256}
257
258.record Bar {
259    Foo foo
260    f64 member1
261    f64 member2
262}
263```
264
265#### Informal Notice
266
267`.record`s are like `struct`s in C, but without support for "by instance" nesting. This is because the result of a field load should be valid for any member, hence a record member should fit the virtusal register. Constraints on register are defined in [Bytecode ISA Specification](isa/isa.yaml).
268
269### Builtin Aggregate Data Types
270
271Platform has following builtin aggregate types
272
273| Panda Assembler Type | Description |
274| ------ | ------ |
275| `panda.String` | UTF16 string |
276
277### Arrays
278
279Platform support arrays of primitive and aggregate data types. Array of type `T` has type name `T[]`. Example:
280```
281.function void f() {
282    ...
283    newarr v1, v0, i32[]
284    ...
285    newarr v1, v0, panda.String[]
286    ...
287    newarr v1, v0, f32[][][]
288    ...
289}
290```
291
292## Functions
293
294Functions are defined as follows:
295
296```
297.function FunctionName(ArgumentType0 a0, ... ArgumentTypeN argN) optional_annotation
298{
299    # code
300}
301```
302
303Following constraints apply:
304
305* `FunctionName`, `ArgumentType0`, ... `ArgumentTypeN`, `a0`, ... `aN` are valid identifiers.
306* All `a0`, ... `aN` are unique within the argument list of the function.
307* `FunctionName` is unique across all source code files.
308
309### Function Arguments and Local Variables
310
311By convention, all arguments are named `a0`, ... `aN` and all local variables are named `v0`, ... `vM`. Panda assembler guarantees that all these entities are unambiguously mapped to the underlying virtual registers.
312
313### Function Body
314
315If a function has a body, it consists of optionally labeled sequence of bytecode instructions, one instruction defined per line. Instruction opcodes and formats follow [Bytecode ISA Specification](isa/isa.yaml).
316
317### Static and virtual functions
318
319By default all function are static except ones that are binded to record and accept reference to it as the first parameter:
320
321```
322.record R {}
323
324.function void R.foo(R a0) {} # virtual function
325
326.function void R.foo(R a0) <static> {} # static function
327
328.function void R.foo(i32 a0) {} # static function
329```
330
331#### Call instructions
332
333Assembler relaxes constraints for call instructions:
334
335- If number of arguments is less than specified in [Bytecode ISA Specification](isa/isa.yaml) it passes `v0` instead of unspecified ones.
336
337- For non range call instructions assembler chooses optimal encoding according to number of specified arguments.
338
339Example:
340
341Following instruction in assembly
342```
343call.static f, v1
344```
345will be emitted as
346```
347call.short.static f, v1, v0
348```
349
350### Program Entry Point
351
352Any function which accepts an array of strings as its single argument may serve as a program entry point. The name of the entry point must be specified as a part of the input to the assembler program. An example of a possible entry point is:
353
354```
355.record _panda_array_string <external>
356
357.function foo(_panda_array_string a0)
358{
359    # code
360}
361```
362
363### Exception handlers
364
365Try, catch and finally blocks can be declared using `.catch` and `.catchall` directives:
366```
367.catch <exception_record>, <try_begin_label>, <try_end_label>, <catch_begin_label>
368.catchall <try_begin_label>, <try_end_label>, <catch_begin_label>
369```
370
371Example:
372```
373.record Exception1 {}
374.record Exception2 {}
375
376.function void foo()
377{
378    ...
379try_begin:
380    ...
381try_end:
382    ...
383catch_begin1:
384    ...
385catch_begin2:
386    ...
387catchall_begin1:
388    ...
389
390    .catch Exception1, try_begin, try_end, catch_begin1
391    .catch Exception2, try_begin, try_end, catch_begin2
392    .catchall try_begin, try_end, catchall_begin1
393}
394```
395
396Also there are more safer directives, which allow to specify exact bounds
397of an exceptions handler for more precise verification of control-flow in
398byte-code verifier.
399
400```
401.catch <exception_record>, <try_begin_label>, <try_end_label>, <catch_begin_label>, <catch_end_label>
402.catchall <try_begin_label>, <try_end_label>, <catch_begin_label>, <catch_end_label>
403```
404
405They are almost identical to `.catch` and `.catchall` differ only by specifying end label of the
406exception handler. End label is the label that immediately follows last instruction of the
407exception handler.
408
409## Pseudo-BNF
410
411Instruction flow is omitted for simplicity:
412
413```
414# Literals are represented in double-quotes as "literal value".
415# Free-form descriptions are represented as "<description here>"
416# Empty symbol is represented as E.
417
418defs          := defs def | E
419def           := rec_def | func_def
420
421# Identifiers:
422
423letter_lower    := "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" | "j" | "k" | "l" | "m" | "n" | "o" | "p" | "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" | "y" | "z"
424letter_upper    := "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" | "J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" | "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z"
425digit           := "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
426char_misc       := "_"
427
428char_non_dig   := letter_lower | letter_upper | char_misc
429char_simple    := char_non_dig | digit
430
431id_simple      := char_non_dig id_simple_tail
432id_simple_tail := id_simple_tail char_simple | E
433
434id_prefixed    := id_simple | id_simple "." id_prefixed
435
436# Records and types:
437rec_def       := ".record" rec_name rec_add
438rec_add       := def_pair_rec_meta rec_body | def_lonely_rec_meta
439rec_name      := id_prefixed
440rec_body      := "{" fields "}"
441type_def      := "u1" | "u8" | "i8" | "u16" | "i16" | "u32" | "i32" | "i64" | "f32" | "f64" | "any" | rec_name | type_def []
442
443# Fields of records:
444fields        := fields field_def | E
445field_def     := field_type field_name def_field_meta
446field_type    := type_def
447field_name    := id_simple
448
449# Functions:
450func_def      := ".function" func_sig func_add
451func_add      := def_pair_func_meta func_body | def_lonely_func_meta
452func_sig      := func_ret func_name func_args
453func_ret      := type_def
454func_name     := id_prefixed
455func_args     := "(" arg_list ")"
456arg_list      := <","-separated list of argument names and their respective types>
457func_body     := "{" func_code "}"
458func_code     := <newline-separated sequence of bytecode instructions and their operands>
459
460# Function metadata annotations:
461def_pair_func_meta    := "<" func_meta_list ">" | E
462def_lonely_func_meta  := "<" func_lonely_meta_list ">"
463func_meta_list        := func_meta_list func_meta_item "," | E
464func_meta_item        := func_kv_pair | func_id
465func_kv_pair          := <an element of the function standard metadata list that assumes the assignment of a value, and the value that is assigned to it, separated by the sign "=">
466func_id               := <an element of the function standard metadata list>
467func_lonely_meta_list := func_lonely_meta_list func_meta_item "," | E
468func_meta_item        := func_kv_lonely_pair | func_lonely_id
469func_kv_lonely_pair   := <an element of the function lonely metadata list that assumes the assignment of a value, and the value that is assigned to it, separated by the sign "=">
470func_lonely_id        := <an element of the function lonely metadata list>
471
472# Record metadata annotations:
473def_pair_rec_meta    := "<" rec_meta_list ">" | E
474def_lonely_rec_meta  := "<" rec_lonely_meta_list ">"
475rec_meta_list        := rec_meta_list rec_meta_item "," | E
476rec_meta_item        := rec_kv_pair | rec_id
477rec_kv_pair          := <an element of the record standard metadata list that assumes the assignment of a value, and the value that is assigned to it, separated by the sign "=">
478rec_id               := <an element of the record standard metadata list>
479rec_lonely_meta_list := rec_lonely_meta_list rec_meta_item "," | E
480rec_meta_item        := rec_kv_lonely_pair | rec_lonely_id
481rec_kv_lonely_pair   := <an element of the record lonely metadata list that assumes the assignment of a value, and the value that is assigned to it, separated by the sign "=">
482rec_lonely_id        := <an element of the record lonely metadata list>
483
484# Field metadata annotations:
485def_field_meta       := "<" field_meta_list ">" | E
486field_meta_list      := field_meta_list field_meta_item "," | E
487field_meta_item      := field_kv_pair | field_id
488field_kv_pair        := <an element of the field metadata list that assumes the assignment of a value, and the value that is assigned to it, separated by the sign "=">
489field_id             := <an element of the field metadata list>
490```
491
492## Important notes
493
494- Assembler doesn't guarantee that functions, records and their fields will be located in binary file in the same order as they are located in assembly one
495
496## Appendix A, Informative: Code Layout Sample
497
498```
499# External records and functions:
500.record Record1 <external>
501.function Record1.function1(Record1 a0, f64 a1) <external>
502
503.record Foo <java.extends=SomeRecord> {
504    i32 member1 <java.access=private>
505    i32 member2 <java.access=public>
506    i32 member3 <java.access=static, java.instantiation=static>
507}
508
509.function Foo.constructor1(Foo a0) <java.ctor>
510{
511    # code for an overloaded "constructor" (whatever you mean by it)
512}
513
514.function Foo.constructor2(Foo a0, i32 a1) <java.ctor>
515{
516    # code for an overloaded "constructor" (whatever you mean by it)
517}
518
519.function Foo.func1(Foo a0, i32 a1) <java.access=public>
520{
521    # code
522}
523
524# "Interface" function:
525.function Foo.func2(Foo a0, i32 a1) <noimpl>
526
527.function entry_point(_panda_array_string a0)
528{
529    # After loading the binary, control will be transferred here
530}
531```
532
533Apart from metadata annotations, `Foo.` prefixes (remaining a pure naming convention for the assembler!) can be additionally processed during linkage to "bind" functions to records making them "true" methods from the OOP world.
534
535**Strings** and **arrays** can be thought as `external` record with some manipulating functions. There is no support for generics due to the low-level nature of the assembler, hence arrays of different types are implemented with different external record.
536
537## Appendix B, Informative: Mapping Panda Assembler TYpes to JVM Types
538
539This section serves purely illustrative purposes.
540
541| Panda Assembler Type | Corresponding JVM Type |
542| ------ | ------ |
543| `u1` | `bool` |
544| `u8` | N/A |
545| `i8` | `byte` |
546| `u16` | `char` |
547| `i16` | `short` |
548| `u32` | N/A |
549| `i32` | `int` |
550| `u64` | N/A |
551| `i64` | `long` |
552| `f32` | `float` |
553| `f64` | `double` |
554| `cref` | N/A |
555| `dref` | `reference` |
556
557## Appendix C, TODO List
558
559* Specify `cref` and indirect calls to functions.
560* Elaborate on bytecode definition.
561* Compose formal definitions for literals.
562