• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# Ark Bytecode File Format
2This topic describes the Ark bytecode file format in detail, aiming to help you understand its structure and facilitate the analysis and modification of bytecode files.
3
4
5## Constraints
6This topic is based on Ark bytecode of version 12.0.6.0. The version number is an internal field of the ArkCompiler and does not require your attention.
7
8
9## Bytecode File Data Types
10The Ark bytecode uses a variety of basic and composite data types. Below are the definitions and explanations for common types.
11
12### Integer
13
14| **Name**       | **Description**                          |
15| -------------- | ---------------------------------- |
16| `uint8_t`      | 8-bit unsigned integer.                 |
17| `uint16_t`     | 16-bit unsigned integer in little-endian mode.  |
18| `uint32_t`     | 32-bit unsigned integer in little-endian mode.  |
19| `uleb128`      | Unsigned integer encoded in LEB128 format.            |
20| `sleb128`      | Signed integer encoded in LEB128 format.            |
21
22
23### String
24
25- Alignment: single-byte aligned
26- Format
27
28| **Name**| **Format**| **Description**                                              |
29| -------------- | -------------- | ------------------------------------------------------------ |
30| `utf16_length`   | `uleb128`  | The value is **len << 1 \**| **is_ascii**, where **len** is the length of a string encoded by UTF-16, and **is_ascii** specifies whether the string contains only ASCII characters.|
31| `data`           | `uint8_t[]` | Null-terminated sequence of characters encoded in MUTF-8 format. |
32
33
34### TaggedValue
35
36- Alignment: single-byte aligned
37- Format
38
39| **Name**| **Format**| **Description**                               |
40| -------------- | -------------- | -------------------------------------------- |
41| `tag`          | `uint8_t`      | Marker indicating the type of data.                          |
42| `data`         | `uint8_t[]`    | Data content. Its type is determined by the tag, and it may be empty.|
43
44
45## TypeDescriptor
46Represents [class](#class) names in the format of **L_ClassName;**, where **ClassName** is the fully qualified name, in which **'.'** is replaced with **'/'**.
47
48
49## Bytecode File Layout
50The bytecode file begins with the [Header](#header) structure, from which all other structures can be accessed directly or indirectly. References within the file use offsets (32-bit) and indexes (16-bit). Offsets indicate the position relative to the file header, starting from 0. Indexes point to specific entries within index regions. More details are provided in [IndexSection](#indexsection).
51
52All multi-byte values in bytecode files are stored in little-endian format.
53
54
55### Header
56
57- Alignment: single-byte aligned
58- Format
59
60| **Name**   | **Format**| **Description**                                              |
61| ----------------- | -------------- | ------------------------------------------------------------ |
62| `magic`             | `uint8_t[8]`     | Magic number of the file header. Its value must be **'P' 'A' 'N' 'D' 'A' '\0' '\0' '\0'**.   |
63| `checksum`          | `uint32_t`       | **Adler32** checksum of the content in the bytecode file, excluding the magic number and this checksum field.|
64| `version`           | `uint8_t[4]`     | [Version](#version) of the bytecode file.|
65| `file_size`         | `uint32_t`       | Size of the bytecode file, in bytes.                            |
66| `foreign_off`       | `uint32_t`       | An offset that points to a foreign region, which contains only elements of the type [ForeignClass](#foreignclass) or [ForeignMethod](#foreignmethod). **foreign_off** points to the first element in the region.|
67| `foreign_size`      | `uint32_t`       | Size of the foreign region, in bytes.                              |
68| `num_classes`       | `uint32_t`       | Number of elements in the [ClassIndex](#classindex) structure, that is, the number of [classes](#class) defined in the file.|
69| `class_idx_off`     | `uint32_t`       | An offset that points to [ClassIndex](#classindex).|
70| `num_lnps`          | `uint32_t`       | Number of elements in the [LineNumberProgramIndex](#linenumberprogramindex) structure, that is, the number of [line number programs](#line-number-program) defined in the file.|
71| `lnp_idx_off`       | `uint32_t`       | An offset that points to [LineNumberProgramIndex](#linenumberprogramindex).|
72| `reserved`          | `uint32_t`       | Reserved field for internal use by the Ark bytecode file.                          |
73| `reserved`          | `uint32_t`       | Reserved field for internal use by the Ark bytecode file.                          |
74| `num_index_regions` | `uint32_t`       | Number of elements in the [IndexSection](#indexsection) structure, that is, the number of [IndexHeaders](#indexheader) in the file.|
75| `index_section_off` | `uint32_t`       | An offset that points to [IndexSection](#indexsection).|
76
77
78### Version
79The bytecode version number consists of four parts in the format of major.minor.feature.build.
80
81| **Name**| **Format**| **Description**                                            |
82| -------------- | -------------- | ---------------------------------------------------------- |
83| Major      | `uint8_t`        | Indicates significant architectural changes.                |
84| Minor      | `uint8_t`        | Indicates changes in local architecture or major features.|
85| Feature    | `uint8_t`        | Indicates changes due to minor features.                    |
86| Build    | `uint8_t`        | Indicates changes due to bug fixes.                    |
87
88
89### ForeignClass
90Represents foreign classes referenced in the bytecode file but declared in other files.
91
92- Alignment: single-byte aligned
93- Format
94
95| **Name**| **Format**| **Description**                                              |
96| -------------- | -------------- | ------------------------------------------------------------ |
97| `name`           | `String`         | Foreign class name, which follows the [TypeDescriptor](#typedescriptor) syntax.|
98
99
100### ForeignMethod
101Represents foreign methods referenced in the bytecode file but declared in other files.
102
103- Alignment: single-byte aligned
104- Format
105
106| **Name**| **Format**| **Description**                                              |
107| -------------- | -------------- | ------------------------------------------------------------ |
108| `class_idx`      | `uint16_t`       | An index of the class to which the method belongs. It points to a position in [ClassRegionIndex](#classregionindex), and the position value is an offset pointing to [Class](#class) or [ForeignClass](#foreignclass).|
109| `reserved`       | `uint16_t`       | Reserved field for internal use by the Ark bytecode file.              |
110| `name_off`       | `uint32_t`       | An offset to a [string](#string) representing the method name.|
111| `index_data`     | `uleb128`        | [MethodIndexData](#methodindexdata) data of the method.|
112
113> **NOTE**
114>
115> The offset of **ForeignMethod** can be used to locate **IndexHeader** at that offset to parse **class_idx**.
116
117
118### ClassIndex
119Facilitates quick lookup of class definitions by name.
120
121- Alignment: 4-byte aligned
122- Format
123
124| **Name**| **Format**| **Description**                                              |
125| -------------- | -------------- | ------------------------------------------------------------ |
126| `offsets`        | `uint32_t[]`     | An array of offsets pointing to [classes](#class). Elements in the array are sorted by the class name, which follows the [TypeDescriptor](#typedescriptor) syntax. The array length is specified by **num_classes** in [Header](#header).|
127
128
129### Class
130Represents either a source code file or an internal [Annotation](#annotation). For a source code file, **methods** correspond to functions in the source code file, and **fields** correspond to internal information in the source file. For **Annotation**, **fields** or **methods** are not contained. A class in the source code file is represented as a constructor in the bytecode file.
131
132- Alignment: single-byte aligned
133- Format
134
135| **Name**| **Format**| **Description**                                              |
136| -------------- | -------------- | ------------------------------------------------------------ |
137| `name`           | `String`         | Class name, which follows the [TypeDescriptor](#typedescriptor) syntax.|
138| `reserved`       | `uint32_t`       | Reserved field for internal use by the Ark bytecode file.                          |
139| `access_flags`   | `uleb128`        | Tags to access the class, which is a combination of [ClassAccessFlags](#classaccessflag).|
140| `num_fields`     | `uleb128`        | Number of fields in the class.                                         |
141| `num_methods`    | `uleb128`        | Number of methods in the class.                                         |
142| `class_data`     | `TaggedValue[]`  | Array with variable length. Each element in the array is of the [TaggedValue](#taggedvalue) type, and the element tag is of the [ClassTag](#classtag) type. Elements in the array are sorted in ascending order based on the tag (except the **0x00** tag).|
143| `fields`         | `Field[]`        | Array of fields in the class. Each element in this array is of the [Field](#field) type. The array length is specified by **num_fields**.|
144| `methods`        | `Method[]`       | Array of methods in the class. Each element in this array is of the [Method](#method) type. The array length is specified by **num_methods**.|
145
146
147### ClassAccessFlag
148
149| **Name**| **Value**| **Description**                                              |
150| -------------- | ------------ | ------------------------------------------------------------ |
151| `ACC_PUBLIC`     | `0x0001`       | Default attribute. All [classes](#class) in the Ark bytecode file have this tag.|
152| `ACC_ANNOTATION` | `0x2000`       | Declares this class as the [Annotation](#annotation) type.|
153
154
155### ClassTag
156
157- Alignment: single-byte aligned
158- Format
159
160| **Name**| **Value**| **Quantity**| **Format**| **Description**                                              |
161| -------------- | ------------ | -------------- | -------------- | ------------------------------------------------------------ |
162| `NOTHING`        | `0x00`  | `1`  | `none`    | Marks a [TaggedValue](#taggedvalue) as the final item in **class_data**.|
163| `SOURCE_LANG`    | `0x02`  | `0-1 ` | `uint8_t` | **data** of a [TaggedValue](#taggedvalue) with this tag is **0**, indicating that the source code language is in ArkTS, TS, or JS.|
164| `SOURCE_FILE`    | `0x07`  | `0-1`  | `uint32_t`| **data** of a [TaggedValue](#taggedvalue) with this tag is an offset that points to a [string](#string) representing the source file name.|
165
166> **NOTE**
167>
168> **ClassTag** is a marker of the element ([TaggedValue](#taggedvalue)) in **class_data**. **Quantity** in the table header refers to the number of occurrences of the element with this tag in **class_data** of a [class](#class).
169
170
171### Field
172Represents fields within the bytecode file.
173
174- Alignment: single-byte aligned
175- Format
176
177| **Name**| **Format**| **Description**                                              |
178| -------------- | -------------- | ------------------------------------------------------------ |
179| `class_idx`      | `uint16_t`       | An index of the class to which the field belongs. It points to a position in [ClassRegionIndex](#classregionindex). The value of the position is of the [Type](#type) type and is an offset pointing to [Class](#class).|
180| `type_idx`       | `uint16_t`       | An index of the type of the field. It points to a position in [ClassRegionIndex](#classregionindex). The value of the position is of the [Type](#type) type.|
181| `name_off`       | `uint32_t`       | An offset to a [string](#string) representing the field name.|
182| `reserved`       | `uleb128`        | Reserved field for internal use by the Ark bytecode file.                          |
183| `field_data`     | `TaggedValue[]`  | Array with variable length. Each element in the array is of the [TaggedValue](#taggedvalue) type, and the element tag is of the [FieldTag](#fieldtag) type. Elements in the array are sorted in ascending order based on the tag (except the **0x00** tag).|
184
185> **NOTE**
186>
187> The offset of **Field** can be used to locate **IndexHeader** at that offset to parse **class_idx** and **type_idx**.
188
189
190### FieldTag
191
192- Alignment: single-byte aligned
193- Format
194
195| **Name**| **Value**| **Quantity**| **Format**| **Description** |
196| -------------- | ------------ | -------------- | -------------- | ------------------------------------------------------------ |
197| `NOTHING`        | `0x00`   | `1`   | `none`     | Marks a [TaggedValue](#taggedvalue) as the final item in **field_data**.|
198| `INT_VALUE`      | `0x01`   | `0-1` | `sleb128`  | The **data** type of a [TaggedValue](#taggedvalue) with this tag is of **boolean**, **byte**, **char**, **short**, or **int**.|
199| `VALUE`          | `0x02`   | `0-1` | `uint32_t` | The **data** type of a [TaggedValue](#taggedvalue) with this tag is of **FLOAT** or **ID** in [Value formats](#value-formats).|
200
201> **NOTE**
202>
203> **FieldTag** is a marker of the element ([TaggedValue](#taggedvalue)) in **field_data**. **Quantity** in the table header refers to the number of occurrences of the element with this tag in **field_data** of a [field](#field).
204
205
206### Method
207Represents methods within the bytecode file.
208
209- Alignment: single-byte aligned
210- Format
211
212| **Name**| **Format**| **Description**                                              |
213| -------------- | -------------- | ------------------------------------------------------------ |
214| `class_idx`      | `uint16_t`       | An index of the class to which the method belongs. It points to a position in [ClassRegionIndex](#classregionindex). The value of the position is of the [Type](#type) type and is an offset pointing to [Class](#class).|
215| `reserved`       | `uint16_t`       | Reserved field for internal use by the Ark bytecode file.                          |
216| `name_off`       | `uint32_t`       | An offset to a [string](#string) representing the method name.|
217| `index_data`     | `uleb128`        | [MethodIndexData](#methodindexdata) data of the method.|
218| `method_data`    | `TaggedValue[]`  | Array with variable length. Each element in the array is of the [TaggedValue](#taggedvalue) type, and the element tag is of the [MethodTag](#methodtag) type. Elements in the array are sorted in ascending order based on the tag (except the **0x00** tag).|
219
220> **NOTE**
221>
222> The offset of **Method** can be used to locate **IndexHeader** at that offset to parse **class_idx**.
223
224
225### MethodIndexData
226A 32-bit unsigned integer, divided into three parts.
227
228| **Bit**| **Name**| **Format**| **Description**                                              |
229| ------------ | -------------- | -------------- | ------------------------------------------------------------ |
230| 0 - 15       | `header_index`   | `uint16_t`       | Points to a position in [IndexSection](#indexsection). The value of this position is [IndexHeader](#indexheader). You can use **IndexHeader** to find the offsets of all methods ([Method](#method)), [string](#string), or literal array ([LiteralArray](#literalarray)) referenced by this method.|
231| 16 - 23      | `function_kind`  | `uint8_t`        | Function type ([FunctionKind](#functionkind)) of a method.|
232| 24 - 31      | `reserved`       | `uint8_t`        | Reserved field for internal use by the Ark bytecode file.                          |
233
234
235#### FunctionKind
236
237| **Name**          | **Value**| **Description**  |
238| ------------------------ | ------------ | ---------------- |
239| `FUNCTION`                 | `0x1`          | Common function.      |
240| `NC_FUNCTION`              | `0x2`          | Common arrow function.  |
241| `GENERATOR_FUNCTION`       | `0x3`          | Generator function.    |
242| `ASYNC_FUNCTION`           | `0x4`          | Asynchronous function.      |
243| `ASYNC_GENERATOR_FUNCTION` | `0x5`          | Asynchronous generator function.|
244| `ASYNC_NC_FUNCTION`        | `0x6`          | Asynchronous arrow function.  |
245| `CONCURRENT_FUNCTION`      | `0x7`          | Concurrent function.      |
246
247
248### MethodTag
249
250| **Name**| **Value**| **Quantity**| **Format**| **Description**                                              |
251| -------------- | ------------ | -------------- | -------------- | ------------------------------------------------------------ |
252| `NOTHING`        | `0x00`         | `1`             | `none`           | Marks a [TaggedValue](#taggedvalue) as the final item in **method_data**.|
253| `CODE`           | `0x01`         | `0-1 `           | `uint32_t`       | **data** of a [TaggedValue](#taggedvalue) with this tag is an offset pointing to [Code](#code), indicating the code segment of the method.|
254| `SOURCE_LANG`    | `0x02`         | `0-1`            | `uint8_t`        | **data** of a [TaggedValue](#taggedvalue) with this tag is **0**, indicating that the source code language is in ArkTS, TS, or JS.|
255| `DEBUG_INFO`     | `0x05`         | `0-1`            | `uint32_t`       | **data** of a [TaggedValue](#taggedvalue) with this tag is an offset pointing to [DebugInfo](#debuginfo), indicating the debugging information of the method.|
256| `ANNOTATION`     | `0x06`         | `>=0`            | `uint32_t`       | **data** of a [TaggedValue](#taggedvalue) with this tag is an offset pointing to [Annotation](#annotation), indicating the annotation of the method.|
257
258> **NOTE**
259>
260> **MethodTag** is a marker of the element ([TaggedValue](#taggedvalue)) in **method_data**. **Quantity** in the table header refers to the number of occurrences of the element with this tag in **method_data** of a [method](#method).
261
262
263### Code
264
265- Alignment: single-byte aligned
266- Format
267
268| **Name**| **Format**| **Description**                                              |
269| -------------- | -------------- | ------------------------------------------------------------ |
270| `num_vregs`      | `uleb128`        | Number of registers. Registers that store input and default parameters are not counted.        |
271| `num_args`       | `uleb128`        | Total number of input and default parameters.                                    |
272| `code_size`      | `uleb128`        | Total size of all instructions, in bytes.                            |
273| `tries_size`     | `uleb128`        | Length of the **try_blocks** array, that is, the number of [TryBlocks](#tryblock).   |
274| `instructions`   | `uint8_t[]`      | Array of all instructions.                                          |
275| `try_blocks`     | `TryBlock[]`     | An array of **TryBlock** elements.|
276
277
278### TryBlock
279
280- Alignment: single-byte aligned
281- Format
282
283| **Name**| **Format**| **Description**                                              |
284| -------------- | -------------- | ------------------------------------------------------------ |
285| `start_pc`       | `uleb128`        | Offset between the first instruction of the **TryBlock** and the start position of the **instructions** in [Code](#code).|
286| `length`         | `uleb128`        | Size of the **TryBlock** object to create, in bytes.                              |
287| `num_catches`    | `uleb128`        | Number of [CatchBlocks](#catchblock) associated with **TryBlock**. The value is 1.|
288| `catch_blocks`   | `CatchBlock[]`   | Array of **CatchBlocks** associated with **TryBlock**. The array contains one **CatchBlock** that can capture all types of exceptions.|
289
290
291### CatchBlock
292
293- Alignment: single-byte aligned
294- Format
295
296| **Name**| **Format**| **Description**                                 |
297| -------------- | -------------- | ----------------------------------------------- |
298| `type_idx`       | `uleb128`        | If the value is **0**, the **CatchBlock** captures all types of exceptions.|
299| `handler_pc`     | `uleb128`        | Program counter of the first instruction for handling the exception.         |
300| `code_size`      | `uleb128`        | Size of the **CatchBlock**, in bytes.             |
301
302
303### Annotation
304Represents annotations in the bytecode file.
305
306- Alignment: single-byte aligned
307- Format
308
309| **Name**| **Format**     | **Description**                                              |
310| -------------- | ------------------- | ------------------------------------------------------------ |
311| `class_idx`      | `uint16_t`   | An index of the class to which the **Annotation** belongs. It points to a position in [ClassRegionIndex](#classregionindex). The value of the position is of the [Type](#type) type and is an offset pointing to [Class](#class).|
312| `count`          | `uint16_t`   | Length of the **elements** array.                                        |
313| `elements`       | AnnotationElement[] | An array of [AnnotationElement](#annotationelement) elements.|
314| `element_types`  | `uint8_t[]`  | An array, in which each element is of the [AnnotationElementTag](#annotationelementtag) type and is used to describe an **AnnotationElement.** The position of each element in the **element_types** array is the same as that of the corresponding **AnnotationElement** in the **elements** array.|
315
316> **NOTE**
317>
318> The offset of **Annotation** can be used to locate **IndexHeader** at that offset to parse **class_idx**.
319
320
321### AnnotationElementTag
322
323| **Name**| **Tag**|
324| -------------- | --------- |
325| `u1`             | `'1'`   |
326| `i8`             | `'2'`   |
327| `u8`             | `'3'`   |
328| `i16`            | `'4'`   |
329| `u16`            | `'5'`   |
330| `i32`            | `'6'`   |
331| `u32`            | `'7'`   |
332| `i64`            | `'8'`   |
333| `u64`            | `'9'`   |
334| `f32`            | `'A'`   |
335| `f64`            | `'B'`   |
336| `string`         | `'C'`   |
337| `method`         | `'E'`   |
338| `annotation`     | `'G'`   |
339| `literalarray`   | `'#'`   |
340| `unknown`        | `'0'`   |
341
342
343### AnnotationElement
344
345- Alignment: single-byte aligned
346- Format
347
348| **Name**| **Format**| **Description**                                              |
349| -------------- | -------------- | ------------------------------------------------------------ |
350| `name_off`       | `uint32_t`       | An offset to a [string](#string) representing the annotation element name.|
351| `value`          | `uint32_t`       | Value of the annotation element. If the width of the value is less than 32 bits, the value itself is stored here. Otherwise, the value stored here is an offset pointing to the [value formats](#value-formats).|
352
353
354### Value formats
355Different value types have different value encoding formats, including **INTEGER**, **LONG**, **FLOAT**, **DOUBLE**, and **ID**.
356
357| **Name**| **Format**| **Description**                                              |
358| -------------- | -------------- | ------------------------------------------------------------ |
359| `INTEGER`        | `uint32_t`       | 4-byte signed integer.                                      |
360| `LONG`           | `uint64_t`       | 8-byte signed integer.                                      |
361| `FLOAT`          | `uint32_t`       | 4-byte pattern that is zero-extended to the right. The system interprets this pattern as a 32-bit floating-point value in IEEE754 format.|
362| `DOUBLE`         | `uint64_t`       | 8-byte pattern that is zero-extended to the right. The system interprets this pattern as a 64-bit floating-point value in IEEE754 format.|
363| `ID`             | `uint32_t`       | 4-byte pattern that indicates the offset to another structure in the file.                  |
364
365
366### LineNumberProgramIndex
367An array that facilitates the use of a more compact index to access the [line number program](#line-number-program).
368
369- Alignment: 4-byte aligned
370- Format
371
372| **Name**| **Format**| **Description**                                              |
373| -------------- | -------------- | ------------------------------------------------------------ |
374| `offsets`        | `uint32_t[]`     | An array of offsets pointing to line number programs. The array length is specified by **num_lnps** in [Header](#header).|
375
376
377### DebugInfo
378Contains mappings between program counters of the method and the line/column numbers in the source code, as well as information about local variables. The format of the debugging information is derived from the contents in section 6.2 of [DWARF 3.0 Standard](https://dwarfstd.org/dwarf3std.html). The execution model of the [state machine](#state-machine) interprets the [line number program](#line-number-program) to obtain the mappings and local variable information code. To deduplicate programs with the same line number in different methods, all constants referenced in the programs are moved to the [constant pool](#constant-pool).
379
380- Alignment: single-byte aligned
381- Format
382
383| **Name**         | **Format**| **Description**                                              |
384| ----------------------- | -------------- | ------------------------------------------------------------ |
385| `line_start`              | `uleb128`        | Initial value of the line number register of the state machine.                                |
386| `num_parameters`          | `uleb128`        | Total number of input and default parameters.                                    |
387| `parameters`              | `uleb128[]`      | Array that stores the names of input parameters. The array length is specified by **num_parameters**. The value of each element is the offset to the string or **0**. If the value is **0**, the corresponding parameter does not have a name.|
388| `constant_pool_size`      | `uleb128`        | Size of the constant pool, in bytes.                                |
389| `constant_pool`           | `uleb128[]`      | Array for storing constant pool data. The array length is **constant_pool_size**.        |
390| `line_number_program_idx` | `uleb128`        | An index that points to a position in [LineNumberProgramIndex](#linenumberprogramindex). The value of this position is an offset pointing to [Line number program](#line-number-program). The length of **Line number program** is variable and ends with the **END_SEQUENCE** operation code.|
391
392
393#### Constant Pool
394A structure within **DebugInfo** for storing constants. Many methods have similar line number programs, differing only in variable names, variable types, and file names. To eliminate redundancy in these programs, all referenced constants are stored in the constant pool. During program interpretation, the state machine maintains a pointer to the constant pool. When interpreting an instruction that requires a constant parameter, the state machine reads the value from the position pointed to by the constant pool pointer and then increments the pointer.
395
396
397#### State Machine
398Generates [DebugInfo](#debuginfo) information. It contains the following registers.
399
400| **Name**   | **Initial Value**                                            | **Description**                                              |
401| ----------------- | ------------------------------------------------------------ | ------------------------------------------------------------ |
402| `address`           | 0                                                            | Program counter (points to an instruction in the method), which can only increase monotonically.            |
403| `line`              | Value of **line_start** in [DebugInfo](#debuginfo)| Unsigned integer, corresponding to the line number in the source code. All lines are numbered from 1. Therefore, the register value cannot be less than **1**.|
404| `column`            | 0                                                            | Unsigned integer, corresponding to the column number in the source code.                              |
405| `file`              | Value of **SOURCE_FILE** in **class_data** (see [Class](#class)), or 0.| An offset to a [string](#string) representing the source file name. If there is no file name (no **SOURCE_FILE** tag in [Class](#class)), the register value is **0**.|
406| `source_code`       | 0                                                            | An offset to a [string](#string) representing the source code of the source file. If there is no source code information, the register value is **0**.|
407| `constant_pool_ptr` | Address of the first byte of the constant pool in [DebugInfo](#debuginfo).| Pointer to the current constant value.                                      |
408
409
410#### Line Number Program
411Consists of instructions, each containing a single-byte operation code and optional parameters. Depending on the operation code, the parameter values may be encoded within the instruction (called instruction parameters) or retrieved from the constant pool (called constant pool parameters).
412
413| **Operation Code** | **Value**| **Instruction Parameter**  | **Constant Pool Parameters**   | **Parameter Description**| **Description** |
414| ----- | ----- | ------- | ---- | ------- | ------ |
415| `END_SEQUENCE`         | `0x00`  |       |          |        | Marks the end of the line number program.   |
416| `ADVANCE_PC`           | `0x01`  |    | `uleb128 addr_diff`   | **addr_diff**: value to increment the **address** register.   | Increments the **address** register by **addr_diff** to point to the next address, without generating a location entry.|
417| `ADVANCE_LINE`         | `0x02` |     | `sleb128 line_diff`  | **line_diff**: value to increment the **line** register.   | Increments the **line** register by **line_diff** to point to the next line position, without generating a location entry.|
418| `START_LOCAL`          | `0x03` | `sleb128 register_num` | `uleb128 name_idx`<br>`uleb128 type_idx`   | **register_num**: register containing the local variable.<br>**name_idx**: an offset to a [string](#string) representing the variable name.<br>**type_idx**: an offset to a [string](#string) representing the variable type.| Introduces a local variable with a name and type at the current address. The number of the register that will contain this variable is encoded in the instruction. If **register_num** is **-1**, it indicates an accumulator register. The values of **name_idx** and **type_idx** may be **0**, indicating no such information.|
419| `START_LOCAL_EXTENDED` | `0x04` | `sleb128 register_num` | `uleb128 name_idx`<br>`uleb128 type_idx`<br>`uleb128 sig_idx` | **register_num**: register containing the local variable.<br>**name_idx**: an offset to a [string](#string) representing the variable name.<br>**type_idx**: an offset to a [string](#string) representing the variable type.<br>**sig_idx**: an offset to a [string](#string) representing the variable signature.| Introduces a local variable with a name, type, and signature at the current address. The number of the register that will contain this variable is encoded in the instruction. If **register_num** is **-1**, it indicates an accumulator register. The values of **name_idx**, **type_idx**, and **sig_idx** may be **0**, indicating no such information.|
420| `END_LOCAL`            | `0x05` | `sleb128 register_num` |    | **register_num**: register containing the local variable. | Marks the local variable in the specified register as out of scope at the current address. If **register_num** is **-1**, it indicates an accumulator register.|
421| `SET_FILE`             | `0x09`  |    | `uleb128 name_idx`  | **name_idx**: an offset to a [string](#string) representing the file name.| Sets the value of the file register. The value of **name_idx** may be **0**, indicating no such information.|
422| `SET_SOURCE_CODE`      | `0x0a`  |    | `uleb128 source_idx` | **source_idx**: an offset to a [string](#string) representing the source code of the file.| Sets the value of the **source_code** register. The value of **source_idx** may be **0**, indicating no such information.|
423| `SET_COLUMN`           | `0x0b` |    | `uleb128 column_num`   | **column_num**: column number to be set.  | Sets the value of the **column** register and generates a location entry. |
424| Special operation code          | `0x0c..0xff`   |   |  |   | Adjusts the **line** and **address** registers to the next address and generate a location entry. Details are described below.|
425
426
427For special operation codes in the range **0x0c** to **0xff** (included), the state machine performs the following steps to adjust the **line** and **address** registers and then generates a new location entry. For details, see section 6.2.5.1 "Special Opcodes" in [DWARF 3.0 Standard](https://dwarfstd.org/dwarf3std.html).
428
429| **Step**| **Operation**                                    | **Description**                                              |
430| ----- | -------------------------------------------------- | ------------------------------------------------------------ |
431| 1     | `adjusted_opcode = opcode - OPCODE_BASE`            | Calculates the adjusted operation code. The value of **OPCODE_BASE** is **0x0c**, which is the first special operation code.|
432| 2     | `address += adjusted_opcode / LINE_RANGE`            | Increments the **address** register. The value of **LINE_RANGE** is 15, which is used to calculate changes in the line number information.|
433| 3     | `line += LINE_BASE + (adjusted_opcode % LINE_RANGE)` | Increments the **line** register. The value of **LINE_BASE** is **-4**, which is the minimum line number increment. The maximum increment is **LINE_BASE + LINE_RANGE - 1**.|
434| 4     |                                                    | Generates a new location entry.                                      |
435
436> **NOTE**
437>
438> Special operation codes are calculated by using the following formula: (line_increment - LINE_BASE) + (address_increment * LINE_RANGE) + OPCODE_BASE.
439
440
441### IndexSection
442Generally, bytecode files use 32-bit offsets for referencing structures. When a structure references another structure, the current structure records a 32-bit offset of the referenced structure. To optimize file size, the bytecode file is segmented into index regions that use 16-bit indices instead of 32-bit offsets. The **IndexSection** structure provides an overview of these regions.
443
444- Alignment: 4-byte aligned
445- Format
446
447| **Name**| **Format**| **Description**      |
448| -------------- | -------------- | --------- |
449| `headers`        | `IndexHeader[]`  | An array of [IndexHeader](#indexheader) elements. Elements in the array are sorted based on the start offset of the region. The array length is specified by **num_index_regions** in [Header](#header).|
450
451
452### IndexHeader
453Represents an index region. Each index region has two types of indexes: indexes pointing to [Type](#type) and indexes pointing to methods, strings, or literal arrays.
454
455- Alignment: 4-byte aligned
456- Format
457
458| **Name**       | **Format**| **Description**   |
459| -------------- | -------------- | ---------- |
460| `start_off`                             | `uint32_t`       | Start offset of the region.                                        |
461| `end_off`                               | `uint32_t`       | End offset of the region.                                        |
462| `class_region_idx_size`                 | `uint32_t`       | Number of elements in [ClassRegionIndex](#classregionindex) of the region. The maximum value is **65536**.|
463| `class_region_idx_off`                  | `uint32_t`       | An offset to [ClassRegionIndex](#classregionindex).|
464| `method_string_literal_region_idx_size` | `uint32_t`       | Number of elements in [MethodStringLiteralRegionIndex](#methodstringliteralregionindex) of the region. The maximum value is **65536**.|
465| `method_string_literal_region_idx_off`  | `uint32_t`       | An offset to [MethodStringLiteralRegionIndex](#methodstringliteralregionindex).|
466| `reserved`                              | `uint32_t`       | Reserved field for internal use by the Ark bytecode file.                          |
467| `reserved`                              | `uint32_t`       | Reserved field for internal use by the Ark bytecode file.                          |
468| `reserved`                              | `uint32_t`       | Reserved field for internal use by the Ark bytecode file.                          |
469| `reserved`                              | `uint32_t`       | Reserved field for internal use by the Ark bytecode file.                          |
470
471
472### ClassRegionIndex
473Provides compact indexing for locating [Type](#type) entries.
474
475- Alignment: 4-byte aligned
476- Format
477
478| **Name**| **Format**| **Description**                                              |
479| -------------- | -------------- | ------------------------------------------------------------ |
480| `types`          | `Type[]`         | An array of [Type](#type) elements. The array length is specified by **class_region_idx_size** in [IndexHeader](#indexheader).|
481
482
483### Type
484A 32-bit value representing either a basic type encoding or an offset to a [class](#class).
485
486Basic type encodings are listed below.
487
488| **Type**      | **Encoding**       |
489| -------------- | -------------- |
490| `u1`           | `0x00`         |
491| `i8`           | `0x01`         |
492| `u8`           | `0x02`         |
493| `i16`          | `0x03`         |
494| `u16`          | `0x04`         |
495| `i32`          | `0x05`         |
496| `u32`          | `0x06`         |
497| `f32`          | `0x07`         |
498| `f64`          | `0x08`         |
499| `i64`          | `0x09`         |
500| `u64`          | `0x0a`         |
501| `any`          | `0x0c`         |
502
503
504### MethodStringLiteralRegionIndex
505Provides compact indexing for methods, strings, or literal arrays.
506
507- Alignment: 4-byte aligned
508- Format
509
510| **Name**| **Format**| **Description**                                              |
511| -------------- | -------------- | ------------------------------------------------------------ |
512| `offsets`      | `uint32_t[]`   | An array of offsets to methods, strings, or literal arrays. The array length is specified by **method_string_literal_region_idx_size** in [IndexHeader](#indexheader).|
513
514
515### LiteralArray
516Describes a literal array in the bytecode file.
517
518- Alignment: single-byte aligned
519- Format
520
521| **Name**| **Format**| **Description**                                              |
522| -------------- | -------------- | ------------------------------------------------------------ |
523| `num_literals`   | `uint32_t`       | Length of the **literals** array.                                        |
524| `literals`       | `Literal[]`      | An array of [Literal](#literal) elements.|
525
526
527### Literal
528Describes literals in the bytecode file. Depending on the number of bytes in the literal value, there are four encoding formats: single-byte encoding, double-byte encoding, four-byte encoding, and eight-byte encoding. These formats optimize the size of the bytecode file by matching the encoding to the length of the value.
529
530- Alignment: Each format has its corresponding alignment rules.
531- Format
532
533| **Name**| **Format**| **Description**|
534| -------------- | ------------ | -------------- |
535| Single-byte encoding     | `uint8_t`    | A single-byte value, aligned to one byte, used for simple type literals, such as **BOOL** literals.  |
536| Double-byte encoding     | `uint16_t`   | A two-byte value, aligned to two bytes, used for 16-bit integer literals.  |
537| Four-byte code     | `uint32_t`   | A four-byte value, aligned to four bytes, used for 32-bit numeric literals, such as **INTEGER** integer literals or **FLOAT** floating-point literals.  |
538| Eight-byte encoding     | `uint64_t`   | An eight-byte value, aligned to eight bytes, used for 64-bit numeric literals, such as **DOUBLE** double-precision floating-point literals.  |
539