1# Analyzing CPP Crash 2 3A cpp crash refers to a process crash in C/C++ application. The FaultLogger module provides capabilities such as process crash detection, log collection, log storage, and log reporting, helping you to locate faults more effectively. 4 5The following introduces cpp crash detection, crash fault locating and analysis, and typical cases. To use this guideline, you need to have basic knowledge about stack and heap in C/C++. 6 7## Cpp Crash Detection 8 9Process crash detection is based on the posix signal mechanism. Currently, the exception signals that can be processed are as follows: 10 11| Signo| Signal| Description| Trigger Cause| 12| -------- | -------- | -------- | -------- | 13| 4 | SIGILL | Invalid instruction| An invalid, incorrectly formatted, unknown, or privileged instruction is executed.| 14| 5 | SIGTRAP | Breakpoint or trap| An exception occurs or a trap instruction is executed.| 15| 6 | SIGABRT | Process abort| The process is aborted abnormally. Generally, this exception occurs when the process calls **abort()** in the Standard Function Library.| 16| 7 | SIGBUS | Illegal memory access| The process accesses an aligned or nonexistent physical address.| 17| 8 | SIGFPE | Floating-point exception| An incorrect arithmetic operation is executed, for example, a 0 divisor, floating point overflow, or integer overflow.| 18| 11 | SIGSEGV | Invalid memory access| The process accesses an invalid memory region.| 19| 16 | SIGSTKFLT | Stack error| The processor performs an incorrect stack operation, such as a pop when the stack is empty or a push when the stack is full.| 20| 31 | SIGSYS | Incorrect system call| An incorrect or invalid parameter is used in a system call.| 21 22Some of the preceding fault signals are classified into codes based on specific scenarios. 23**SIGILL** occurs in Unix and Unix-like operating systems. It indicates an invalid instruction exception. The **SIGILL** signal is usually triggered by the following causes: 24| Code| Signal| Description| Trigger Cause| 25| -------- | -------- | -------- | -------- | 26| 1 | ILL_ILLOPC | Illegal operation code.| A privileged instruction or an instruction that is unsupported by the CPU is executed.| 27| 2 | ILL_ILLOPN | Illegal operand.| An incorrect operand or improper operand type is used.| 28| 3 | ILL_ILLADR | Illegal address.| A program accesses an invalid memory address or an unaligned memory address.| 29| 4 | ILL_ILLTRP | Illegal trap.| A program performs an illegal trap instruction or an undefined operation.| 30| 5 | ILL_PRVOPC | Illegal privileged operation code.| A common user executes a privileged instruction.| 31| 6 | ILL_PRVREG | Illegal privileged register.| A common user accesses a privileged register.| 32| 7 | ILL_COPROC | Illegal coprocessor.| A program performs an undefined coprocessor instruction.| 33| 8 | ILL_BADSTK | Illegal stack.| A program performs an operation at an invalid stack address, or when the stack overflows.| 34 35**SIGTRAP** usually occurs in debugging and tracking. The four scenarios of the **SIGTRAP** signal are described as follows. 36| Code| Signal| Description| Trigger Cause| 37| -------- | -------- | -------- | -------- | 38| 1 | TRAP_BRKPT | Software breakpoint.| The software breakpoint is reached in a program. When debugging a program, a software breakpoint at the key position can be used to pause the program execution and check information such as variable values.| 39| 2 | TRAP_TRACE | Single-step debugging.| A single instruction is executed in a program. Single instruction can be used to check the execution result of each instruction.| 40| 3 | TRAP_BRANCH | Branch tracing.| A branch instruction is executed in a program. Branch instruction can be used to control the execution process of a program, such as if statements and loop statements.| 41| 4 | TRAP_HWBKPT | Hardware breakpoint.| The hardware breakpoint is reached in a program. When debugging a program, a hardware breakpoint at the key position can be used to pause the program execution and check information such as variable values. Different from a software breakpoint, a hardware breakpoint is implemented in CPU hardware. Therefore, whether a hardware breakpoint is triggered can be detected in real time during program execution.| 42 43The **SIGBUS** signal is sent by the operating system to a process. It usually indicates a memory access error. The codes of the **SIGBUS** signal are described as follows: 44 45| Code| Signal| Description| Trigger Cause| 46| -------- | -------- | -------- | -------- | 47| 1 | BUS_ADRALN | Unaligned memory address.| A program accesses an unaligned memory address, for example, a non-even address of a 4-byte integer.| 48| 2 | BUS_ADRERR | Invalid memory address.| A program accesses a memory address that does not exist in the Process Address Space, such as a null pointer.| 49| 3 | BUS_OBJERR | Invalid object access.| A program accesses an object that is deleted or not initialized.| 50| 4 | BUS_MCEERR_AR | Invalid hardware memory check.| A checksum error is detected when the hardware memory is accessed.| 51| 5 | BUS_MCEERR_AO | Invalid hardware memory check.| An address check error is detected when the hardware memory is accessed.| 52 53The **SIGFPE** signal indicates a floating-point exception or an arithmetic exception. The codes of the **SIGFPE** signal are described as follows: 54 55| Code| Signal| Description| Trigger Cause| 56| -------- | -------- | -------- | -------- | 57| 1 | FPE_INTDIV | Invalid integer division.| The divisor in an integer division is zero. | 58| 2 | FPE_INTOVF | Integer overflow.| The divisor in an integer division is negative. | 59| 3 | FPE_FLTDIV | Invalid floating-point division.| The divisor in a floating-point division is zero. | 60| 4 | FPE_FLTOVF | Floating-point overflow.| The divisor in a floating-point division is negative. | 61| 5 | FPE_FLTUND | Floating-point underflow.| The divisor in a floating-point division is zero. | 62| 6 | FPE_FLTRES | Invalid floating-point result.| The divisor in a floating-point division is positive. | 63| 7 | FPE_FLTINV | Invalid floating-point operation.| The divisor in a floating-point division is negative. | 64| 8 | FPE_FLTSUB | Floating-point trap.| The divisor in a floating-point division is zero. | 65 66The **SIGSEGV** signal occurs when a process accesses a non-existent memory address or an inaccessible address. The codes of the **SIGSEGV** signal are described as follows: 67 68| Code| Signal| Description| Trigger Cause| 69| -------- | -------- | -------- | -------- | 70| 1 | SEGV_MAPERR | Non-existent memory address.| A process accesses a memory address that does not exist or that is not mapped to the Process Address Space. This exception is usually caused by pointer errors or memory leaks.| 71| 2 | SEGV_ACCERR | Inaccessible memory address.| A process accesses an inaccessible memory address marked by the operating system, such as a read-only memory address or a memory address without execution permission. This exception is usually caused by buffer overflow or modifying read-only memory.| 72 73The classification of codes cannot only be based on **signo**, but also be based on the causes of the signal. The preceding describes the codes classified based on the **signo** of each signal, while the following describes the codes classified based on causes of all signals: 74 75| Code| Signal| Description| Trigger Cause| 76| -------- | -------- | -------- | -------- | 77| 0 | SI_USER | User space.|This signal is sent by a process in user space to another process, usually using the **kill()**. For example, when a user presses **Ctrl+C** on the terminal, a **SIGINT** signal is sent to all foreground processes.| 78| 0x80 | SI_KERNEL | Kernel.|This signal is sent by the kernel to the process. It is usually sent when the kernel detects some errors or exceptions. For example, when a process accesses an invalid memory address or executes an invalid instruction, the kernel sends a **SIGSEGV** signal to the process.| 79| -1 | SI_QUEUE | The **sigqueue()** function.|This signal is sent by **sigqueue()**, and an additional integer value and a pointer can be carried. It is usually used for advanced communication between processes, such as transferring data or notifying a process that an event occurs.| 80| -2 | SI_TIMER | Timer.|This signal is sent by a timer and is usually used to execute a scheduled task or a periodic task. For example, when a timer expires, the kernel sends a **SIGALRM** signal to the process.| 81| -3 | SI_MESGQ | Message queue.|This signal is sent by a message queue and is usually used for communication across processes. For example, when a process sends a message to a message queue, the kernel sends a **SIGIO** signal to the receiving process.| 82| -4 | SI_ASYNCIO | Asynchronous I/O.|This signal is sent by an asynchronous I/O and is usually used for a non-blocking I/O. For example, when an I/O operation on a file descriptor is complete, the kernel sends a **SIGIO** signal to the process.| 83| -5 | SI_SIGIO | Synchronous I/O.|This signal is sent by an asynchronous I/O and is usually used for a non-blocking I/O. For example, when an I/O operation on a file descriptor is complete, the kernel sends a **SIGIO** signal to the process.| 84| -6 | SI_TKILL | The **tkill()** function.|This signal is sent by the function **tkill()**, which is similar to the function **kill()**. In addition, you can specify the ID of the thread that sends the signal. It is usually used to send a signal to a specified thread in a multithreaded program.| 85 86## Fault Analysis 87 88### Crash Log Collection 89 90The process crash log is managed together with the app freeze and JS crash logs by the FaultLogger module. You can obtain process crash logs using any of the following methods: 91 92- Method 1: DevEco Studio 93 94 DevEco Studio collects process crash logs from **/data/log/faultlog/faultlogger/** to FaultLog, where logs are displayed by process name, fault, and time. For details about how to obtain logs, see <!--RP1-->[DevEco Studio User Guide-FaultLog](https://developer.huawei.com/consumer/cn/doc/harmonyos-guides-V5/ide-fault-log-V5)<!--RP1End-->. 95 96- Method 2: hiAppEvent APIs 97 98 hiAppEvent provides APIs to subscribe to various fault logs. For details, see [Introduction to HiAppEvent](hiappevent-intro.md). 99 100<!--Del--> 101- Method 3: Shell 102 103 - When a process crashes, you can find fault logs in **/data/log/faultlog/temp/** on the device. The log files are named in the format of **cppcrash-process PID-timestamp (millisecond)**. They contain information such as the process crash call stack, process crash register, stack memory, maps, and process file handle list. 104 105  106 107 The fault logs obtained using Shell in **/data/log/faultlog/temp** is as follows: 108 109 ```text 110 Timestamp:2024-05-06 20:10:51.000 <- Timestamp when the fault occurs 111 Pid:9623 <- Process ID 112 Uid:0 <- User ID 113 Process name:./crasher_cpp <- Process name 114 Process life time:1s <- Process life time 115 Reason:Signal:SIGSEGV(SEGV_MAPERR)@0x00000004 probably caused by NULL pointer dereference <- Fault cause and null pointer prompt 116 Fault thread info: 117 Tid:9623, Name:crasher_cpp <- Thread ID, thread name 118 #00 pc 00008d22 /system/bin/crasher_cpp(TestNullPointerDereferenceCrash0()+22)(adfc673300571d2da1e47d1d12f48b44) <- Call stack 119 #01 pc 000064d1 /system/bin/crasher_cpp(DfxCrasher::ParseAndDoCrash(char const*) const+160)(adfc673300571d2da1e47d1d12f48b44) 120 #02 pc 00006569 /system/bin/crasher_cpp(main+92)(adfc673300571d2da1e47d1d12f48b44) 121 #03 pc 00072b98 /system/lib/ld-musl-arm.so.1(libc_start_main_stage2+56)(d820b1827e57855d4f9ed03ba5dfea83) 122 #04 pc 00004e28 /system/bin/crasher_cpp(_start_c+84)(adfc673300571d2da1e47d1d12f48b44) 123 #05 pc 00004dcc /system/bin/crasher_cpp(adfc673300571d2da1e47d1d12f48b44) 124 Registers: <- Fault registers 125 r0:ffffafd2 r1:00000004 r2:00000001 r3:00000000 126 r4:ffd27e39 r5:0096e000 r6:00000a40 r7:0096fdfc 127 r8:f7ba58d5 r9:f7baea86 r10:f7cadd38 128 fp:ffd27308 ip:f7cb2078 sp:ffd272a0 lr:f7c7ab98 pc:0096ad22 129 Memory near registers: <- Memory near fault registers 130 r4([stack]): 131 ffd27e30 72656873 132 ffd27e34 7070635f 133 ... 134 ffd27eac 3d73746f 135 r5(/system/bin/crasher_cpp): 136 0096dff8 00000000 137 0096dffc 0096717d 138 ... 139 0096e074 00000000 140 r7(/system/lib/ld-musl-arm.so.1): 141 f7cabb58 00000000 142 f7cabb5c 0034ba00 143 ... 144 f7cabbd4 00000000 145 r8(/system/lib/ld-musl-arm.so.1): 146 f7ba58cc 63637573 147 f7ba58d0 2e737365 148 ... 149 f7ba5948 70206269 150 r9(/system/lib/ld-musl-arm.so.1): 151 f7baea7c 20746f6e 152 f7baea80 6e756f66 153 ... 154 f7baeaf8 25206e69 155 r10([anon:ld-musl-arm.so.1.bss]): 156 f7cadd30 00000000 157 f7cadd34 00000000 158 ... 159 f7caddac 00000000 160 r12([anon:ld-musl-arm.so.1.bss]): 161 f7cb2070 56726562 162 f7cb2074 65756c61 163 ... 164 f7cb20ec 00000000 165 sp([stack]): 166 ffd27328 00000000 167 ffd2732c 00966dd0 168 ... 169 ffd273a4 00000004 170 pc(/system/bin/crasher_cpp): 171 00966dc8 e1a0d00c 172 00966dcc eb000000 173 ... 174 00966e44 e5907008 175 pc(/system/bin/crasher_cpp): 176 00966dc8 e1a0d00c 177 00966dcc eb000000 178 ... 179 00966e44 e5907008 180 FaultStack: <- Stack of the crashed thread 181 ffd27260 00000000 182 ffd27264 f7cac628 183 ... 184 ffd2729c 0096ad1f 185 sp0:ffd272a0 0096fdfc <- #00Stack top 186 ffd272a4 009684d3 187 sp1:ffd272a8 00000001 188 ffd272ac 73657408 189 ffd272b0 f7590074 190 ... 191 ffd272dc 0096856d 192 sp2:ffd272e0 ffd27334 193 ffd272e4 ffd27334 194 ffd272e8 00000002 195 .... 196 ffd272f4 f7bfbb9c 197 sp3:ffd272f8 00000000 198 ffd272fc ffd27334 199 200 Maps: <- Process maps files when the fault occurs 201 962000-966000 r--p 00000000 /system/bin/crasher_cpp 202 966000-96c000 r-xp 00003000 /system/bin/crasher_cpp 203 96c000-96f000 r--p 00008000 /system/bin/crasher_cpp 204 96f000-970000 rw-p 0000a000 /system/bin/crasher_cpp 205 149f000-14a0000 ---p 00000000 [heap] 206 14a0000-14a2000 rw-p 00000000 [heap] 207 ... 208 f7b89000-f7be1000 r--p 00000000 /system/lib/ld-musl-arm.so.1 209 f7be1000-f7ca9000 r-xp 00057000 /system/lib/ld-musl-arm.so.1 210 f7ca9000-f7cab000 r--p 0011e000 /system/lib/ld-musl-arm.so.1 211 f7cab000-f7cad000 rw-p 0011f000 /system/lib/ld-musl-arm.so.1 212 f7cad000-f7cbc000 rw-p 00000000 [anon:ld-musl-arm.so.1.bss] 213 ffd07000-ffd28000 rw-p 00000000 [stack] 214 ffff0000-ffff1000 r-xp 00000000 [vectors] 215 OpenFiles: <- FD information of the file opened by the process when the fault occurs 216 0->/dev/pts/1 native object of unknown type 0 217 1->/dev/pts/1 native object of unknown type 0 218 2->/dev/pts/1 native object of unknown type 0 219 3->socket:[67214] native object of unknown type 0 220 ... 221 11->pipe:[67219] native object of unknown type 0 222 12->socket:[29074] native object of unknown type 0 223 25->/dev/ptmx native object of unknown type 0 224 26->/dev/ptmx native object of unknown type 0 225 ``` 226 227 - You can find more comprehensive fault logs in **/data/log/faultlog/faultlogger/**, which include information such as device name, system version and process logs. The log files are named in the format of **cppcrash-process name-process UID-time (millisecond).log**. 228 229  230<!--DelEnd--> 231 232**Fault Logs of Null Pointer** 233 234In this scenario, a message is printed in the log, indicating that the fault may be caused by a null pointer dereference. The following is an example process crash log archived by DevEco Studio in FaultLog: 235 236```text 237Generated by HiviewDFX@OpenHarmony 238================================================================ 239Device info:OpenHarmony 3.2 <- Device information 240Build info:OpenHarmony 5.0.0.23 <- Build information 241Fingerprint:cdf52fd0cc328fc432459928f3ed8edfe8a72a92ee7316445143bed179138073 <- Fingerprint 242Module name:crasher_cpp <-Module name 243Timestamp:2024-05-06 20:10:51.000 <- Timestamp when the fault occurs 244Pid:9623 <- Process ID 245Uid:0 <- User ID 246Process name:./crasher_cpp <- Process name 247Process life time:1s <- Process life time 248Reason:Signal:SIGSEGV(SEGV_MAPERR)@0x00000004 probably caused by NULL pointer dereference <- Fault cause and null pointer prompt 249Fault thread info: 250Tid:9623, Name:crasher_cpp <- Thread ID, thread name 251#00 pc 00008d22 /system/bin/crasher_cpp(TestNullPointerDereferenceCrash0()+22)(adfc673300571d2da1e47d1d12f48b44) <- Call stack 252#01 pc 000064d1 /system/bin/crasher_cpp(DfxCrasher::ParseAndDoCrash(char const*) const+160)(adfc673300571d2da1e47d1d12f48b44) 253#02 pc 00006569 /system/bin/crasher_cpp(main+92)(adfc673300571d2da1e47d1d12f48b44) 254#03 pc 00072b98 /system/lib/ld-musl-arm.so.1(libc_start_main_stage2+56)(d820b1827e57855d4f9ed03ba5dfea83) 255#04 pc 00004e28 /system/bin/crasher_cpp(_start_c+84)(adfc673300571d2da1e47d1d12f48b44) 256#05 pc 00004dcc /system/bin/crasher_cpp(adfc673300571d2da1e47d1d12f48b44) 257Registers: <- Fault registers 258r0:ffffafd2 r1:00000004 r2:00000001 r3:00000000 259r4:ffd27e39 r5:0096e000 r6:00000a40 r7:0096fdfc 260r8:f7ba58d5 r9:f7baea86 r10:f7cadd38 261fp:ffd27308 ip:f7cb2078 sp:ffd272a0 lr:f7c7ab98 pc:0096ad22 262Memory near registers: <- Memory near fault registers 263r4([stack]): 264 ffd27e30 72656873 265 ffd27e34 7070635f 266 ... 267 ffd27eac 3d73746f 268r5(/system/bin/crasher_cpp): 269 0096dff8 00000000 270 0096dffc 0096717d 271 ... 272 0096e074 00000000 273r7(/system/lib/ld-musl-arm.so.1): 274 f7cabb58 00000000 275 f7cabb5c 0034ba00 276 ... 277 f7cabbd4 00000000 278r8(/system/lib/ld-musl-arm.so.1): 279 f7ba58cc 63637573 280 f7ba58d0 2e737365 281 ... 282 f7ba5948 70206269 283r9(/system/lib/ld-musl-arm.so.1): 284 f7baea7c 20746f6e 285 f7baea80 6e756f66 286 ... 287 f7baeaf8 25206e69 288r10([anon:ld-musl-arm.so.1.bss]): 289 f7cadd30 00000000 290 f7cadd34 00000000 291 ... 292 f7caddac 00000000 293r12([anon:ld-musl-arm.so.1.bss]): 294 f7cb2070 56726562 295 f7cb2074 65756c61 296 ... 297 f7cb20ec 00000000 298sp([stack]): 299 ffd27328 00000000 300 ffd2732c 00966dd0 301 ... 302 ffd273a4 00000004 303pc(/system/bin/crasher_cpp): 304 00966dc8 e1a0d00c 305 00966dcc eb000000 306 ... 307 00966e44 e5907008 308pc(/system/bin/crasher_cpp): 309 00966dc8 e1a0d00c 310 00966dcc eb000000 311 ... 312 00966e44 e5907008 313FaultStack: <- Stack of the crashed thread 314 ffd27260 00000000 315 ffd27264 f7cac628 316 ... 317 ffd2729c 0096ad1f 318sp0:ffd272a0 0096fdfc <- #00Stack top 319 ffd272a4 009684d3 320sp1:ffd272a8 00000001 321 ffd272ac 73657408 322 ffd272b0 f7590074 323 ... 324 ffd272dc 0096856d 325sp2:ffd272e0 ffd27334 326 ffd272e4 ffd27334 327 ffd272e8 00000002 328 .... 329 ffd272f4 f7bfbb9c 330sp3:ffd272f8 00000000 331 ffd272fc ffd27334 332 333Maps: <- Process maps files when the fault occurs 334962000-966000 r--p 00000000 /system/bin/crasher_cpp 335966000-96c000 r-xp 00003000 /system/bin/crasher_cpp 33696c000-96f000 r--p 00008000 /system/bin/crasher_cpp 33796f000-970000 rw-p 0000a000 /system/bin/crasher_cpp 338149f000-14a0000 ---p 00000000 [heap] 33914a0000-14a2000 rw-p 00000000 [heap] 340... 341f7b89000-f7be1000 r--p 00000000 /system/lib/ld-musl-arm.so.1 342f7be1000-f7ca9000 r-xp 00057000 /system/lib/ld-musl-arm.so.1 343f7ca9000-f7cab000 r--p 0011e000 /system/lib/ld-musl-arm.so.1 344f7cab000-f7cad000 rw-p 0011f000 /system/lib/ld-musl-arm.so.1 345f7cad000-f7cbc000 rw-p 00000000 [anon:ld-musl-arm.so.1.bss] 346ffd07000-ffd28000 rw-p 00000000 [stack] 347ffff0000-ffff1000 r-xp 00000000 [vectors] 348OpenFiles: <- FD information of the file opened by the process when the fault occurs 3490->/dev/pts/1 native object of unknown type 0 3501->/dev/pts/1 native object of unknown type 0 3512->/dev/pts/1 native object of unknown type 0 3523->socket:[67214] native object of unknown type 0 353... 35411->pipe:[67219] native object of unknown type 0 35512->socket:[29074] native object of unknown type 0 35625->/dev/ptmx native object of unknown type 0 35726->/dev/ptmx native object of unknown type 0 358 359HiLog: <- HiLog logs when the fault occurs 36005-06 20:10:51.301 9623 9623 E C03f00/MUSL-SIGCHAIN: signal_chain_handler call 2 rd sigchain action for signal: 11 36105-06 20:10:51.306 9623 9623 I C02d11/DfxSignalHandler: DFX_SigchainHandler :: sig(11), pid(9623), tid(9623). 36205-06 20:10:51.307 9623 9623 I C02d11/DfxSignalHandler: DFX_SigchainHandler :: sig(11), pid(9623), processName(./crasher_cpp), threadName(crasher_cpp). 36305-06 20:10:51.389 9623 9623 I C02d11/DfxSignalHandler: processdump have get all resgs 364 365``` 366 367**Fault Logs of Stack Overflow** 368 369If the following prompt information is printed in logs, it indicates that the fault may be caused by stack overflow. The following is an example process crash log archived by DevEco Studio in FaultLog: 370 371```text 372Generated by HiviewDFX@OpenHarmony 373================================================================ 374Device info:OpenHarmony 3.2 <- Device information 375Build info:OpenHarmony 5.0.0.23 <- Build information 376Fingerprint:8bc3343f50024204e258b8dce86f41f8fcc50c4d25d56b24e71fe26c0a23e321 <- Fingerprint 377Module name:crasher_cpp <- Module name 378Timestamp:2024-05-06 20:18:24.000 <- Timestamp when the fault occurs 379Pid:9838 <- Process ID 380Uid:0 <- User ID 381Process name:./crasher_cpp <- Process name 382Process life time:2s <- Process life time 383Reason:Signal:SIGSEGV(SEGV_ACCERR)@0xf76b7ffc current thread stack low address = 0xf76b8000, probably caused by stack-buffer-overflow <- Fault cause and stack overflow prompt 384... 385``` 386 387**Fault Logs of Stack Coverage** 388 389In the stack coverage scenario, the stack frame cannot be traced because the stack memory is illegally accessed. A message is displayed in the log, indicating that the stack fails to be returned and the system attempts to parse the thread stack to obtain an unreliable call stack. The information is provided for problem analysis. The following is an example process crash log archived by DevEco Studio in FaultLog: 390 391```text 392Generated by HiviewDFX@OpenHarmony 393================================================================ 394Device info:OpenHarmony 3.2 <- Device information 395Build info:OpenHarmony 5.0.0.23 <- Build information 396Fingerprint:79b6d47b87495edf27135a83dda8b1b4f9b13d37bda2560d43f2cf65358cd528 <- Fingerprint 397Module name:crasher_cpp <- Module name 398Timestamp:2024-05-06 20:27:23.2035266415 <- Timestamp when the fault occurs 399Pid:10026 <- Process ID 400Uid:0 <- User ID 401Process name:./crasher_cpp <- Process name 402Process life time:1s <- Process life time 403Reason:Signal:SIGSEGV(SEGV_MAPERR)@0000000000 probably caused by NULL pointer dereference <- Fault cause 404Fault thread info: 405Tid:10026, Name:crasher_cpp <- Thread ID, thread name 406#00 pc 00000000 Not mapped 407#01 pc 00008d22 /system/bin/crasher_cpp(TestNullPointerDereferenceCrash0()+22)(adfc673300571d2da1e47d1d12f48b44) <- Call stack 408#02 pc 000064d1 /system/bin/crasher_cpp(DfxCrasher::ParseAndDoCrash(char const*) const+160)(adfc673300571d2da1e47d1d12f48b44) 409#03 pc 00006569 /system/bin/crasher_cpp(main+92)(adfc673300571d2da1e47d1d12f48b44) 410#04 pc 00072b98 /system/lib/ld-musl-arm.so.1(libc_start_main_stage2+56)(d820b1827e57855d4f9ed03ba5dfea83) 411Registers: <- Fault registers 412r0:ffffafd2 r1:00000004 r2:00000001 r3:00000000 413r4:ffd27e39 r5:0096e000 r6:00000a40 r7:0096fdfc 414r8:f7ba58d5 r9:f7baea86 r10:f7cadd38 415fp:ffd27308 ip:f7cb2078 sp:ffd272a0 lr:f7c7ab98 pc:0096ad22 416ExtraCrashInfo(Unwindstack): <- Print the custom stack information about the system framework service. 417Failed to unwind stack, try to get unreliable call stack from #02 by reparsing thread stack <- Attempt to obtain an unreliable stack from the thread stack 418... 419``` 420 421**Fault Logs of Asynchronous Thread** 422 423When an asynchronous thread crashes, the stack of the thread that submits the asynchronous task is also printed to help locate the crash. Currently, the ARM64 architecture is supported on the debugging application (**HAP_DEBUGGABLE**). The **SubmitterStacktrace** is used to differentiate the call stack of the crash thread and that of the submitting thread. The following is an example process crash log archived by DevEco Studio in FaultLog: 424 425```text 426Generated by HiviewDFX@OpenHarmony 427================================================================ 428Device info:OpenHarmony 3.2 <- Device information 429Build info:OpenHarmony 5.0.0.23 <- Build information 430Fingerprint:8bc3343f50024204e258b8dce86f41f8fcc50c4d25d56b24e71fe26c0a23e321 <- Fingerprint 431Module name:crasher_cpp <- Module name 432Timestamp:2024-05-06 20:28:24.000 <- Timestamp when the fault occurs 433Pid:9838 <- Process ID 434Uid:0 <- User ID 435Process name:./crasher_cpp <- Process name 436Process life time:2s <- Process life time 437Reason:Signal:SIGSEGV(SI_TKILL)@0x000000000004750 from:18256:0 <- Fault Cause 438Fault thread info: 439Tid:18257, Name:crasher_cpp <- Thread ID, thread name 440#00 pc 000054e6 /system/bin/ld-musl-aarch64.so.l(raise+228)(adfc673300571d2da1e47d1d12f48b44) <- Call stack 441#01 pc 000054f9 /system/bin/crasher_cpp(CrashInSubThread(void*)+56)(adfc673300571d2da1e47d1d12f48b50) 442#02 pc 000054f9 /system/bin/ld-musl-aarch64.so.l(start+236)(adfc673300571d2da1e47d1d12f48b44) 443========SubmitterStacktrace======== <- The call stack used to print submitting thread 444#00 pc 000094dc /system/bin/crasher_cpp(DfxCrasher::AsyncStacktrace()+36)(adfc673300571d2da1e47d1d12f48b50) 445#01 pc 00009a58 /system/bin/crasher_cpp(DfxCrasher::ParseAndDoCrash(char const*) const+232)(adfc673300571d2da1e47d1d12f48b50) 446#02 pc 00009b40 /system/bin/crasher_cpp(main+140)(adfc673300571d2da1e47d1d12f48b50) 447#03 pc 0000a4e1c /system/bin/ld-musl-aarch64.so.l(libc_start_main_stage2+68)(adfc673300571d2da1e47d1d12f48b44) 448... 449``` 450 451**Logs of Custom Information About System Framework Services** 452 453When a process crashes, the custom maintenance and test information of the system framework service can be printed to help you locate faults. The information can be the string, memory, callback, or stack type. Currently, the ARM64 architecture is supported. Since API 18, the **LastFatalMessage** field carries only the last fatal-level log printed by using HiLog or the last message set by using the **set_fatal_message** API of libc before the process crashes. The callback type information and stack type information are moved from the **LastFatalMessage** field to the **ExtraCrashInfo** (Callback) and **ExtraCrashInfo** (Unwindstack) fields, respectively. The following is the core content of the process crash logs archived by DevEco Studio in FaultLog, which contains four types of custom information about system framework services. 454 455- String information: 456 457 ```text 458 Generated by HiviewDFX@OpenHarmony 459 ================================================================ 460 Device info:OpenHarmony 3.2 <- Device information 461 Build info:OpenHarmony 5.0.0.23 <- Build information 462 Fingerprint:cdf52fd0cc328fc432459928f3ed8edfe8a72a92ee7316445143bed179138073 <- Fingerprint 463 Module name:crasher_cpp <-Module name 464 Timestamp:2024-05-06 20:10:51.000 <- Timestamp when the fault occurs 465 Pid:9623 <- Process ID 466 Uid:0 <- User ID 467 Process name:./crasher_cpp <- Process name 468 Process life time:1s <- Process life time 469 Reason:Signal:SIGSEGV(SEGV_MAPERR)@0x00000004 probably caused by NULL pointer dereference <- Fault cause and null pointer prompt 470 Fault thread info: 471 Tid:9623, Name:crasher_cpp <- Thread ID, thread name 472 #00 pc 00008d22 /system/bin/crasher_cpp(TestNullPointerDereferenceCrash0()+22)(adfc673300571d2da1e47d1d12f48b44) <- Call stack 473 #01 pc 000064d1 /system/bin/crasher_cpp(DfxCrasher::ParseAndDoCrash(char const*) const+160)(adfc673300571d2da1e47d1d12f48b44) 474 #02 pc 00006569 /system/bin/crasher_cpp(main+92)(adfc673300571d2da1e47d1d12f48b44) 475 #03 pc 00072b98 /system/lib/ld-musl-arm.so.1(libc_start_main_stage2+56)(d820b1827e57855d4f9ed03ba5dfea83) 476 #04 pc 00004e28 /system/bin/crasher_cpp(_start_c+84)(adfc673300571d2da1e47d1d12f48b44) 477 #05 pc 00004dcc /system/bin/crasher_cpp(adfc673300571d2da1e47d1d12f48b44) 478 Registers: <- Fault registers 479 r0:ffffafd2 r1:00000004 r2:00000001 r3:00000000 480 r4:ffd27e39 r5:0096e000 r6:00000a40 r7:0096fdfc 481 r8:f7ba58d5 r9:f7baea86 r10:f7cadd38 482 fp:ffd27308 ip:f7cb2078 sp:ffd272a0 lr:f7c7ab98 pc:0096ad22 483 ExtraCrashInfo(String): <- Print custom string information about the system framework service 484 test get CrashObject. 485 ... 486 ``` 487 488- Memory information: 489 490 ```text 491 ... 492 ExtraCrashInfo(Memory start address 0000xxxx): <- Print custom memory information about the system framework service 493 +0x000: xxxxx xxxxx xxxxx xxxxx <- Print the memory value from 0x000 to 0x018. 494 +0x020: xxxxx xxxxx xxxxx xxxxx <- Print the memory value from 0x020 to 0x038. 495 ... 496 ``` 497 4983. Callback information: 499 500 From API 18, the callback information is moved from the **LastFatalMessage** field to the **ExtraCrashInfo(Callback)** field. 501 502 ```text 503 ... 504 ExtraCrashInfo(Callback): <- Print custom callback information about the system framework service. 505 test get callback information. 506 ... 507 ``` 508 5094. Stack information: 510 511 From API 18, the callback information is moved from the **LastFatalMessage** field to the **ExtraCrashInfo(Unwindstack)** field. 512 513 ```text 514 ... 515 ExtraCrashInfo(Unwindstack): <- Print the custom stack information about the system framework service. 516 Failed to unwind stack, try to get unreliable call stack from #02 by reparsing thread stack 517 ... 518 ``` 519 520> **NOTE** 521> 522> The omitted information is similar to the example of the string information. 523 524### Locating the Problematic Code Based on the Crash Stack 525 526#### Method 1: DevEco Studio 527 528In application development, you can locate the problematic code in the cppcrash stack of the dynamic library. Both native stack frames and JS stack frames are supported. For some stack frames that fail to be parsed and located in DevEco Studio, refer to Method 2. 529 530 531 532#### Method 2: SDK llvm-addr2line 533 534- Obtain the symbol list. 535 Obtain the .so file with symbols in the crash stack, which should be the same as that of the application or system. 536 Compiled and built in DevEco Studio, the .so file of dynamic library is generated with symbols by default in **/build/default/intermediates/libs**. You can run the **Linux file** command to check whether the BuildID of two .so files match. Generated by a compiler, BuildID is the unique identifier of a binary file, in which "not stripped" indicates that a symbol table is included. 537 538 ```text 539 $ file libbabel.so 540 libbabel.so: ELF 64-bit LSB shared object, ARM aarch64, version 1 (SYSV), dynamically linked, BuildID[sha1]=fdb1b5432b9ea4e2a3d29780c3abf30e2a22da9d, with debug_info, not stripped 541 ``` 542 543 **Note**: The symbol table of the system dynamic library is archived with the version. 544 545- Locate the line number using llvm-addr2line. 546 You can find llvm-addr2line in **[SDK DIR PATH]\OpenHarmony\11\native\llvm\bin**, or you need to search for the path as it varies based on the SDK version. 547 The sample stack is as follows (part are omitted): 548 549 ```text 550 Generated by HiviewDFX@OpenHarmony 551 ================================================================ 552 Device info:OpenHarmony 3.2 553 Build info:OpenHarmony 5.0.0.22 554 Fingerprint:50577c0a1a1b5644ac030ba8f08c241cca0092026b59f29e7b142d5d4d5bb934 555 Module name:com.samples.recovery 556 Version:1.0.0 557 VersionCode:1000000 558 PreInstalled:No 559 Foreground:No 560 Timestamp:2017-08-05 17:03:40.000 561 Pid:2396 562 Uid:20010044 563 Process name:com.samples.recovery 564 Process life time:7s 565 Reason:Signal:SIGSEGV(SEGV_MAPERR)@0000000000 probably caused by NULL pointer dereference 566 Tid:2396, Name:amples.recovery 567 # 00 pc 00003510 /data/storage/el1/bundle/libs/arm/libentry.so(TriggerCrash(napi_env__*, napi_callback_info__*)+24)(446ff75d3f6a518172cc52e8f8055650b02b0e54) 568 # 01 pc 0002b0c5 /system/lib/platformsdk/libace_napi.z.so(panda::JSValueRef ArkNativeFunctionCallBack<true>(panda::JsiRuntimeCallInfo*)+448)(a84fbb767fd826946623779c608395bf) 569 # 02 pc 001e7597 /system/lib/platformsdk/libark_jsruntime.so(panda::ecmascript::EcmaInterpreter::RunInternal(panda::ecmascript::JSThread*, unsigned char const*, unsigned long long*)+14710)(106c552f6ce4420b9feac95e8b21b792) 570 # 03 pc 001e0439 /system/lib/platformsdk/libark_jsruntime.so(panda::ecmascript::EcmaInterpreter::Execute(panda::ecmascript::EcmaRuntimeCallInfo*)+984)(106c552f6ce4420b9feac95e8b21b792) 571 ... 572 # 39 pc 00072998 /system/lib/ld-musl-arm.so.1(libc_start_main_stage2+56)(5b1e036c4f1369ecfdbb7a96aec31155) 573 # 40 pc 00005b48 /system/bin/appspawn(_start_c+84)(cb0631260fa74df0bc9b0323e30ca03d) 574 # 41 pc 00005aec /system/bin/appspawn(cb0631260fa74df0bc9b0323e30ca03d) 575 Registers: 576 r0:00000000 r1:ffc47af8 r2:00000001 r3:f6555c94 577 r4:00000000 r5:f4d90f64 r6:bd8434f8 r7:00000000 578 r8:00000000 r9:ffc48808 r10:ffc47b70 579 fp:f7d8a5a0 ip:00000000 sp:ffc47aac lr:f4d6b0c7 pc:bd843510 580 ``` 581 582 Parsed by SDK llvm-addr2line, the row number of problematic code is as follows: 583 584 ```text 585 [SDK DIR PATH]\OpenHarmony\11\native\llvm\bin> .\llvm-addr2line.exe -Cfie libentry.so 3150 586 TrggerCrash(napi_env__*, napi_callback_info__*) 587 D:/code/apprecovery-demo/entry/src/main/cpp/hello.cpp:48 588 ``` 589 590 You can use the **llvm-addr2line.exe -fCpie libutils.z.so offset** command to parse the stack line by line. If there are multiple offsets, you can parse them together using the **llvm-addr2line.exe -fCpie libxxx.so 0x1bc868 0x1be28c xxx** command. If the obtained row number does not seem correct, you can change the address (for example, subtract 1) or disable some compilation optimization. 591 592#### Method 3: DevEco Studio hstack 593 594hstack is a tool provided by DevEco Studio for you to restore the crash stack of an obfuscated release app to the source code stack. It runs on Windows, macOS, and Linux. For details, see [DevEco Studio hstack User Guide](https://developer.huawei.com/consumer/cn/doc/harmonyos-guides-V5/ide-command-line-hstack-V5). 595 596### Reviewing Code Based on Services 597 598Review the context after the row number of the stack top is obtained. As shown in the following figure, line 48 in the **hello.cpp** file indicates a null pointer dereference. 599 600 601 602This example is constructed, and actual scenario is usually more complicate and needs to be analyzed based on services. 603 604### Disassembling (optional) 605 606Generally, if the problem is clear, you can locate the problem by decompiling the code line. In a few cases, if the method called in a line contains multiple parameters and the parameters involve structs, you need to use disassembly for further analysis. 607 608Case 609 610The header information of the CPPCRASH log is as follows: 611 612```text 613Process name:com.ohos.medialibrary.medialibrarydata 614 615Process life time:13402s 616 617Reason:SIGSEGV(SEGV_MAPERR)@0x0000005b3b46c000 618 619Fault thread info: 620 621Tid:48552, Name:UpradeTask 622 623#00 pc 00000000000a87e4 /system/lib/ld-musl-aarch64.so.1(memcpy+356)(3c3e7fb27680dc2ee99aa08dd0f81e85) 624 625... 626``` 627 628Procedure: 629 630- Obtain the corresponding assembly instruction based on the PC register address and obtain the current operation based on the assembly instruction. 631 632 Obtain the PC address at the top of the stack from the CPPCRASH log file and disassemble the corresponding ELF file (using the unstrip .so file and the **llvm-objdump -d -l xxx.so** command). 633 634 For example, when a **data_abort** issue occurs during the execution of the instruction corresponding to the **00000000000a87e4** address, decompile the libc.so file corresponding to the buildId **3c3e7fb27680dc2ee99aa08dd0f81e85**. 635 636 Disassemble the code to view the information displayed in the **a87e4** offset address: 637 638 ```text 639 xxx/../../third_party/optimized-routines/string/aarch64/memcpy.S:175 640 641 a87e4: a94371aa ldp x10, x11, [x1, #48] 642 ``` 643 644 Check the code of the **memcpy.S** source file corresponding to line 175: 645 646 ```text 647 L(loop64): 648 649 line 170 stp A_l, A_h, [dst, 16] 650 651 line 171 ldp A_l, A_h, [src, 16] 652 653 line 172 stp B_l, B_h, [dst, 32] 654 655 line 173 ldp B_l, B_h, [src, 32] 656 657 line 174 stp C_l, C_h, [dst, 48] 658 659 line 175 ldp C_l, C_h, [src, 48] ----> Instruction in the crash 660 661 line 176 stp D_l, D_h, [dst, 64] 662 663 line 177 ldp D_l, D_h, [src, 64] 664 665 line 178 subs count, count, 64 666 667 line 179 b.hi L(loop64) 668 ``` 669 670- Infer the code object of the current operation based on the register value and context. 671 672 Generally, register **x0** is the first parameter of the function, **x1** is the second parameter, **x2** is the third parameter, and so on. If the method is a class method, **x0** is the address pointer of the object, and **x1**, **x2**, and **x3** are deduced by analogy. Note that if there are more than five function parameters, they will be pushed into the stack. 673 674 In **void* memcpy(void* restrict dest, void* restrict src, size_t n)** at the stack top, **x0** indicates the destination address **dest**, **x1** indicates the source address, and **x2** indicates the number of copied bytes. 675 676 Obtain the corresponding three register values in the CPPCRASH log file. Based on the error access address **0x0000005b3b46c000**, it is determined that the faulty parameter is the **src** parameter corresponding to **x1**. 677 678 ```text 679 Register: 680 681 x0:000005b50c3e3c4 x1:000005b3b46bfcc x2:0000000000007e88 x3:000005b50c42380 682 683 ... 684 ``` 685 6863. Determine the fault type of the code object. 687 688 Check **Memory near registers** in the CPPCRASH log. 689 690 ```text 691 x1(/data/medialibrary/database/kvdb/3ddb6fb8b2fcb38d2f431e86bfb806dab771637860d6e86bb9430fa15df04248/single_ver/main/gen_natural_st): 692 693 0000005b21bb1fb8 8067d0f2e727f00a 694 695 0000005b21bb1fc0 1b10e1e9a1079f7a 696 697 0000005b21bb1fc8 83906d9c18cdb9c1 698 699 0000005b21bb1fd0 627dd75ab9335eb0 700 701 0000005b21bb1fd8 aabe2bb1b00f2c03 702 703 0000005b21bb1fe0 f981e4acb716cbc1 704 705 0000005b21bb1fe8 806b3d5730d281ee 706 707 0000005b21bb1ff0 3e99fedbc0a9b5e9 708 709 0000005b21bb1ff8 a91ab9d327969682 710 711 0000005b21bb2000 ffffffffffffffff -----> Out-of-bounds read 712 713 0000005b21bb2008 ffffffffffffffff 714 715 0000005b21bb2010 ffffffffffffffff 716 717 0000005b21bb2018 ffffffffffffffff 718 719 0000005b21bb2020 ffffffffffffffff 720 721 0000005b21bb2028 ffffffffffffffff 722 723 0000005b21bb2030 ffffffffffffffff 724 ``` 725 726 According to the log, an out-of-bound read problem occurs. The faulty parameters are **buf** and **bufSize** of **memcpy**. 727 728 In this case, you only need to analyze the parameter logic passed in when **memcpy** is called in the code. 729 7304. Track the parameter source of the problematic object and locate the problem based on the code and logs. 731 732 Method 1: Check whether the parameter object and range are valid. For example, check whether the **buf** size is the same as the input **bufSize**. 733 734 Method 2: Check whether the lifecycle of the parameter object is valid. For example, check whether **buf** has been released and whether memory corruption occurs due to multi-thread operations. 735 736 Method 3: Use the parameter object to access the function context and check the improper operation logic of the parameter. For example, trace the operation logic of **buf** and **bufsize**, add debugging information, and locate the improper operation logic. 737 738 Code snippet: 739 740 ```text 741 static StatusInter xxxFunc(..., const uint8_t *buf, uint32_t bufSize) 742 743 ... 744 745 uint32_t srcSize = bufSize; 746 747 uint32_t srcOffset = cache->appendOffset - bufSize; 748 749 errno_t ret = memcpy_s(cache->buffer + srcOffset, srcSize, buf, bufSize); 750 751 if (ret != EOK) { 752 753 return MEMORY_OPERATE_FAILED_INTER; 754 755 } 756 757 ... 758 ``` 759 760 By continuously tracing the sources of **buf** and **bufSize**, it is found that after continuous copy, **bufSize** is greater than **buf**, causing out-of-bounds read. 761 762### Common CppCrash Faults and Causes 763 764- Null pointer dereference. 765 When a crash log is in format **SIGSEGV(SEGV_MAPERR)@0x00000000** or the values of the input parameter registers such as **r0** and **r1** printed in the **Register** are **0**, check whether a null pointer is input when invoking a method. 766 When a crash log is in format **SIGSEGV(SEGV_MAPERR)@0x0000000c** or the value of the input parameter register such as **r1** printed in the **Register** is small, check whether the called structs contain a null pointer. 767- SIGABRT. 768 Generally, this fault is triggered by the user, framework, or C library, and you can locate the problematic code in the first frame of the framework library. In this case, check whether resources such as thread and file descriptor are properly used, and whether the invoking sequence of APIs is correct. 769- SIGSEGV. 770 - Multithreading operation collection in STD library is not thread-safe. If the collection is added or deleted on multiple threads, the **SIGSEGV** crash occurs. If **llvm-addr2line** is used and the result code involve operations on collections, this could be the reason for the crash. 771 - If the pointer does not match the lifecycle of an object, for example, using a raw pointer to store the **sptr** type and **shared_ptr** type, can lead to memory leak and dangling pointer. A raw pointer is a pointer that does not have features such as encapsulation and automatic memory management. It is only a simple pointer to the memory address. The memory to which the pointer points is not protected or managed. A raw pointer can directly access the pointed memory, but problems such as memory leak and null pointer reference may also occur. Therefore, when using a raw pointer, pay attention to potential security problems. You are advised to use smart pointers to manage memory. 772- Use after free. 773 This fault occurs when the reference of a released stack variable is not set to null and the access continues. 774 775 ```text 776 # include <iostream> 777 778 int& getStackReference() { 779 int x = 5; 780 return x; // Return the reference to x. 781 } 782 783 int main() { 784 int& ref = getStackReference (); // Obtain the reference to x. 785 // x is released when getStackReference() returns. 786 // ref is now a dangling reference. If you continue to access it, undefined behavior occurs. 787 std::cout << ref << std::endl; // Outputting the value of x is an undefined behavior. 788 return 0; 789 } 790 ``` 791 792- Stack overflow occurs in recursive invocation, mutual invocation of destructors, and the use of large stack memory blocks in special stacks (signal stacks). 793 ```text 794 # include <iostream> 795 796 class RecursiveClass { 797 public: 798 RecursiveClass() { 799 std::cout << "Constructing RecursiveClass" << std::endl; 800 } 801 802 ~RecursiveClass() { 803 std::cout << "Destructing RecursiveClass" << std::endl; 804 // Recursive invocation of a destructor. 805 RecursiveClass obj; 806 } 807 }; 808 809 int main() { 810 RecursiveClass obj; 811 return 0; 812 } 813 ``` 814 815 When a **RecursiveClass** object is created, its constructor is called. When this object is destroyed, its destructor is called. In the destructor, a new **RecursiveClass** object is created, which causes recursive calls until the stack overflows. Recursive calls are infinite. As a result, the stack space is used up and the application crashes. 816- Binary mismatch usually indicates the mismatch of the Application Binary Interface (ABI). For example, when a compiled binary interface or its data structure definition does not match the ABI, a random crash stack is generated. 817- Memory corruption occurs when the memory of a valid wild pointer is changed to an invalid value, which results in out-of-bounds access and data overwrite. In this case, a random crash stack is generated. 818- SIGBUS (Alignment) occurs when the address is in the unaligned state after the pointer is forcibly converted. 819- When the length of a function name exceeds 256 bytes, the stack frame does not contain the function name. 820- If the ELF file does not contain **.note.gnu.build-id**, the stack frame does not contain the **build-id** information. 821 822## Case Study 823 824The following analyzes the typical CppCrash cases based on signals, scenarios, and tools respectively. 825The analysis based on signals introduces common crash signals and provides a typical case for each type of signal. 826The analysis based on scenarios concludes a common scenario for frequent problems, and provides a typical case for each scenario. 827The analysis based on tools describes how to use various maintenance and debugging tools, and provides a typical case for each tool. 828 829### Analyzing CppCrash Based on Signals 830 831#### Type 1: SIGSEGV Crash 832 833The **SIGSEGV** signal indicates a Segmentation Fault of the program. This fault occurs when a program accesses a memory area outside its bounds (for example, writes a memory in the operating system), or accesses a memory area without correct permission (for example, writes to read-only memory). The details are as follows: 834 835- **SIGSEGV** is a type of memory management fault. 836- **SIGSEGV** is generated in a user-mode program. 837- **SIGSEGV** occurs when a user-mode program accesses a memory area outside its bound. 838- **SIGSEGV** also occurs when a user-mode program accesses a memory without correct permission. 839 840In most cases, **SIGSEGV** is caused by pointer overwriting. However, not all pointer overwriting causes **SIGSEGV**. The **SIGSEGV** crash would not be triggered unless an out-of-bounds pointer is dereferenced. In addition, even if an out-of-bounds pointer is dereferenced, the **SIGSEGV** crash may not be caused. The **SIGSEGV** crash involves the operating system, C library, compiler, and linker. The examples are as follows: 841 842- The memory area is read-only memory. 843 The sample code is as follows: 844 845 ```text 846 static napi_value TriggerCrash(napi_env env, napi_callback_info info) 847 { 848 char *s = "hello world"; 849 s[1] = 'H'; 850 return 0; 851 } 852 ``` 853 854 This is one of the most common examples. In this case, "hello world" is a constant string and is placed in **.rodata section** of GCC. When the target program is generated, **.rodata section** is merged into the **text segment** and placed together with the **code segment**. Therefore, the memory area where the **.rodata section** is located is read-only. This is the **SIGSEGV(SEGV_ACCERR)** crash caused by writing to read-only memory area. 855 856  857 858- The memory area is out of the process address space. 859 The sample code is as follows: 860 861 ```text 862 static napi_value TriggerCrash(napi_env env, napi_callback_info info) 863 { 864 uint64_t* p = (uint64_t*)0xffffffcfc42ae6f4; 865 *p = 10; 866 return 0; 867 } 868 ``` 869 870 In this example, the program accesses a memory address in the kernel. The **SIGSEGV(SEGV_MAPERR)@0xffffffcfc42ae6f4** crash is usually triggered by the program by accident. The key logs of this cpp crash are as follows: 871 872 ```text 873 Device info:xxxxxx xxxx xx xxx 874 Build info:xxxxxxx 875 Fingerprint:73a5dcdf3e509605563aa11ac8cb4f3d7f99b9946dc142212246b53b741c4129 876 Module name:com.samples.recovery 877 Version:1.0.0 878 VersionCode:1000000 879 PreInstalled:No 880 Foreground:Yes 881 Timestamp:2024-04-29 14:07:12.082 882 Pid:21374 883 Uid:20020144 884 Process name:com.samples.recovery 885 Process life time:8s 886 Reason:Signal:SIGSEGV(SEGV_MAPERR)@0xffffffcfc42ae6f4 887 Fault thread info: 888 Tid:21374, Name:amples.recovery 889 # 00 pc 0000000000001ccc /data/storage/el1/bundle/libs/arm64/libentry.so(TriggerCrash(napi_env__*, napi_callback_info__*)+36)(4dd115fa8b8c1b3f37bdb5b7b67fc70f31f0dbac) 890 # 01 pc 0000000000033678 /system/lib64/platformsdk/libace_napi.z.so(ArkNativeFunctionCallBack(panda::JsiRuntimeCallInfo*)+372)(7d6f229764fdd4b72926465066bc475e) 891 # 02 pc 00000000001d7f38 /system/lib64/module/arkcompiler/stub.an(RTStub_PushCallArgsAndDispatchNative+40) 892 # 03 at doTriggerException entry (entry/src/main/ets/pages/FaultTriggerPage.ets:72:7) 893 # 04 at triggerNativeException entry (entry/src/main/ets/pages/FaultTriggerPage.ets:79:5) 894 # 05 at anonymous entry (entry/src/main/ets/pages/FaultTriggerPage.ets:353:19) 895 # 06 pc 000000000048e024 /system/lib64/platformsdk/libark_jsruntime.so(panda::FunctionRef::Call(panda::ecmascript::EcmaVM const*, panda::Local<panda::JSValueRef>, panda::Local<panda::JSValueRef> const*, int)+1040)(9fa942a1d42bd4ae607257975fbc1b77) 896 ... 897 # 38 pc 00000000000324b0 /system/bin/appspawn(AppSpawnRun+172)(c992404f8d1cf03c84c067fbf3e1dff9) 898 # 39 pc 00000000000213a8 /system/bin/appspawn(main+956)(c992404f8d1cf03c84c067fbf3e1dff9) 899 # 40 pc 00000000000a4b98 /system/lib/ld-musl-aarch64.so.1(libc_start_main_stage2+64)(ff4c94d996663814715bedb2032b2bbc) 900 ``` 901 9023. The memory does not exist. 903 The sample code is as follows: 904 905 ```text 906 static napi_value TriggerCrash(napi_env env, napi_callback_info info) 907 { 908 int *a = NULL; 909 *a = 1; 910 return 0; 911 } 912 ``` 913 914 In practice, the most common null pointer dereference occurs when the user-mode address to which the null pointer points does not exist. The inference information "Reason:Signal:SIGSEGV(SEGV_MAPERR)@000000000000000000 probably caused by NULL pointer dereference" is printed in the **Reason** of CppCrash logs, as shown in the following figure. 915 916  917 9184. Double free. 919 The sample code is as follows: 920 921 ```text 922 static napi_value TriggerCrash(napi_env env, napi_callback_info info) 923 { 924 void *pc = malloc(1024); 925 free(pc); 926 free (pc); // Double free 927 printf("free ok!\n"); 928 return 0; 929 } 930 ``` 931 932 In the double-free memory scenario, the system throws a **SIGSEGV(SI_TKILL)** fault indicating an illegal memory operation, as shown below 933 934  935 936 The preceding are common causes for **SIGSEGV** crashes. Other scenarios may also trigger **SIGSEGV** crashes, which include stack overflow memory access, heap overflow memory access, global wild pointer access, execution on an invalid address, and invalid parameter invocation. The **SIGSEGV** crash is associated to the stack allocation and recovery of the operating system and the compiler. 937 938#### Type 2: SIGABRT Crash 939 940The **SIGABRT** signal is sent to abort the process. This signal can be called by the process executing **abort()** in C standard library, or it can be sent to the process from outside like other signals. 941 942- 943 The sample code of executing the **abort()** function: 944 945 ```text 946 static napi_value TriggerCrash(napi_env env, napi_callback_info info) 947 { 948 OH_LOG_FATAL(LOG_APP, "test fatal log."); 949 abort(); 950 return 0; 951 } 952 ``` 953 954 In this scenario, the **abort()** function is proactively called when a process is identified as not safe in checks from basic libraries. The last fatal log before the process exits is printed in the crash log, as shown in the following figure: 955 956  957 958- 959 The sample code of executing the **assert()** function: 960 961 ```text 962 static napi_value TriggerCrash(napi_env env, napi_callback_info info) 963 { 964 # if 0 // If the value is 0, an error is reported. If the value is 1, it is normal. 965 void *pc = malloc(1024); 966 # else 967 void *pc = nullptr; 968 # endif 969 assert(pc != nullptr); 970 return 0; 971 } 972 ``` 973 974 In addition to the **abort()** function, other exception handling mechanisms in C++ include the **assert()** function, **exit()** function, exception capture mechanism (**try-catch**), and **exception** class. The **assert()** function is used to check some data in the function execution. If the check fails, the process aborts. The corresponding fault scenario is shown below. 975 976  977 978### Analyzing CppCrash Based on Scenarios 979 980#### Type 1: Memory Access Crash 981 982**Background** 983 984The crash address **0x7f82764b70** is in the readable and executable segment of **libace_napi_ark.z.so**. The cause is that the address needs to be written, but the corresponding **maps** segment has only the read and execute permissions. In other words, when a process attempts to access a memory area that is not allowed to be accessed, the process crashes. 985 986```text 9877f82740000-7f8275c000 r--p 00000000 /system/lib64/libace_napi_ark.z.so 9887f8275c000-7f8276e000 r-xp 0001b000 /system/lib64/libace_napi_ark.z.so <- The crash address locates within this address range. 9897f8276e000-7f82773000 r--p 0002c000 /system/lib64/libace_napi_ark.z.so 9907f82773000-7f82774000 rw-p 00030000 /system/lib64/libace_napi_ark.z.so 991``` 992 993The following figure shows the crash call stack. 994 995 996 997**Fault Analysis** 998 999This address error is regular, but it is abnormal that the node address fall in **libace_napi_ark.z.so**. In this case, this may be memory corruption error. You can use [ASan Check](https://developer.huawei.com/consumer/cn/doc/harmonyos-guides-V5/ide-asan-V5) to locate the memory corruption error. By performing stress tests to reproduce the problem, ASan can also be used to find the regular crash scenario. The fault detected by ASan is the same as that in the crash stack above. The stack reports **heap-use-after-free**, which was actually a double free of the same address. During the second free operation, the address is used to access to its object member, resulting in a UAF fault. 1000The key logs of ASan are as follows: 1001 1002```text 1003================================================================= 1004==appspawn==2029==ERROR: AddressSanitizer: heap-use-after-free on address 0x003a375eb724 at pc 0x002029ba8514 bp 0x007fd8175710 sp 0x007fd8175708 1005READ of size 1 at 0x003a375eb724 thread T0 (thread name) 1006 # 0 0x2029ba8510 (/system/asan/lib64/platformsdk/libark_jsruntime.so+0xca8510) panda::ecmascript::Node::IsUsing() const at arkcompiler/ets_runtime/ecmascript/ecma_global_storage.h:82:16 1007(inlined by) panda::JSNApi::DisposeGlobalHandleAddr(panda::ecmascript::EcmaVM const*, unsigned long) at arkcompiler/ets_runtime/ecmascript/napi/jsnapi.cpp:749:67 BuildID[md5/uuid]=9a18e2ec0dc8a83216800b2f0dd7b76a 1008 # 1 0x403ee94d30 (/system/asan/lib64/libace.z.so+0x6194d30) panda::CopyableGlobal<panda::ObjectRef>::Free() at arkcompiler/ets_runtime/ecmascript/napi/include/jsnapi.h:1520:9 1009(inlined by) panda::CopyableGlobal<panda::ObjectRef>::Reset() at arkcompiler/ets_runtime/ecmascript/napi/include/jsnapi.h:189:9 1010(inlined by) OHOS::Ace::Framework::JsiType<panda::ObjectRef>::Reset() at foundation/arkui/ace_engine/frameworks/bridge/declarative_frontend/engine/jsi/jsi_types.inl:112:13 1011(inlined by) OHOS::Ace::Framework::JsiWeak<OHOS::Ace::Framework::JsiObject>::~JsiWeak() at foundation/arkui/ace_engine/frameworks/bridge/declarative_frontend/engine/jsi/jsi_ref.h:167:16 1012(inlined by) OHOS::Ace::Framework::ViewFunctions::~ViewFunctions() at foundation/arkui/ace_engine/frameworks/bridge/declarative_frontend/jsview/js_view_functions.h:44:5 BuildID[md5/uuid]=1330f8b9be73bdb76ae18107c2a60ca1 1013 # 2 0x403ee9296c (/system/asan/lib64/libace.z.so+0x619296c) OHOS::Ace::Framework::ViewFunctions::~ViewFunctions() at foundation/arkui/ace_engine/frameworks/bridge/declarative_frontend/jsview/js_view_functions.h:42:5 1014(inlined by) OHOS::Ace::Framework::ViewFunctions::~ViewFunctions() at foundation/arkui/ace_engine/frameworks/bridge/declarative_frontend/jsview/js_view_functions.h:42:5 BuildID[md5/uuid]=1330f8b9be73bdb76ae18107c2a60ca1 1015 # 3 0x403ed9b130 (/system/asan/lib64/libace.z.so+0x609b130) OHOS::Ace::Referenced::DecRefCount() at foundation/arkui/ace_engine/frameworks/base/memory/referenced.h:76:13 1016(inlined by) OHOS::Ace::RefPtr<OHOS::Ace::Framework::ViewFunctions>::~RefPtr() at foundation/arkui/ace_engine/frameworks/base/memory/referenced.h:148:22 BuildID[md5/uuid]=1330f8b9be73bdb76ae18107c2a60ca1 1017 # 4 0x403ed9b838 (/system/asan/lib64/libace.z.so+0x609b838) OHOS::Ace::RefPtr<OHOS::Ace::Framework::ViewFunctions>::Reset() at foundation/arkui/ace_engine/frameworks/base/memory/referenced.h:163:9 1018(inlined by) OHOS::Ace::Framework::JSViewFullUpdate::~JSViewFullUpdate() at foundation/arkui/ace_engine/frameworks/bridge/declarative_frontend/jsview/js_view.cpp:159:21 BuildID[md5/uuid]=1330f8b9be73bdb76ae18107c2a60ca1 1019 # 5 0x403ed9bf24 (/system/asan/lib64/libace.z.so+0x609bf24) OHOS::Ace::Framework::JSViewFullUpdate::~JSViewFullUpdate() at foundation/arkui/ace_engine/frameworks/bridge/declarative_frontend/jsview/js_view.cpp:157:1 1020(inlined by) OHOS::Ace::Framework::JSViewFullUpdate::~JSViewFullUpdate() at foundation/arkui/ace_engine/frameworks/bridge/declarative_frontend/jsview/js_view.cpp:157:1 BuildID[md5/uuid]=1330f8b9be73bdb76ae18107c2a60ca1 1021... 1022freed by thread T0 (thread name) here: 1023 # 0 0x2024ed3abc (/system/asan/lib64/libclang_rt.asan.so+0xd3abc) 1024 # 1 0x2029ba8424 (/system/asan/lib64/platformsdk/libark_jsruntime.so+0xca8424) std::__h::__function::__value_func<void (unsigned long)>::operator()[abi:v15004](unsigned long&&) const at prebuilts/clang/ohos/linux-x86_64/llvm/bin/../include/libcxx-ohos/include/c++/v1/__functional/function.h:512:16 1025(inlined by) std::__h::function<void (unsigned long)>::operator()(unsigned long) const at prebuilts/clang/ohos/linux-x86_64/llvm/bin/../include/libcxx-ohos/include/c++/v1/__functional/function.h:1197:12 1026(inlined by) panda::ecmascript::JSThread::DisposeGlobalHandle(unsigned long) at arkcompiler/ets_runtime/ecmascript/js_thread.h:604:9 1027(inlined by) panda::JSNApi::DisposeGlobalHandleAddr(panda::ecmascript::EcmaVM const*, unsigned long) at arkcompiler/ets_runtime/ecmascript/napi/jsnapi.cpp:752:24 BuildID[md5/uuid]=9a18e2ec0dc8a83216800b2f0dd7b76a 1028 # 2 0x403ee94b68 (/system/asan/lib64/libace.z.so+0x6194b68) panda::CopyableGlobal<panda::FunctionRef>::Free() at arkcompiler/ets_runtime/ecmascript/napi/include/jsnapi.h:1520:9 1029(inlined by) panda::CopyableGlobal<panda::FunctionRef>::Reset() at arkcompiler/ets_runtime/ecmascript/napi/include/jsnapi.h:189:9 1030(inlined by) OHOS::Ace::Framework::JsiType<panda::FunctionRef>::Reset() at foundation/arkui/ace_engine/frameworks/bridge/declarative_frontend/engine/jsi/jsi_types.inl:112:13 1031(inlined by) OHOS::Ace::Framework::JsiWeak<OHOS::Ace::Framework::JsiFunction>::~JsiWeak() at foundation/arkui/ace_engine/frameworks/bridge/declarative_frontend/engine/jsi/jsi_ref.h:167:16 1032(inlined by) OHOS::Ace::Framework::ViewFunctions::~ViewFunctions() at foundation/arkui/ace_engine/frameworks/bridge/declarative_frontend/jsview/js_view_functions.h:44:5 BuildID[md5/uuid]=1330f8b9be73bdb76ae18107c2a60ca1 1033 # 3 0x403ee9296c (/system/asan/lib64/libace.z.so+0x619296c) OHOS::Ace::Framework::ViewFunctions::~ViewFunctions() at foundation/arkui/ace_engine/frameworks/bridge/declarative_frontend/jsview/js_view_functions.h:42:5 1034(inlined by) OHOS::Ace::Framework::ViewFunctions::~ViewFunctions() at foundation/arkui/ace_engine/frameworks/bridge/declarative_frontend/jsview/js_view_functions.h:42:5 BuildID[md5/uuid]=1330f8b9be73bdb76ae18107c2a60ca1 1035 # 4 0x403ed9b130 (/system/asan/lib64/libace.z.so+0x609b130) OHOS::Ace::Referenced::DecRefCount() at foundation/arkui/ace_engine/frameworks/base/memory/referenced.h:76:13 1036(inlined by) OHOS::Ace::RefPtr<OHOS::Ace::Framework::ViewFunctions>::~RefPtr() at foundation/arkui/ace_engine/frameworks/base/memory/referenced.h:148:22 BuildID[md5/uuid]=1330f8b9be73bdb76ae18107c2a60ca1 1037... 1038previously allocated by thread T0 (thread name) here: 1039 # 0 0x2024ed3be4 (/system/asan/lib64/libclang_rt.asan.so+0xd3be4) 1040 # 1 0x2029ade778 (/system/asan/lib64/platformsdk/libark_jsruntime.so+0xbde778) panda::ecmascript::NativeAreaAllocator::AllocateBuffer(unsigned long) at arkcompiler/ets_runtime/ecmascript/mem/native_area_allocator.cpp:98:17 BuildID[md5/uuid]=9a18e2ec0dc8a83216800b2f0dd7b76a 1041 # 2 0x2029a39064 (/system/asan/lib64/platformsdk/libark_jsruntime.so+0xb39064) std::__h::enable_if<!std::is_array_v<panda::ecmascript::NodeList<panda::ecmascript::WeakNode>>, panda::ecmascript::NodeList<panda::ecmascript::WeakNode>*>::type panda::ecmascript::NativeAreaAllocator::New<panda::ecmascript::NodeList<panda::ecmascript::WeakNode>>() at arkcompiler/ets_runtime/ecmascript/mem/native_area_allocator.h:61:19 1042(inlined by) unsigned long panda::ecmascript::EcmaGlobalStorage<panda::ecmascript::Node>::NewGlobalHandleImplement<panda::ecmascript::WeakNode>(panda::ecmascript::NodeList<panda::ecmascript::WeakNode>**, panda::ecmascript::NodeList<panda::ecmascript::WeakNode>**, unsigned long) at arkcompiler/ets_runtime/ecmascript/ecma_global_storage.h:565:34 1043(inlined by) panda::ecmascript::EcmaGlobalStorage<panda::ecmascript::Node>::SetWeak(unsigned long, void*, void (*)(void*), void (*)(void*)) at arkcompiler/ets_runtime/ecmascript/ecma_global_storage.h:455:26 BuildID[md5/uuid]=9a18e2ec0dc8a83216800b2f0dd7b76a 1044 # 3 0x2029ba5620 (/system/asan/lib64/platformsdk/libark_jsruntime.so+0xca5620) std::__h::__function::__value_func<unsigned long (unsigned long, void*, void (*)(void*), void (*)(void*))>::operator()[abi:v15004](unsigned long&&, void*&&, void (*&&)(void*), void (*&&)(void*)) const at prebuilts/clang/ohos/linux-x86_64/llvm/bin/../include/libcxx-ohos/include/c++/v1/__functional/function.h:512:16 1045(inlined by) std::__h::function<unsigned long (unsigned long, void*, void (*)(void*), void (*)(void*))>::operator()(unsigned long, void*, void (*)(void*), void (*)(void*)) const at prebuilts/clang/ohos/linux-x86_64/llvm/bin/../include/libcxx-ohos/include/c++/v1/__functional/function.h:1197:12 1046(inlined by) panda::ecmascript::JSThread::SetWeak(unsigned long, void*, void (*)(void*), void (*)(void*)) at arkcompiler/ets_runtime/ecmascript/js_thread.h:610:16 1047(inlined by) panda::JSNApi::SetWeak(panda::ecmascript::EcmaVM const*, unsigned long) at arkcompiler/ets_runtime/ecmascript/napi/jsnapi.cpp:711:31 BuildID[md5/uuid]=9a18e2ec0dc8a83216800b2f0dd7b76a 1048... 1049``` 1050 1051When **JsiWeak** is destructed or reset, **CopyableGlobal** in the parent class **JsiType** of its member (**JsiObject**/**JsiValue**/**JsiFunction**) is released, as shown in the following figure. 1052 1053 1054 1055During Garbage Collection (GC), **IterateWeakEcmaGlobalStorage** calls **DisposeGlobalHandle** on **WeakNode** without a callback, and releases it, as shown in the following figure. 1056 1057 1058 1059Therefore, for the same **WeakNode**, there may be two functions for release. If **IterateWeakEcmaGlobalStorage** releases it first during GC, without a callback notification to **JsiWeak** for cleanup, **JsiWeak** still retains a reference **CopyableGlobal** to the released **WeakNode**. When the **NodeList** containing the **WeakNode** is released and returned to the operating system, the retained **CopyableGlobal** in **JsiWeak** is released again, leading to a double-free error. 1060 1061 1062 1063**Solutions** 1064 1065Invoke a callback when **JsiWeak** calls **SetWeakCallback**. Therefore, the callback can notify **JsiWeak** to reset **CopyableGlobal** when **IterateWeakEcmaGlobalStorage** releases the **WeakNode**, ensuring the same address is not double-freed. 1066 1067**Suggestions** 1068 1069When using memory, consider whether the memory is double-freed or not freed. Additionally, when locating memory access crashes (usually **SIGSEGV** crashes), run the ASan to reproduce the fault if there is no clue based on the crash stack analysis. 1070 1071#### Type 2: Multi-thread Crash 1072 1073**Background** 1074 1075**napi_env** is still used after being released. 1076 1077**Symptom** 1078 1079The **env** of a **napi** API is invalid. The crash stack is mounted to **NativeEngineInterface::ClearLastError()**. Based on the log of **env** address, it is found that the **env** is used after being released. 1080 1081 1082 1083The key crash stack is as follows. 1084 1085 1086 1087**Solutions** 1088 1089The **env** created by a thread should not be transferred to another thread. 1090 1091**Suggestions** 1092 1093You can select the **Multi Thread Check** option to locate multi thread faults. For details, see "Ark Runtime Multi Thread Check" in guideline. 1094 1095Note: **env** in the **napi** interface is the **arkNativeEngine** when the engine is created. 1096 1097#### Type 3: Lifecycle Crash 1098 1099**Background** 1100 1101When you create a native **napi_value**, it needs to be used with **napi_handle_scope**. The **napi_handle_scope** is used to manage the lifecycle of **napi_value**. **napi_value** can be used only within **napi_handle_scope**, otherwise, the lifecycle of **napi_value** and its JS objects is no longer protected. If the reference count is 0, **napi_value** is collected by GC. Using **napi_value** at this point indicates accessing freed memory, which results in faults. 1102 1103**Symptom** 1104 1105**napi_value** is a raw pointer (a struct pointer). It is used to hold JS objects and maintain the lifecycle of JS objects to ensure that JS objects are not collected by GC. **napi_handle_scope** is used to manage **napi_value**. Once out of **napi_handle_scope**, **napi_value** is collected by GC, and **napi_value** no longer holds the JS object (no longer protects the JS object's lifecycle) 1106 1107**Fault Analysis** 1108 1109By decompiling the crash stack, the upper-level interface of the problematic **napi** interface can be located, in which the problematic **napi_value** can be found. In this case, you need to check if the **napi_value** is used out of **napi_handle_scope**. 1110 1111**Cases** 1112 1113The **napi_value** is used out of the scope of the NAPI framework. 1114 1115 1116 1117On the JS side, data is added using the **Add()**, and on the native side, **napi_value** is saved to a **vector**. On the JS side, data is obtained using the **get** API, and on the native side, the saved **napi_value** is returned as an array. The JS side then reads the properties of the data. The error message "Can not get Prototype on non ECMA Object" is displayed. The **native_value** across **napi** is not saved using **napi_ref**. As a result, the **native_value** is invalid. 1118Note: The scope of the NAPI framework is **napi_handle_scope**. You can use **napi_handle_scope** to manage the lifecycle of **napi_value**. The scope of the framework layer is embedded in the end-to-end process of the JS call native. That is, the scope is opened when the native method is entered, and the scope is closed when the native method ends. 1119 1120#### Type 4: Pointer Crash 1121 1122**Background** 1123 1124Smart pointers are used without null checks, causing null pointer dereference crashes during process execution. 1125 1126**Impact** 1127 1128The process crashes, causing unexpected exit. 1129 1130**Fault Analysis** 1131 1132 1133 1134Null pointer crashes can be identified based on the fault cause. Run the llvm-addr2line command to parse the line number. It is found that the service code does not check whether the smart pointer is null before using it. As a result, the service code accesses the null address, causing the crash. 1135 1136**Solution** 1137 1138Add protective null checks for the pointer. 1139 1140**Suggestions** 1141 1142Pointers should be null-checked before using it to prevent null pointers and process crashes and exits. 1143 1144### Analyzing Cpp Crash Based on Tools 1145 1146#### Tool 1: ASAN 1147 1148[ASan Check](https://developer.huawei.com/consumer/en/doc/harmonyos-guides-V5/ide-asan-V5). 1149 1150#### Tool 2: Ark Runtime Multi Thread Check 1151 1152**Fundamentals** 1153 1154JS is single-threaded. Operations on JS objects can be performed only on the JS thread. Otherwise, multi-thread security problems may occur. (JS objects created on the main thread can be operated only on the main thread, and JS objects created on the worker thread can be operated only on the worker thread.) The napi APIs involve object operations. Therefore, 95% napi APIs can be used only on the JS thread. The multi-thread detection mechanism checks whether the **JS thread ID** of the calling thread is the same as that of the used **VM/Env**. If they are different, the **VM/Env** is used across threads, causing multi-thread security problems. Common problems: 1. Napi APIs are used in non-JS threads. 2. **env** of other threads are used in napi APIs. 1155 1156**How to Use** 1157 1158 1159 1160Select **Multi Thread Check** on DevEco to enable Ark multi-thread detection. 1161 1162**Scenario** 1163 1164If the stack of crash logs is difficult to analyze and the probability of this problem is high, you need to enable multi-thread detection. When the multi-thread detection is enabled, if the fatal information in the **cpp_crash** log is "Fatal: ecma_vm cannot run in multi-thread! thread:3096 currentThread:3550", it indicates that a multi-thread security problem occurs. That is, the calling thread ID is **3550**, but the JS thread is created by thread **3096**. The **vm** is used across threads. 1165 1166**Cases** 1167 1168After the multi thread check is enabled, the crash is triggered again. If the problem is caused by multiple threads, fatal information is displayed. The following is an example: 1169 1170```text 1171Fatal: ecma_vm cannot run in multi-thread! thread:xxx currentThread:yyy 1172``` 1173 1174The preceding information indicates that the calling thread ID is **17585**, but the JS thread is created by thread **17688**. The **vm** is used across threads. The **vm** is the **napi_env__*** of the JS thread. It is the environment for running thread code. One thread uses one **vm**. 1175The key crash log is as follows: 1176 1177```text 1178Reason:Signal:SIGABRT(SI_TKILL)@0x01317b9f000044b1 from:17585: 20020127 1179LastFatalMessage: [default] CheckThread:177 Fatal: ecma_vm cannot run in multi-thread! thread:17688 currentThread:17585 1180Fault thread Info: 1181Tid:17585, Name:xxxxx 1182# 00 pc 00000000000f157c /system/lib/ld-musl-aarch64-asan.so.1(__restore_sigs+52)(38eb4ca904ae601d4b4dca502e948960) 1183# 01 pc 00000000000f1800 /system/lib/ld-musl-aarch64-asan.so.1(raise+112) (38eb4ca904ae��01d4b4dca502e948960) 1184# 02 pc 00000000000adc74 /system/lib/ld-musl-aarch64-asan.so.1(abort.+20) (38eb4ca904ae601d4b4dca502e948960) 1185# 03 pc 0000000000844fdc /system/asan/lib��4/platformsdk/libark_jsruntime.so(panda::ecmascript::EcmaVM::CheckThread() const+2712)(1df055932338c14060b864435aec88ab) 1186# 04 pc 0000000000f3d930 /system/asan/lib��4/platformsdk/libark_jsruntime.so(panda::0bjectRef:: New(panda::ecmascript::EcmaVM const*)+908)(1df055932338c14060b864435aec88 1187# 05 pC 0000000000095048 /sYstem/asan/lib64/platformsdk/libace_napi.z.so(napi_create_object+80)(efc1b3d1378f56b4b800489fb30dcded) 1188# 06 pc 00000000005d9770 /data/ storage/el1/bundle/libs/arm64/xxxxx.so (c0f1735eada49fadc5197745f5afOc0a52246270) 1189``` 1190 1191To analyze the multi-thread problem, perform the following steps: 1192i. Check the first stack frame under **libace_napi.z.so**. The preceding figure shows **xxxxx.so**. Check whether the **napi_env** of thread **17688** is transferred to thread **17585**. 1193ii. If the stack frame under **libace_napi.z.so** does not transfer the **napi_env** parameter, check whether the parameter is transferred as a struct member variable. 1194 1195#### Tool 3: objdump 1196 1197**How to Use** 1198 1199objdump binary is a system tool. You must have the OpenHarmony compilation environment, whose project code can be obtained from Gitee. The command is as follows: 1200 1201```text 1202repo init -u git@gitee.com:openharmony/manifest.git -b master --no-repo-verify --no-clone-bundle --depth=1 1203repo sync -c 1204./build/prebuilts_download.sh 1205``` 1206 1207You can obtain the tool in `prebuilts/clang/ohos/linux-x86_64/llvm/bin/llvm-objdump` of the project. The command is as follows: 1208 1209```text 1210prebuilts/clang/ohos/linux-x86_64/llvm/bin/llvm-objdump -d libark_jsruntime.so > dump.txt 1211``` 1212 1213**Scenario** 1214 1215In some cases, addr2line can only be used to check whether a line of the code is faulty but cannot determine which variable is abnormal. In this case, you can use objdump to disassemble the code and combine the information from the cppcrash register to further determine the crash cause. 1216 1217**Cases** 1218 1219The log is as follows: 1220 1221```text 1222Tid:6655, Name:GC_WorkerThread 1223# 00 pc 00000000004492d4 /system/lib64/platformsdk/libark_jsruntime.so(panda::ecmascript::NonMovableMarker::MarkObject(unsigned int, panda::ecmascript::TaggedObject*)+124)(21cf5411626d5986a4ba6383e959b3cc) 1224# 01 pc 000000000044b580 /system/lib64/platformsdk/libark_jsruntime.so(panda::ecmascript::NonMovableMarker::MarkValue(unsigned int, panda::ecmascript::ObjectSlot&, panda::ecmascript::Region*, bool)+72)(21cf5411626d5986a4ba6383e959b3cc) 1225# 02 pc 000000000044b4e8 /system/lib64/platformsdk/libark_jsruntime.so(std::__h::__function::__func<panda::ecmascript::NonMovableMarker::ProcessMarkStack(unsigned int)::$_2, std::__h::allocator<panda::ecmascript::NonMovableMarker::ProcessMarkStack(unsigned int)::$_2>, void (panda::ecmascript::TaggedObject*, panda::ecmascript::ObjectSlot, panda::ecmascript::ObjectSlot, panda::ecmascript::VisitObjectArea)>::operator()(panda::ecmascript::TaggedObject*&&, panda::ecmascript::ObjectSlot&&, panda::ecmascript::ObjectSlot&&, panda::ecmascript::VisitObjectArea&&)+256)(21cf5411626d5986a4ba6383e959b3cc) 1226# 03 pc 0000000000442ac0 /system/lib64/platformsdk/libark_jsruntime.so(void panda::ecmascript::ObjectXRay::VisitObjectBody<(panda::ecmascript::VisitType)1>(panda::ecmascript::TaggedObject*, panda::ecmascript::JSHClass*, std::__h::function<void (panda::ecmascript::TaggedObject*, panda::ecmascript::ObjectSlot, panda::ecmascript::ObjectSlot, panda::ecmascript::VisitObjectArea)> const&)+216)(21cf5411626d5986a4ba6383e959b3cc) 1227# 04 pc 0000000000447ccc /system/lib64/platformsdk/libark_jsruntime.so(panda::ecmascript::NonMovableMarker::ProcessMarkStack(unsigned int)+248)(21cf5411626d5986a4ba6383e959b3cc) 1228# 05 pc 0000000000438588 /system/lib64/platformsdk/libark_jsruntime.so(panda::ecmascript::Heap::ParallelGCTask::Run(unsigned int)+148)(21cf5411626d5986a4ba6383e959b3cc) 1229# 06 pc 00000000004e31c8 /system/lib64/platformsdk/libark_jsruntime.so(panda::ecmascript::Runner::Run(unsigned int)+144)(21cf5411626d5986a4ba6383e959b3cc) 1230# 07 pc 00000000004e3780 /system/lib64/platformsdk/libark_jsruntime.so(void* std::__h::__thread_proxy[abi:v15004]<std::__h::tuple<std::__h::unique_ptr<std::__h::__thread_struct, std::__h::default_delete<std::__h::__thread_struct>>, void (panda::ecmascript::Runner::*)(unsigned int), panda::ecmascript::Runner*, unsigned int>>(void*)+64)(21cf5411626d5986a4ba6383e959b3cc) 1231# 08 pc 000000000014d894 /system/lib/ld-musl-aarch64.so.1 1232# 09 pc 0000000000085d04 /system/lib/ld-musl-aarch64.so.1 1233``` 1234 1235Run the addr2line command to locate the error line. 1236 1237 1238 1239The preceding information indicates that a null pointer is accessed and the process is suspended when **InYoungSpace** is accessed. Therefore, it can be suspected that the **Region** is a null pointer. 1240Use objdump to disassemble and search for the error address **4492d4**. The command is as follows: 1241 1242 1243 1244Check the **x20** register, and the value is **0x000000000000000**. The preceding information shows that **x20** performs bitwise operation based on **x2** (the last 18 bits are cleared, which is a typical **Region::ObjectAddressToRange** operation). The analysis shows that **x2** is the second parameter object of the **MarkObject** function, and **x20** is the variable **objectRegion**. 1245 1246```text 1247Registers: x0:0000007f0fe31560 x1:0000000000000003 x2:0000000000000000 x3:0000005593100000 1248 x4:0000000000000000 x5:0000000000000000 x6:0000000000000000 x7:0000005596374fa0 1249 x8:0000000000000000 x9:0000000000000000 x10:0000000000000000 x11:0000007f9cb42bb8 1250 x12:000000000000005e x13:000000000061f59e x14:00000005d73d60fb x15:0000000000000000 1251 x16:0000007f9cc5f200 x17:0000007f9f201f68 x18:0000000000000000 x19:0000000000000000 1252 x20:0000000000000000 x21:0000000000000000 x22:0000000000000000 x23:000000559313f860 1253 x24:000000559313f868 x25:0000000000000003 x26:00000055a0e19960 x27:0000007f9cc57b38 1254 x28:0000007f9f21a1c0 x29:00000055a0e19700 lr:0000007f9cb4b584 sp:00000055a0e19700 pc:0000007f9cb492d4 1255``` 1256 1257**ldrb w8, [x20]** corresponds to **packedData_.flags_.spaceFlag_** because **packedData_** is the first field of **region**, **flags_** is the first field of **packedData_**, and **spaceFlag_** is the first field of **flags_**. Therefore, the first byte corresponding to the **objectRegion** address is used. 1258To view assembly code, you need to be familiar with common assembly instructions and parameter transfer rules. For example, the non-inline member function **r0** in C++ stores the **this** pointer. In addition, due to compiler optimization, the mapping between source code and assembly code may not be clear. The mapping can be quickly obtained based on some feature values (constants) in the code. 1259