1# 分析AppFreeze(应用无响应) 2 3用户在使用应用时会出现点击没反应、应用无响应等情况,其超过一定时间限制后即被定义为应用无响应(appfreeze)。系统提供了检测应用无响应的机制,并生成appfreeze日志供应用开发分析。 4 5> **说明:** 6> 7> 本文仅适用于Stage模型下的应用使用。且在根据本文分析日志前,需要开发者对JS在系统中运行情况、C++程序堆栈信息有相关基础知识,并对应用相关的子系统有一定了解。 8 9## 应用无响应检测能力点 10 11目前应用无响应检测从以下维度检测,应用开发者了解其原理对定位和分析appfreeze故障非常有帮助。 12 13| 故障类型 | 说明 | 14| -------- | -------- | 15| THREAD_BLOCK_6S | 应用主线程卡死超时。 | 16| APP_INPUT_BLOCK | 用户输入响应超时。 | 17| LIFECYCLE_TIMEOUT | Ability生命周期切换超时。 | 18 19### THREAD_BLOCK_6S 应用主线程卡死超时 20 21发生该故障,表示当前应用主线程有卡死或者执行任务过多的情况,影响应用的流畅度和体验。 22 23检测原理:应用的watchdog线程定期向主线程插入判活检测任务。当判活检测超过3s没有被执行,会上报THREAD_BLOCK_3S告警事件;超过6s依然没有被执行,会上报THREAD_BLOCK_6S主线程卡死事件。两个事件匹配生成THREAD_BLOCK的应用无响应日志。 24 25检测原理如下图: 26 27 28 29### APP_INPUT_BLOCK 用户输入响应超时 30 31该故障是指用户的点击事件超过5s未得到响应。 32 33检测原理:用户点击应用时,输入系统会向应用侧发送点击事件,应用侧的响应反馈回执超时,则上报该故障。 34 35检测原理如下图: 36 37 38 39### 生命周期切换超时 40 41生命周期切换超时分为[Ability生命周期](../application-models/uiability-lifecycle.md#uiability组件生命周期)切换超时和[PageAbility生命周期](../application-models/pageability-lifecycle.md#pageability的生命周期)切换超时。 42 43该故障发生在生命周期切换的过程中,影响当前应用内Ability的切换或者不同PageAbility之间的切换。 44 45检测原理是:在生命周期切换开始的位置向watchdog线程插入超时任务,切换完成之后移除超时任务,固定时间内未成功移除任务将上报故障。 46 47生命周期切换超时由LIFECYCLE_HALF_TIMEOUT和LIFECYCLE_TIMEOUT两个事件组合而成。LIFECYCLE_HALF_TIMEOUT作为LIFECYCLE_TIMEOUT的告警事件,抓取binder等信息。 48 49 50 51不同的生命周期,超时的时间不一样。如下表所示: 52 53| 生命周期 | 超时时间 | 54| -------- | -------- | 55| Load | 10s | 56| Terminate | 10s | 57| Connect | 3s | 58| Disconnect | 0.5s | 59| Foreground | 5s | 60| Background | 3s | 61 62## 应用无响应日志分析 63 64应用无响应(appfreeze)故障需要结合应用无响应日志和流水hilog日志一起分析。 65 66当前提供一个故障分析示例,请开发者根据具体问题具体分析。 67 68应用无响应日志主要分以下几个模块信息: 69 70### 日志头部信息 71 72| 字段 | 说明 | 73| -------- | -------- | 74| Reason | 应用无响应原因,与[应用无响应检测能力点](#应用无响应检测能力点)对应。 | 75| PID | 发生故障时候的pid。 | 76| PACKAGE_NAME | 应用进程包名。 | 77 78``` 79Generated by HiviewDFX@OpenHarmony 80================================================================ 81Device info:OpenHarmony 3.2 82Build info:OpenHarmony 5.1.0.59 83Fingerprint:9f232fb5053c092144eedaa39ceecd67a6997db69467973b4d5fe786f184374d 84Module name:com.example.freeze 85Version:1.0.0 86VersionCode:1000000 87PreInstalled:No 88Foreground:Yes 89Pid:2212 90Uid:20010044 91Reason:THREAD_BLOCK_6S 92appfreeze: com.example.freeze THREAD_BLOCK_6S at 20170817192244 93DisplayPowerInfo:powerState:UNKNOWN 94>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 95DOMAIN:AAFWK 96STRINGID:THREAD_BLOCK_6S 97TIMESTAMP:2017/08/17-19:22:44:952 98PID:2212 99UID:20010044 100PACKAGE_NAME:com.example.freeze 101PROCESS_NAME:com.example.freeze 102******************************************* 103``` 104 105### 日志主干通用信息 106 107三种Appfreeze事件都包含以下几部分信息,可以通过搜索相关“主要信息字段”的内容,在日志中找到对应的位置。具体信息如下: 108 109| 主要信息字段 | 说明 | 110| -------- | -------- | 111| EVENTNAME | 组成卡死检测事件。 | 112| TIMESTAMP | 发生故障时上报事件的时刻,可以根据[应用无响应检测能力点](#应用无响应检测能力点)中说明的超时时间,在相应流水日志中缩小查看日志的时间范围。 | 113| PID | 发生故障时候的pid。 | 114| PACKAGE_NAME | 应用进程包名。 | 115| MSG | 发生故障时dump信息或者说明信息,后面具体说明。 | 116| BinderCatcher | 进程间通信的调用信息,显示调用等待时间长的情况。 | 117| PeerBinder Stacktrace | 故障进程相关的对端进程有卡死,会抓取对端的进程堆栈。 | 118| cpuusage | 故障时间段整机CPU使用情况。 | 119| memory | 故障时间设备内存使用情况。 | 120 121> **说明:** 122> 123> 在整机高负载的情况下,采用低开销方式获取调用栈的情况,可能损失函数名称和build-id信息。 124 125MSG字段信息主要包括卡死上报的原因,以及当前应用主线程的队列中任务堆积信息。 126 127主线程队列中任务堆积信息如下: 128 129- 当前正在运行的任务以及任务启动的时间:如果跟当前日志上报的时间相差很大,则当前运行的任务就是卡死的主要任务事件。 130 131- 历史任务时间:可判断是否由于历史任务过多且每一个任务执行都占一定时间,导致当前的新任务运行时无法及时响应。 132 133- 堆积中还没有执行的任务。 134 135**当前进程堆栈示例:** 136 137通过搜索pid找到应用栈信息。以下堆栈示例表明窗口通过IPC向系统发送事件时,停留在IPC通信阶段。 138 139``` 140OpenStacktraceCatcher -pid==1561 packageName is com.example.myapplication 141Result: 0 ( no error ) 142Timestamp:2017-08-0817:06:53.000 143Pid:1561 144Uid:20010039 145Process name:com.example.myapplication 146Tid:1561,Name:i.myapplication 147#00 pc 0017888c /system/lib/libark_jsruntime.so 148#01 pc 00025779 /system/lib/platformsdk/libipc_core.z.so(OHOS:BinderConnector:WriteBinder(unsigned Long,void*)+56) 149#02 pc 000265a5 /system/lib/platformsdk/libipc_core.z.so(OHOS:BinderInvoker:TransactWithDriver(bool)+216) 150#03 pc 0002666f /system/lib/platformsdk/libipc_core.z.so(OHOS:BinderInvoker:StartWorkLoop()+18) 151#04 pc 000270a9 /system/lib/platformsdk/libipc_core.z.so(OHOS:BinderInvoker:JoinThread(bool)+32) 152#05 pc 00023783 /system/lib/platformsdk/libipc_core.z.so(OHOS:IPCWorkThread:ThreadHandler(void*)+290) 153#06 pc 00e1c6f7 /system/lib/libace.z.so 154#07 pc 0091bbdd /system/lib/libace.z.so 155#08 pc 0092fd9d /system/lib/libace.z.so 156#09 pc 0092fa5f /system/lib/libace.z.so 157#10 pc 0092cd6b /system/lib/libace.z.so 158#11 pc 009326a9 /system/lib/libace.z.so 159#12 pc 0093054b /system/lib/libace.z.so 160#13 pc 009324f3 /system/lib/libace.z.so 161#14 pc 003989e1 /system/lib/libace.z.so 162#15 pc 0045dd4b /system/lib/libace.z.so 163#16 pc 00d24fef /system/lib/libace.z.so 164#17 pc 0041e6e9 /system/lib/libace.z.so 165#18 pc 0000b4d9 /system/lib/platformsdk/libeventhandler.z.so(OHOS:AppExecFwk:EventHandler:DistributeEvent(std::__h:unique_ptr<0 #19 pc 00012829 /system/lib/platformsdk/libeventhandler.z.so)) 166#20 pc 00011841 /system/lib/platformsdk/libeventhandler.z.so(OHOS:AppExecFwk:EventRunner:Run()+64) 167#21 pc 00054a8b /system/lib/libappkit_native.z.so(OHOS:AppExecFwk:MainThread:Start()+278) 168#22 pc 00011503 /system/bin/appspawn 169#23 pc 0001141f /system/bin/appspawn 170#24 pc 0000ee97 /system/bin/appspawn 171``` 172 173**BinderCatcher信息示例:** 174 175通过搜索pid当前进程与哪个进程在通信,同步的通信等待的时长。 176 177示例表明当前1561进程向685进程请求通信,等待超过10s没有得到响应。 178 179``` 180PeerBinderCatcher -pid==1561 Layer_==0 181 182 183BinderCatcher -- 184 1561:1561 to 685:0 code 0 Wait:10.366245919 s, ns:-1:-1 to -1:-1, debug:1561:1561 to 685:0, active_code:0 active_thread:0, pending_async_proc=0 185 1329:1376 to 487:794 code 0 Wait:0.12070041 s, ns:-1:-1 to -1:-1, debug:1329:1376 to 487:794, active_code:0 active_thread:0, pending_async_proc=0 186 187pid context request started max ready free_async_space 1881561 binder 0 3 16 4 520192 189544 binder 0 4 16 5 520192 1901104 binder 0 1 16 2 520192 1911397 binder 0 1 16 3 520192 192... 193``` 194 195**PeerBinder Stacktrace信息示例:** 196 197示例表明对端卡死进程685的堆栈信息。 198 199``` 200Binder catcher stacktrace, type is peer, pid : 685 201Result: 0 ( no error ) 202Timestamp:2017-08-0817:06:55.000 203Pid:685 204Uid:1000 205Process name:wifi_manager_service 206Tid:658,Name:wifi_manager_service 207#00 pc 000669f0 /system/lib/ld-musl-arm.so.1 208#01 pc 000c60cc /system/lib/ld-musl-arm.so.1 209#02 pc 000c5040 /system/lib/ld-musl-arm.so.1 210#03 pc 000c6818 /system/lib/ld-musl-arm.so.1(__pthread_cond_timedwait_time64+596) 211#04 pc 000bd058 /system/lib/libc++.so 212#05 pc 0008592c /system/lib/ld-musl-arm.so.1(ioctl+72) 213#06 pc 00025779 /system/lib/platformsdk/libipc_core.z.so(OHOS:BinderConnector:WriteBinder(unsigned long,void*)+56) 214#07 pc 000265a5 /system/lib/platformsdk/libipc_core.z.so(OHOS:BinderInvoker:TransactWithDriver(bool)+216) 215#08 pc 0002666f /system/lib/platformsdk/libipc_core.z.so(OHOS:BinderInvoker:StartWorkLoop()+18) 216#09 pc 000270a9 /system/lib/platformsdk/libipc_core.z.so(OHOS:BinderInvoker:JoinThread(bool)+32) 217#10 pc 00023783 /system/lib/platformsdk/libipc_core.z.so(OHOS:IPCWorkThread:ThreadHandler(void*)+290) 218#11 pc 0007b4d9 /system/lib/platformsdk/libeventhandler.z.so 219#12 pc 00072829 /system/lib/platformsdk/libeventhandler.z.so 220#13 pc 00071841 /system/lib/platformsdk/libeventhandler.z.so(OHOS:AppExecFwk:EventRunner:Run()+64) 221#14 pc 00094a8b /system/lib/libappkit_native.z.so(OHOS:AppExecFwk:MainThread:Start()+278) 222 223Tid:1563,Name:IPC_0_1563 224#00 pc 0009392c /system/lib/ld-musl-arm.so.1(ioctl+72) 225#01 pc 00025779 /system/lib/platformsdk/libipc_core.z.so(OHOS:BinderConnector:WriteBinder(unsigned long,void*)+56) 226``` 227 228**cpuusage信息示例:** 229 230整机CPU信息。 231 232``` 233Load average: 2.87 / 1.45 / 0.58; the cpu load average in 1 min,5 min and 15 min 234CPU usage from 2023-03-10 17:06:53 to 2023-03-10 17:06:53 235Total: 29%; User Space: 28%; Kernel Space: 1%; iowait: 6%; irq: 0%; idle: 62% 236Details of Processes: 237 PID Total Usage User Space Kernel Space Page Fault Minor Page Fault Major Name 238 1561 23% 23% 0% 9985 26 i.myapplication 239 527 1% 1% 0% 3046 9 hidumper_servic 240 242 1% 1% 0% 69799 280 hiview 241``` 242 243**memory信息示例:** 244 245当前进程内存信息。 246 247``` 248-------------------------------------------[memory]---------------------------------------- 249 Pss Shared Shared Private Private Swap SwapPss Heap Heap 250 Total CLean Dirty CLean Dirty Total Total Size Alloc 251 (kB) (kB) (kB) (kB) (kB) (kB) (kB) (kB) (kB) 252------------------------------------------------------------------------------------------- 253guard 0 0 0 0 0 0 0 0 0 254native heap 185 0 180 0 160 0 0 0 0 255AnonPage other 17881 12 12376 88 15948 0 0 0 0 256stack 292 0 0 0 292 0 0 0 0 257.S0 5053 63408 4172 1812 2640 0 0 0 0 258.ttf 1133 3092 0 4 0 0 0 0 0 259dev 10 0 108 8 0 0 0 0 0 260FilePage other 121 556 8 0 4 0 0 0 0 261------------------------------------------------------------------------------------------ 262Total 34675 67068 16844 1912 19044 0 0 0 0 263``` 264 265### 日志主干特异性信息(应用主线程卡死超时) 266 267Reason是THREAD_BLOCK_6S的日志。根据前面的[应用主线程卡死超时](#thread_block_6s-应用主线程卡死超时)的原理可知,THREAD_BLOCK由THREAD_BLOCK_3S和THREAD_BLOCK_6S两部分组成。将两部分日志对比分析,可更准确的判断是卡死还是执行任务过多造成无法响应的情况。 268 269THREAD_BLOCK_3S在日志的前部分,THREAD_BLOCK_6S在THREAD_BLOCK_3S后面写入。可以通过EVENTNAME字段搜索两个事件在日志中的位置。 270 271两个事件中都包含MSG字段,该字段在应用主线程卡死超时故障中写入了当前主线程处理队列的信息,可查看在两个时间点中主线程事件处理队列排队情况。 272 273以下示例日志显示了在VIP priority的队列中07:22:40.931的事件一直在处理,THREAD_BLOCK_3S和THREAD_BLOCK_6S的队列都显示其存在。这说明主线程卡死不是任务过多情况。 274 275由于THREAD_BLOCK_6S是主线程卡死,进程堆栈信息只需要关注主线程的堆栈(主线程线程号跟进程号相同)。以下示例日志中,主线程堆栈显示通过ArkUI控件到JS运行,说明卡死在JS代码中。3S和6S都是这个位置的堆栈,说明JS有卡死,但不是任务过多导致。 276 277THREAD_BLOCK_3S: 278 279``` 280start time: 2017/08/17-19:22:42:022 281DOMAIN = AAFWK 282EVENTNAME = THREAD_BLOCK_3S 283TIMESTAMP = 2017/08/17-19:22:42:10 284PID = 2212 285UID = 20010044 286TID = 2212 287PACKAGE_NAME = com.example.freeze 288PROCESS_NAME = com.example.freeze 289eventLog_action = ffrt,t,GpuStack,cmd:m,hot 290eventLog_interval = 10 291MSG = 292Fault time:2017/08/17-19:22:40 293App main thread is not response! 294 295Main handler dump start time: 2017-08-17 19:22:40.932 296mainHandler dump is: 297 EventHandler dump begin curTime: 2017-08-17 07:22:40.932 298 Event runner (Thread name = , Thread ID = 2212) is running 299 Current Running: start at 2017-08-17 07:22:34.930, Event { send thread = 2212, send time = 2017-08-17 07:22:29.932, handle time = 2017-08-17 07:22:34.929, trigger time = 2017-08-17 07:22:34.930, task name = uv_timer_task, caller = [ohos_loop_handler.cpp(OnTriggered:72)] } 300 History event queue information: 301 No. 0 : Event { send thread = 2212, send time = 2017-08-17 07:22:29.864, handle time = 2017-08-17 07:22:29.864, trigger time = 2017-08-17 07:22:29.864, completeTime time = 2017-08-17 07:22:29.865, priority = VIP, task name = MMITask } 302 No. 1 : Event { send thread = 2212, send time = 2017-08-17 07:22:29.814, handle time = 2017-08-17 07:22:29.865, trigger time = 2017-08-17 07:22:29.865, completeTime time = 2017-08-17 07:22:29.865, priority = Low, task name = ArkUIIdleTask } 303 No. 2 : Event { send thread = 2212, send time = 2017-08-17 07:22:29.869, handle time = 2017-08-17 07:22:29.869, trigger time = 2017-08-17 07:22:29.869, completeTime time = 2017-08-17 07:22:29.870, priority = VIP, task name = MMITask } 304 ...... 305 VIP priority event queue information: 306 No.1 : Event { send thread = 2486, send time = 2017-08-17 07:22:37.931, handle time = 2017-08-17 07:22:37.931, id = 1, caller = [watchdog.cpp(Timer:208)] } 307 No.2 : Event { send thread = 2486, send time = 2017-08-17 07:22:40.931, handle time = 2017-08-17 07:22:40.931, id = 1, caller = [watchdog.cpp(Timer:208)] } 308 Total size of VIP events : 2 309 Immediate priority event queue information: 310 Total size of Immediate events : 0 311 High priority event queue information: 312 Total size of High events : 0 313 Low priority event queue information: 314 Total size of Low events : 0 315 Idle priority event queue information: 316 Total size of Idle events : 0 317 Total event size : 2 318Main handler dump end time: 2017-08-17 19:22:40.934 319 320Catche stack trace start time: 2017-08-17 19:22:41.098 321 322Tid:2212, Name:.example.freeze 323#00 pc 001d0d1e /system/lib/platformsdk/libark_jsruntime.so(panda::BytecodeInst<(panda::BytecodeInstMode)0>::Size(panda::BytecodeInst<(panda::BytecodeInstMode)0>::Format)+6)(ec51476a00bfde9c6d68b983a483d1ee) 324#01 pc 0024b011 /system/lib/platformsdk/libark_jsruntime.so(panda::ecmascript::EcmaInterpreter::RunInternal(panda::ecmascript::JSThread*, unsigned char const*, unsigned long long*)+1824)(ec51476a00bfde9c6d68b983a483d1ee) 325#02 pc 0024a75d /system/lib/platformsdk/libark_jsruntime.so(panda::ecmascript::EcmaInterpreter::Execute(panda::ecmascript::EcmaRuntimeCallInfo*)+1084)(ec51476a00bfde9c6d68b983a483d1ee) 326#03 pc 002e78e5 /system/lib/platformsdk/libark_jsruntime.so(panda::ecmascript::JSFunction::Call(panda::ecmascript::EcmaRuntimeCallInfo*)+392)(ec51476a00bfde9c6d68b983a483d1ee) 327#04 pc 00427fbb /system/lib/platformsdk/libark_jsruntime.so(panda::FunctionRef::CallForNapi(panda::ecmascript::EcmaVM const*, panda::JSValueRef*, panda::JSValueRef* const*, int)+1382)(ec51476a00bfde9c6d68b983a483d1ee) 328#05 pc 000502a1 /system/lib/platformsdk/libace_napi.z.so(napi_call_function+156)(394b2e21899459da15b9fe3f1c3188f0) 329...... 330``` 331 332THREAD_BLOCK_6S: 333``` 334start time: 2017/08/17-19:22:44:962 335DOMAIN = AAFWK 336EVENTNAME = THREAD_BLOCK_6S 337TIMESTAMP = 2017/08/17-19:22:44:952 338PID = 2212 339UID = 20010044 340TID = 2212 341PACKAGE_NAME = com.example.freeze 342PROCESS_NAME = com.example.freeze 343eventLog_action = t,cmd:c,cmd:cci,cmd:m,cmd:dam,tr,k:SysRqFile,hot 344eventLog_interval = 10 345MSG = 346Fault time:2017/08/17-19:22:43 347App main thread is not response! 348 349Main handler dump start time: 2017-08-17 19:22:43.932 350mainHandler dump is: 351 EventHandler dump begin curTime: 2017-08-17 07:22:43.932 352 Event runner (Thread name = , Thread ID = 2212) is running 353 Current Running: start at 2017-08-17 07:22:34.930, Event { send thread = 2212, send time = 2017-08-17 07:22:29.933, handle time = 2017-08-17 07:22:34.930, trigger time = 2017-08-17 07:22:34.930, task name = uv_timer_task, caller = [ohos_loop_handler.cpp(OnTriggered:72)] } 354 History event queue information: 355 No. 0 : Event { send thread = 2212, send time = 2017-08-17 07:22:29.864, handle time = 2017-08-17 07:22:29.864, trigger time = 2017-08-17 07:22:29.864, completeTime time = 2017-08-17 07:22:29.865, priority = VIP, task name = MMITask } 356 No. 1 : Event { send thread = 2212, send time = 2017-08-17 07:22:29.814, handle time = 2017-08-17 07:22:29.865, trigger time = 2017-08-17 07:22:29.865, completeTime time = 2017-08-17 07:22:29.865, priority = Low, task name = ArkUIIdleTask } 357 No. 2 : Event { send thread = 2212, send time = 2017-08-17 07:22:29.869, handle time = 2017-08-17 07:22:29.869, trigger time = 2017-08-17 07:22:29.870, completeTime time = 2017-08-17 07:22:29.870, priority = VIP, task name = MMITask } 358 ....... 359 VIP priority event queue information: 360 No.1 : Event { send thread = 2486, send time = 2017-08-17 07:22:37.931, handle time = 2017-08-17 07:22:37.931, id = 1, caller = [watchdog.cpp(Timer:208)] } 361 No.2 : Event { send thread = 2486, send time = 2017-08-17 07:22:40.931, handle time = 2017-08-17 07:22:40.931, id = 1, caller = [watchdog.cpp(Timer:208)] } 362 No.3 : Event { send thread = 2486, send time = 2017-08-17 07:22:43.932, handle time = 2017-08-17 07:22:43.932, id = 1, caller = [watchdog.cpp(Timer:208)] } 363 Total size of VIP events : 3 364 Immediate priority event queue information: 365 Total size of Immediate events : 0 366 High priority event queue information: 367 Total size of High events : 0 368 Low priority event queue information: 369 Total size of Low events : 0 370 Idle priority event queue information: 371 Total size of Idle events : 0 372 Total event size : 3 373Main handler dump end time: 2017-08-17 19:22:43.934 374 375Catche stack trace start time: 2017-08-17 19:22:44.061 376 377Tid:2212, Name:.example.freeze 378#00 pc 0024afea /system/lib/platformsdk/libark_jsruntime.so(panda::ecmascript::EcmaInterpreter::RunInternal(panda::ecmascript::JSThread*, unsigned char const*, unsigned long long*)+1786)(ec51476a00bfde9c6d68b983a483d1ee) 379#01 pc 0024a75d /system/lib/platformsdk/libark_jsruntime.so(panda::ecmascript::EcmaInterpreter::Execute(panda::ecmascript::EcmaRuntimeCallInfo*)+1084)(ec51476a00bfde9c6d68b983a483d1ee) 380#02 pc 002e78e5 /system/lib/platformsdk/libark_jsruntime.so(panda::ecmascript::JSFunction::Call(panda::ecmascript::EcmaRuntimeCallInfo*)+392)(ec51476a00bfde9c6d68b983a483d1ee) 381#03 pc 00427fbb /system/lib/platformsdk/libark_jsruntime.so(panda::FunctionRef::CallForNapi(panda::ecmascript::EcmaVM const*, panda::JSValueRef*, panda::JSValueRef* const*, int)+1382)(ec51476a00bfde9c6d68b983a483d1ee) 382#04 pc 000502a1 /system/lib/platformsdk/libace_napi.z.so(napi_call_function+156)(394b2e21899459da15b9fe3f1c3188f0) 383...... 384``` 385 386再结合流水日志查看当前应用侧是在执行哪部分代码。 387 388一般情况下可以查看以上[通用日志信息](#日志主干通用信息)内容,判断是否存在对端通信卡死,整机CPU消耗很高导致当前应用响应不过来,内存泄漏,内存不足导致任务无法运行的情况。 389 390### 日志主干特异性信息(用户输入响应超时) 391 392Reason是APP_INPUT_BLOCK,表明用户点击事件超过5s没有得到反馈。 393 394MSG信息是这个事件的说明:用户的输入没有得到响应。 395 396APP_INPUT_BLOCK的日志信息可以参考[通用日志信息](#日志主干通用信息)进行分析。需特别说明的是,一般情况下用户输入无响应大概率主线程也会卡死。可以结合两个日志的三个堆栈、两个BinderCatcher信息,进行对比查看。如果没有主线程卡死的日志,说明有可能在输入事件之前有大量的细碎的其他事件,细碎的事件不足以主线程卡死,但是数量比较多导致用户的输入事件响应不过来。 397 398### 日志主干特异性信息(生命周期切换超时) 399 400Reason是LIFECYCLE_TIMEOUT的日志与上文THREAD_BLOCK_6S和THREAD_BLOCK_3S一样都是有两个事件。分别是LIFECYCLE_HALF_TIMEOUT和LIFECYCLE_TIMEOUT。 401 402示例中的MSG说明表示生命周期超时类型。 403 404通过示例可以看出,LIFECYCLE_TIMEOUT的原因是Ability在切换后台的时候超时,可以按照[生命周期切换超时](#生命周期切换超时)的超时时间来找流水日志等信息。 405 406LIFECYCLE_TIMEOUT: 407 408``` 409DOMAIN:AAFWK 410STRINGID:LIFECYCLE 411TIMEOUT TIMESTAMP:2023/03/10-17:06:53:65 412PID:1561 413UID:20010039 414PACKAGE_NAME:com.example.myapplication 415PROCESS_NAME:com.example.myapplication 416MSG:ability:EntryAbility background timeout 417``` 418 419其他的日志信息可以参考[通用日志信息](#日志主干通用信息)进行分析。需要特别说明的是,一般情况下生命周期切换大概率主线程也会卡死。可以结合两个日志的三个堆栈、两个BinderCatcher信息,进行对比查看。 420 421## 应用退出 422 423当应用发生以下故障时,为了保证可恢复,会杀死应用。 424 425| 故障类型 | 说明 | 426| -------- | -------- | 427| THREAD_BLOCK_6S | 应用主线程卡死超时。 | 428| APP_INPUT_BLOCK | 用户输入响应超时。 | 429| LIFECYCLE_TIMEOUT | Ability生命周期切换超时。 | 430 431## 定位步骤与思路 432 433定位应用无响应问题,首先需要开发者获取相关日志,再通过日志记录的问题基本信息,结合hilog日志和trace来定位出无响应问题的发生的具体位置。 434 435### 获取日志 436 437应用无响应日志是一种故障日志,与Native进程崩溃、JS应用崩溃、系统进程异常等都由FaultLog模块管理,可通过以下方式获取日志: 438 439- 方式一:通过DevEco Studio获取日志。 440 441 DevEco Studio会收集设备的故障日志并归档到FaultLog下。具体可参考[DevEco Studio使用指南-FaultLog](https://developer.huawei.com/consumer/cn/doc/harmonyos-guides/ide-fault-log)。 442 443- 方式二:通过hiAppEvent接口订阅。 444 445 hiAppEvent 提供了故障订阅接口,可以订阅各类故障打点,详见[HiAppEvent介绍](hiappevent-intro.md)。 446 447<!--Del--> 448- 方式三:设备ROOT模式下通过shell获取日志。 449 450 应用无响应日志是以appfreeze-开头,生成在”设备/data/log/faultlog/faultlogger/”路径下。该日志文件名格式为“appfreeze-应用包名-应用UID-毫秒级时间.log”。 451 452  453<!--DelEnd--> 454 455### 确认基本信息 456 457#### 获取直接卡死应用的进程号,是否处于前台等基础信息 458 459``` 460Generated by HiviewDFX@OpenHarmony 461============================================================ 462Device info:HUANEI Mate 60 Pro 463Build info:ALN-AL00 x.x.x.xx(XXXXXXX) 464Fingerprint:ef8bd28f8b57b54656d743b546efa73764c77866a65934bd96f2678f886813b7 465Module name:com.xxx.xxx 466Version:1.2.2.202 467VersionCode:1002002202 468PreInstalled:Yes 469Foreground:No --> 是否处于前台 470Pid:15440 471Uid:20020029 472Reason:THREAD BLOCK 6S 473appfreeze: com.xxx.xxx THREAD_BLOCK 6S at 20240410164052 474DisplayPowerInfo:powerState: AWAKE 475>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 476``` 477 478#### 获取故障发生时间点 479 480故障上报时间点。 481 482``` 483>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 484DOMAIN:AAFWK 485STRINGID: THREAD BLOCK 6S 486TIMESTAMP: 2024/04/10-16:40:52:743 --> 故障上报时间戳 487PID:15440 488UID:20020029 489PACKAGE NAME:com.xxx.xxx 490PROCESS NAME:com.xxx.xxx 491**************************************** 492``` 493 494不同故障类型的不同场景下的检测时长汇总表格。 495 496| THREAD_BLOCK_6S |APP_INPUT_BLOCK|LIFECYCLE_TIMEOUT| 497| -------- |--------|--------| 498|前台应用:6s <br> 后台应用 :3s * 5 + 6s = 21s | 5s | Load:10s <br> Active:5s <br> Inactive:0.5s <br> Terminate:10s <br> Connect:3s <br> Disconnect:0.5s <br> Restart:5s <br> Foreground:5s <br> Background:3s | 499 500**说明:** 501 5021、THREAD_BLOCK_3S / LIFECYCLE_HALF_TIMEOUT 的检测时长是相应THREAD_BLOCK_6S / LIFECYCLE_TIMEOUT的一半,warning 级别,不会单独上报日志;THREAD_BLOCK_6S / LIFECYCLE_TIMEOUT 是 error 级别,整合了本身和其一半检测时长故障的日志一同上报; 503 5042、前台应用发生THREAD_BLOCK_3S后即可触发后续THREAD_BLOCK_6S事件; 505 5063、后台应用存在计数器 backgroundReportCount_ = 0,发生THREAD_BLOCK_3S后 +1 累计到 5 次后才会上报 (即连续发生5次 THREAD_BLOCK_3S 事件),计数不清零,才会上报THREAD_BLOCK_6S 事件,可知后台应用THREAD_BLOCK_3S 与THREAD_BLOCK_6S 检测时长依次为 18s 与 21s</samll>。 507 508通过故障上报时间点往前推检测时长可得到故障发生的具体时间。 509 510### 查看 eventHandler 信息 511 512开发者可以通过 “mainHandler dump is” 关键字搜索日志中的 eventHandler dump 信息。 513 5141、dump begin curTime & Current Running。 515 516``` 517mainHandler dump is: 518 EventHandler dump begin curTime: 2024-08-08 12:17:43.544 --> 开始 dump 时间 519 Event runner (Thread name = , Thread ID = 35854) is running --> 正在运行的线程信息 520 Current Running: start at 2024-08-08 12:17:16.629, Event { send thread = 35882, send time = 2024-08-08 12:17:16.628, handle time = 2024-08-08 12:17:16.629, trigger time = 2024-08-08 12:17:16.630, task name = , caller = xx } 521 --> trigger time--> 任务开始运行的时间 522``` 523 524当前任务运行时长 = dump begin curTime - trigger time, 如示例中当前任务运行达到27s。 525 526若任务运行时长 > 故障检测时长,表示当前正在运行的任务是导致应用卡死的任务,需对该任务进行排查。 527 528若任务运行时长较小,表示当前任务仅是检测时间区间内主线程运行的任务之一,主要耗时不一定是该任务,建议优先查看近期耗时最长任务(History event queue information中)。该情形多为线程繁忙导致的watchdog无法调度执行。 529 5302、 History event queue information。 531 532``` 533 Current Running: start at 2024-08-08 12:17:16.629, Event { send thread = 35882, send time = 2024-08-08 12:17:16.628, handle time = 2024-08-08 12:17:16.629, trigger time = 2024-08-08 12:17:16.630, task name = , caller = [extension_ability_thread.cpp(ScheduleAbilityTransaction:393)]} 534 History event queue information: 535 No. 0 : Event { send thread = 35854, send time = 2024-08-08 12:17:15.525, handle time = 2024-08-08 12:17:15.525, trigger time = 2024-08-08 12:17:15.527, completeTime time = 2024-08-08 12:17:15.528, priority = High, id = 1 } 536 No. 1 : Event { send thread = 35854, send time = 2024-08-08 12:17:15.525, handle time = 2024-08-08 12:17:15.525, trigger time = 2024-08-08 12:17:15.527, completeTime time = 2024-08-08 12:17:15.527, priority = Low, task name = MainThread:SetRunnerStarted } 537 No. 2 : Event { send thread = 35856, send time = 2024-08-08 12:17:15.765, handle time = 2024-08-08 12:17:15.765, trigger time = 2024-08-08 12:17:15.766, completeTime time = 2024-08-08 12:17:15.800, priority = Low, task name = MainThread:LaunchApplication } 538 No. 3 : Event { send thread = 35856, send time = 2024-08-08 12:17:15.767, handle time = 2024-08-08 12:17:15.767, trigger time = 2024-08-08 12:17:15.800, completeTime time = 2024-08-08 12:17:16.629, priority = Low, task name = MainThread:LaunchAbility } 539 No. 4 : Event { send thread = 35854, send time = 2024-08-08 12:17:15.794, handle time = 2024-08-08 12:17:15.794, trigger time = 2024-08-08 12:17:16.629, completeTime time = 2024-08-08 12:17:16.629, priority = IDEL, task name = IdleTime:PostTask } 540 No. 5 : Event { send thread = 35852, send time = 2024-08-08 12:17:16.629, handle time = 2024-08-08 12:17:16.629, trigger time = 2024-08-08 12:17:16.629, completeTime time = , priority = Low, task name = } 541``` 542 543可以从历史任务队列中寻找故障发生时间区间内较为耗时的任务。其中CompleteTime time 为空的任务是当前任务。 544任务运行耗时 = CompleteTime time - trigger time。 545筛选出耗时较高的任务,排查其运行情况。 546 5473、VIP priority event queue information。 548 549``` 550 VIP priority event queue information: 551 No.1 : Event { send thread = 2486, send time = 2017-08-17 07:22:37.931, handle time = 2017-08-17 07:22:37.931, id = 1, caller = [watchdog.cpp(Timer:208)] } 552 No.2 : Event { send thread = 2486, send time = 2017-08-17 07:22:40.931, handle time = 2017-08-17 07:22:40.931, id = 1, caller = [watchdog.cpp(Timer:208)] } 553 No.3 : Event { send thread = 2486, send time = 2017-08-17 07:22:43.932, handle time = 2017-08-17 07:22:43.932, id = 1, caller = [watchdog.cpp(Timer:208)] } 554 ... 555``` 556 557watchdog 任务位于此优先级队列中,观察 watchdog 任务队列发现其是每隔 3s 发送一次。 558 559对比其他优先级事件,观察 watchdog 任务在队列中的移动情况。 560 561为保障第一时间响应用户,用户输入事件传递链中的任务都属于高优先级任务。此任务事件队列均由系统创建,通常记录用户输入->屏幕->窗口->ArkUI->应用的传输过程,与三方应用事件无关,开发者无需额外关注。 562 563``` 564 VIP priority event queue information: 565 No. 1 : Event { send thread = 35862, send time = 2024-08-08 12:17:25.526, handle time = 2024-08-08 12:17:25.526, id = 1, caller = [watchdog.cpp(Timer:156)]} 566 No. 2 : Event { send thread = 35862, send time = 2024-08-08 12:17:28.526, handle time = 2024-08-08 12:17:28.526, id = 1, caller = [watchdog.cpp(Timer:156)]} 567 No. 3 : Event { send thread = 35862, send time = 2024-08-08 12:17:31.526, handle time = 2024-08-08 12:17:31.526, id = 1, caller = [watchdog.cpp(Timer:156)]} 568 No. 4 : Event { send thread = 35862, send time = 2024-08-08 12:17:34.530, handle time = 2024-08-08 12:17:34.530, id = 1, caller = [watchdog.cpp(Timer:156)]} 569 No. 5 : Event { send thread = 35862, send time = 2024-08-08 12:17:37.526, handle time = 2024-08-08 12:17:37.526, id = 1, caller = [watchdog.cpp(Timer:156)]} 570 No. 6 : Event { send thread = 35862, send time = 2024-08-08 12:17:40.526, handle time = 2024-08-08 12:17:40.526, id = 1, caller = [watchdog.cpp(Timer:156)]} 571 No. 7 : Event { send thread = 35862, send time = 2024-08-08 12:17:43.544, handle time = 2024-08-08 12:17:43.544 ,id = 1, caller = [watchdog.cpp(Timer:156)]} 572 Total size of High events : 7 573``` 574 575warning: 576``` 577 VIP priority event queue information: 578 No. 1 : Event { send thread = 35862, send time = 2024-08-08 12:17:25.526, handle time = 2024-08-08 12:17:25.526, id = 1, caller = [watchdog.cpp(Timer:156)]} 579 No. 2 : Event { send thread = 35862, send time = 2024-08-08 12:17:28.526, handle time = 2024-08-08 12:17:28.526, id = 1, caller = [watchdog.cpp(Timer:156)]} 580 No. 3 : Event { send thread = 35862, send time = 2024-08-08 12:17:31.526, handle time = 2024-08-08 12:17:31.526, id = 1, caller = [watchdog.cpp(Timer:156)]} 581 No. 4 : Event { send thread = 35862, send time = 2024-08-08 12:17:34.530, handle time = 2024-08-08 12:17:34.530, id = 1, caller = [watchdog.cpp(Timer:156)]} 582 Total size of High events : 4 583``` 584 585block: 586``` 587 VIP priority event queue information: 588 No. 1 : Event { send thread = 35862, send time = 2024-08-08 12:17:25.526, handle time = 2024-08-08 12:17:25.526, id = 1, caller = [watchdog.cpp(Timer:156)]} 589 No. 2 : Event { send thread = 35862, send time = 2024-08-08 12:17:28.526, handle time = 2024-08-08 12:17:28.526, id = 1, caller = [watchdog.cpp(Timer:156)]} 590 No. 3 : Event { send thread = 35862, send time = 2024-08-08 12:17:31.526, handle time = 2024-08-08 12:17:31.526, id = 1, caller = [watchdog.cpp(Timer:156)]} 591 No. 4 : Event { send thread = 35862, send time = 2024-08-08 12:17:34.530, handle time = 2024-08-08 12:17:34.530, id = 1, caller = [watchdog.cpp(Timer:156)]} 592 No. 5 : Event { send thread = 35862, send time = 2024-08-08 12:17:37.526, handle time = 2024-08-08 12:17:37.526, id = 1, caller = [watchdog.cpp(Timer:156)]} 593 Total size of High events : 5 594``` 595 596以上示例中可发现 block 队列相比于 warning 队列更长了,而对应的第一个任务没有发生变化,可能存在两种情况: 597 598- 当前正在运行的任务卡死阻塞,导致其他任务一直未被调度执行; 599- 更高优先级队列中任务堆积,导致位于较低优先级队列中的 watchdog 任务未被调度执行。 600 601### 查看 stack 信息 602 603通过得到的 Pid、Tid 查看对应的 stack,存在以下几种情况: 604 6051、有明确卡死堆栈信息。 606 607``` 608Tid:3025, Name: xxx 609# 00 pc 00000000001b4094 /system/lib/ld-musl-aarch64.so.1(__timedwait_cp+188)(b168f10a179cf6050a309242262e6a17) 610# 01 pc 00000000001b9fc8 /system/lib/ld-musl-aarch64.so.1(__pthread_mutex_timedlock_inner+592)(b168f10a179cf6050a309242262e6a17) 611# 02 pc 00000000000c3e40 /system/lib64/libc++.so(std::__h::mutex::lock()+8)(9cbc937082b3d7412696099dd58f4f78242f9512) --> 等锁卡死 612# 03 pc 000000000007ac4c /system/lib64/platformsdk/libnative_rdb.z.so(OHOS::NativeRdb::SqliteConnectionPool::Container::Release(std::__h::shared_ptr<OHOS::NativeRdb::SqliteConnectionPool::ConnNode>)+60)(5e8443def4695e8c791e5f847035ad9f) 613# 04 pc 000000000007aaf4 /system/lib64/platformsdk/libnative_rdb.z.so(OHOS::NativeRdb::SqliteConnectionPool::ReleaseNode(std::__h::shared_ptr<OHOS::NativeRdb::SqliteConnectionPool::ConnNode>)+276)(5e8443def4695e8c791e5f847035ad9f) 614# 05 pc 000000000007a8c0 /system/lib64/platformsdk/libnative_rdb.z.so(5e8443def4695e8c791e5f847035ad9f) 615# 06 pc 00000000000b36ec /system/lib64/platformsdk/libnative_rdb.z.so(OHOS::NativeRdb::SqliteSharedResultSet::Close()+324)(5e8443def4695e8c791e5f847035ad9f) 616# 07 pc 000000000006da94 /system/lib64/module/data/librelationalstore.z.so(OHOS::RelationalStoreJsKit::ResultSetProxy::Close(napi_env__*, napi_callback_info__*) (.cfi)+212)(5c7c67512e12e0e53fd23e82ee576a88) 617# 08 pc 0000000000034408 /system/lib64/platformsdk/libace_napi.z.so(panda::JSValueRef ArkNativeFunctionCallBack<true>(panda::JsiRuntimeCallInfo*)+220)(f271f536a588ef9d0dc5328c70fce511) 618# 09 pc 00000000002d71d0 /system/lib64/module/arkcompiler/stub.an(RTStub_PushCallArgsAndDispatchNative+40) 619# 10 at parseResultSet (entry/build/default/cache/default/default@CompileArkTS/esmodule/release/datamanager/datawrapper/src/main/ets/database/RdbManager.ts:266:1) 620# 11 at query (entry/build/default/cache/default/default@CompileArkTS/esmodule/release/datamanager/datawrapper/src/main/ets/database/RdbManager.ts:188:1) 621``` 622 623so 明确等锁卡死,通过反编译获取对应代码行,排查代码上下文解决 bug。 624 6252、卡在 ipc 请求。 626 627``` 628Tid:53616, Name:xxx 629# 00 pc 0000000000171c1c /system/lib/ld-musl-aarch64.so.1(ioctl+176)(b168f10a179cf6050a309242262e6a17) 630# 01 pc 0000000000006508 /system/lib64/chipset-pub-sdk/libipc_common.z.so(OHOS::BinderConnector::WriteBinder(unsigned long, void*)+100)(1edec25445c569dd1093635c1da3bc0a) --> binder 卡死 631# 02 pc 000000000004d500 /system/lib64/platformsdk/libipc_core.z.so(OHOS::BinderInvoker::TransactWithDriver(bool)+296)(6151eca3b47aa2ab3e378e6e558b90f3) 632# 03 pc 000000000004c6c0 /system/lib64/platformsdk/libipc_core.z.so(OHOS::BinderInvoker::WaitForCompletion(OHOS::MessageParcel*, int*)+128)(6151eca3b47aa2ab3e378e6e558b90f3) 633# 04 pc 000000000004c304 /system/lib64/platformsdk/libipc_core.z.so(OHOS::BinderInvoker::SendRequest(int, unsigned int, OHOS::MessageParcel&, OHOS::MessageParcel&, OHOS::MessageOption&)+348)(6151eca3b47aa2ab3e378e6e558b90f3) 634# 05 pc 00000000000319ac /system/lib64/platformsdk/libipc_core.z.so(OHOS::IPCObjectProxy::SendRequestInner(bool, unsigned int, OHOS::MessageParcel&, OHOS::MessageParcel&, OHOS::MessageOption&)+124)(6151eca3b47aa2ab3e378e6e558b90f3) 635# 06 pc 0000000000031cfc /system/lib64/platformsdk/libipc_core.z.so(OHOS::IPCObjectProxy::SendRequest(unsigned int, OHOS::MessageParcel&, OHOS::MessageParcel&, OHOS::MessageOption&)+184)(6151eca3b47aa2ab3e378e6e558b90f3) 636# 07 pc 0000000000023c7c /system/lib64/libipc.dylib.so(<ipc::remote::obj::RemoteObj>::send_request+268)(7006cb5520edc22f64d04df86cb90152) 637# 08 pc 000000000000b904 /system/lib64/libasset_sdk.dylib.so(<asset_sdk::Manager>::send_request+48)(4073ec22b58b83f79883d5fc8102ce77) 638# 09 pc 000000000000b600 /system/lib64/libasset_sdk.dylib.so(<asset_sdk::Manager>::query+156)(4073ec22b58b83f79883d5fc8102ce77) 639# 10 pc 0000000000006d94 /system/lib64/libasset_sdk_ffi.z.so(query_asset+116)(9a309896092ba014c878289a54688679) 640# 11 pc 0000000000006740 /system/lib64/module/security/libasset_napi.z.so((anonymous namespace)::NapiQuerySync(napi_env__*, napi_callback_info__*) (.cfi)+220)(ef7afe850712e4822f085ed0ac184e8a) 641# 12 pc 0000000000034408 /system/lib64/platformsdk/libace_napi.z.so(panda::JSValueRef ArkNativeFunctionCallBack<true>(panda::JsiRuntimeCallInfo*)+220)(f271f536a588ef9d0dc5328c70fce511) 642``` 643 6443、warning/error 栈一致,栈顶为业务同步执行代码。 645 646warning/error 栈均为: 647 648``` 649Tid:14727, Name:xxx 650# 00 pc 00000000001c4c60 /system/lib/ld-musl-aarch64.so.1(pread+72)(b168f10a179cf6050a309242262e6a17) 651# 01 pc 0000000000049154 /system/lib64/platformsdk/libsqlite.z.so(unixRead+180)(48485aa23da681fc87d8dc0b4be3e34c) 652# 02 pc 0000000000053e98 /system/lib64/platformsdk/libsqlite.z.so(readDbPage+116)(48485aa23da681fc87d8dc0b4be3e34c) 653# 03 pc 0000000000053d48 /system/lib64/platformsdk/libsqlite.z.so(getPageNormal+864)(48485aa23da681fc87d8dc0b4be3e34c) 654# 04 pc 00000000000757a0 /system/lib64/platformsdk/libsqlite.z.so(getAndInitPage+216)(48485aa23da681fc87d8dc0b4be3e34c) 655# 05 pc 0000000000077658 /system/lib64/platformsdk/libsqlite.z.so(moveToLeftmost+164)(48485aa23da681fc87d8dc0b4be3e34c) 656# 06 pc 000000000006aa34 /system/lib64/platformsdk/libsqlite.z.so(sqlite3VdbeExec+34532)(48485aa23da681fc87d8dc0b4be3e34c) 657# 07 pc 000000000002e424 /system/lib64/platformsdk/libsqlite.z.so(sqlite3_step+644)(48485aa23da681fc87d8dc0b4be3e34c) 658# 08 pc 00000000000b1a70 /system/lib64/platformsdk/libnative_rdb.z.so(FillSharedBlockOpt+408)(5e8443def4695e8c791e5f847035ad9f) 659# 09 pc 0000000000082a94 /system/lib64/platformsdk/libnative_rdb.z.so(OHOS::NativeRdb::SqliteStatement::FillBlockInfo(OHOS::NativeRdb::SharedBlockInfo*) const+76)(5e8443def4695e8c791e5f847035ad9f) 660# 10 pc 00000000000b4214 /system/lib64/platformsdk/libnative_rdb.z.so(OHOS::NativeRdb::SqliteSharedResultSet::ExecuteForSharedBlock(OHOS::AppDataFwk::SharedBlock*, int, int, bool)+236)(5e8443def4695e8c791e5f847035ad9f) 661``` 662 663结合 [trace](#结合-trace)进一步确认,排查调用的单一栈顶函数逻辑是否执行超时。 664 6654、 瞬时栈,warning/error 栈不一致。 666 667warning 栈: 668 669``` 670Tid:3108, Name:xxx 671# 00 pc 0000000000146e2c /system/lib/ld-musl-aarch64.so.1(open64+224)(b168f10a179cf6050a309242262e6a17) 672# 01 pc 0000000000014600 /system/lib64/chipset-pub-sdk/libextractortool.z.so(OHOS::AbilityBase::ZipFileReader::init()+600)(c4893935af8fc8cb36569be5ccdebfa9) 673# 02 pc 0000000000014144 /system/lib64/chipset-pub-sdk/libextractortool.z.so(OHOS::AbilityBase::ZipFileReader::CreateZipFileReader(std::__h::basic_string<char, std::__h::char_traits<char>, std::__h::allocator<char>> const&)+392)(c4893935af8fc8cb36569be5ccdebfa9) 674# 03 pc 000000000000f724 /system/lib64/chipset-pub-sdk/libextractortool.z.so(OHOS::AbilityBase::ZipFile::Open()+728)(c4893935af8fc8cb36569be5ccdebfa9) 675# 04 pc 000000000000a808 /system/lib64/chipset-pub-sdk/libextractortool.z.so(OHOS::AbilityBase::Extractor::Init()+124)(c4893935af8fc8cb36569be5ccdebfa9) 676# 05 pc 000000000000c4a4 /system/lib64/chipset-pub-sdk/libextractortool.z.so(OHOS::AbilityBase::ExtractorUtil::GetExtractor(std::__h::basic_string<char, std::__h::char_traits<char>, std::__h::allocator<char>> const&, bool&, bool)+596)(c4893935af8fc8cb36569be5ccdebfa9) 677# 06 pc 00000000000391e4 /system/lib64/platformsdk/libglobal_resmgr.z.so(OHOS::Global::Resource::GetIndexData(char const*, std::__h::unique_ptr<unsigned char [], std::__h::default_delete<unsigned char []>>&, unsigned long&)+284)(5c4263e737507b4a8f2ee7196a152dbd) 678# 07 pc 0000000000038590 /system/lib64/platformsdk/libglobal_resmgr.z.so(OHOS::Global::Resource::HapResource::LoadFromHap(char const*, std::__h::shared_ptr<OHOS::Global::Resource::ResConfigImpl>&, bool, bool, unsigned int const&)+80)(5c4263e737507b4a8f2ee7196a152dbd) 679# 08 pc 00000000000384e8 /system/lib64/platformsdk/libglobal_resmgr.z.so(OHOS::Global::Resource::HapResource::Load(char const*, std::__h::shared_ptr<OHOS::Global::Resource::ResConfigImpl>&, bool, bool, unsigned int const&)+364)(5c4263e737507b4a8f2ee7196a152dbd) 680# 09 pc 000000000002f118 /system/lib64/platformsdk/libglobal_resmgr.z.so(OHOS::Global::Resource::HapManager::AddResourcePath(char const*, unsigned int const&)+280)(5c4263e737507b4a8f2ee7196a152dbd) 681# 10 pc 000000000002efdc /system/lib64/platformsdk/libglobal_resmgr.z.so(OHOS::Global::Resource::HapManager::AddResource(char const*, unsigned int const&)+52)(5c4263e737507b4a8f2ee7196a152dbd) 682``` 683 684error 栈: 685 686``` 687Tid:3108, xxx 688# 00 pc 00000000003e13cc /system/lib64/platformsdk/libark_jsruntime.so(panda::ecmascript::JSObject::GetProperty(panda::ecmascript::JSThread*, panda::ecmascript::JSHandle<panda::ecmascript::JSTaggedValue> const&, panda::ecmascript::JSHandle<panda::ecmascript::JSTaggedValue> const&, panda::ecmascript::JSShared::SCheckMode)+164)(13376099388381a01b166c00a8af99fb) 689# 01 pc 00000000003d5518 /system/lib64/platformsdk/libark_jsruntime.so(panda::ecmascript::JSIterator::IteratorStep(panda::ecmascript::JSThread*, panda::ecmascript::JSHandle<panda::ecmascript::JSTaggedValue> const&)+228)(13376099388381a01b166c00a8af99fb) 690# 02 pc 0000000000570fa8 /system/lib64/platformsdk/libark_jsruntime.so(panda::ecmascript::RuntimeStubs::StArraySpread(unsigned long, unsigned int, unsigned long)+592)(13376099388381a01b166c00a8af99fb) 691# 03 pc 00000000002d53c0 /system/lib64/module/arkcompiler/stub.an(RTStub_CallRuntime+40) 692# 04 at doTask (product/phone/build/default/cache/default/default@CompileArkTS/esmodule/release/staticcommon/launchercommon/src/main/ets/db/RdbTaskPool.ts:1:1) 693# 05 at update (product/phone/build/default/cache/default/default@CompileArkTS/esmodule/release/staticcommon/launchercommon/src/main/ets/db/RdbTaskPool.ts:1:1) 694# 06 at updateAppNameByAbilityInfoAndType (product/phone/build/default/cache/default/default@CompileArkTS/esmodule/release/staticcommon/launchercommon/src/main/ets/db/RdbStoreManager.ts:12:1) 695# 07 at anonymous (product/phone/build/default/cache/default/default@CompileArkTS/esmodule/release/staticcommon/launchercommon/src/main/ets/model/AppModel.ts:0:1) 696# 08 pc 0000000000304a94 /system/lib64/platformsdk/libark_jsruntime.so(panda::ecmascript::InterpreterAssembly::Execute(panda::ecmascript::EcmaRuntimeCallInfo*)+144)(13376099388381a01b166c00a8af99fb) 697# 09 pc 0000000000201d84 /system/lib64/platformsdk/libark_jsruntime.so(panda::ecmascript::builtins::BuiltinsPromiseJob::PromiseReactionJob(panda::ecmascript::EcmaRuntimeCallInfo*)+348)(13376099388381a01b166c00a8af99fb) 698# 10 pc 00000000002d6e14 /system/lib64/module/arkcompiler/stub.an(RTStub_AsmInterpreterEntry+208) 699``` 700 701此时栈是在线程的运行过程中抓的,没有规律,说明线程未卡死;线程繁忙场景,需结合 [trace](#结合-trace) 和 hilog 判断应用具体运行场景,针对场景进行优化。 702 703### 查看 binder 信息 704 705binder信息抓取时机:存在半周期检测的故障类型是在warning事件产生后获取;其他则在block事件后获取。 706 7071、获取binder调用链。 708 709``` 710PeerBinderCatcher -- pid==35854 layer_ == 1 711 712BinderCatcher -- 713 714 35854:35854 to 52462:52462 code 3 wait:27.185154163 s frz_state:3, ns:-1:-1 to -1:-1, debug:35854:35854 to 52462:52462, active_code:0 active_thread:0, pending_async_proc=0 715 ... 716 52462:52462 to 1386:0 code 13 wait:24.733640622 s frz_state:3, ns:-1:-1 to -1:-1, debug:35854:35854 to 52462:52462, active_code:0 active_thread:0, pending_async_proc=0 717async 1590:14743 to 1697:0 code 4 wait:124.733640622 s frz_state:1, ns:-1:-1 to -1:-1, debug:1590:14743 to 1697:0, active_code:-1 active_thread:0, pending_async_proc=1590 718 ... 719``` 720以上示例为参考:从故障进程的主线程出发,存在 35854:35854 -> 52462:52462 -> 1386:0 的调用链关系,结合对端进程堆栈信息排查对端阻塞原因。 721 7222、线程号为0。 723 724表示该应用为IPC_FULL,即应用的ipc线程都在使用中,没有空闲线程分配来完成本次请求,导致阻塞,如上面示例中的1386进程,可参照其stack分析: 725 726``` 727pid context request started max ready free_async_space 728 72935862 binder 0 2 16 2 519984 730 73135854 binder 0 2 16 3 520192 732 73335850 binder 0 2 16 3 520192 734 73513669 binder 0 1 16 3 520192 736 737... 738 7391386 binder 1 15 16 0 517264 -> binderInfo 740 7411474 binder 0 2 16 4 520192 742``` 743 744可以看到此时 1386 进程处于 ready 态的线程为 0,验证了上述说法。此情况说明该进程的其他ipc线程可能全部被阻塞了,需要分析排查为什么其他ipc线程不释放。常见场景为:某一ipc线程持锁阻塞,导致其他线程等锁卡死。 745 746另一种情况为 free_async_space 消耗殆尽,导致新的ipc线程没有足够的 buffer 空间完成请求。值得说明的是,同步和异步请求都会消耗该值,常见场景为:某短时间段内大批量异步请求。 747 7483、waitTime过小。 749 750waitTime 表示的是本次ipc通信时长,如果该值远小于故障检测时长,则本次ipc请求并不是卡死的根本原因。 751一种典型的场景是:应用侧主线程在短时间内多次ipc请求,总请求时长过长导致故障。 752 753排查方向: 754 - 单次请求是否在预期时长内(例如:规格在20ms的请求接口异常情形下达到1s),排查接口性能不达预期的原因。 755 - 应用测频繁调用场景是否合理。 756 7574、无调用关系,栈为ipc栈。 758 759确定是否为瞬时栈,即warning/block栈是否一致,可能场景是:warning为ipc栈,block栈为其他瞬时栈,表明抓取binder时ipc请求已经结束,本次ipc请求耗时并不长。 760需要提到的是:binder信息并不是在发生故障时刻实时获取的,有一定的延迟性;对于存在半周期检测的故障类型来说,binder抓取比较准确,绝大多数都可以在故障时间段内完成采集;而其他故障类型在上报存在延迟的情况下可能抓取到非现场binder。 761 762当然,结合 [trace](#结合-trace) 分析更能直观查看binder的耗时情况。 763 764### 结合 hilog 765 766#### DFX 相关日志 767 7681、故障上报(reportEvent)。 769 770 771 7722、抓栈(signal: 35)。 773 774 775 7763、后台应用检测(5次后上报),21s 左右。 777 778 779 7804、记录查杀原因。 781 782 783 7845、APPFREEZE kill 卡死应用。 785 786 787 788#### 一般分析步骤 789 790根据故障日志确定上报[时间点](#获取故障发生时间点),再根据具体场景下的故障类型推断卡死开始发生的时间点,查看对应时间段的hilog日志,分析日志得出应用对应线程运行状态: 791 792- 应用日志完全无打印输出:卡死在最后日志打印的接口调用处。 793 794  795 796 **...** 797 798  799 800 例如上图案例:APP_INPUT_BLOCK 类型在 07:24:08.167 上报,应用主线程在 07:24:01.581 后就没有打印了,可排查是否为 FormManagerService: 801 802 [form_mgr_proxy.cpp(GetFormsInfoByApp:1128)] 中的逻辑超时。 803 804- 应用频繁打印输出日志:分析对应输出表示的场景及其合理性。 805 806  807 808 例如上图案例:进程在被 APP_FREEZE 杀死前在大量输出,对应的 ImageEffect 领域需排查此日志是否正常。 809 810### 结合 trace 811 812存在以下可能: 813 8141、进程每一小段业务时间并不长,但是较长时间段运行非常密集,占满了主线程。 815 816 817 818 819 820上图案例为:PriviewArea::updateShotComponent(更新组件) -> animator (执行动画)-> 密集的动画执行过程达 9.2s; 821 822线程繁忙地循环执行某业务,分析每一小段业务: 823 824- 不符合业务场景(此处不应该频繁调用),分析业务代码,为何会循环执行; 825 826- 符合业务场景,分析每一小段业务是否耗时超过预期,性能为何不满足设计规格。 827 8282、进程执行某一函数接口超时。 829 830 831 832上图案例为:OHOS::AppExecFwk::FormMgrAdapter::GetFormsInfoByApp 接口执行时长达到 8s。 833 834## 分析案例 835 836### ThreadBlock 类典型案例——未正确使用锁 837 838#### 背景/原理 839 840xxxservice 上报 THREAD_BLOCK_6S 的 appfreeze 问题。 841 842#### 错误代码案例 843 844第4行加锁,第6行函数返回失败后,第6行直接返回未解锁,导致其他线程一直等锁。 845 846```cpp 847int xxx() 848{ 849 ... 850 mutex_.lock(); 851 AIContext aiContext; 852 if (ConvertRpcHandle2AIContext(inputs[0], aiContext) != aicp::SUCCESS) { 853 return FAILED; 854 } 855 ... 856} 857``` 858 859#### 影响/报错 860 861后台应用卡死,用户无感知,但是相关功能不可用。 862 863#### 定位思路 864 865提取故障日志关键类别信息。 866 867``` 868appfreeze: com.example.hmsapp.xxx THREAD_BLOCK_6S at 20240408082432 869DisplayPowerInfo:powerState:AWAKE 870``` 871 872从 Foreground 值可看出该应用此时处于后台,可推断出当真正的 3s 事件上报上来时,后台应用已卡 **18s** 前。 873 874``` 875Module name:com.xxx.xxx.xxx 876Version:1.2.2.202 877VersionCode:1002002202 878PreInstalled:Yes 879Foreground:No --> 后台 880Pid:43675 881Uid:20020029 882Reason:THREAD_BLOCK_6S 883``` 884 885THREAD_BLOCK_3S 上报的时间为 08:24:29:612; 886THREAD_BLOCK_6S 上报的时间为 08:24:32:638;相隔 3s 符合预期。 887 888``` 889>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 890DOMAIN:AAFWK 891STRINGID:THREAD_BLOCK_6S 892TIMESTAMP:2024/04/08-08:24:32:638 893PID:43675 894UID:20020029 895PACKAGE_NAME:com.xxx.xxx.xxx 896PROCESS_NAME:com.xxx.xxx.xxx 897******************************************* 898start time: 2024/04/08-08:24:29:627 899DOMAIN = AAFWK 900EVENTNAME = THREAD_BLOCK_3S 901TIMESTAMP = 2024/04/08-08:24:29:612 902PID = 43675 903UID = 20020029 904PACKAGE_NAME = com.xxx.xxx.xxx 905PROCESS_NAME = com.xxx.xxx.xxx 906``` 907 9083s 上报时会去抓取此时的 EventHandler 信息,时间为 08:24:29.413,符合预期上报的原因为:App main thread is not response! 主线程无响应,当前正在运行的任务开始时间为 08:24:01.514。 909 910``` 911MSG = 912Fault time:2024/04/08-08:24:29 913App main thread is not response! 914mainHandler dump is: 915 EventHandler dump begin curTime: 2024-04-08 08:24:29.413 916 Event runner (Thread name = , Thread ID = 43675) is running 917 Current Running: start at 2024-04-08 08:24:01.514, Event { send thread = 43675, send time = 2024-04-08 08:24:01.514, handle time = 2024-04-08 08:24:01.514, task name = uvLoopTask } 918``` 919 920watchdog 任务位于高优先级队列(High priority event queue),如下图可发现:每隔 3s 就会抛一个任务到主线程去,符合预期。 921 922THREAD_BLOCK_3S、THREAD_BLOCK_6S 的队列一致,6s 较 3s 多了一个 event。 923 924最早的一个 event send time 为 **08:24:11.388**,与 3s 上报上来的时间 08:24:29:612 刚好差 18s,符合预期。 925 926``` 927 High priority event queue information: 928 No.1 : Event { send thread = 43679, send time = 2024-04-08 08:24:11.388, handle time = 2024-04-08 08:24:11.388, id = 1, caller = [watchdog.cpp(Timer:139)] } 929 No.2 : Event { send thread = 43679, send time = 2024-04-08 08:24:14.458, handle time = 2024-04-08 08:24:14.458, id = 1, caller = [watchdog.cpp(Timer:139)] } 930 No.3 : Event { send thread = 43679, send time = 2024-04-08 08:24:17.383, handle time = 2024-04-08 08:24:17.383, id = 1, caller = [watchdog.cpp(Timer:139)] } 931 No.4 : Event { send thread = 43679, send time = 2024-04-08 08:24:20.363, handle time = 2024-04-08 08:24:20.363, id = 1, caller = [watchdog.cpp(Timer:139)] } 932 No.5 : Event { send thread = 43679, send time = 2024-04-08 08:24:23.418, handle time = 2024-04-08 08:24:23.418, id = 1, caller = [watchdog.cpp(Timer:139)] } 933 No.6 : Event { send thread = 43679, send time = 2024-04-08 08:24:26.369, handle time = 2024-04-08 08:24:26.369, id = 1, caller = [watchdog.cpp(Timer:139)] } 934 No.7 : Event { send thread = 43679, send time = 2024-04-08 08:24:29.412, handle time = 2024-04-08 08:24:29.412, id = 1, caller = [watchdog.cpp(Timer:139)] } 935``` 936 937以上可总结:应用**主线程从 08:24:01.514 开始运行本次任务,第一次 3s 检测开始时间为 08:24:11.388,真正开始卡住的时间在 08:24:11 左右。** 938 939查看主线程栈:从 xxx_request_client.so -> libsamgr_proxy.z.so -> libipc_core.z.so(OHOS::BinderConnector::WriteBinder) 940 941可知:**此时主线程发起了一个 ipc 请求,对端进程未返回导致卡死,堆栈如下所示:** 942 943``` 944Tid:43675, Name:xxx 945# 00 pc 0000000000168c44 /system/lib/ld-musl-aarch64.so.1(ioctl+176)(91b804d2409a13f27463debe9e19fb5d) 946# 01 pc 0000000000049268 /system/lib64/platformsdk/libipc_core.z.so(OHOS::BinderConnector::WriteBinder(unsigned long, void*)+112)(e59500a4ea66775388332f6e3cc12fe3) 947# 02 pc 0000000000054fd4 /system/lib64/platformsdk/libipc_core.z.so(OHOS::BinderInvoker::TransactWithDriver(bool)+296)(e59500a4ea66775388332f6e3cc12fe3) 948# 03 pc 00000000000544c8 /system/lib64/platformsdk/libipc_core.z.so(OHOS::BinderInvoker::WaitForCompletion(OHOS::MessageParcel*, int*)+304)(e59500a4ea66775388332f6e3cc12fe3) 949# 04 pc 0000000000053c84 /system/lib64/platformsdk/libipc_core.z.so(OHOS::BinderInvoker::SendRequest(int, unsigned int, OHOS::MessageParcel&, OHOS::MessageParcel&, OHOS::MessageOption&)+312)(e59500a4ea66775388332f6e3cc12fe3) 950# 05 pc 000000000002d6d8 /system/lib64/platformsdk/libipc_core.z.so(OHOS::IPCObjectProxy::SendRequestInner(bool, unsigned int, OHOS::MessageParcel&, OHOS::MessageParcel&, OHOS::MessageOption&)+128)(e59500a4ea66775388332f6e3cc12fe3) 951# 06 pc 0000000000030e00 /system/lib64/platformsdk/libipc_core.z.so(OHOS::IPCObjectProxy::GetProtoInfo()+396)(e59500a4ea66775388332f6e3cc12fe3) 952# 07 pc 000000000002e990 /system/lib64/platformsdk/libipc_core.z.so(OHOS::IPCObjectProxy::WaitForInit()+292)(e59500a4ea66775388332f6e3cc12fe3) 953# 08 pc 0000000000036cd0 /system/lib64/platformsdk/libipc_core.z.so(OHOS::IPCProcessSkeleton::FindOrNewObject(int)+116)(e59500a4ea66775388332f6e3cc12fe3) 954# 09 pc 00000000000571cc /system/lib64/platformsdk/libipc_core.z.so(OHOS::BinderInvoker::UnflattenObject(OHOS::Parcel&)+272)(e59500a4ea66775388332f6e3cc12fe3) 955# 10 pc 00000000000463a4 /system/lib64/platformsdk/libipc_core.z.so(OHOS::MessageParcel::ReadRemoteObject()+116)(e59500a4ea66775388332f6e3cc12fe3) 956# 11 pc 000000000001250c /system/lib64/chipset-pub-sdk/libsamgr_proxy.z.so(OHOS::SystemAbilityManagerProxy::CheckSystemAbility(int, bool&)+952)(6f113f37ac6ac882cfa16077ad5b406a) 957# 12 pc 0000000000010e7c /system/lib64/chipset-pub-sdk/libsamgr_proxy.z.so(OHOS::SystemAbilityManagerProxy::GetSystemAbilityWrapper(int, std::__h::basic_string<char, std::__h::char_traits<char>, std::__h::allocator<char>> const&)+232)(6f113f37ac6ac882cfa16077ad5b406a) 958# 13 pc 00000000000118b8 /system/lib64/chipset-pub-sdk/libsamgr_proxy.z.so(OHOS::SystemAbilityManagerProxy::Recompute(int, int)+132)(6f113f37ac6ac882cfa16077ad5b406a) 959# 14 pc 0000000000011170 /system/lib64/chipset-pub-sdk/libsamgr_proxy.z.so(OHOS::DynamicCache<int, OHOS::sptr<OHOS::IRemoteObject>>::QueryResult(int, int)+316)(6f113f37ac6ac882cfa16077ad5b406a) 960# 15 pc 0000000000007e0c xxx_request_client.so(xxx::RPCRequestClient::GetService()+540)(557450139184527807025a632613fd76) 961# 16 pc 0000000000008824 xxx_request_client.so(xxx::RPCRequestClient::Init()+16)(557450139184527807025a632613fd76) 962# 17 pc 0000000000008d60 xxx_request_client.so(CreateRpcRequestByServiceName+104)(557450139184527807025a632613fd76) 963# 18 pc 0000000000008f98 xxx_request_client.so(CreateRpcRequest+72)(557450139184527807025a632613fd76) 964# 19 pc 0000000000002944 xxx_rpc_client.so(xxx::xxx::RpcRequestClient::RpcRequestClient()+224)(02343ed2fff69759d408b23eaf69fcde) 965``` 966 967查看 binderCatcher:**此时 43675 的主线程正在与 979 进程通信,抓 binder 时已经卡了 27s。** 968 969``` 970PeerBinderCatcher -- pid==43675 layer_ == 1 971 972BinderCatcher -- 973 974 43675:43675 to 979:0 code 5f475249 wait:27.104545829 s frz_state:1, ns:-1:-1 to -1:-1, debug:35854:35854 to 52462:52462, active_code:0 active_thread:0, pending_async_proc=0 --> binder通信等待了27s 975 57187:39147 to 28644:30753 code 0 wait:0.337894271 s frz_state:3, ns:-1:-1 to -1:-1, debug:57187:39147 to 28644:30753, active_code:0 active_thread:0, pending_async_proc=0 976 57187:39151 to 28644:28652 code 7 wait:0.531140625 s frz_state:3, ns:-1:-1 to -1:-1, debug:57187:39151 to 28644:28652, active_code:0 active_thread:0, pending_async_proc=0 977 57187:39150 to 28644:31297 code 0 wait:0.976419270 s frz_state:3, ns:-1:-1 to -1:-1, debug:57187:39150 to 28644:31297, active_code:0 active_thread:0, pending_async_proc=0 978 57187:38979 to 28644:32554 code 0 wait:0.22108334 s frz_state:3, ns:-1:-1 to -1:-1, debug:57187:38979 to 28644:32554, active_code:0 active_thread:0, pending_async_proc=0 979 57187:39149 to 28644:30754 code 0 wait:0.534261979 s frz_state:3, ns:-1:-1 to -1:-1, debug:57187:39149 to 28644:30754, active_code:0 active_thread:0, pending_async_proc=0 980 57187:38949 to 28644:31301 code 0 wait:0.977779166 s frz_state:3, ns:-1:-1 to -1:-1, debug:57187:38949 to 28644:31301, active_code:0 active_thread:0, pending_async_proc=0 981 57187:39172 to 28644:35667 code 0 wait:1.47387500 s frz_state:3, ns:-1:-1 to -1:-1, debug:57187:39172 to 28644:35667, active_code:0 active_thread:0, pending_async_proc=0 982 57187:39173 to 28644:28822 code 0 wait:0.565389063 s frz_state:3, ns:-1:-1 to -1:-1, debug:57187:39173 to 28644:28822, active_code:0 active_thread:0, pending_async_proc=0 983 1477:1676 to 1408:2026 code 17 wait:0.0 s frz_state:3, ns:-1:-1 to -1:-1, debug:1477:1676 to 1408:2026, active_code:0 active_thread:0, pending_async_proc=0 984 628:8136 to 979:0 code 2 wait:13166.722870859 s frz_state:1, ns:-1:-1 to -1:-1, debug:628:8136 to 979:0, active_code:0 active_thread:0, pending_async_proc=0 985``` 986 987查看 979 进程主线程栈:xxxserver 在等待锁释放。**该问题即为典型的锁使用不当,导致的等锁卡死。** 988 989``` 990Binder catcher stacktrace, type is peer, pid : 979 991Result: 0 ( no error ) 992Timestamp:2024-04-08 08:24:29.000 993Pid:979 994Uid:3094 995Process name:xxxserver 996Process life time:60410s 997Tid:979, Name:xxxserver 998# 00 pc 00000000001aafc4 /system/lib/ld-musl-aarch64.so.1(__timedwait_cp+192)(91b804d2409a13f27463debe9e19fb5d) 999# 01 pc 00000000001b0d50 /system/lib/ld-musl-aarch64.so.1(__pthread_mutex_timedlock_inner+592)(91b804d2409a13f27463debe9e19fb5d) 1000# 02 pc 00000000000c38e0 /system/lib64/libc++.so(std::__h::mutex::lock()+8)(0b61ba21a775725a1bd8802a393b133afbc503a5) --> 调用了lock,然后等待 1001# 03 pc 00000000000086dc xxx_server.so(xxx::xxx::InitImpl(int, std::__h::vector<xxx::xxx::RpcHandle, std::__h::allocator<xxx::xxx::RpcHandle>> const&, std::__h::vector<xxx::xxx::RpcHandle, std::__h::allocator<xxx::xxx::RpcHandle>>&)+84)(f4bb275898d797b22eae35fe48db9009) 1002# 04 pc 000000000000798c xxx_request_server.so(xxx::RPCRequestStub::SyncExecute(OHOS::MessageParcel&, OHOS::MessageParcel&)+164)(70cbb10c758902c1e3e179efc93ce0af) 1003# 05 pc 0000000000008314 xxx_request_server.so(xxx::RPCRequestStub::OnRemoteRequest(unsigned int, OHOS::MessageParcel&, OHOS::MessageParcel&, OHOS::MessageOption&)+300)(70cbb10c758902c1e3e179efc93ce0af) 1004# 06 pc 00000000000153e4 /system/lib64/chipset-pub-sdk/libipc_single.z.so(OHOS::IPCObjectStub::SendRequest(unsigned int, OHOS::MessageParcel&, OHOS::MessageParcel&, OHOS::MessageOption&)+604)(25b3d63905ef47289c096ea42ba2d86a) 1005# 07 pc 000000000002b464 /system/lib64/chipset-pub-sdk/libipc_single.z.so(OHOS::IPC_SINGLE::BinderInvoker::OnTransaction(unsigned char const*)+1220)(25b3d63905ef47289c096ea42ba2d86a) 1006# 08 pc 000000000002baec /system/lib64/chipset-pub-sdk/libipc_single.z.so(OHOS::IPC_SINGLE::BinderInvoker::HandleCommandsInner(unsigned int)+368)(25b3d63905ef47289c096ea42ba2d86a) 1007# 09 pc 000000000002a6b0 /system/lib64/chipset-pub-sdk/libipc_single.z.so(OHOS::IPC_SINGLE::BinderInvoker::HandleCommands(unsigned int)+60)(25b3d63905ef47289c096ea42ba2d86a) 1008# 10 pc 000000000002a4dc /system/lib64/chipset-pub-sdk/libipc_single.z.so(OHOS::IPC_SINGLE::BinderInvoker::StartWorkLoop()+120)(25b3d63905ef47289c096ea42ba2d86a) 1009# 11 pc 000000000002bc2c /system/lib64/chipset-pub-sdk/libipc_single.z.so(OHOS::IPC_SINGLE::BinderInvoker::JoinThread(bool)+92)(25b3d63905ef47289c096ea42ba2d86a) 1010# 12 pc 0000000000004bd4 xxxserver(02cff7e5dce05d6d28096601458b6f6d) 1011# 13 pc 00000000000a3b58 /system/lib/ld-musl-aarch64.so.1(libc_start_main_stage2+64)(91b804d2409a13f27463debe9e19fb5d) 1012``` 1013 1014反编译即可确定对应卡锁代码行,结合上下文检测锁的使用。 1015 1016#### 修改方法 1017 1018```cpp 1019int xxx() 1020{ 1021 ... 1022 mutex_.lock(); 1023 AIContext aiContext; 1024 if (ConvertRpcHandle2AIContext(inputs[0], aiContext) != aicp::SUCCESS) { 1025 return FAILED; 1026 } 1027 ... 1028} 1029``` 1030 1031修改为: 1032 1033```cpp 1034int xxx() 1035{ 1036 ... 1037 mutex_.lock(); 1038 AIContext aiContext; 1039 if (ConvertRpcHandle2AIContext(inputs[0], aiContext) != aicp::SUCCESS) { 1040 mutex_.unlock(); 1041 return FAILED; 1042 } 1043 ... 1044 mutex_.unlock(); 1045 ... 1046} 1047``` 1048 1049结合上下文,合理调整锁的使用。 1050 1051#### 推荐建议(问题总结) 1052 10531、多线程交互时需要格外注意时序、死锁问题。 1054 1055### APP_INPUT_BLOCK 类典型案例——组件全量刷新 1056 1057#### 背景 1058 1059用户在切换主题时突然卡死,有 sceneboard 的 appfreeze 问题上报。 1060 1061该问题为线程繁忙导致卡死。 1062 1063#### 错误代码实例 1064 1065对于组件的刷新复用,是通过组件的 key 值来控制的,当页面更新时,若组件的 key 不变,会复用之前的组件;若 key 值变化,会更新组件及其子组件。 1066 1067该函数用户生成桌面组件的 key,关联有 themeStyle,当用户在桌面连续切换主题时,导致组件反复全量刷新,导致卡死。 1068 1069```ts 1070private getForeachKey(item: xxx): string { 1071 ... 1072 return `${item.xxx2}${item.xxx2}...${item.themeStyle}`; 1073} 1074``` 1075 1076#### 影响 1077 1078用户在合一桌面切换主题时页面高概率卡死,点击无响应,而后闪退到锁屏界面。 1079 1080严重影响用户使用体验。 1081 1082#### 定位思路 1083 1084提取故障关键信息。 1085 1086``` 1087appfreeze: com.example.sceneboard APP_INPUT_BLOCK at 20240319022527 1088DisplayPowerInfo:powerState:AWAKE 1089``` 1090 1091APP_INPUT_BLOCK 是事件上报的时间为 **14:40:59:440**。 1092 1093``` 1094DOMAIN:AAFWK 1095STRINGID:APP_INPUT_BLOCK 1096TIMESTAMP:2024/03/14-14:40:59:440 --> 故障上报时间 1097PID:2918 1098UID:20020017 1099PACKAGE_NAME:com.example.sceneboard 1100PROCESS_NAME:com.example.sceneboard 1101``` 1102 1103上报的原因是:User input does not respond! 用户输入事件没有响应。 1104可以看到当前是在主线程上(Thread ID == Pid),正在运行的任务从 **14:40:53.499** 开始运行,直到 Fault time **14:40:58** 都还没有运行完。 1105 1106``` 1107MSG = 1108Fault time:2024/03/14-14:40:58 1109User input does not respond! 1110mainHandler dump is: 1111 EventHandler dump begin curTime: 2024-03-14 02:40:58.520 1112 Event runner (Thread name = , Thread ID = 2918) is running 1113 Current Running: start at 2024-03-14 02:40:53.499, Event { send thread = 2918, send time = 2024-03-14 02:40:53.499, handle time = 2024-03-14 02:40:53.499, task name = } 1114``` 1115 1116用户输入事件需要第一时间响应,所以同 watchdog 一样都在 High priority event queue。 1117 1118可以看到此时已经有 200+ 的 input event 在队列中阻塞住没有处理了。 1119 1120``` 1121 High priority event queue information: 1122 No.1 : Event { send thread = 3370, send time = 2024-03-14 02:40:53.690, handle time = 2024-03-14 02:40:53.690, task name = , caller = [input_manager_impl.cpp(OnPointerEvent:465)] } 1123 No.2 : Event { send thread = 3370, send time = 2024-03-14 02:40:53.699, handle time = 2024-03-14 02:40:53.699, task name = , caller = [input_manager_impl.cpp(OnPointerEvent:465)] } 1124 No.3 : Event { send thread = 3370, send time = 2024-03-14 02:40:53.708, handle time = 2024-03-14 02:40:53.708, task name = , caller = [input_manager_impl.cpp(OnPointerEvent:465)] } 1125 No.4 : Event { send thread = 3370, send time = 2024-03-14 02:40:53.717, handle time = 2024-03-14 02:40:53.717, task name = , caller = [input_manager_impl.cpp(OnPointerEvent:465)] } 1126 No.5 : Event { send thread = 3370, send time = 2024-03-14 02:40:53.726, handle time = 2024-03-14 02:40:53.726, task name = , caller = [input_manager_impl.cpp(OnPointerEvent:465)] } 1127 No.6 : Event { send thread = 3370, send time = 2024-03-14 02:40:53.736, handle time = 2024-03-14 02:40:53.736, task name = , caller = [input_manager_impl.cpp(OnPointerEvent:465)] } 1128 No.7 : Event { send thread = 3370, send time = 2024-03-14 02:40:53.745, handle time = 2024-03-14 02:40:53.745, task name = , caller = [input_manager_impl.cpp(OnPointerEvent:465)] } 1129 No.8 : Event { send thread = 3370, send time = 2024-03-14 02:40:53.754, handle time = 2024-03-14 02:40:53.754, task name = , caller = [input_manager_impl.cpp(OnPointerEvent:465)] } 1130 ... 1131 No.190 : Event { send thread = 3370, send time = 2024-03-14 02:40:56.166, handle time = 2024-03-14 02:40:56.166, task name = , caller = [input_manager_impl.cpp(OnPointerEvent:465)] } 1132 No.191 : Event { send thread = 3370, send time = 2024-03-14 02:40:56.176, handle time = 2024-03-14 02:40:56.176, task name = , caller = [input_manager_impl.cpp(OnPointerEvent:465)] } 1133 No.192 : Event { send thread = 3370, send time = 2024-03-14 02:40:56.186, handle time = 2024-03-14 02:40:56.186, task name = , caller = [input_manager_impl.cpp(OnPointerEvent:465)] } 1134 No.193 : Event { send thread = 2923, send time = 2024-03-14 02:40:56.196, handle time = 2024-03-14 02:40:56.196, id = 1, caller = [watchdog.cpp(Timer:140)] } 1135 No.194 : Event { send thread = 3370, send time = 2024-03-14 02:40:56.196, handle time = 2024-03-14 02:40:56.196, task name = , caller = [input_manager_impl.cpp(OnPointerEvent:465)] } 1136 No.195 : Event { send thread = 3370, send time = 2024-03-14 02:40:56.206, handle time = 2024-03-14 02:40:56.206, task name = , caller = [input_manager_impl.cpp(OnPointerEvent:465)] } 1137 No.196 : Event { send thread = 3370, send time = 2024-03-14 02:40:56.216, handle time = 2024-03-14 02:40:56.216, task name = , caller = [input_manager_impl.cpp(OnPointerEvent:465)] } 1138 No.197 : Event { send thread = 3370, send time = 2024-03-14 02:40:56.226, handle time = 2024-03-14 02:40:56.226, task name = , caller = [input_manager_impl.cpp(OnPointerEvent:465)] } 1139 No.198 : Event { send thread = 3370, send time = 2024-03-14 02:40:56.236, handle time = 2024-03-14 02:40:56.236, task name = , caller = [input_manager_impl.cpp(OnPointerEvent:465)] } 1140 No.199 : Event { send thread = 3370, send time = 2024-03-14 02:40:56.245, handle time = 2024-03-14 02:40:56.245, task name = , caller = [input_manager_impl.cpp(OnPointerEvent:465)] } 1141 No.200 : Event { send thread = 3370, send time = 2024-03-14 02:40:56.254, handle time = 2024-03-14 02:40:56.254, task name = , caller = [input_manager_impl.cpp(OnPointerEvent:465)] } 1142 No.201 : Event { send thread = 3370, send time = 2024-03-14 02:40:56.265, handle time = 2024-03-14 02:40:56.265, task name = , caller = [input_manager_impl.cpp(OnPointerEvent:465)] } 1143 No.202 : Event { send thread = 3370, send time = 2024-03-14 02:40:56.275, handle time = 2024-03-14 02:40:56.274, task name = , caller = [input_manager_impl.cpp(OnPointerEvent:465)] } 1144 No.203 : Event { send thread = 3370, send time = 2024-03-14 02:40:56.284, handle time = 2024-03-14 02:40:56.284, task name = , caller = [input_manager_impl.cpp(OnPointerEvent:465)] } 1145 No.204 : Event { send thread = 3370, send time = 2024-03-14 02:40:56.294, handle time = 2024-03-14 02:40:56.294, task name = , caller = [input_manager_impl.cpp(OnPointerEvent:465)] } 1146 No.205 : Event { send thread = 3370, send time = 2024-03-14 02:40:56.305, handle time = 2024-03-14 02:40:56.305, task name = , caller = [input_manager_impl.cpp(OnPointerEvent:465)] } 1147``` 1148 1149从逻辑来看,input event 触发应用主线程任务开始执行,但是 6s 还没有执行完,没有反馈,导致 ANR 超时; 1150因此我们只需要关心 input 触发了应用执行什么任务,该任务为什么会执行超时即可。 1151 1152主线程栈:此时运行时状态,栈顶的 ark_jsruntime GetCurrentThreadId 也不是持锁阻塞或耗时很长函数,抓到的栈为瞬时栈,没有参考意义。 1153 1154``` 1155Tid:2918, Name:example.sceneboard 1156# 00 pc 000000000009f73c /system/lib/ld-musl-aarch64.so.1(8fa55898166cd804dad43d909b5319cc) 1157# 01 pc 000000000054b7b4 /system/lib64/platformsdk/libark_jsruntime.so(panda::os::thread::GetCurrentThreadId()+12)(7715646e48f750f3dc31e660b056eb43) 1158# 02 pc 00000000002107a4 /system/lib64/platformsdk/libark_jsruntime.so(panda::ecmascript::EcmaVM::CheckThread() const+200)(7715646e48f750f3dc31e660b056eb43) 1159# 03 pc 0000000000432998 /system/lib64/platformsdk/libark_jsruntime.so(panda::JSNApi::GetHandleAddr(panda::ecmascript::EcmaVM const*, unsigned long)+64)(7715646e48f750f3dc31e660b056eb43) 1160# 04 pc 000000000003eeb8 /system/lib64/platformsdk/libace_napi.z.so(ArkNativeReference::Get()+32)(c3a760aff0c73a2e76accaf518321fc9) 1161# 05 pc 0000000000043cb4 /system/lib64/platformsdk/libace_napi.z.so(napi_get_reference_value+48)(c3a760aff0c73a2e76accaf518321fc9) 1162# 06 pc 0000000000007564 /system/lib64/module/events/libemitter.z.so(OHOS::AppExecFwk::SearchCallbackInfo(napi_env__*, std::__h::variant<unsigned int, std::__h::basic_string<char, std::__h::char_traits<char>, std::__h::allocator<char>>> const&, napi_value__*)+248)(8fe2937855aab3ea839419f952597511) 1163# 07 pc 0000000000007d8c /system/lib64/module/events/libemitter.z.so(OHOS::AppExecFwk::OnOrOnce(napi_env__*, napi_callback_info__*, bool)+568)(8fe2937855aab3ea839419f952597511) 1164# 08 pc 00000000000096d8 /system/lib64/module/events/libemitter.z.so(OHOS::AppExecFwk::JS_Once(napi_env__*, napi_callback_info__*) (.cfi)+84)(8fe2937855aab3ea839419f952597511) 1165# 09 pc 000000000002c8f0 /system/lib64/platformsdk/libace_napi.z.so(ArkNativeFunctionCallBack(panda::JsiRuntimeCallInfo*)+168)(c3a760aff0c73a2e76accaf518321fc9) 1166# 10 pc 0000000000187b48 /system/lib64/module/arkcompiler/stub.an(RTStub_PushCallArgsAndDispatchNative+40) 1167# 11 pc 00000000002da5fc /system/lib64/platformsdk/libark_jsruntime.so(panda::ecmascript::InterpreterAssembly::Execute(panda::ecmascript::EcmaRuntimeCallInfo*)+416)(7715646e48f750f3dc31e660b056eb43) 1168# 12 pc 00000000002da5fc /system/lib64/platformsdk/libark_jsruntime.so(panda::ecmascript::InterpreterAssembly::Execute(panda::ecmascript::EcmaRuntimeCallInfo*)+416)(7715646e48f750f3dc31e660b056eb43) 1169# 13 pc 00000000003954a0 /system/lib64/platformsdk/libark_jsruntime.so(panda::ecmascript::JSStableArray::HandleforEachOfStable(panda::ecmascript::JSThread*, panda::ecmascript::JSHandle<panda::ecmascript::JSObject>, panda::ecmascript::JSHandle<panda::ecmascript::JSTaggedValue>, panda::ecmascript::JSHandle<panda::ecmascript::JSTaggedValue>, unsigned int, unsigned int&)+596)(7715646e48f750f3dc31e660b056eb43) 1170# 14 pc 000000000018f4b0 /system/lib64/platformsdk/libark_jsruntime.so(panda::ecmascript::builtins::BuiltinsArray::ForEach(panda::ecmascript::EcmaRuntimeCallInfo*)+840)(7715646e48f750f3dc31e660b056eb43) 1171... 1172``` 1173 1174接下来排查流水日志: 1175 1176首先找到上报 APP_INPUT_BLOCK 的时间点,大概在 13:40:59.448 左右,且从这里我们可以看到在事件上报完后,dfx 将卡死的 scb 杀掉。 1177 1178 1179 1180往前推 6s 左右,可以看到在 14:40:53.498 左右,有一个点击事件发给了 scb。 1181 1182 1183 1184在这之间的 6s 存在大量的 scb 日志,判断是在做更新渲染。 1185 1186 1187 1188看下对应时间点的 trace: 1189 1190 1191 1192发现 scb 主线程被占满,确实很繁忙。选择耗时较长的任务,是 **CustomNodeUpdate SwiperPage**,后续就需要排查为啥这个组件里一直在做刷新。 1193 1194根据对应领域排查后发现:swiperPage上把 themeStyle 加入到了 key 里面,key 变了就会走控件新建流程。 1195 1196即当用户切换主题或者切换图标风格时,会造成桌面上控件的全量新建,导致主线程繁忙,导致输入事件未响应。 1197 1198#### 修改方法 1199 1200仅当切换桌面组件风格时,才触发桌面组件的刷新,缩小刷新范围。 1201 1202```ts 1203+ if (!CheckEmptyUtils.isEmpty(themeStyleInfo.iconResourcePath) && 1204+ themeStyleInfo.iconResourcePath !== this.themeStyle.iconResourcePath) { 1205+ this.isStyleChanged = true; 1206+ this.themeStyle.iconResourcePath = themeStyleInfo.iconResourcePath; --> themeStyle 与 iconResourcePath 关联 1207+ } 1208``` 1209 1210#### 推荐建议(问题总结) 1211 12121、用户点击触发页面更新时,需要合理控制页面刷新的范围,考虑大量组件、频繁刷新等场景。 1213 1214### LIFECYCLE_TIMEOUT 类典型案例——加载云图 1215 1216#### 背景/原理 1217 1218用户在打开云笔记时应用卡死后闪退。 1219 1220#### 错误代码实例 1221 1222循环中同步获取云图。 1223 1224```ts 1225public static xxxFunction(fileUris: string[]): void { 1226 ... 1227 for (const fileuril of fileUrils) { 1228 let file = fs.openSync(fileUri, fs.OpenMode.READ_ONLY); 1229 ... 1230 } 1231 ... 1232} 1233``` 1234 1235#### 影响/报错 1236 1237应用拉起、切前台等过程中卡死并闪退。 1238 1239#### 定位思路 1240 1241以 notepad LIFECYCLE_TIMEOUT 为例,提取故障关键信息: 1242 1243``` 1244 sysfreeze: LIFECYCLE_TIMEOUT LIFECYCLE_TIMEOUT at 20240201100459 1245``` 1246 1247查看 MSG 信息:**foreground timeout,对应时长为 5s**。 1248 1249``` 1250MSG = 1251Fault time:2024/02/01-10:04:57 1252ability:MainAbility foreground timeout. 1253server: 1254312522; AbilityRecord::ForegroundAbility; the ForegroundAbility lifecycle starts. 1255client: 1256312522; AbilityThread::ScheduleAbilityTransaction; the foreground lifecycle. 1257``` 1258 1259LIFECYCLE_HALF_TIMEOUT 上报时间为 **10:04:57:538**; 1260LIFECYCLE_TIMEOUT 上报时间为 **10:04:59:965**;相隔 2.5s 左右,符合预期。 1261 1262``` 1263>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1264DOMAIN:AAFWK 1265STRINGID:LIFECYCLE_TIMEOUT 1266TIMESTAMP:2024/02/01-10:04:59:965 1267PID:18083 1268UID:20020041 1269PACKAGE_NAME:com.example.notepad 1270PROCESS_NAME:com.example.notepad 1271******************************************* 1272start time: 2024/02/01-10:04:57:555 1273DOMAIN = AAFWK 1274EVENTNAME = LIFECYCLE_HALF_TIMEOUT 1275TIMESTAMP = 2024/02/01-10:04:57:538 1276PID = 18083 1277UID = 20020041 1278TID = 17286 1279PACKAGE_NAME = com.example.notepad 1280PROCESS_NAME = com.example.notepad 1281``` 1282 1283任务开始的时间为 **10:04:54.798**,离 LIFECYCLE_HALF_TIMEOUT 相隔 2.5s 左右,符合预期。 1284 1285``` 1286mainHandler dump is: 1287 EventHandler dump begin curTime: 2024-02-01 10:04:57.306 1288 Event runner (Thread name = , Thread ID = 18083) is running 1289 Current Running: start at 2024-02-01 10:04:54.798, Event { send thread = 18132, send time = 2024-02-01 10:04:54.778, handle time = 2024-02-01 10:04:54.778, task name = UIAbilityThread:SendResult } 1290 History event queue information: 1291 No. 0 : Event { send thread = 18083, send time = 2024-02-01 10:04:46.481, handle time = 2024-02-01 10:04:46.981, trigger time = 2024-02-01 10:04:46.982, completeTime time = 2024-02-01 10:04:46.982, task name = } 1292 No. 1 : Event { send thread = 18132, send time = 2024-02-01 10:04:47.149, handle time = 2024-02-01 10:04:47.149, trigger time = 2024-02-01 10:04:47.149, completeTime time = 2024-02-01 10:04:47.197, task name = MainThread:BackgroundApplication } 1293 No. 2 : Event { send thread = 18083, send time = 2024-02-01 10:04:44.329, handle time = 2024-02-01 10:04:47.329, trigger time = 2024-02-01 10:04:47.329, completeTime time = 2024-02-01 10:04:47.329, task name = } 1294 No. 3 : Event { send thread = 18087, send time = 2024-02-01 10:04:48.091, handle time = 2024-02-01 10:04:48.091, trigger time = 2024-02-01 10:04:48.091, completeTime time = 2024-02-01 10:04:48.091, task name = } 1295 No. 4 : Event { send thread = 18087, send time = 2024-02-01 10:04:51.047, handle time = 2024-02-01 10:04:51.047, trigger time = 2024-02-01 10:04:51.048, completeTime time = 2024-02-01 10:04:51.048, task name = } 1296 No. 5 : Event { send thread = 18087, send time = 2024-02-01 10:04:54.067, handle time = 2024-02-01 10:04:54.067, trigger time = 2024-02-01 10:04:54.067, completeTime time = 2024-02-01 10:04:54.067, task name = } 1297 ... 1298``` 1299 1300看对应的堆栈信息:libfs.z.so -> libdatashare_consumer.z.so -> libipc_core.z.so。 1301 1302``` 1303Tid:18083, Name:ei.example.notepad 1304# 00 pc 00000000001617a4 /system/lib/ld-musl-aarch64.so.1(ioctl+180)(4ca73cff61bea7c4a687eb0f71c9df69) 1305# 01 pc 000000000003e8a0 /system/lib64/platformsdk/libipc_core.z.so(OHOS::BinderConnector::WriteBinder(unsigned long, void*)+72)(3248fceb1fa676994734e0437430ce37) 1306# 02 pc 0000000000049f38 /system/lib64/platformsdk/libipc_core.z.so(OHOS::BinderInvoker::TransactWithDriver(bool)+296)(3248fceb1fa676994734e0437430ce37) 1307# 03 pc 00000000000496f8 /system/lib64/platformsdk/libipc_core.z.so(OHOS::BinderInvoker::WaitForCompletion(OHOS::MessageParcel*, int*)+116)(3248fceb1fa676994734e0437430ce37) 1308# 04 pc 00000000000490bc /system/lib64/platformsdk/libipc_core.z.so(OHOS::BinderInvoker::SendRequest(int, unsigned int, OHOS::MessageParcel&, OHOS::MessageParcel&, OHOS::MessageOption&)+312)(3248fceb1fa676994734e0437430ce37) 1309# 05 pc 0000000000027700 /system/lib64/platformsdk/libipc_core.z.so(OHOS::IPCObjectProxy::SendRequestInner(bool, unsigned int, OHOS::MessageParcel&, OHOS::MessageParcel&, OHOS::MessageOption&)+132)(3248fceb1fa676994734e0437430ce37) 1310# 06 pc 000000000002799c /system/lib64/platformsdk/libipc_core.z.so(OHOS::IPCObjectProxy::SendRequest(unsigned int, OHOS::MessageParcel&, OHOS::MessageParcel&, OHOS::MessageOption&)+140)(3248fceb1fa676994734e0437430ce37) 1311# 07 pc 000000000002640c /system/lib64/platformsdk/libdatashare_consumer.z.so(OHOS::DataShare::DataShareProxy::OpenFile(OHOS::Uri const&, std::__h::basic_string<char, std::__h::char_traits<char>, std::__h::allocator<char>> const&)+440)(e93b5085235269d4b7218ea7de64b69a) 1312# 08 pc 0000000000014b2c /system/lib64/platformsdk/libdatashare_consumer.z.so(OHOS::DataShare::ExtSpecialController::OpenFile(OHOS::Uri const&, std::__h::basic_string<char, std::__h::char_traits<char>, std::__h::allocator<char>> const&)+160)(e93b5085235269d4b7218ea7de64b69a) 1313# 09 pc 0000000000022c54 /system/lib64/platformsdk/libdatashare_consumer.z.so(OHOS::DataShare::DataShareHelperImpl::OpenFile(OHOS::Uri&, std::__h::basic_string<char, std::__h::char_traits<char>, std::__h::allocator<char>> const&)+96)(e93b5085235269d4b7218ea7de64b69a) 1314# 10 pc 0000000000108b34 /system/lib64/module/file/libfs.z.so(OHOS::FileManagement::ModuleFileIO::OpenFileByDatashare(std::__h::basic_string<char, std::__h::char_traits<char>, std::__h::allocator<char>> const&, unsigned int)+468)(152580bf9c379f11f29fdded278541bd) 1315# 11 pc 0000000000108264 /system/lib64/module/file/libfs.z.so(OHOS::FileManagement::ModuleFileIO::OpenFileByUri(std::__h::basic_string<char, std::__h::char_traits<char>, std::__h::allocator<char>> const&, unsigned int)+1760)(152580bf9c379f11f29fdded278541bd) 1316# 12 pc 00000000001077fc /system/lib64/module/file/libfs.z.so(OHOS::FileManagement::ModuleFileIO::Open::Sync(napi_env__*, napi_callback_info__*) (.cfi)+1036)(152580bf9c379f11f29fdded278541bd) 1317# 13 pc 000000000002bbf8 /system/lib64/platformsdk/libace_napi.z.so(ArkNativeFunctionCallBack(panda::JsiRuntimeCallInfo*)+168)(f5b81db475835ee752235c606b1c5e33) 1318# 14 pc 0000000000132e48 /system/lib64/module/arkcompiler/stub.an 1319``` 1320 1321通过 binder 可以看出在与 5235 进程通信,时长大于 2.5s,符合预期。 1322 1323``` 1324PeerBinderCatcher -- pid==18083 layer_ == 1 1325 1326BinderCatcher -- 1327 1328 18083:18083 to 5235:7437 code 2 wait:2.723147396 s, ns:-1:-1 to -1:-1, debug:18083:18083 to 5235:7437, active_code:0 active_thread:0, pending_async_proc=0 1329 3462:3840 to 4956:4958 code 8 wait:261.213830169 s, ns:-1:-1 to -1:-1, debug:3462:3840 to 4956:4958, active_code:0 active_thread:0, pending_async_proc=0 1330 3462:3621 to 4956:4981 code 8 wait:273.550283291 s, ns:-1:-1 to -1:-1, debug:3462:3621 to 4956:4981, active_code:0 active_thread:0, pending_async_proc=0 1331``` 1332 13335235 为媒体库进程,该堆栈无有效信息。 1334 1335``` 1336Binder catcher stacktrace, type is peer, pid : 5235 1337Result: 0 ( no error ) 1338Timestamp:2024-02-01 10:04:57.000 1339Pid:5235 1340Uid:20020079 1341Process name:com.medialibrary.medialibrarydata 1342Tid:5235, Name:edialibrarydata 1343# 00 pc 0000000000142d1c /system/lib/ld-musl-aarch64.so.1(epoll_wait+84)(4ca73cff61bea7c4a687eb0f71c9df69) 1344# 01 pc 000000000000fb74 /system/lib64/chipset-pub-sdk/libeventhandler.z.so(OHOS::AppExecFwk::EpollIoWaiter::WaitFor(std::__h::unique_lock<std::__h::mutex>&, long)+224)(a4d21072c08fd3ac639d5cf5b8fb8b51) 1345# 02 pc 0000000000019df8 /system/lib64/chipset-pub-sdk/libeventhandler.z.so(OHOS::AppExecFwk::EventQueue::WaitUntilLocked(std::__h::chrono::time_point<std::__h::chrono::steady_clock, std::__h::chrono::duration<long long, std::__h::ratio<1l, 1000000000l>>> const&, std::__h::unique_lock<std::__h::mutex>&)+180)(a4d21072c08fd3ac639d5cf5b8fb8b51) 1346# 03 pc 0000000000019c6c /system/lib64/chipset-pub-sdk/libeventhandler.z.so(OHOS::AppExecFwk::EventQueue::GetEvent()+128)(a4d21072c08fd3ac639d5cf5b8fb8b51) 1347# 04 pc 00000000000202b8 /system/lib64/chipset-pub-sdk/libeventhandler.z.so(OHOS::AppExecFwk::(anonymous namespace)::EventRunnerImpl::Run()+1164)(a4d21072c08fd3ac639d5cf5b8fb8b51) 1348# 05 pc 0000000000022388 /system/lib64/chipset-pub-sdk/libeventhandler.z.so(OHOS::AppExecFwk::EventRunner::Run()+120)(a4d21072c08fd3ac639d5cf5b8fb8b51) 1349# 06 pc 000000000007ea08 /system/lib64/platformsdk/libappkit_native.z.so(OHOS::AppExecFwk::MainThread::Start()+772)(183fe2babcfdd3e1ea4bca16a0e26a5d) 1350# 07 pc 0000000000011ac8 /system/bin/appspawn(RunChildProcessor+236)(7b715884c45cfe57b22df46fdaeeca88) 1351# 08 pc 0000000000034684 /system/bin/appspawn(AppSpawnChild+264)(7b715884c45cfe57b22df46fdaeeca88) 1352# 09 pc 00000000000344f4 /system/bin/appspawn(AppSpawnProcessMsg+380)(7b715884c45cfe57b22df46fdaeeca88) 1353# 10 pc 00000000000305a0 /system/bin/appspawn(OnReceiveRequest+1820)(7b715884c45cfe57b22df46fdaeeca88) 1354# 11 pc 0000000000017c58 /system/lib64/chipset-pub-sdk/libbegetutil.z.so(HandleRecvMsg_+260)(22f33d1b0218f31bad0dcc75cf348b90) 1355# 12 pc 00000000000178fc /system/lib64/chipset-pub-sdk/libbegetutil.z.so(HandleStreamEvent_+148)(22f33d1b0218f31bad0dcc75cf348b90) 1356# 13 pc 0000000000015478 /system/lib64/chipset-pub-sdk/libbegetutil.z.so(ProcessEvent+112)(22f33d1b0218f31bad0dcc75cf348b90) 1357# 14 pc 0000000000015090 /system/lib64/chipset-pub-sdk/libbegetutil.z.so(RunLoop_+308)(22f33d1b0218f31bad0dcc75cf348b90) 1358# 15 pc 000000000002eff4 /system/bin/appspawn(AppSpawnRun+116)(7b715884c45cfe57b22df46fdaeeca88) 1359# 16 pc 000000000001f438 /system/bin/appspawn(main+724)(7b715884c45cfe57b22df46fdaeeca88) 1360# 17 pc 00000000000a0974 /system/lib/ld-musl-aarch64.so.1(libc_start_main_stage2+64)(4ca73cff61bea7c4a687eb0f71c9df69) 1361# 18 pc 000000000001106c /system/bin/appspawn(_start_c+76)(7b715884c45cfe57b22df46fdaeeca88) 1362``` 1363 1364以上可以得到信息:应用通过文件系统 Open::Sync 同步通过 uri 加载文件,调用到 datashare 请求媒体库文件数据。 1365 1366查看对应时间点的流水信息:进程调用 datashare 加载云图后卡死,与堆栈信息吻合。 1367 1368 1369 1370查看具体代码: 1371 1372在循环中同步加载 fileUri ,这种明显是不合理的,当弱网或者同时加载大量数据的条件下,极易出现卡死情况,应用侧需要整改。 1373 1374#### 修改方法 1375 1376同步加载改为异步加载,并用标志位来标识是否加载完,用户界面展示加载中效果。 1377 1378```ts 1379public static xxxFunction(fileUris: string[]): void { 1380 ... 1381 for (const fileuril of fileUrils) { 1382 let file = fs.openSync(fileUri, fs.OpenMode.READ_ONLY); 1383 ... 1384 } 1385 ... 1386} 1387``` 1388 1389修改为: 1390 1391```ts 1392public static async xxxFunction(fileUris: string[]): void { 1393 ... 1394 AppStorage.setOrCreate<boolean>('isLoadingPic', true); --> 用于页面 load 效果展示 1395 for (const fileuril of fileUrils) { 1396 let file = await fs.open(fileUri, fs.OpenMode.READ_ONLY); 1397 ... 1398 } 1399 ... 1400} 1401``` 1402 1403#### 推荐建议(问题总结) 1404 14051、请求云侧数据需要验证充分,有网、弱网、无网场景下; 1406 14072、不要在应用生命周期函数中做耗时操作。 1408