1Demonstrations of runqlen, the Linux eBPF/bcc version. 2 3 4This program summarizes scheduler queue length as a histogram, and can also 5show run queue occupancy. It works by sampling the run queue length on all 6CPUs at 99 Hertz. 7 8As an example, here is an idle system: 9 10# ./runqlen.py 11Sampling run queue length... Hit Ctrl-C to end. 12^C 13 runqlen : count distribution 14 0 : 1776 |****************************************| 15 16This shows a zero run queue length each time it was sampled. 17 18And now a heavily loaded system: 19 20# ./runqlen.py 21Sampling run queue length... Hit Ctrl-C to end. 22^C 23 runqlen : count distribution 24 0 : 1068 |****************************************| 25 1 : 642 |************************ | 26 2 : 369 |************* | 27 3 : 183 |****** | 28 4 : 104 |*** | 29 5 : 42 |* | 30 6 : 13 | | 31 7 : 2 | | 32 8 : 1 | | 33 34Now there is often threads queued, with one sample reaching a queue length 35of 8. This will cause run queue latency, which can be measured by the bcc 36runqlat tool. 37 38 39Here's an example of an issue that runqlen can indentify. Starting with the 40system-wide summary: 41 42# ./runqlen.py 43Sampling run queue length... Hit Ctrl-C to end. 44^C 45 runqlen : count distribution 46 0 : 1209 |****************************************| 47 1 : 372 |************ | 48 2 : 73 |** | 49 3 : 3 | | 50 4 : 1 | | 51 5 : 0 | | 52 6 : 0 | | 53 7 : 237 |******* | 54 55This shows there is often a run queue length of 7. Now using the -C option to 56see per-CPU histograms: 57 58# ./runqlen.py -C 59Sampling run queue length... Hit Ctrl-C to end. 60^C 61 62cpu = 0 63 runqlen : count distribution 64 0 : 257 |****************************************| 65 1 : 64 |********* | 66 2 : 5 | | 67 3 : 0 | | 68 4 : 0 | | 69 5 : 0 | | 70 6 : 1 | | 71 72cpu = 1 73 runqlen : count distribution 74 0 : 226 |****************************************| 75 1 : 90 |*************** | 76 2 : 11 |* | 77 78cpu = 2 79 runqlen : count distribution 80 0 : 264 |****************************************| 81 1 : 52 |******* | 82 2 : 8 |* | 83 3 : 1 | | 84 4 : 0 | | 85 5 : 0 | | 86 6 : 1 | | 87 7 : 0 | | 88 8 : 1 | | 89 90cpu = 3 91 runqlen : count distribution 92 0 : 0 | | 93 1 : 0 | | 94 2 : 0 | | 95 3 : 0 | | 96 4 : 0 | | 97 5 : 0 | | 98 6 : 0 | | 99 7 : 327 |****************************************| 100 101cpu = 4 102 runqlen : count distribution 103 0 : 255 |****************************************| 104 1 : 63 |********* | 105 2 : 9 |* | 106 107cpu = 5 108 runqlen : count distribution 109 0 : 244 |****************************************| 110 1 : 78 |************ | 111 2 : 3 | | 112 3 : 2 | | 113 114cpu = 6 115 runqlen : count distribution 116 0 : 253 |****************************************| 117 1 : 66 |********** | 118 2 : 6 | | 119 3 : 1 | | 120 4 : 1 | | 121 122cpu = 7 123 runqlen : count distribution 124 0 : 243 |****************************************| 125 1 : 74 |************ | 126 2 : 6 | | 127 3 : 1 | | 128 4 : 0 | | 129 5 : 1 | | 130 6 : 2 | | 131 132The run queue length of 7 is isolated to CPU 3. It was caused by CPU binding 133(taskset). This can sometimes happen by applications that try to auto-bind 134to CPUs, leaving other CPUs idle while work is queued. 135 136 137runqlat accepts an interval and a count. For example, with -T for timestamps: 138 139# ./runqlen.py -T 1 5 140Sampling run queue length... Hit Ctrl-C to end. 141 14219:51:34 143 runqlen : count distribution 144 0 : 635 |****************************************| 145 1 : 142 |******** | 146 2 : 13 | | 147 3 : 0 | | 148 4 : 1 | | 149 15019:51:35 151 runqlen : count distribution 152 0 : 640 |****************************************| 153 1 : 136 |******** | 154 2 : 13 | | 155 3 : 1 | | 156 4 : 0 | | 157 5 : 0 | | 158 6 : 0 | | 159 7 : 0 | | 160 8 : 0 | | 161 9 : 0 | | 162 10 : 1 | | 163 16419:51:36 165 runqlen : count distribution 166 0 : 603 |****************************************| 167 1 : 170 |*********** | 168 2 : 16 |* | 169 3 : 1 | | 170 4 : 0 | | 171 5 : 0 | | 172 6 : 0 | | 173 7 : 0 | | 174 8 : 0 | | 175 9 : 1 | | 176 17719:51:37 178 runqlen : count distribution 179 0 : 617 |****************************************| 180 1 : 154 |********* | 181 2 : 20 |* | 182 3 : 0 | | 183 4 : 0 | | 184 5 : 0 | | 185 6 : 0 | | 186 7 : 0 | | 187 8 : 0 | | 188 9 : 0 | | 189 10 : 0 | | 190 11 : 1 | | 191 19219:51:38 193 runqlen : count distribution 194 0 : 603 |****************************************| 195 1 : 161 |********** | 196 2 : 24 |* | 197 3 : 4 | | 198 199The spikes in run queue length of 11 are likely threads waking up at the same 200time (a thundering herd), and then are scheduled and complete their execution 201quickly. 202 203 204The -O option prints run queue occupancy: the percentage of time that there 205was work queued waiting its turn. Eg: 206 207# ./runqlen.py -OT 1 208Sampling run queue length... Hit Ctrl-C to end. 209 21019:54:53 211runqocc: 41.09% 212 21319:54:54 214runqocc: 41.85% 215 21619:54:55 217runqocc: 41.47% 218 21919:54:56 220runqocc: 42.35% 221 22219:54:57 223runqocc: 40.83% 224[...] 225 226This can also be examined per-CPU: 227 228# ./runqlen.py -COT 1 229Sampling run queue length... Hit Ctrl-C to end. 230 23119:55:03 232runqocc, CPU 0 32.32% 233runqocc, CPU 1 26.26% 234runqocc, CPU 2 38.38% 235runqocc, CPU 3 100.00% 236runqocc, CPU 4 26.26% 237runqocc, CPU 5 32.32% 238runqocc, CPU 6 39.39% 239runqocc, CPU 7 46.46% 240 24119:55:04 242runqocc, CPU 0 35.00% 243runqocc, CPU 1 32.32% 244runqocc, CPU 2 37.00% 245runqocc, CPU 3 100.00% 246runqocc, CPU 4 43.43% 247runqocc, CPU 5 31.31% 248runqocc, CPU 6 28.00% 249runqocc, CPU 7 31.31% 250 25119:55:05 252runqocc, CPU 0 43.43% 253runqocc, CPU 1 32.32% 254runqocc, CPU 2 45.45% 255runqocc, CPU 3 100.00% 256runqocc, CPU 4 29.29% 257runqocc, CPU 5 36.36% 258runqocc, CPU 6 36.36% 259runqocc, CPU 7 30.30% 260 26119:55:06 262runqocc, CPU 0 40.00% 263runqocc, CPU 1 38.00% 264runqocc, CPU 2 31.31% 265runqocc, CPU 3 100.00% 266runqocc, CPU 4 31.31% 267runqocc, CPU 5 28.28% 268runqocc, CPU 6 31.00% 269runqocc, CPU 7 29.29% 270[...] 271 272 273USAGE message: 274 275# ./runqlen -h 276usage: runqlen [-h] [-T] [-O] [-C] [interval] [count] 277 278Summarize scheduler run queue length as a histogram 279 280positional arguments: 281 interval output interval, in seconds 282 count number of outputs 283 284optional arguments: 285 -h, --help show this help message and exit 286 -T, --timestamp include timestamp on output 287 -O, --runqocc report run queue occupancy 288 -C, --cpus print output for each CPU separately 289 290examples: 291 ./runqlen # summarize run queue length as a histogram 292 ./runqlen 1 10 # print 1 second summaries, 10 times 293 ./runqlen -T 1 # 1s summaries and timestamps 294 ./runqlen -O # report run queue occupancy 295 ./runqlen -C # show each CPU separately 296