1Name 2 3 AMD_performance_monitor 4 5Name Strings 6 7 GL_AMD_performance_monitor 8 9Contributors 10 11 Dan Ginsburg 12 Aaftab Munshi 13 Dave Oldcorn 14 Maurice Ribble 15 Jonathan Zarge 16 17Contact 18 19 Dan Ginsburg (dan.ginsburg 'at' amd.com) 20 21Status 22 23 ??? 24 25Version 26 27 Last Modified Date: 11/29/2007 28 29Number 30 31 OpenGL Extension #360 32 OpenGL ES Extension #50 33 34Dependencies 35 36 None 37 38Overview 39 40 This extension enables the capture and reporting of performance monitors. 41 Performance monitors contain groups of counters which hold arbitrary counted 42 data. Typically, the counters hold information on performance-related 43 counters in the underlying hardware. The extension is general enough to 44 allow the implementation to choose which counters to expose and pick the 45 data type and range of the counters. The extension also allows counting to 46 start and end on arbitrary boundaries during rendering. 47 48Issues 49 50 1. Should this be an EGL or OpenGL/OpenGL ES extension? 51 52 Decision - Make this an OpenGL/OpenGL ES extension 53 54 Reason - We would like to expose this extension in both OpenGL and 55 OpenGL ES which makes EGL an unsuitable choice. Further, support for 56 EGL is not a requirement and there are platforms that support OpenGL ES 57 but not EGL, making it difficult to make this an EGL extension. 58 59 2. Should the API support multipassing? 60 61 Decision - No. 62 63 Reason - Multipassing should really be left to the application to do. 64 This makes the API unnecessarily complicated. A major issue is that 65 depending on which counters are to be sampled, the # of passes and which 66 counters get selected in each pass can be difficult to determine. It is 67 much easier to give a list of counters categorized by groups with 68 specific information on the number of counters that can be selected from 69 each group. 70 71 3. Should we define a 64-bit data type for UNSIGNED_INT64_AMD? 72 73 Decision - No. 74 75 Reason - While counters can be returned as 64-bit unsigned integers, the 76 data is passed back to the application inside of a void*. Therefore, 77 there is no need in this extension to define a 64-bit data type (e.g., 78 GLuint64). It will be up the application to declare a native 64-bit 79 unsigned integer and cast the returned data to that type. 80 81 82New Procedures and Functions 83 84 void GetPerfMonitorGroupsAMD(int *numGroups, sizei groupsSize, 85 uint *groups) 86 87 void GetPerfMonitorCountersAMD(uint group, int *numCounters, 88 int *maxActiveCounters, sizei countersSize, 89 uint *counters) 90 91 void GetPerfMonitorGroupStringAMD(uint group, sizei bufSize, sizei *length, 92 char *groupString) 93 94 void GetPerfMonitorCounterStringAMD(uint group, uint counter, sizei bufSize, 95 sizei *length, char *counterString) 96 97 void GetPerfMonitorCounterInfoAMD(uint group, uint counter, 98 enum pname, void *data) 99 100 void GenPerfMonitorsAMD(sizei n, uint *monitors) 101 102 void DeletePerfMonitorsAMD(sizei n, uint *monitors) 103 104 void SelectPerfMonitorCountersAMD(uint monitor, boolean enable, 105 uint group, int numCounters, 106 uint *counterList) 107 108 void BeginPerfMonitorAMD(uint monitor) 109 110 void EndPerfMonitorAMD(uint monitor) 111 112 void GetPerfMonitorCounterDataAMD(uint monitor, enum pname, sizei dataSize, 113 uint *data, int *bytesWritten) 114 115 116New Tokens 117 118 Accepted by the <pame> parameter of GetPerfMonitorCounterInfoAMD 119 120 COUNTER_TYPE_AMD 0x8BC0 121 COUNTER_RANGE_AMD 0x8BC1 122 123 Returned as a valid value in <data> parameter of 124 GetPerfMonitorCounterInfoAMD if <pname> = COUNTER_TYPE_AMD 125 126 UNSIGNED_INT 0x1405 127 FLOAT 0x1406 128 UNSIGNED_INT64_AMD 0x8BC2 129 PERCENTAGE_AMD 0x8BC3 130 131 Accepted by the <pname> parameter of GetPerfMonitorCounterDataAMD 132 133 PERFMON_RESULT_AVAILABLE_AMD 0x8BC4 134 PERFMON_RESULT_SIZE_AMD 0x8BC5 135 PERFMON_RESULT_AMD 0x8BC6 136 137Addition to the GL specification 138 139 Add a new section called Performance Monitoring 140 141 A performance monitor consists of a number of hardware and software counters 142 that can be sampled by the GPU and reported back to the application. 143 Performance counters are organized as a single hierarchy where counters are 144 categorized into groups. Each group has a list of counters that belong to 145 the counter and can be sampled, and a maximum number of counters that can be 146 sampled. 147 148 The command 149 150 void GetPerfMonitorGroupsAMD(int *numGroups, sizei groupsSize, 151 uint *groups); 152 153 returns the number of available groups in <numGroups>, if <numGroups> is 154 not NULL. If <groupsSize> is not 0 and <groups> is not NULL, then the list 155 of available groups is returned. The number of entries that will be 156 returned in <groups> is determined by <groupsSize>. If <groupsSize> is 0, 157 no information is copied. Each group is identified by a unique unsigned int 158 identifier. 159 160 The command 161 162 void GetPerfMonitorCountersAMD(uint group, int *numCounters, 163 int *maxActiveCounters, 164 sizei countersSize, 165 uint *counters); 166 167 returns the following information. For each group, it returns the number of 168 available counters in <numCounters>, the max number of counters that can be 169 active at any time in <maxActiveCounters>, and the list of counters in 170 <counters>. The number of entries that can be returned in <counters> is 171 determined by <countersSize>. If <countersSize> is 0, no information is 172 copied. Each counter in a group is identified by a unique unsigned int 173 identifier. If <group> does not reference a valid group ID, an 174 INVALID_VALUE error is generated. 175 176 177 The command 178 179 void GetPerfMonitorGroupStringAMD(uint group, sizei bufSize, 180 sizei *length, char *groupString) 181 182 183 returns the string that describes the group name identified by <group> in 184 <groupString>. The actual number of characters written to <groupString>, 185 excluding the null terminator, is returned in <length>. If <length> is 186 NULL, then no length is returned. The maximum number of characters that 187 may be written into <groupString>, including the null terminator, is 188 specified by <bufSize>. If <bufSize> is 0 and <groupString> is NULL, the 189 number of characters that would be required to hold the group string, 190 excluding the null terminator, is returned in <length>. If <group> 191 does not reference a valid group ID, an INVALID_VALUE error is generated. 192 193 194 The command 195 196 void GetPerfMonitorCounterStringAMD(uint group, uint counter, 197 sizei bufSize, sizei *length, 198 char *counterString); 199 200 201 returns the string that describes the counter name identified by <group> 202 and <counter> in <counterString>. The actual number of characters written 203 to <counterString>, excluding the null terminator, is returned in <length>. 204 If <length> is NULL, then no length is returned. The maximum number of 205 characters that may be written into <counterString>, including the null 206 terminator, is specified by <bufSize>. If <bufSize> is 0 and 207 <counterString> is NULL, the number of characters that would be required to 208 hold the counter string, excluding the null terminator, is returned in 209 <length>. If <group> does not reference a valid group ID, or <counter> 210 does not reference a valid counter within the group ID, an INVALID_VALUE 211 error is generated. 212 213 The command 214 215 void GetPerfMonitorCounterInfoAMD(uint group, uint counter, 216 enum pname, void *data); 217 218 returns the following information about a counter. For a <counter> 219 belonging to <group>, we can query the counter type and counter range. If 220 <pname> is COUNTER_TYPE_AMD, then <data> returns the type. Valid type 221 values returned are UNSIGNED_INT, UNSIGNED_INT64_AMD, PERCENTAGE_AMD, FLOAT. 222 If type value returned is PERCENTAGE_AMD, then this describes a float 223 value that is in the range [0.0 .. 100.0]. If <pname> is COUNTER_RANGE_AMD, 224 <data> returns two values representing a minimum and a maximum. The 225 counter's type is used to determine the format in which the range values 226 are returned. If <group> does not reference a valid group ID, or <counter> 227 does not reference a valid counter within the group ID, an INVALID_VALUE 228 error is generated. 229 230 231 The command 232 233 void GenPerfMonitorsAMD(sizei n, uint *monitors) 234 235 returns a list of monitors. These monitors can then be used to select 236 groups/counters to be sampled, to start multiple monitoring sessions and to 237 return counter information sampled by the GPU. At creation time, the 238 performance monitor object has all counters disabled. The value of the 239 PERFMON_RESULT_AVAILABLE_AMD, PERFMON_RESULT_AMD, and 240 PERFMON_RESULT_SIZE_AMD queries will all initially be 0. 241 242 The command 243 244 void DeletePerfMonitorsAMD(sizei n, uint *monitors) 245 246 is used to delete the list of monitors created by a previous call to 247 GenPerfMonitors. If a monitor ID in the list <monitors> does not 248 reference a previously generated performance monitor, an INVALID_VALUE 249 error is generated. 250 251 The command 252 253 void SelectPerfMonitorCountersAMD(uint monitor, boolean enable, 254 uint group, int numCounters, 255 uint *counterList); 256 257 is used to enable or disable a list of counters from a group to be monitored 258 as identified by <monitor>. The <enable> argument determines whether the 259 counters should be enabled or disabled. <group> specifies the group 260 ID under which counters will be enabled or disabled. The <numCounters> 261 argument gives the number of counters to be selected from the list 262 <counterList>. If <monitor> is not a valid monitor created by 263 GenPerfMonitorsAMD, then INVALID_VALUE error will be generated. If <group> 264 is not a valid group, the INVALID_VALUE error will be generated. If 265 <numCounters> is less than 0, an INVALID_VALUE error will be generated. 266 267 When SelectPerfMonitorCountersAMD is called on a monitor, any outstanding 268 results for that monitor become invalidated and the result queries 269 PERFMON_RESULT_SIZE_AMD and PERFMON_RESULT_AVAILABLE_AMD are reset to 0. 270 271 The command 272 273 void BeginPerfMonitorAMD(uint monitor); 274 275 is used to start a monitor session. Note that BeginPerfMonitor calls cannot 276 be nested. In addition, it is quite possible that given the list of groups 277 and counters/group enabled for a monitor, it may not be able to sample the 278 necessary counters and so the monitor session will fail. In such a case, 279 an INVALID_OPERATION error will be generated. 280 281 While BeginPerfMonitorAMD does mark the beginning of performance counter 282 collection, the counters do not begin collecting immediately. Rather, the 283 counters begin collection when BeginPerfMonitorAMD is processed by 284 the hardware. That is, the API is asynchronous, and performance counter 285 collection does not begin until the graphics hardware processes the 286 BeginPerfMonitorAMD command. 287 288 The command 289 290 void EndPerfMonitorAMD(uint monitor); 291 292 ends a monitor session started by BeginPerfMonitorAMD. If a performance 293 monitor is not currently started, an INVALID_OPERATION error will be 294 generated. 295 296 Note that there is an implied overhead to collecting performance counters 297 that may or may not distort performance depending on the implementation. 298 For example, some counters may require a pipeline flush thereby causing a 299 change in the performance of the application. Further, the frequency at 300 which an application samples may distort the accuracy of counters which are 301 variant (e.g., non-deterministic based on the input). While the effects 302 of sampling frequency are implementation dependent, general guidance can 303 be given that sampling at a high frequency may distort both performance 304 of the application and the accuracy of variant counters. 305 306 The command 307 308 void GetPerfMonitorCounterDataAMD(uint monitor, enum pname, 309 sizei dataSize, 310 uint *data, sizei *bytesWritten); 311 312 is used to return counter values that have been sampled for a monitor 313 session. If <pname> is PERFMON_RESULT_AVAILABLE_AMD, then <data> will 314 indicate whether the result is available or not. If <pname> is 315 PERFMON_RESULT_SIZE_AMD, <data> will contain actual size of all counter 316 results being sampled. If <pname> is PERFMON_RESULT_AMD, <data> will 317 contain results. For each counter of a group that was selected to be 318 sampled, the information is returned as group ID, followed by counter ID, 319 followed by counter value. The size of counter value returned will depend 320 on the counter value type. The argument <dataSize> specifies the number of 321 bytes available in the <data> buffer for writing. If <bytesWritten> is not 322 NULL, it gives the number of bytes written into the <data> buffer. It is an 323 INVALID_OPERATION error for <data> to be NULL. If <pname> is 324 PERFMON_RESULT_AMD and <dataSize> is less than the number of bytes required 325 to store the results as reported by a PERFMON_RESULT_SIZE_AMD query, then 326 results will be written only up to the number of bytes specified by 327 <dataSize>. 328 329 If no BeginPerfMonitorAMD/EndPerfMonitorAMD has been issued for a monitor, 330 then the result of querying for PERFMON_RESULT_AVAILABLE and 331 PERFMON_RESULT_SIZE will be 0. When SelectPerfMonitorCountersAMD is called 332 on a monitor, the results stored for the monitor become invalidated and 333 the value of PERFMON_RESULT_AVAILABLE and PERFMON_RESULT_SIZE queries should 334 behave as if no BeginPerfMonitorAMD/EndPerfMonitorAMD has been issued for 335 the monitor. 336 337Errors 338 339 INVALID_OPERATION error will be generated if BeginPerfMonitorAMD is unable 340 to begin monitoring with the currently selected counters. 341 342 INVALID_OPERATION error will be generated if BeginPerfMonitorAMD is called 343 when a performance monitor is already active. 344 345 INVALID_OPERATION error will be generated if EndPerfMonitorAMD is called 346 when a performance monitor is not currently started. 347 348 INVALID_VALUE error will be generated if the <group> parameter to 349 GetPerfMonitorCountersAMD, GetPerfMonitorCounterStringAMD, 350 GetPerfMonitorCounterStringAMD, GetPerfMonitorCounterInfoAMD, or 351 SelectPerfMonitorCountersAMD does not reference a valid group ID. 352 353 INVALID_VALUE error will be generated if the <counter> parameter to 354 GetPerfMonitorCounterInfoAMD does not reference a valid counter ID 355 in the group specified by <group>. 356 357 INVALID_VALUE error will be generated if any of the monitor IDs 358 in the <monitors> parameter to DeletePerfMonitorsAMD do not reference 359 a valid generated monitor ID. 360 361 INVALID_VALUE error will be generated if the <monitor> parameter to 362 SelectPerfMonitorCountersAMD does not reference a monitor created by 363 GenPerfMonitorsAMD. 364 365 INVALID_VALUE error will be generated if the <numCounters> parameter to 366 SelectPerfMonitorCountersAMD is less than 0. 367 368 369 370New State 371 372Sample Usage 373 374 typedef struct 375 { 376 GLuint *counterList; 377 int numCounters; 378 int maxActiveCounters; 379 } CounterInfo; 380 381 void 382 getGroupAndCounterList(GLuint **groupsList, int *numGroups, 383 CounterInfo **counterInfo) 384 { 385 GLint n; 386 GLuint *groups; 387 CounterInfo *counters; 388 389 glGetPerfMonitorGroupsAMD(&n, 0, NULL); 390 groups = (GLuint*) malloc(n * sizeof(GLuint)); 391 glGetPerfMonitorGroupsAMD(NULL, n, groups); 392 *numGroups = n; 393 394 *groupsList = groups; 395 counters = (CounterInfo*) malloc(sizeof(CounterInfo) * n); 396 for (int i = 0 ; i < n; i++ ) 397 { 398 glGetPerfMonitorCountersAMD(groups[i], &counters[i].numCounters, 399 &counters[i].maxActiveCounters, 0, NULL); 400 401 counters[i].counterList = (GLuint*)malloc(counters[i].numCounters * 402 sizeof(int)); 403 404 glGetPerfMonitorCountersAMD(groups[i], NULL, NULL, 405 counters[i].numCounters, 406 counters[i].counterList); 407 } 408 409 *counterInfo = counters; 410 } 411 412 static int countersInitialized = 0; 413 414 int 415 getCounterByName(char *groupName, char *counterName, GLuint *groupID, 416 GLuint *counterID) 417 { 418 int numGroups; 419 GLuint *groups; 420 CounterInfo *counters; 421 int i = 0; 422 423 if (!countersInitialized) 424 { 425 getGroupAndCounterList(&groups, &numGroups, &counters); 426 countersInitialized = 1; 427 } 428 429 for ( i = 0; i < numGroups; i++ ) 430 { 431 char curGroupName[256]; 432 glGetPerfMonitorGroupStringAMD(groups[i], 256, NULL, curGroupName); 433 if (strcmp(groupName, curGroupName) == 0) 434 { 435 *groupID = groups[i]; 436 break; 437 } 438 } 439 440 if ( i == numGroups ) 441 return -1; // error - could not find the group name 442 443 for ( int j = 0; j < counters[i].numCounters; j++ ) 444 { 445 char curCounterName[256]; 446 447 glGetPerfMonitorCounterStringAMD(groups[i], 448 counters[i].counterList[j], 449 256, NULL, curCounterName); 450 if (strcmp(counterName, curCounterName) == 0) 451 { 452 *counterID = counters[i].counterList[j]; 453 return 0; 454 } 455 } 456 457 return -1; // error - could not find the counter name 458 } 459 460 void 461 drawFrameWithCounters(void) 462 { 463 GLuint group[2]; 464 GLuint counter[2]; 465 GLuint monitor; 466 GLuint *counterData; 467 468 // Get group/counter IDs by name. Note that normally the 469 // counter and group names need to be queried for because 470 // each implementation of this extension on different hardware 471 // could define different names and groups. This is just provided 472 // to demonstrate the API. 473 getCounterByName("HW", "Hardware Busy", &group[0], 474 &counter[0]); 475 getCounterByName("API", "Draw Calls", &group[1], 476 &counter[1]); 477 478 // create perf monitor ID 479 glGenPerfMonitorsAMD(1, &monitor); 480 481 // enable the counters 482 glSelectPerfMonitorCountersAMD(monitor, GL_TRUE, group[0], 1, 483 &counter[0]); 484 glSelectPerfMonitorCountersAMD(monitor, GL_TRUE, group[1], 1, 485 &counter[1]); 486 487 glBeginPerfMonitorAMD(monitor); 488 489 // RENDER FRAME HERE 490 // ... 491 492 glEndPerfMonitorAMD(monitor); 493 494 // read the counters 495 GLint resultSize; 496 glGetPerfMonitorCounterDataAMD(monitor, GL_PERFMON_RESULT_SIZE_AMD, 497 sizeof(GLint), &resultSize, NULL); 498 499 counterData = (GLuint*) malloc(resultSize); 500 501 GLsizei bytesWritten; 502 glGetPerfMonitorCounterDataAMD(monitor, GL_PERFMON_RESULT_AMD, 503 resultSize, counterData, &bytesWritten); 504 505 // display or log counter info 506 GLsizei wordCount = 0; 507 508 while ( (4 * wordCount) < bytesWritten ) 509 { 510 GLuint groupId = counterData[wordCount]; 511 GLuint counterId = counterData[wordCount + 1]; 512 513 // Determine the counter type 514 GLuint counterType; 515 glGetPerfMonitorCounterInfoAMD(groupId, counterId, 516 GL_COUNTER_TYPE_AMD, &counterType); 517 518 if ( counterType == GL_UNSIGNED_INT64_AMD ) 519 { 520 unsigned __int64 counterResult = 521 *(unsigned __int64*)(&counterData[wordCount + 2]); 522 523 // Print counter result 524 525 wordCount += 4; 526 } 527 else if ( counterType == GL_FLOAT ) 528 { 529 float counterResult = *(float*)(&counterData[wordCount + 2]); 530 531 // Print counter result 532 533 wordCount += 3; 534 } 535 // else if ( ... ) check for other counter types 536 // (GL_UNSIGNED_INT and GL_PERCENTAGE_AMD) 537 } 538 } 539 540Revision History 541 11/29/2007 - dginsburg 542 + Clarified the default state of a performance monitor object on creation 543 544 11/09/2007 - dginsbur 545 + Clarify what happens if SelectPerfMonitorCountersAMD is called on 546 a monitor with outstanding query results. 547 + Rename counterSize to countersSize 548 + Remove some ';' typos 549 550 06/13/2007 - dginsbur 551 + Add language on the asynchronous nature of the API and 552 counter accuracy/performance distortion. 553 + Add myself as the contact 554 + Remove INVALID_OPERATION error when countersList is NULL 555 + Clarify 64-bit issue 556 + Make PERCENTAGE_AMD counters float rather than uint 557 + Clarify accuracy distortion on variant counters only 558 + Tweak to overview language 559 560 06/09/2007 - dginsbur 561 + Fill in errors section and make many more errors explicit 562 + Fix the example code so it compiles 563 564 06/08/2007 - dginsbur 565 + Modified GetPerfMonitorGroupString and GetPerfMonitorCounterString to 566 be more client/server friendly. 567 + Modified example. 568 + Renamed parameters/variables to follow GL conventions. 569 + Modified several 'int' param types to 'sizei' 570 + Modifid counters type from 'int' to 'uint' 571 + Renamed argument 'cb' and 'cbret' 572 + Better documented GetPerfMonitorCounterData 573 + Add AMD adornment in many places that were missing it 574 575 06/07/2007 - dginsbur 576 + Cleanup formatting, remove tabs, make fit in proper page width 577 + Add FLOAT and UNSIGNED_INT to list of COUNTER_TYPEs 578 + Fix some bugs in the example code 579 + Rewrite introduction 580 + Clarified Issue 1 reasoning 581 + Added Issue 3 regarding use of 64-bit data types 582 + Added revision history 583 584 03/21/2007 - Initial version written. Written by amunshi. 585 586 587