1Name 2 3 INTEL_performance_query 4 5Name Strings 6 7 GL_INTEL_performance_query 8 9Contact 10 11 Tomasz Madajczak, Intel (tomasz.madajczak 'at' intel.com) 12 13Contributors 14 15 Piotr Uminski, Intel 16 Slawomir Grajewski, Intel 17 18Status 19 20 Complete, shipping on selected Intel graphics. 21 22Version 23 24 Last Modified Date: December 20, 2013 25 Revision: 3 26 27Number 28 29 OpenGL Extension #443 30 OpenGL ES Extension #164 31 32Dependencies 33 34 OpenGL dependencies: 35 36 OpenGL 3.0 is required. 37 38 The extension is written against the OpenGL 4.4 Specification, Core 39 Profile, October 18, 2013. 40 41 OpenGL ES dependencies: 42 43 This extension is written against the OpenGL ES 2.0.25 Specification 44 and OpenGL ES 3.0.2 Specification. 45 46Overview 47 48 The purpose of this extension is to expose Intel proprietary hardware 49 performance counters to the OpenGL applications. Performance counters may 50 count: 51 52 - number of hardware events such as number of spawned vertex shaders. In 53 this case the results represent the number of events. 54 55 - duration of certain activity, like time took by all fragment shader 56 invocations. In that case the result usually represents the number of 57 clocks in which the particular HW unit was busy. In order to use such 58 counter efficiently, it should be normalized to the range of <0,1> by 59 dividing its value by the number of render clocks. 60 61 - used throughput of certain memory types such as texture memory. In that 62 case the result of performance counter usually represents the number of 63 bytes transferred between GPU and memory. 64 65 This extension specifies universal API to manage performance counters on 66 different Intel hardware platforms. Performance counters are grouped 67 together into proprietary, hardware-specific, fixed sets of counters that 68 are measured together by the GPU. 69 70 It is assumed that performance counters are started and ended on any 71 arbitrary boundaries during rendering. 72 73 A set of performance counters is represented by a unique query type. Each 74 query type is identified by assigned name and ID. Multiple query types 75 (sets of performance counters) are supported by the Intel hardware. However 76 each Intel hardware generation supports different sets of performance 77 counters. Therefore the query types between hardware generations can be 78 different. The definition of query types and their results structures can 79 be learned through the API. It is also documented in a separate document of 80 Intel OGL Performance Counters Specification issued per each new hardware 81 generation. 82 83 The API allows to create multiple instances of any query type and to sample 84 different fragments of 3D rendering with such instances. Query instances 85 are identified with handles. 86 87New Procedures and Functions 88 89 void GetFirstPerfQueryIdINTEL(uint *queryId); 90 91 void GetNextPerfQueryIdINTEL(uint queryId, uint *nextQueryId); 92 93 void GetPerfQueryIdByNameINTEL(char *queryName, uint *queryId); 94 95 void GetPerfQueryInfoINTEL(uint queryId, 96 uint queryNameLength, char *queryName, 97 uint *dataSize, uint *noCounters, 98 uint *noInstances, uint *capsMask); 99 100 void GetPerfCounterInfoINTEL(uint queryId, uint counterId, 101 uint counterNameLength, char *counterName, 102 uint counterDescLength, char *counterDesc, 103 uint *counterOffset, uint *counterDataSize, uint *counterTypeEnum, 104 uint *counterDataTypeEnum, uint64 *rawCounterMaxValue); 105 106 void CreatePerfQueryINTEL(uint queryId, uint *queryHandle); 107 108 void DeletePerfQueryINTEL(uint queryHandle); 109 110 void BeginPerfQueryINTEL(uint queryHandle); 111 112 void EndPerfQueryINTEL(uint queryHandle); 113 114 void GetPerfQueryDataINTEL(uint queryHandle, uint flags, 115 sizei dataSize, void *data, uint *bytesWritten); 116 117New Tokens 118 119 Returned by the capsMask parameter of GetPerfQueryInfoINTEL 120 121 PERFQUERY_SINGLE_CONTEXT_INTEL 0x0000 122 PERFQUERY_GLOBAL_CONTEXT_INTEL 0x0001 123 124 Accepted by the flags parameter of GetPerfQueryDataINTEL 125 126 PERFQUERY_WAIT_INTEL 0x83FB 127 PERFQUERY_FLUSH_INTEL 0x83FA 128 PERFQUERY_DONOT_FLUSH_INTEL 0x83F9 129 130 Returned by GetPerfCounterInfoINTEL function as counter type enumeration in 131 location pointed by counterTypeEnum 132 133 PERFQUERY_COUNTER_EVENT_INTEL 0x94F0 134 PERFQUERY_COUNTER_DURATION_NORM_INTEL 0x94F1 135 PERFQUERY_COUNTER_DURATION_RAW_INTEL 0x94F2 136 PERFQUERY_COUNTER_THROUGHPUT_INTEL 0x94F3 137 PERFQUERY_COUNTER_RAW_INTEL 0x94F4 138 PERFQUERY_COUNTER_TIMESTAMP_INTEL 0x94F5 139 140 Returned by glGetPerfCounterInfoINTEL function as counter data type 141 enumeration in location pointed by counterDataTypeEnum 142 143 PERFQUERY_COUNTER_DATA_UINT32_INTEL 0x94F8 144 PERFQUERY_COUNTER_DATA_UINT64_INTEL 0x94F9 145 PERFQUERY_COUNTER_DATA_FLOAT_INTEL 0x94FA 146 PERFQUERY_COUNTER_DATA_DOUBLE_INTEL 0x94FB 147 PERFQUERY_COUNTER_DATA_BOOL32_INTEL 0x94FC 148 149 Accepted by the <pname> parameter of GetIntegerv: 150 151 PERFQUERY_QUERY_NAME_LENGTH_MAX_INTEL 0x94FD 152 PERFQUERY_COUNTER_NAME_LENGTH_MAX_INTEL 0x94FE 153 PERFQUERY_COUNTER_DESC_LENGTH_MAX_INTEL 0x94FF 154 155 Accepted by the <pname> parameter of GetBooleanv: 156 157 PERFQUERY_GPA_EXTENDED_COUNTERS_INTEL 0x9500 158 159Add new Section 4.4 to Chapter 4, Event Model for OpenGL 4.4 160Add new Section 2.18 to Chapter 2, OpenGL ES Operation for OpenGL ES 3.0.2 161 162 4.4 Performance Queries (for OpenGL 4.4) 163 2.18 Performance Queries (for OpenGL ES 3.0.2) 164 165 Hardware and software performance counters can be used to obtain 166 information about GPU activity. Performance counters are grouped into query 167 types. Different query types can be supported on different hardware 168 platforms and/or driver versions. One or more instances of the query types 169 can be created. 170 171 Each query type has unique query ID. Query ids supported on given platform 172 can be queried in the run-time. Function: 173 174 void GetFirstPerfQueryIdINTEL(uint *queryId); 175 176 returns the identifier of the first performance query type that is 177 supported on a given platform. The result is passed in location pointed by 178 queryId parameter. If the given hardware platform doesn't support any 179 performance queries, then the value of 0 is returned and INVALID_OPERATION 180 error is raised. If queryId pointer is equal to 0, INVALID_VALUE error is 181 generated. 182 183 Next query ids can be queried by multiply call to the function: 184 185 void GetNextPerfQueryIdINTEL(uint queryId, uint *nextQueryId); 186 187 This function returns the integer identifier of the next performance query 188 on a given platform to the specified with queryId. The result is passed in 189 location pointed by nextQueryId. If query identified by queryId is the last 190 query available the value of 0 is returned. If the specified performance 191 query identifier is invalid then INVALID_VALUE error is generated. If 192 nextQueryId pointer is equal to 0, an INVALID_VALUE error is 193 generated. Whenever error is generated, the value of 0 is returned. 194 195 Each performance query type has a name and a unique identifier. The query 196 identifier for a given query name be read using function: 197 198 void GetPerfQueryIdByNameINTEL(char *queryName, uint *queryId); 199 200 This function returns the identified of the query type specified by the 201 string provided as queryName parameter. If queryName does not reference a 202 valid query name, an INVALID_VALUE error is generated. 203 204 General description of a query type can be read using the function: 205 206 void GetPerfQueryInfoINTEL(uint queryId, uint queryNameLength, 207 char *queryName, uint *dataSize, 208 uint *noCounters, uint *maxInstances, 209 uint *noActiveInstances, uint *capsMask); 210 211 The function returns information about the performance query specified with 212 queryId parameter, particularly: 213 214 - query name in queryName location. The maximal name is specified by 215 queryNameLength 216 217 - size of query output structure in bytes in dataSize location 218 219 - number of performance counters in the query output structure in 220 noCounters location 221 222 - the maximal allowed number of query instances that can be created on a 223 given architecture in maxInstances location. Because the other type queries 224 are created using the same resources, it may happen that the actual amount 225 of created instances is smaller than the returned number 226 227 - the actual number of already created query instances in maxInstances 228 location 229 230 - mask of query capabilities in capsMask location. 231 232 If the mask returned in capsMask contains PERFQUERY_SINGLE_CONTEXT_INTEL 233 token this means the query supports context sensitive measurements, 234 otherwise, if the mask contains token of GL_PERFQUERY_GLOBAL_CONTEXT_INTEL 235 this means the query doesn't support that feature and the counters will be 236 updated for all render contexts as they are global for hardware. 237 238 If queryId does not reference a valid query type, an INVALID_VALUE error is 239 generated. 240 241 Performance counters that belong to the same query type have unique 242 ids. Performance counter ids values start with 1. Performance counter id 0 243 is reserved as an invalid counter. Information about performance counters 244 that belongs to a given query type can be read using the function: 245 246 void GetPerfCounterInfoINTEL(uint queryId, uint counterId, 247 uint counterNameLength, char *counterName, 248 uint counterDescLength, char *counterDesc, 249 uint *counterOffset, uint *counterDataSize, uint *counterTypeEnum, 250 uint *counterDataTypeEnum, uint64 *rawCounterMaxValue); 251 252 The function returns descriptive information about each particular 253 performance counter that is an element of the performance query. The 254 counter is identified with a pair of queryId and counterId parameters. The 255 following parameters are returned: 256 257 - counter name in counterName location. The maximal length of copied name 258 is specified with counterNameLength. 259 260 - counter description text in counterDesc location. The maximal length of 261 copied text is specified with counterDescLength. 262 263 - byte offset of the counter from the start of the query structure in 264 counterOffset location. 265 266 - counter size in bytes in counterDataSize location. 267 268 - counter type enumeration in counterTypeEnum location. It can be one o 269 the following tokens: 270 PERFQUERY_COUNTER_EVENT_INTEL 271 PERFQUERY_COUNTER_DURATION_NORM_INTEL 272 PERFQUERY_COUNTER_DURATION_RAW_INTEL 273 PERFQUERY_COUNTER_THROUGHPUT_INTEL 274 PERFQUERY_COUNTER_RAW_INTEL 275 PERFQUERY_COUNTER_TIMESTAMP_INTEL 276 277 - counter data type enumeration, in counterDataTypeEnum location. It can 278 be one o the following tokens: 279 PERFQUERY_COUNTER_DATA_UINT32_INTEL 280 PERFQUERY_COUNTER_DATA_UINT64_INTEL 281 PERFQUERY_COUNTER_DATA_FLOAT_INTEL 282 PERFQUERY_COUNTER_DATA_DOUBLE_INTEL 283 PERFQUERY_COUNTER_DATA_BOOL32_INTEL 284 285 - for some raw counters for which the maximal value is deterministic, the 286 maximal value of the counter in 1 second is returned in the location 287 pointed by rawCounterMaxValue, otherwise, the location is written with 288 the value of 0. 289 290 If the pair of queryId and counterId does not reference a valid counter, 291 an INVALID_VALUE error is generated. 292 293 A single instance of the performance query of a given type can be created 294 using function: 295 296 void CreatePerfQueryINTEL(uint queryId, uint *queryHandle); 297 298 The handle to newly created query instance is returned in queryHandle 299 location. If queryId does not reference a valid query type, 300 an INVALID_VALUE error is generated. If the query instance cannot be 301 created due to exceeding the number of allowed instances or driver fails 302 query creation due to an insufficient memory reason, an OUT_OF_MEMORY error 303 is generated, and the location pointed by queryHandle returns NULL. 304 Existing query instance can be deleted using function 305 306 void DeletePerfQueryINTEL(uint queryHandle); 307 308 queryHandle must be a query instance handle returned by 309 CreatePerfQueryINTEL(). If a query handle doesn't reference a previously 310 created performance query instance, an INVALID_VALUE error is generated. 311 312 A new measurement session for a given query instance can be started using 313 function: 314 315 void BeginPerfQueryINTEL(uint queryHandle); 316 317 where queryHandle must be a query instance handle returned by 318 CreatePerfQueryINTEL(). If a query handle doesn't reference a previously 319 created performance query instance, an INVALID_VALUE error is 320 generated. Note that some query types, they cannot be collected in the same 321 time. Therefore calls of BeginPerfQueryINTEL() cannot be nested if they 322 refer to queries of such different types. In such case INVALID_OPERATION 323 error is generated. 324 325 The counters may not start immediately after BeginPerfQueryINTEL(). 326 Because the API and GPU are asynchronous, the start of performance counters 327 is delayed until the graphics hardware actually executes the hardware 328 commands issued by this function. However, it is guaranteed that collecting 329 of performance counters will start before any draw calls specified in the 330 same context after call to BeginPerfQueryINTEL(). 331 332 Collecting performance counters may be stopped by a function: 333 334 void EndPerfQueryINTEL(uint queryHandle); 335 336 where queryHandle must be a query instance handle returned by 337 CreatePerfQueryINTEL(). The function ends the measurement session started 338 by BeginPerfQueryINTEL(). If a performance query is not currently started, 339 an INVALID_OPERATION error will be generated. Similarly as in 340 glBeginPerfQueryINTEL() case, the execution of glEndPerfQueryINTEL() is not 341 immediate. The end of measurement is delayed until graphics hardware 342 completes processing of the hardware commands issued by this 343 function. However, it is guaranteed that results any draw calls specified in 344 the same context after call to EndPerfQueryINTEL() will be not measured by 345 this query. 346 347 The query result can be read using function: 348 349 void GetPerfQueryDataINTEL(uint queryHandle, uint flags, sizei 350 dataSize, void *data, uint *bytesWritten); 351 352 The function returns the values of counters which have been measured within 353 the query session identified by queryHandle. The call may end without 354 returning any data if they are not ready for reading as the measurement 355 session is still pending (the EndPerfQueryINTEL() command processing is not 356 finished by hardware). In this case location pointed by the bytesWritten 357 parameter will be set to 0. The meaning of the flags parameter is the 358 following: 359 360 - PERFQUERY_DONOT_FLUSH_INTEL means that the call of 361 GetPerfQueryDataINTEL() is non-blocking, which checks for results and 362 returns them if they are available. Otherwise, (if the results of the 363 query are not ready) it returns without flushing any outstanding 3D 364 commands to the GPU. The use case for this is when a flush of 365 outstanding 3D commands to GPU has already been ensured with other 366 OpenGL API calls. 367 368 - PERFQUERY_FLUSH_INTEL means that the call of GetPerfQueryDataINTEL() is 369 non-blocking, which checks for results and returns them if they are 370 available. Otherwise, it implicitly submits any outstanding 3D commands 371 to the GPU for execution. In that case the subsequent call of 372 glGetPerfQueryDataINTEL() may return data once the query completes. 373 374 - PERFQUERY_WAIT_INTEL means that the call of GetPerfQueryDataINTEL() is 375 blocking and waits till the query results are available and returns 376 them. It means that if the query results are not yet available then it 377 implicitly submits any outstanding 3D commands to GPU and waits for the 378 query completion. 379 380 If the measurement session indentified by queryHandle is completed then the 381 call of GetPerfQueryDataINTEL() always writes query result to the location 382 pointed by the data parameter and the amount of bytes written is stored in 383 the location pointed by the bytesWritten parameter. 384 385 If bytesWritten or data pointers are NULL then an INVALID_VALUE error is 386 generated. 387 388 389New Implementation Dependent State 390 391Add new Table 23.75 to Chapter 23, State Tables (OpenGL 4.4) 392Add new Table 6.37 to Chapter 6.2, State Tables (OpenGL ES 3.0.2) 393 394 395 Get Value Type Get Command Value Description 396 ------------------------------ ---- ----------- ----- ------------- 397 PERFQUERY_QUERY_NAME_LENGTH_MAX_INTEL Z+ GetIntegerv 256 max query name length 398 PERFQUERY_COUNTER_NAME_LENGTH_MAX_INTEL Z+ GetIntegerv 256 max counter name length 399 PERFQUERY_COUNTER_DESC_LENGTH_MAX_INTEL Z+ GetIntegerv 1024 max description length 400 PERFQUERY_GPA_EXTENDED_COUNTERS_INTEL B GetBooleanv - extended counters available 401 402 403Issues 404 405 1. What is the usage model of this extension? 406 407 Generally there are two approaches of measuring performance with Intel OGL 408 Performance Queries, such as: 409 410 - Per draw call measurements - performance counters can be used to assess 411 the business of particular 3D hardware units under assumption that 3D 412 hardware is almost 100% time busy from the CPU point of view. 413 414 - Per 3D scene measurements - performance counters can be used to assess 415 the balance of CPU and GPU processing times. Such assessment shows whether 416 the workload is CPU whether GPU bound. 417 418 2. How per draw call measurements are performed? 419 420 In the per-draw call usage model each call to the draw routine 421 (e.g. glDrawArrays, glDrawElements) should be surrounded by a dedicated 422 query instance. That means that each draw operation should be measured 423 independently. It is recommended to measure the GPU performance 424 characteristics for a single draw call to find possible bottlenecks 425 for the application executed on a given hardware. 426 427 3. How per scene measurements are performed? 428 429 The usage model assumes that one performance query instance measures a 430 complete scene. It is recommended to figure out if the workload is CPU 431 or GPU bound. It should be noted that: 432 433 - For a longer scope of performance query the probability of 3D hardware 434 frequency change is higher. The higher probability of frequency change 435 causes that the larger percentage of results may be biased with gross 436 errors. 437 438 - For complicated 3D scenes the condition of render commands split is 439 always met. 440 441 Thus, to calculate an average 3D hardware unit utilization for a longer 442 period of time it is recommended to use a larger number of per draw call 443 queries rather than a lower number of per 3D scene queries. It is 444 recommended to use this method when application uses full screen mode as 445 current implementation of queries supports only global context. 446 447 4. How results of the query can be read? 448 449 Results of the queries cannot be read before the entire drawing is done 450 by the GPU. This means that the application programmer has to decide 451 about the synchronization method it uses to read the query 452 results. There are the following options: 453 454 - Use glFlush to trigger submission of any pending commands to the 455 GPU. Later check results availability with repetitive non-blocking 456 calls to GetPerfQueryDataINTEL function using the synchronization flag 457 of GL_PERFQUERY_DONOT_FLUSH_INTEL. 458 459 - Use flag GL_PERFQUERY_FLUSH_INTEL in glGetPerfQueryDataINTEL to 460 trigger submission of any pending commands to the GPU. If results are 461 not immediately available, check their availability with repetitive 462 non-blocking calls to GetPerfQueryDataINTEL function using the 463 synchronization flag of GL_PERFQUERY_DONOT_FLUSH_INTEL. 464 465 - Do a blocking call to glGetPerfQueryDataINTEL() with 466 GL_PERFQUERY_WAIT_INTEL flag set. The flag ensures that any pending GPU 467 commands are submitted and function blocks till GPU results are 468 available. 469 470 It is allowed to perform simultaneous measurements with multiple active 471 queries of the same type. However it may be not allowed to perform 472 simultaneous measurements of queries with different types, as it may 473 require reprogramming of the same hardware part and could destroy the 474 hardware settings of the previous query. 475 476 5. Are query results always accurate? 477 478 There are certain hardware conditions which may cause the results 479 of performance counters expressed in hardware clocks to be inaccurate. 480 The conditions may include: 481 482 - Render clock change - the condition usually causes that all counter 483 values expressed in hardware clocks are incorrect. It is indicated by 484 FrequencyChanged flag. 485 486 - Render commands split - in some cases GPU has to split execution of 487 drawing operations surrounded by the query into at least two 488 parts. The condition usually causes that counter values expressed in 489 time domains (in microseconds) may be substantially larger than the 490 average values of that counter. It is indicated by SplitOccured flag. 491 492 - Rendering preemption - if GPU is shared among two or more 3D 493 applications, the hardware counters gathered in a global mode contain 494 additive results for these applications. The condition is also 495 indicated with SplitOccured flag. 496 497 The above conditions are indicated in special fields in the query 498 results structures. It is up to the user to decide if the results are to 499 be processed further or dropped. In certain cases it can be determined 500 that the render commands split condition always occurs and has to be 501 accepted. 502 503 6. Are query results per-context or global? 504 505 Some GPU platforms and/or driver versions support only global GPU 506 counters. In such cases, the query instance has to have 507 GL_PERFQUERY_GLOBAL_CONTEXT_INTEL flag set when creating query 508 instance. Otherwise, creation will fail and an INVALID_OPERATION error 509 will be generated. 510 511 Support for a global context means that a single query instance measures 512 all GPU activities performed between query start and query end. Query 513 measures not only current OpenGL context but also activities of other 514 OpenGL contexts, other 3D API like DX and operating system windows draw 515 calls. 516 517Program examples 518 519 1. Reading counter meta data example 520 521 // query data has proprietary predefined structure layout 522 // associated with the vendor query ID 523 GL_QUERY_PIPELINE_METRICS * pQueryData; 524 525 uint queryId; 526 uint nextQueryId; 527 uint queryHandle; 528 uint dataSize; 529 uint noCounters; 530 uint noInstances; 531 uint capsMask; 532 533 const uint queryNameLen = 32; 534 char queryName[queryNameLen]; 535 536 const uint counterNameLen = 32; 537 char counterName[counterNameLen]; 538 539 const uint counterDescLen = 256; 540 char counterDesc[counterDescLen]; 541 542 //get first vendor queryID 543 glGetFirstPerfQueryIdINTEL(&queryId); 544 545 nextQueryId = queryId; 546 while(nextQueryId) 547 { 548 glGetPerfQueryInfoINTEL( 549 nextQueryId, 550 queryNameLen, 551 &queryName, 552 &dataSize, 553 &noCounters, 554 &noInstances, 555 &capsMask); 556 557 for(int counterId = 1; counterId <= noCounters; counterId++) 558 { 559 uint counterOffset; 560 uint counterDataSize; 561 uint counterTypeEnum; 562 uint counterDataTypeEnum; 563 UINT64 rawCounterMaxValue; 564 565 glGetPerfCounterInfoINTEL( 566 nextQueryId, 567 counterId, 568 counterNameLen, 569 counterName, 570 counterDescLen, 571 counterDesc, 572 &counterOffset, 573 &counterDataSize, 574 &counterTypeEnum, 575 &counterDataTypeEnum, 576 &rawCounterMaxValue); 577 578 // use returned values here 579 ... 580 } 581 } 582 583 2. Measuring a single draw call example 584 585 Note that GL_QUERY_PIPELINE_METRICS is a proprietary structure defined 586 by vendor and is used as example and function named according to the 587 convention of glFuntionINTEL are wrappers to dynamically linked-by-name 588 procedures. 589 590 // query data has proprietary predefined structure layout 591 // associated with the vendor query ID 592 GL_QUERY_PIPELINE_METRICS * pQueryData; 593 594 uint queryId; 595 uint queryHandle; 596 char queryName[] = "Intel_Pipeline_Query"; 597 598 // get vendor queryID by name 599 glGetPerfQueryIdByNameINTEL(queryName, &queryId); 600 601 // create query instance of queryId type 602 glCreatePerfQueryINTEL(queryId, &queryHandle); 603 604 glBeginPerfQueryINTEL(queryHandle); // Start query 605 606 glDrawElements(...); // Issue graphics commands, do whatever 607 608 glEndPerfQueryINTEL(queryHandle); // End query 609 610 // perform other application activities 611 612 uint bytesWritten = 0; 613 uint dataSize = sizeof(GL_QUERY_PIPELINE_METRICS); 614 615 pQueryData = (GL_QUERY_PIPELINE_METRICS *) malloc(dataSize); 616 617 // for the first time use GL_PERFQUERY_FLUSH_INTEL flag to ensure graphics 618 // commands were submitted to hardware 619 620 glGetPerfQueryDataINTEL( 621 queryHandle, 622 GL_PERFQUERY_FLUSH_INTEL, 623 dataSize, 624 pQueryData, 625 &bytesWritten); 626 627 while(bytesWritten == 0) 628 { 629 // Now enough to use GL_PERFQUERY_DONOT_FLUSH_INTEL flag 630 glGetPerfQueryDataINTEL( 631 queryHandle, 632 GL__PERFQUERY_DONOT_FLUSH_INTEL, 633 dataSize, 634 pQueryData, 635 &bytesWritten); 636 } 637 638 if(bytesWritten == dataSize) 639 { 640 // Use counters' data here 641 uint64 vertexShaderKernelsRunCount = 642 pQueryData->VertexShaderInvocations; 643 uint64 fragmentShaderKernelsRunCount = 644 pQueryData->FragmentShaderInvocations; 645 ... 646 } 647 else 648 { 649 // error handling case 650 } 651 652 glDeletePerfQueryINTEL(queryHandle); // query instance is released 653 654 3. Measuring multiple draw calls with synchronous wait for result 655 656 Note that GL_QUERY_HD_HW_METRICS is a proprietary structure defined by 657 vendor and is used as example and function named according to the 658 convention of glFuntionINTEL are wrappers to dynamically linked-by-name 659 procedures. 660 661 // query data has proprietary predefined structure layout 662 // associated with the vendor query ID 663 GL_QUERY_HD_HW_METRICS * pQueryData; 664 665 uint queryId; 666 UINT32 queryHandle[1000]; 667 char queryName[] = "Intel_HD_Hardware_Counters"; 668 669 // get vendor queryID by name 670 glGetPerfQueryIdByNameINTEL(queryName, &queryId); 671 672 // create memory for 1000 results 673 uint dataSize = sizeof(GL_QUERY_HD_HW_METRICS); 674 pQueryData = (GL_QUERY_HD_HW_METRICS *) malloc(dataSize * 1000); 675 676 // create 1000 query instances of queryId type 677 for(int i = 0; i < 1000; i++) 678 { 679 glCreatePerfQueryINTEL(queryId, &queryHandle[i]); 680 } 681 682 uint currentDrawNumber = 0; 683 684 // start 1st query 685 glBeginPerfQueryINTEL(queryHandle[currentDrawNumber]); 686 687 glDrawElements(...); // Issue graphics commands 688 689 // end query 690 glEndPerfQueryINTEL(queryHandle[currentDrawNumber++]); 691 692 ... 693 694 // start nth query 695 glBeginPerfQueryINTEL(queryHandle[currentDrawNumber]); 696 697 glDrawElements(...); // Issue graphics commands 698 699 // end query 700 glEndPerfQueryINTEL(queryHandle[currentDrawNumber++]); 701 702 ... 703 704 // assume currentDrawNumber == 1000 here 705 // so get all results after these 1000 draws 706 707 GL_QUERY_HD_HW_METRICS *pData = pQueryData; 708 709 for(int i = 0; i < 1000; i++) 710 { 711 uint bytesWritten = 0; 712 713 // use GL_PERFQUERY_WAIT_INTEL flag to cause the function will wait 714 // for the query completion 715 glGetPerfQueryDataINTEL( 716 queryHandle[i], 717 GL_PERFQUERY_WAIT_INTEL, 718 dataSize, 719 pData, 720 &bytesWritten); 721 722 if(bytesWritten != sizeof(GL_QUERY_HD_HW_METRICS)) 723 { 724 // query error case 725 assert(false); 726 ... 727 // some cleanup needed also 728 ... 729 return ERROR; 730 } 731 732 pData++; 733 } 734 735 // use counters data 736 ... 737 738 // repeat measurements if needed reusing the query instances 739 ... 740 741 // query instances are no longer needed so release all of them 742 for(int i = 0; i < 1000; i++) 743 { 744 glDeletePerfQueryINTEL(queryHandle[i]); 745 } 746 747 return SUCCESS; 748 749Revision History 750 751 1.3 20/12/13 Jon Leech Assign extension #s and enum values. Fix 752 a few typos (Bug 11345). 753 754 1.2 29/11/13 sgrajewski Extension upgraded to 4.4 core specification. 755 ES3.0.2 dependencies added. 756 757 1.1 06/06/11 puminski Initial revision. 758