• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1Name
2
3    INTEL_performance_query
4
5Name Strings
6
7    GL_INTEL_performance_query
8
9Contact
10
11   Tomasz Madajczak, Intel (tomasz.madajczak 'at' intel.com)
12
13Contributors
14
15    Piotr Uminski, Intel
16    Slawomir Grajewski, Intel
17
18Status
19
20    Complete, shipping on selected Intel graphics.
21
22Version
23
24    Last Modified Date: December 20, 2013
25    Revision: 3
26
27Number
28
29    OpenGL Extension #443
30    OpenGL ES Extension #164
31
32Dependencies
33
34    OpenGL dependencies:
35
36        OpenGL 3.0 is required.
37
38        The extension is written against the OpenGL 4.4 Specification, Core
39        Profile, October 18, 2013.
40
41    OpenGL ES dependencies:
42
43        This extension is written against the OpenGL ES 2.0.25 Specification
44        and OpenGL ES 3.0.2 Specification.
45
46Overview
47
48    The purpose of this extension is to expose Intel proprietary hardware
49    performance counters to the OpenGL applications. Performance counters may
50    count:
51
52    - number of hardware events such as number of spawned vertex shaders. In
53      this case the results represent the number of events.
54
55    - duration of certain activity, like time took by all fragment shader
56      invocations. In that case the result usually represents the number of
57      clocks in which the particular HW unit was busy. In order to use such
58      counter efficiently, it should be normalized to the range of <0,1> by
59      dividing its value by the number of render clocks.
60
61    - used throughput of certain memory types such as texture memory. In that
62      case the result of performance counter usually represents the number of
63      bytes transferred between GPU and memory.
64
65    This extension specifies universal API to manage performance counters on
66    different Intel hardware platforms. Performance counters are grouped
67    together into proprietary, hardware-specific, fixed sets of counters that
68    are measured together by the GPU.
69
70    It is assumed that performance counters are started and ended on any
71    arbitrary boundaries during rendering.
72
73    A set of performance counters is represented by a unique query type. Each
74    query type is identified by assigned name and ID. Multiple query types
75    (sets of performance counters) are supported by the Intel hardware. However
76    each Intel hardware generation supports different sets of performance
77    counters.  Therefore the query types between hardware generations can be
78    different. The definition of query types and their results structures can
79    be learned through the API. It is also documented in a separate document of
80    Intel OGL Performance Counters Specification issued per each new hardware
81    generation.
82
83    The API allows to create multiple instances of any query type and to sample
84    different fragments of 3D rendering with such instances. Query instances
85    are identified with handles.
86
87New Procedures and Functions
88
89    void GetFirstPerfQueryIdINTEL(uint *queryId);
90
91    void GetNextPerfQueryIdINTEL(uint queryId, uint *nextQueryId);
92
93    void GetPerfQueryIdByNameINTEL(char *queryName, uint *queryId);
94
95    void GetPerfQueryInfoINTEL(uint queryId,
96             uint queryNameLength, char *queryName,
97             uint *dataSize, uint *noCounters,
98             uint *noInstances, uint *capsMask);
99
100    void GetPerfCounterInfoINTEL(uint queryId, uint counterId,
101             uint counterNameLength, char *counterName,
102             uint counterDescLength, char *counterDesc,
103             uint *counterOffset, uint *counterDataSize, uint *counterTypeEnum,
104             uint *counterDataTypeEnum, uint64 *rawCounterMaxValue);
105
106    void CreatePerfQueryINTEL(uint queryId, uint *queryHandle);
107
108    void DeletePerfQueryINTEL(uint queryHandle);
109
110    void BeginPerfQueryINTEL(uint queryHandle);
111
112    void EndPerfQueryINTEL(uint queryHandle);
113
114    void GetPerfQueryDataINTEL(uint queryHandle, uint flags,
115             sizei dataSize, void *data, uint *bytesWritten);
116
117New Tokens
118
119    Returned by the capsMask parameter of GetPerfQueryInfoINTEL
120
121        PERFQUERY_SINGLE_CONTEXT_INTEL          0x0000
122        PERFQUERY_GLOBAL_CONTEXT_INTEL          0x0001
123
124    Accepted by the flags parameter of GetPerfQueryDataINTEL
125
126        PERFQUERY_WAIT_INTEL                    0x83FB
127        PERFQUERY_FLUSH_INTEL                   0x83FA
128        PERFQUERY_DONOT_FLUSH_INTEL             0x83F9
129
130    Returned by GetPerfCounterInfoINTEL function as counter type enumeration in
131    location pointed by counterTypeEnum
132
133        PERFQUERY_COUNTER_EVENT_INTEL           0x94F0
134        PERFQUERY_COUNTER_DURATION_NORM_INTEL   0x94F1
135        PERFQUERY_COUNTER_DURATION_RAW_INTEL    0x94F2
136        PERFQUERY_COUNTER_THROUGHPUT_INTEL      0x94F3
137        PERFQUERY_COUNTER_RAW_INTEL             0x94F4
138        PERFQUERY_COUNTER_TIMESTAMP_INTEL       0x94F5
139
140    Returned by glGetPerfCounterInfoINTEL function as counter data type
141    enumeration in location pointed by counterDataTypeEnum
142
143        PERFQUERY_COUNTER_DATA_UINT32_INTEL     0x94F8
144        PERFQUERY_COUNTER_DATA_UINT64_INTEL     0x94F9
145        PERFQUERY_COUNTER_DATA_FLOAT_INTEL      0x94FA
146        PERFQUERY_COUNTER_DATA_DOUBLE_INTEL     0x94FB
147        PERFQUERY_COUNTER_DATA_BOOL32_INTEL     0x94FC
148
149   Accepted by the <pname> parameter of GetIntegerv:
150
151        PERFQUERY_QUERY_NAME_LENGTH_MAX_INTEL   0x94FD
152        PERFQUERY_COUNTER_NAME_LENGTH_MAX_INTEL 0x94FE
153        PERFQUERY_COUNTER_DESC_LENGTH_MAX_INTEL 0x94FF
154
155    Accepted by the <pname> parameter of GetBooleanv:
156
157        PERFQUERY_GPA_EXTENDED_COUNTERS_INTEL   0x9500
158
159Add new Section 4.4 to Chapter 4, Event Model for OpenGL 4.4
160Add new Section 2.18 to Chapter 2, OpenGL ES Operation for OpenGL ES 3.0.2
161
162    4.4 Performance Queries (for OpenGL 4.4)
163    2.18 Performance Queries (for OpenGL ES 3.0.2)
164
165    Hardware and software performance counters can be used to obtain
166    information about GPU activity. Performance counters are grouped into query
167    types. Different query types can be supported on different hardware
168    platforms and/or driver versions. One or more instances of the query types
169    can be created.
170
171    Each query type has unique query ID. Query ids supported on given platform
172    can be queried in the run-time. Function:
173
174        void GetFirstPerfQueryIdINTEL(uint *queryId);
175
176    returns the identifier of the first performance query type that is
177    supported on a given platform. The result is passed in location pointed by
178    queryId parameter. If the given hardware platform doesn't support any
179    performance queries, then the value of 0 is returned and INVALID_OPERATION
180    error is raised. If queryId pointer is equal to 0, INVALID_VALUE error is
181    generated.
182
183    Next query ids can be queried by multiply call to the function:
184
185        void GetNextPerfQueryIdINTEL(uint queryId, uint *nextQueryId);
186
187    This function returns the integer identifier of the next performance query
188    on a given platform to the specified with queryId. The result is passed in
189    location pointed by nextQueryId. If query identified by queryId is the last
190    query available the value of 0 is returned. If the specified performance
191    query identifier is invalid then INVALID_VALUE error is generated. If
192    nextQueryId pointer is equal to 0, an INVALID_VALUE error is
193    generated. Whenever error is generated, the value of 0 is returned.
194
195    Each performance query type has a name and a unique identifier. The query
196    identifier for a given query name be read using function:
197
198        void GetPerfQueryIdByNameINTEL(char *queryName, uint *queryId);
199
200    This function returns the identified of the query type specified by the
201    string provided as queryName parameter.  If queryName does not reference a
202    valid query name, an INVALID_VALUE error is generated.
203
204    General description of a query type can be read using the function:
205
206        void GetPerfQueryInfoINTEL(uint queryId, uint queryNameLength,
207            char *queryName, uint *dataSize,
208            uint *noCounters, uint *maxInstances,
209            uint *noActiveInstances, uint *capsMask);
210
211    The function returns information about the performance query specified with
212    queryId parameter, particularly:
213
214    -  query name in queryName location. The maximal name is specified by
215       queryNameLength
216
217    -  size of query output structure in bytes in dataSize location
218
219    -  number of performance counters in the query output structure in
220       noCounters location
221
222    -  the maximal allowed number of query instances that can be created on a
223       given architecture in maxInstances location. Because the other type queries
224       are created using the same resources, it may happen that the actual amount
225       of created instances is smaller than the returned number
226
227    -  the actual number of already created query instances in maxInstances
228       location
229
230    -  mask of query capabilities in capsMask location.
231
232    If the mask returned in capsMask contains PERFQUERY_SINGLE_CONTEXT_INTEL
233    token this means the query supports context sensitive measurements,
234    otherwise, if the mask contains token of GL_PERFQUERY_GLOBAL_CONTEXT_INTEL
235    this means the query doesn't support that feature and the counters will be
236    updated for all render contexts as they are global for hardware.
237
238    If queryId does not reference a valid query type, an INVALID_VALUE error is
239    generated.
240
241    Performance counters that belong to the same query type have unique
242    ids. Performance counter ids values start with 1. Performance counter id 0
243    is reserved as an invalid counter. Information about performance counters
244    that belongs to a given query type can be read using the function:
245
246    void GetPerfCounterInfoINTEL(uint queryId, uint counterId,
247         uint counterNameLength, char *counterName,
248         uint counterDescLength, char *counterDesc,
249         uint *counterOffset, uint *counterDataSize, uint *counterTypeEnum,
250         uint *counterDataTypeEnum, uint64 *rawCounterMaxValue);
251
252    The function returns descriptive information about each particular
253    performance counter that is an element of the performance query. The
254    counter is identified with a pair of queryId and counterId parameters. The
255    following parameters are returned:
256
257    -  counter name in counterName location. The maximal length of copied name
258       is specified with counterNameLength.
259
260    -  counter description text in  counterDesc location. The maximal length of
261       copied text is specified with counterDescLength.
262
263    -  byte offset of the counter from the start of the query structure in
264       counterOffset location.
265
266    -  counter size in bytes in  counterDataSize location.
267
268    -  counter type enumeration in counterTypeEnum location. It can be one o
269       the following tokens:
270           PERFQUERY_COUNTER_EVENT_INTEL
271           PERFQUERY_COUNTER_DURATION_NORM_INTEL
272           PERFQUERY_COUNTER_DURATION_RAW_INTEL
273           PERFQUERY_COUNTER_THROUGHPUT_INTEL
274           PERFQUERY_COUNTER_RAW_INTEL
275           PERFQUERY_COUNTER_TIMESTAMP_INTEL
276
277    -  counter data type enumeration, in counterDataTypeEnum location. It can
278       be one o the following tokens:
279           PERFQUERY_COUNTER_DATA_UINT32_INTEL
280           PERFQUERY_COUNTER_DATA_UINT64_INTEL
281           PERFQUERY_COUNTER_DATA_FLOAT_INTEL
282           PERFQUERY_COUNTER_DATA_DOUBLE_INTEL
283           PERFQUERY_COUNTER_DATA_BOOL32_INTEL
284
285    -  for some raw counters for which the maximal value is deterministic, the
286       maximal value of the counter in 1 second is returned in the location
287       pointed by rawCounterMaxValue, otherwise, the location is written with
288       the value of 0.
289
290    If the pair of queryId and counterId does not reference a valid counter,
291    an INVALID_VALUE error is generated.
292
293    A single instance of the performance query of a given type can be created
294    using function:
295
296        void CreatePerfQueryINTEL(uint queryId, uint *queryHandle);
297
298    The handle to newly created query instance is returned in queryHandle
299    location. If queryId does not reference a valid query type,
300    an INVALID_VALUE error is generated. If the query instance cannot be
301    created due to exceeding the number of allowed instances or driver fails
302    query creation due to an insufficient memory reason, an OUT_OF_MEMORY error
303    is generated, and the location pointed by queryHandle returns NULL.
304    Existing query instance can be deleted using function
305
306        void DeletePerfQueryINTEL(uint queryHandle);
307
308    queryHandle must be a query instance handle returned by
309    CreatePerfQueryINTEL(). If a query handle doesn't reference a previously
310    created performance query instance, an INVALID_VALUE error is generated.
311
312    A new measurement session for a given query instance can be started using
313    function:
314
315        void BeginPerfQueryINTEL(uint queryHandle);
316
317    where queryHandle must be a query instance handle returned by
318    CreatePerfQueryINTEL(). If a query handle doesn't reference a previously
319    created performance query instance, an INVALID_VALUE error is
320    generated. Note that some query types, they cannot be collected in the same
321    time. Therefore calls of BeginPerfQueryINTEL() cannot be nested if they
322    refer to queries of such different types. In such case INVALID_OPERATION
323    error is generated.
324
325    The counters may not start immediately after BeginPerfQueryINTEL().
326    Because the API and GPU are asynchronous, the start of performance counters
327    is delayed until the graphics hardware actually executes the hardware
328    commands issued by this function.  However, it is guaranteed that collecting
329    of performance counters will start before any draw calls specified in the
330    same context after call to BeginPerfQueryINTEL().
331
332    Collecting performance counters may be stopped by a function:
333
334        void EndPerfQueryINTEL(uint queryHandle);
335
336    where queryHandle must be a query instance handle returned by
337    CreatePerfQueryINTEL(). The function ends the measurement session started
338    by BeginPerfQueryINTEL().  If a performance query is not currently started,
339    an INVALID_OPERATION error will be generated. Similarly as in
340    glBeginPerfQueryINTEL() case, the execution of glEndPerfQueryINTEL() is not
341    immediate. The end of measurement is delayed until graphics hardware
342    completes processing of the hardware commands issued by this
343    function. However, it is guaranteed that results any draw calls specified in
344    the same context after call to EndPerfQueryINTEL() will be not measured by
345    this query.
346
347    The query result can be read using function:
348
349        void GetPerfQueryDataINTEL(uint queryHandle, uint flags, sizei
350            dataSize, void *data, uint *bytesWritten);
351
352    The function returns the values of counters which have been measured within
353    the query session identified by queryHandle.  The call may end without
354    returning any data if they are not ready for reading as the measurement
355    session is still pending (the EndPerfQueryINTEL() command processing is not
356    finished by hardware). In this case location pointed by the bytesWritten
357    parameter will be set to 0. The meaning of the flags parameter is the
358    following:
359
360    -  PERFQUERY_DONOT_FLUSH_INTEL means that the call of
361       GetPerfQueryDataINTEL() is non-blocking, which checks for results and
362       returns them if they are available. Otherwise, (if the results of the
363       query are not ready) it returns without flushing any outstanding 3D
364       commands  to the GPU. The use case for this is when a flush of
365       outstanding 3D commands to GPU has already been ensured with other
366       OpenGL API calls.
367
368    -  PERFQUERY_FLUSH_INTEL means that the call of GetPerfQueryDataINTEL() is
369       non-blocking, which checks for results and returns them if they are
370       available. Otherwise, it implicitly submits any outstanding 3D commands
371       to the GPU for execution. In that case the subsequent call of
372       glGetPerfQueryDataINTEL() may return data once the query completes.
373
374    -  PERFQUERY_WAIT_INTEL means that the call of GetPerfQueryDataINTEL() is
375       blocking and waits till the query results are available and returns
376       them. It means that if the query results are not yet available then it
377       implicitly submits any outstanding 3D commands to GPU and waits for the
378       query completion.
379
380    If the measurement session indentified by queryHandle is completed then the
381    call of GetPerfQueryDataINTEL() always writes query result to the location
382    pointed by the data parameter and the amount of bytes written is stored in
383    the location pointed by the bytesWritten parameter.
384
385    If bytesWritten or data pointers are NULL then an INVALID_VALUE error is
386    generated.
387
388
389New Implementation Dependent State
390
391Add new Table 23.75 to Chapter 23, State Tables (OpenGL 4.4)
392Add new Table 6.37 to Chapter 6.2, State Tables (OpenGL ES 3.0.2)
393
394
395    Get Value                              Type Get Command Value Description
396    ------------------------------         ---- ----------- ----- -------------
397    PERFQUERY_QUERY_NAME_LENGTH_MAX_INTEL   Z+ GetIntegerv  256   max query name length
398    PERFQUERY_COUNTER_NAME_LENGTH_MAX_INTEL Z+ GetIntegerv  256   max counter name length
399    PERFQUERY_COUNTER_DESC_LENGTH_MAX_INTEL Z+ GetIntegerv  1024  max description length
400    PERFQUERY_GPA_EXTENDED_COUNTERS_INTEL   B  GetBooleanv  -     extended counters available
401
402
403Issues
404
405    1. What is the usage model of this extension?
406
407    Generally there are two approaches of measuring performance with Intel OGL
408    Performance Queries, such as:
409
410    - Per draw call measurements - performance counters can be used to assess
411      the business of particular 3D hardware units under assumption that 3D
412      hardware is almost 100% time busy from the CPU point of view.
413
414    - Per 3D scene measurements - performance counters can be used to assess
415      the balance of CPU and GPU processing times. Such assessment shows whether
416      the workload is CPU whether GPU bound.
417
418    2. How per draw call measurements are performed?
419
420       In the per-draw call usage model each call to the draw routine
421       (e.g. glDrawArrays, glDrawElements) should be surrounded by a dedicated
422       query instance. That means that each draw operation should be measured
423       independently. It is recommended to measure the GPU performance
424       characteristics for a single draw call to find possible bottlenecks
425       for the application executed on a given hardware.
426
427    3. How per scene measurements are performed?
428
429       The usage model assumes that one performance query instance measures a
430       complete scene. It is recommended to figure out if the workload is CPU
431       or GPU bound. It should be noted that:
432
433       - For a longer scope of performance query the probability of 3D hardware
434         frequency change is higher. The higher probability of frequency change
435         causes that the larger percentage of results may be biased with gross
436         errors.
437
438       - For complicated 3D scenes the condition of render commands split is
439         always met.
440
441       Thus, to calculate an average 3D hardware unit utilization for a longer
442       period of time it is recommended to use a larger number of per draw call
443       queries rather than a lower number of per 3D scene queries. It is
444       recommended to use this method when application uses full screen mode as
445       current implementation of queries supports only global context.
446
447    4. How results of the query can be read?
448
449       Results of the queries cannot be read before the entire drawing is done
450       by the GPU. This means that the application programmer has to decide
451       about the synchronization method it uses to read the query
452       results. There are the following options:
453
454       - Use glFlush to trigger submission of any pending commands to the
455         GPU. Later check results availability with repetitive non-blocking
456         calls to GetPerfQueryDataINTEL function using the synchronization flag
457         of GL_PERFQUERY_DONOT_FLUSH_INTEL.
458
459       - Use flag GL_PERFQUERY_FLUSH_INTEL in glGetPerfQueryDataINTEL to
460         trigger submission of any pending commands to the GPU. If results are
461         not immediately available, check their availability with repetitive
462         non-blocking calls to GetPerfQueryDataINTEL function using the
463         synchronization flag of GL_PERFQUERY_DONOT_FLUSH_INTEL.
464
465       - Do a blocking call to glGetPerfQueryDataINTEL() with
466         GL_PERFQUERY_WAIT_INTEL flag set. The flag ensures that any pending GPU
467         commands are submitted and function blocks till GPU results are
468         available.
469
470       It is allowed to perform simultaneous measurements with multiple active
471       queries of the same type. However it may be not allowed to perform
472       simultaneous measurements of queries with different types, as it may
473       require reprogramming of the same hardware part and could destroy the
474       hardware settings of the previous query.
475
476    5. Are query results always accurate?
477
478       There are certain hardware conditions which may cause the results
479       of performance counters expressed in hardware clocks to be inaccurate.
480       The conditions may include:
481
482       - Render clock change -  the condition usually causes that all counter
483         values expressed in hardware clocks are incorrect. It is indicated by
484         FrequencyChanged flag.
485
486       - Render commands split - in some cases GPU has to split execution of
487         drawing operations surrounded by the query into at least two
488         parts. The condition usually causes that counter values expressed in
489         time domains (in microseconds) may be substantially larger than the
490         average values of that counter. It is indicated by SplitOccured flag.
491
492       - Rendering preemption - if GPU is shared among two or more 3D
493         applications, the hardware counters gathered in a global mode contain
494         additive results for these applications. The condition is also
495         indicated with SplitOccured flag.
496
497       The above conditions are indicated in special fields in the query
498       results structures. It is up to the user to decide if the results are to
499       be processed further or dropped. In certain cases it can be determined
500       that the render commands split condition always occurs and has to be
501       accepted.
502
503    6. Are query results per-context or global?
504
505       Some GPU platforms and/or driver versions support only global GPU
506       counters. In such cases, the query instance has to have
507       GL_PERFQUERY_GLOBAL_CONTEXT_INTEL flag set when creating query
508       instance. Otherwise, creation will fail and an INVALID_OPERATION error
509       will be generated.
510
511       Support for a global context means that a single query instance measures
512       all GPU activities performed between query start and query end. Query
513       measures not only current OpenGL context but also activities of other
514       OpenGL contexts, other 3D API like DX and operating system windows draw
515       calls.
516
517Program examples
518
519    1. Reading counter  meta data example
520
521       // query data has proprietary predefined structure layout
522       // associated with the vendor query ID
523       GL_QUERY_PIPELINE_METRICS * pQueryData;
524
525       uint queryId;
526       uint nextQueryId;
527       uint queryHandle;
528       uint dataSize;
529       uint noCounters;
530       uint noInstances;
531       uint capsMask;
532
533       const uint queryNameLen = 32;
534       char queryName[queryNameLen];
535
536       const uint counterNameLen = 32;
537       char counterName[counterNameLen];
538
539       const uint counterDescLen = 256;
540       char counterDesc[counterDescLen];
541
542       //get first vendor queryID
543       glGetFirstPerfQueryIdINTEL(&queryId);
544
545       nextQueryId = queryId;
546       while(nextQueryId)
547       {
548           glGetPerfQueryInfoINTEL(
549               nextQueryId,
550               queryNameLen,
551               &queryName,
552               &dataSize,
553               &noCounters,
554               &noInstances,
555               &capsMask);
556
557               for(int counterId = 1; counterId <= noCounters; counterId++)
558           {
559               uint counterOffset;
560               uint counterDataSize;
561               uint counterTypeEnum;
562               uint counterDataTypeEnum;
563               UINT64 rawCounterMaxValue;
564
565               glGetPerfCounterInfoINTEL(
566                   nextQueryId,
567                   counterId,
568                   counterNameLen,
569                   counterName,
570                   counterDescLen,
571                   counterDesc,
572                   &counterOffset,
573                   &counterDataSize,
574                   &counterTypeEnum,
575                   &counterDataTypeEnum,
576                   &rawCounterMaxValue);
577
578                   // use returned values here
579                   ...
580           }
581       }
582
583    2. Measuring a single draw call example
584
585       Note that GL_QUERY_PIPELINE_METRICS is a proprietary structure defined
586       by vendor and is used as example and function named according to the
587       convention of glFuntionINTEL are wrappers to dynamically linked-by-name
588       procedures.
589
590       // query data has proprietary predefined structure layout
591       // associated with the vendor query ID
592       GL_QUERY_PIPELINE_METRICS * pQueryData;
593
594       uint queryId;
595       uint queryHandle;
596       char queryName[] = "Intel_Pipeline_Query";
597
598       // get vendor queryID by name
599       glGetPerfQueryIdByNameINTEL(queryName, &queryId);
600
601       // create query instance of queryId type
602       glCreatePerfQueryINTEL(queryId, &queryHandle);
603
604       glBeginPerfQueryINTEL(queryHandle); // Start query
605
606       glDrawElements(...); // Issue graphics commands, do whatever
607
608       glEndPerfQueryINTEL(queryHandle); // End query
609
610       // perform other application activities
611
612       uint bytesWritten = 0;
613       uint dataSize = sizeof(GL_QUERY_PIPELINE_METRICS);
614
615       pQueryData = (GL_QUERY_PIPELINE_METRICS *) malloc(dataSize);
616
617       // for the first time use GL_PERFQUERY_FLUSH_INTEL flag to ensure graphics
618       // commands were submitted to hardware
619
620       glGetPerfQueryDataINTEL(
621           queryHandle,
622           GL_PERFQUERY_FLUSH_INTEL,
623           dataSize,
624           pQueryData,
625           &bytesWritten);
626
627       while(bytesWritten == 0)
628       {
629           // Now enough to use GL_PERFQUERY_DONOT_FLUSH_INTEL flag
630           glGetPerfQueryDataINTEL(
631               queryHandle,
632               GL__PERFQUERY_DONOT_FLUSH_INTEL,
633                   dataSize,
634               pQueryData,
635               &bytesWritten);
636       }
637
638       if(bytesWritten == dataSize)
639       {
640           // Use counters' data here
641           uint64 vertexShaderKernelsRunCount =
642                pQueryData->VertexShaderInvocations;
643           uint64 fragmentShaderKernelsRunCount =
644                pQueryData->FragmentShaderInvocations;
645           ...
646       }
647       else
648       {
649          // error handling case
650       }
651
652       glDeletePerfQueryINTEL(queryHandle); // query instance is released
653
654    3. Measuring multiple draw calls with synchronous wait for result
655
656       Note that GL_QUERY_HD_HW_METRICS is a proprietary structure defined by
657       vendor and is used as example and function named according to the
658       convention of glFuntionINTEL are wrappers to dynamically linked-by-name
659       procedures.
660
661       // query data has proprietary predefined structure layout
662       // associated with the vendor query ID
663       GL_QUERY_HD_HW_METRICS * pQueryData;
664
665       uint queryId;
666       UINT32 queryHandle[1000];
667       char queryName[] = "Intel_HD_Hardware_Counters";
668
669       // get vendor queryID by name
670       glGetPerfQueryIdByNameINTEL(queryName, &queryId);
671
672       // create memory for 1000 results
673       uint dataSize = sizeof(GL_QUERY_HD_HW_METRICS);
674       pQueryData = (GL_QUERY_HD_HW_METRICS *) malloc(dataSize * 1000);
675
676       // create 1000 query instances of queryId type
677       for(int i = 0; i < 1000; i++)
678       {
679           glCreatePerfQueryINTEL(queryId, &queryHandle[i]);
680       }
681
682       uint currentDrawNumber = 0;
683
684       // start 1st query
685       glBeginPerfQueryINTEL(queryHandle[currentDrawNumber]);
686
687       glDrawElements(...); // Issue graphics commands
688
689       // end query
690       glEndPerfQueryINTEL(queryHandle[currentDrawNumber++]);
691
692       ...
693
694       // start nth query
695       glBeginPerfQueryINTEL(queryHandle[currentDrawNumber]);
696
697       glDrawElements(...); // Issue graphics commands
698
699       // end query
700       glEndPerfQueryINTEL(queryHandle[currentDrawNumber++]);
701
702       ...
703
704       // assume currentDrawNumber == 1000 here
705       // so get all results after these 1000 draws
706
707       GL_QUERY_HD_HW_METRICS *pData = pQueryData;
708
709       for(int i = 0; i < 1000; i++)
710       {
711           uint bytesWritten = 0;
712
713           // use GL_PERFQUERY_WAIT_INTEL flag to cause the function will wait
714           // for the query completion
715           glGetPerfQueryDataINTEL(
716               queryHandle[i],
717               GL_PERFQUERY_WAIT_INTEL,
718               dataSize,
719               pData,
720               &bytesWritten);
721
722           if(bytesWritten != sizeof(GL_QUERY_HD_HW_METRICS))
723           {
724                // query error case
725                assert(false);
726                ...
727                    // some cleanup needed also
728                ...
729                return ERROR;
730           }
731
732           pData++;
733        }
734
735        // use counters data
736        ...
737
738        // repeat measurements if needed reusing the query instances
739        ...
740
741        // query instances are no longer needed so release all of them
742        for(int i = 0; i < 1000; i++)
743        {
744            glDeletePerfQueryINTEL(queryHandle[i]);
745        }
746
747        return SUCCESS;
748
749Revision History
750
751    1.3   20/12/13 Jon Leech  Assign extension #s and enum values. Fix
752                              a few typos (Bug 11345).
753
754    1.2   29/11/13 sgrajewski Extension upgraded to 4.4 core specification.
755                              ES3.0.2 dependencies added.
756
757    1.1   06/06/11 puminski   Initial revision.
758