Lines Matching full:quantized
16 tensors at lower bitwidths than floating point precision. A quantized model
24 speed up inference and only the forward pass is supported for quantized
35 At lower level, PyTorch provides a way to represent quantized tensors and
51 (1). Programmable API for configuring how a model is quantized that can scale to many more use cases
53 …reference quantized model representation that can represent quantized computation with integer ope…
105 1. dynamic quantization (weights quantized with activations read/stored in
106 floating point and quantized for compute)
107 2. static quantization (weights quantized, activations quantized, calibration
109 3. static quantization aware training (weights quantized, activations quantized,
156 quantized ahead of time but the activations are dynamically quantized
170 # dynamically quantized model
192 # create a quantized model instance
196 dtype=torch.qint8) # the target dtype for quantized weights
225 # statically quantized model
235 # define a floating point model where some layers could be statically quantized
239 # QuantStub converts tensors from floating point to quantized
243 # DeQuantStub converts tensors from quantized to floating point
248 # point to quantized in the quantized model
252 # manually specify where tensors will be converted from quantized
253 # to floating point in the quantized model
287 # Convert the observed model to a quantized model. This does several things:
289 # used with each activation tensor, and replaces key operators with quantized
307 activations are quantized, and activations are fused into the preceding layer
326 # quantized model
340 # QuantStub converts tensors from floating point to quantized
345 # DeQuantStub converts tensors from quantized to floating point
385 # Convert the observed model to a quantized model. This does several things:
388 # and replaces key operators with quantized implementations.
409 2. Specify which parts of the model need to be quantized either by assigning
412 ``model.conv`` layer will not be quantized, and setting
420 1. Specify where activations are quantized and de-quantized. This is done using
423 2. Use :class:`~torch.ao.nn.quantized.FloatFunctional` to wrap tensor operations
544 # want the model to be quantized
572 …quantized model. So at high level the quantization stack can be split into two parts: 1). The buil…
574 Quantized Model
576 Quantized Tensor
579 quantized data in Tensors. A Quantized Tensor allows for storing
580 quantized data (represented as int8/uint8/int32) along with quantization
581 parameters like scale and zero\_point. Quantized Tensors allow for many
582 useful operations making quantized arithmetic easy, in addition to
583 allowing for serialization of data in a quantized format.
585 …quantized the same way with the same quantization parameters. Per channel means that for each dime…
596 Here are a few key attributes for quantized Tensor:
605 * dtype (torch.dtype): data type of the quantized Tensor
626 …rs, but activations in the quantized model are quantized, so we need operators to convert between …
628 * Quantize (float -> quantized)
635 * Dequantize (quantized -> float)
640 Quantized Operators/Modules
642 * Quantized Operator are the operators that takes quantized Tensor as inputs, and outputs a quantiz…
643 * Quantized Modules are PyTorch Modules that performs quantized operations. They are typically defi…
645 Quantized Engine
647 …quantized model is executed, the qengine (torch.backends.quantized.engine) specifies which backend…
690 * convert a calibrated/trained model to a quantized model
701 - Weight Only Quantization (only weight is statically quantized)
702 - Dynamic Quantization (weight is statically quantized, activation is dynamically quantized)
703 - Static Quantization (both weight and activations are statically quantized)
705 …e can have post training quantization that has both statically and dynamically quantized operators.
717 | | |quantized (fp16, | …
719 | | |quantized, weight | …
720 | | |statically quantized| …
725 | | |quantized (int8) | …
733 | | |quantized | …
737 | | |quantized | …
823 Today, PyTorch supports the following backends for running quantized operators efficiently:
826 …qnnpack <https://github.com/pytorch/pytorch/tree/main/aten/src/ATen/native/quantized/cpu/qnnpack>`_
832 We expose both `x86` and `qnnpack` with the same native pytorch quantized operators, so we need add…
834 When preparing a quantized model, it is necessary to ensure that qconfig
835 and the engine used for quantized computations match the backend on which
849 torch.backends.quantized.engine = 'x86'
858 torch.backends.quantized.engine = 'qnnpack'
898 of quantization APIs, such as quantization passes, quantized tensor operations,
899 and supported quantized modules and functions.
941 ``nn.quantized.Conv2d``) submodules in the model's module hierarchy. It
949 to specify module quantized in a custom way, with user defined logic for
956 3. The Python type of the quantized module (provided by user). This module needs
957 to define a `from_observed` function which defines how the quantized module is
980 import torch.ao.nn.quantized as nnq
1009 # custom quantized module, provided by user
1024 quantized = cls(nnq.Linear.from_float(observed_module.linear))
1025 return quantized
1089 1. How can I do quantized inference on GPU?:
1094 2. Where can I get ONNX support for my quantized model?
1109 Passing a non-quantized Tensor into a quantized kernel
1114 RuntimeError: Could not run 'quantized::some_operator' with arguments from the 'CPU' backend...
1116 This means that you are trying to pass a non-quantized Tensor to a quantized
1134 Passing a quantized Tensor into a non-quantized kernel
1141 This means that you are trying to pass a quantized Tensor to a non-quantized
1151 # this module will not be quantized (see `qconfig = None` logic below)
1171 Saving and Loading Quantized models
1174 When calling ``torch.load`` on a quantized model, if you see an error like::
1178 This is because directly saving and loading a quantized model using ``torch.save`` and ``torch.load…
1179 is not supported. To save/load quantized models, the following ways can be used:
1181 1. Saving/Loading the quantized model state_dict
1207 quantized = convert_fx(prepared)
1209 quantized.load_state_dict(torch.load(b))
1211 2. Saving/Loading scripted quantized models using ``torch.jit.save`` and ``torch.jit.load``
1243 .. py:module:: torch.ao.nn.quantized
1244 .. py:module:: torch.ao.nn.quantized.reference
1245 .. py:module:: torch.ao.nn.quantized.reference.modules
1247 .. py:module:: torch.ao.nn.sparse.quantized
1248 .. py:module:: torch.ao.nn.sparse.quantized.dynamic
1259 .. py:module:: torch.ao.nn.intrinsic.quantized.dynamic.modules.linear_relu
1260 .. py:module:: torch.ao.nn.intrinsic.quantized.modules.bn_relu
1261 .. py:module:: torch.ao.nn.intrinsic.quantized.modules.conv_add
1262 .. py:module:: torch.ao.nn.intrinsic.quantized.modules.conv_relu
1263 .. py:module:: torch.ao.nn.intrinsic.quantized.modules.linear_relu
1270 .. py:module:: torch.ao.nn.quantized.dynamic.modules.conv
1271 .. py:module:: torch.ao.nn.quantized.dynamic.modules.linear
1272 .. py:module:: torch.ao.nn.quantized.dynamic.modules.rnn
1273 .. py:module:: torch.ao.nn.quantized.modules.activation
1274 .. py:module:: torch.ao.nn.quantized.modules.batchnorm
1275 .. py:module:: torch.ao.nn.quantized.modules.conv
1276 .. py:module:: torch.ao.nn.quantized.modules.dropout
1277 .. py:module:: torch.ao.nn.quantized.modules.embedding_ops
1278 .. py:module:: torch.ao.nn.quantized.modules.functional_modules
1279 .. py:module:: torch.ao.nn.quantized.modules.linear
1280 .. py:module:: torch.ao.nn.quantized.modules.normalization
1281 .. py:module:: torch.ao.nn.quantized.modules.rnn
1282 .. py:module:: torch.ao.nn.quantized.modules.utils
1283 .. py:module:: torch.ao.nn.quantized.reference.modules.conv
1284 .. py:module:: torch.ao.nn.quantized.reference.modules.linear
1285 .. py:module:: torch.ao.nn.quantized.reference.modules.rnn
1286 .. py:module:: torch.ao.nn.quantized.reference.modules.sparse
1287 .. py:module:: torch.ao.nn.quantized.reference.modules.utils
1288 .. py:module:: torch.ao.nn.sparse.quantized.dynamic.linear
1289 .. py:module:: torch.ao.nn.sparse.quantized.linear
1290 .. py:module:: torch.ao.nn.sparse.quantized.utils
1364 .. py:module:: torch.nn.intrinsic.quantized.dynamic.modules.linear_relu
1365 .. py:module:: torch.nn.intrinsic.quantized.modules.bn_relu
1366 .. py:module:: torch.nn.intrinsic.quantized.modules.conv_relu
1367 .. py:module:: torch.nn.intrinsic.quantized.modules.linear_relu
1374 .. py:module:: torch.nn.quantized.dynamic.modules.conv
1375 .. py:module:: torch.nn.quantized.dynamic.modules.linear
1376 .. py:module:: torch.nn.quantized.dynamic.modules.rnn
1377 .. py:module:: torch.nn.quantized.functional
1378 .. py:module:: torch.nn.quantized.modules.activation
1379 .. py:module:: torch.nn.quantized.modules.batchnorm
1380 .. py:module:: torch.nn.quantized.modules.conv
1381 .. py:module:: torch.nn.quantized.modules.dropout
1382 .. py:module:: torch.nn.quantized.modules.embedding_ops
1383 .. py:module:: torch.nn.quantized.modules.functional_modules
1384 .. py:module:: torch.nn.quantized.modules.linear
1385 .. py:module:: torch.nn.quantized.modules.normalization
1386 .. py:module:: torch.nn.quantized.modules.rnn
1387 .. py:module:: torch.nn.quantized.modules.utils