1# Graph Transform Tool 2 3## Table of Contents 4 5* [Introduction](#introduction) 6* [Using the Graph Transform Tool](#using-the-graph-transform-tool) 7* [Inspecting Graphs](#inspecting-graphs) 8* [Common Use Cases](#common-use-cases) 9 * [Optimizing for Deployment](#optimizing-for-deployment) 10 * [Fixing Missing Kernel Errors on 11 Mobile](#fixing-missing-kernel-errors-on-mobile) 12 * [Shrinking File Size](#shrinking-file-size) 13 * [Eight-bit Calculations](#eight-bit-calculations) 14* [Transform Reference](#transform-reference) 15 * [add_default_attributes](#add_default_attributes) 16 * [backport_concatv2](#backport_concatv2) 17 * [flatten_atrous_conv](#flatten_atrous_conv) 18 * [fold_batch_norms](#fold_batch_norms) 19 * [fold_constants](#fold_constants) 20 * [fold_old_batch_norms](#fold_old_batch_norms) 21 * [freeze_requantization_ranges](#freeze_requantization_ranges) 22 * [fuse_convolutions](#fuse_convolutions) 23 * [insert_logging](#insert_logging) 24 * [merge_duplicate_nodes](#merge_duplicate_nodes) 25 * [obfuscate_names](#obfuscate_names) 26 * [quantize_nodes](#quantize_nodes) 27 * [quantize_weights](#quantize_weights) 28 * [remove_attribute](#remove_attribute) 29 * [remove_device](#remove_device) 30 * [remove_nodes](#remove_nodes) 31 * [rename_attribute](#rename_attribute) 32 * [rename_op](#rename_op) 33 * [round_weights](#round_weights) 34 * [sparsify_gather](#sparsify_gather) 35 * [set_device](#set_device) 36 * [sort_by_execution_order](#sort_by_execution_order) 37 * [strip_unused_nodes](#strip_unused_nodes) 38* [Writing Your Own Transforms](#writing-your-own-transforms) 39 * [Transform Functions](#transform-functions) 40 * [Pattern Syntax](#pattern-syntax) 41 * [ReplaceMatchingOpTypes](#replacematchingoptypes) 42 * [Parameters](#parameters) 43 * [Function Libraries](#function-libraries) 44 * [Registering](#registering) 45 46## Introduction 47 48When you have finished training a model and want to deploy it in production, 49you'll often want to modify it to better run in its final environment. For 50example if you're targeting a phone you might want to shrink the file size by 51quantizing the weights, or optimize away batch normalization or other 52training-only features. The Graph Transform framework offers a suite of tools 53for modifying computational graphs, and a framework to make it easy to write 54your own modifications. 55 56This guide is structured into three main parts, first giving some tutorials on 57how to perform common tasks, second a reference covering all of the different 58transformations that are included, together with the options that apply to them, 59and third a guide to creating your own transforms. 60 61## Using the Graph Transform Tool 62 63The Graph Transform tool is designed to work on models that are saved as 64GraphDef files, usually in a binary protobuf format. This is the low-level 65definition of a TensorFlow computational graph, including a list of nodes and 66the input and output connections between them. If you're using a Python API to 67train your model, this will usually be saved out in the same directory as your 68checkpoints, and usually has a '.pb' suffix. 69 70If you want to work with the values of your trained parameters, for example to 71quantize weights, you'll need to run 72[tensorflow/python/tools/freeze_graph.py](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/tools/freeze_graph.py) 73to convert the checkpoint values into embedded constants within the graph file 74itself. 75 76You call the Graph Transform tool itself like this: 77 78```bash 79bazel build tensorflow/tools/graph_transforms:transform_graph 80bazel-bin/tensorflow/tools/graph_transforms/transform_graph \ 81--in_graph=tensorflow_inception_graph.pb \ 82--out_graph=optimized_inception_graph.pb \ 83--inputs='Mul:0' \ 84--outputs='softmax:0' \ 85--transforms=' 86strip_unused_nodes(type=float, shape="1,299,299,3") 87remove_nodes(op=Identity, op=CheckNumerics) 88fold_old_batch_norms 89' 90``` 91 92The arguments here are specifying where to read the graph from, where to write 93the transformed version to, what the input and output layers are, and what 94transforms to modify the graph with. The transforms are given as a list of 95names, and can each have arguments themselves. These transforms define the 96pipeline of modifications that are applied in order to produce the output. 97Sometimes you need some transforms to happen before others, and the ordering 98within the list lets you specify which happen first. 99Note that the optimization 100`remove_nodes(op=Identity, op=CheckNumerics)` will break the model with control 101flow operations, such as `tf.cond`, `tf.map_fn`, and `tf.while`. 102 103## Inspecting Graphs 104 105Many of the transforms that the tool supports need to know what the input and 106output layers of the model are. The best source for these is the model training 107process, where for a classifier the inputs will be the nodes that receive the 108data from the training set, and the output will be the predictions. If you're 109unsure, the 110[`summarize_graph`](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/graph_transforms/summarize_graph_main.cc) 111tool can inspect the model and provide guesses about likely input and output nodes, 112as well as other information that's useful for debugging. Here's an example of 113how to use it on the [Inception V3 114graph](https://storage.googleapis.com/download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz): 115 116```bash 117bazel build tensorflow/tools/graph_transforms:summarize_graph 118bazel-bin/tensorflow/tools/graph_transforms/summarize_graph --in_graph=tensorflow_inception_graph.pb 119``` 120 121## Common Use Cases 122 123This section has small guides for some of the most frequently-used 124transformation pipelines, aimed at users who want to quickly accomplish one of 125these tasks. A lot of them will use the Inception V3 model for their examples, 126which can be downloaded from 127[https://storage.googleapis.com/download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz](https://storage.googleapis.com/download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz). 128 129### Optimizing for Deployment 130 131If you've finished training your model and want to deploy it on a server or a 132mobile device, you'll want it to run as fast as possible, and with as few 133non-essential dependencies as you can. This recipe removes all of the nodes that 134aren't called during inference, shrinks expressions that are always constant 135into single nodes, and optimizes away some multiply operations used during batch 136normalization by pre-multiplying the weights for convolutions. 137 138```bash 139bazel build tensorflow/tools/graph_transforms:transform_graph 140bazel-bin/tensorflow/tools/graph_transforms/transform_graph \ 141--in_graph=tensorflow_inception_graph.pb \ 142--out_graph=optimized_inception_graph.pb \ 143--inputs='Mul' \ 144--outputs='softmax' \ 145--transforms=' 146 strip_unused_nodes(type=float, shape="1,299,299,3") 147 remove_nodes(op=Identity, op=CheckNumerics) 148 fold_constants(ignore_errors=true) 149 fold_batch_norms 150 fold_old_batch_norms' 151``` 152 153The batch norm folding is included twice because there are two different flavors 154of batch normalization used in TensorFlow. The older version was implemented 155with a single op like BatchNormWithGlobalNormalization or FusedBatchNorm, and 156BatchNormWithGlobalNormalization was deprecated in favor of a more recent 157approach using individual ops to implement the same computation. The two 158transforms are in there so that both styles are recognized and optimized. 159 160### Fixing Missing Kernel Errors on Mobile 161 162The mobile version of TensorFlow is focused on inference, and so by default the 163list of supported ops (defined in 164[tensorflow/core/kernels/BUILD:android_extended_ops](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/kernels/BUILD) 165for Bazel and 166[tensorflow/contrib/makefile/tf_op_files.txt](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/makefile/tf_op_files.txt) 167for make builds) doesn't include a lot that are training related. This can cause 168`No OpKernel was registered to support Op` errors when a GraphDef is loaded, 169even if the op isn't going to be executed. 170 171If you see this error and it's an op that you do actually want to run on mobile, 172then you'll need to make local modifications to the build files to include the 173right .cc file that defines it. In a lot of cases the op is just a vestigial 174remnant from the training process though, and if that's true then you can run 175the [strip_unused_nodes](#strip_unused_nodes), specifying the inputs and outputs 176of your inference usage, to remove those unnecessary nodes: 177 178```bash 179bazel build tensorflow/tools/graph_transforms:transform_graph 180bazel-bin/tensorflow/tools/graph_transforms/transform_graph \ 181--in_graph=tensorflow_inception_graph.pb \ 182--out_graph=optimized_inception_graph.pb \ 183--inputs='Mul' \ 184--outputs='softmax' \ 185--transforms=' 186 strip_unused_nodes(type=float, shape="1,299,299,3") 187 fold_constants(ignore_errors=true) 188 fold_batch_norms 189 fold_old_batch_norms' 190``` 191 192### Shrinking File Size 193 194If you're looking to deploy your model as part of a mobile app, then keeping the 195download size as small as possible is important. For most TensorFlow models, the 196largest contributors to the file size are the weights passed in to convolutional 197and fully-connected layers, so anything that can reduce the storage size for 198those is very useful. Luckily most neural networks are resistant to noise, so 199it's possible to change the representation of those weights in a lossy way 200without losing very much accuracy overall. 201 202On both iOS and Android app packages are compressed before download, so the 203simplest way to reduce the bandwidth your users need to receive your app is to 204provide raw data that compresses more easily. By default the weights are stored 205as floating-point values, and even tiny differences between numbers result in 206very different bit patterns, and so these don't compress very well. If you round 207the weights so that nearby numbers are stored as exactly the same values, the 208resulting bit stream has a lot more repetition and so compresses down a lot more 209effectively. To try this technique on your model, run the 210[round_weights](#round_weights) transform. 211 212```bash 213bazel build tensorflow/tools/graph_transforms:transform_graph 214bazel-bin/tensorflow/tools/graph_transforms/transform_graph \ 215--in_graph=tensorflow_inception_graph.pb \ 216--out_graph=optimized_inception_graph.pb \ 217--inputs='Mul' \ 218--outputs='softmax' \ 219--transforms=' 220 strip_unused_nodes(type=float, shape="1,299,299,3") 221 fold_constants(ignore_errors=true) 222 fold_batch_norms 223 fold_old_batch_norms 224 round_weights(num_steps=256)' 225``` 226 227You should see that the `optimized_inception_graph.pb` output file is the same 228size as the input, but if you run zip on it to compress it, it's almost 70% 229smaller than if you zip the original! The nice thing about this transform is 230that it doesn't change the structure of the graph at all, so it's running 231exactly the same operations and should have the same latency and memory usage as 232before. You can adjust the `num_steps` parameter to control how many values each 233weight buffer is rounded to, so lower numbers will increase the compression at 234the cost of accuracy. 235 236As a further step, you can store the weights into eight-bit values directly. 237Here's the recipe for that: 238 239```bash 240bazel build tensorflow/tools/graph_transforms:transform_graph 241bazel-bin/tensorflow/tools/graph_transforms/transform_graph \ 242--in_graph=tensorflow_inception_graph.pb \ 243--out_graph=optimized_inception_graph.pb \ 244--inputs='Mul' \ 245--outputs='softmax' \ 246--transforms=' 247 strip_unused_nodes(type=float, shape="1,299,299,3") 248 fold_constants(ignore_errors=true) 249 fold_batch_norms 250 fold_old_batch_norms 251 quantize_weights' 252``` 253 254You should see that the size of the output graph is about a quarter of the 255original. The downside to this approach compared to round_weights is that extra 256decompression ops are inserted to convert the eight-bit values back into 257floating point, but optimizations in TensorFlow's runtime should ensure these 258results are cached and so you shouldn't see the graph run any more slowly. 259 260So far we've been concentrating on weights because those generally take up the 261most space. If you have a graph with a lot of small nodes in it, the names of 262those nodes can start to take up a noticeable amount of space too. To shrink 263those down, you can run the [obfuscate_names](#obfuscate_names) transform, which 264replaces all the names (except for inputs and outputs) with short, cryptic but 265unique ids: 266 267```bash 268bazel build tensorflow/tools/graph_transforms:transform_graph 269bazel-bin/tensorflow/tools/graph_transforms/transform_graph \ 270--in_graph=tensorflow_inception_graph.pb \ 271--out_graph=optimized_inception_graph.pb \ 272--inputs='Mul:0' \ 273--outputs='softmax:0' \ 274--transforms=' 275 obfuscate_names' 276``` 277 278### Eight-bit Calculations 279 280For some platforms it's very helpful to be able to do as many calculations as 281possible in eight-bit, rather than floating-point. The support for this in 282TensorFlow is still experimental and evolving, but you can convert models into 283quantized form using the graph transform tool: 284 285```bash 286bazel build tensorflow/tools/graph_transforms:transform_graph 287bazel-bin/tensorflow/tools/graph_transforms/transform_graph \ 288--in_graph=tensorflow_inception_graph.pb \ 289--out_graph=optimized_inception_graph.pb \ 290--inputs='Mul' \ 291--outputs='softmax' \ 292--transforms=' 293 add_default_attributes 294 strip_unused_nodes(type=float, shape="1,299,299,3") 295 remove_nodes(op=Identity, op=CheckNumerics) 296 fold_constants(ignore_errors=true) 297 fold_batch_norms 298 fold_old_batch_norms 299 quantize_weights 300 quantize_nodes 301 strip_unused_nodes 302 sort_by_execution_order' 303``` 304 305This process converts all the operations in the graph that have eight-bit 306quantized equivalents, and leaves the rest in floating point. Only a subset of 307ops are supported, and on many platforms the quantized code may actually be 308slower than the float equivalents, but this is a way of increasing performance 309substantially when all the circumstances are right. 310 311A full guide to optimizing for quantization is beyond the scope of this guide, 312but one thing that can help is using the FakeQuantWithMinMaxVars op after Conv2D 313or similar operations during training. This trains the min/max variables that 314control the range used for quantization, so that the range doesn't have to be 315calculated dynamically by RequantizationRange during inference. 316 317## Transform Reference 318 319The --transforms string is parsed as a series of transform names, each of which 320can have multiple named arguments inside parentheses. Arguments are separated by 321commas, and double-quotes (") can be used to hold argument values if they 322themselves contain commas (for example shape definitions). 323 324The --inputs and --outputs are shared across all transforms, since it's common 325to need to know what the ingoing and outgoing nodes in the graph are. You should 326make sure you set these correctly before calling the graph transform tool, and 327if you're in doubt check with the model's author, or use the [`summarize_graph`](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/tools/graph_transforms#inspecting-graphs) tool 328to examine likely inputs and outputs. 329 330All transforms can be passed the `ignore_errors` flag, with the value set to 331either true or false. By default any errors that happen within a transform will 332abort the whole process, but if you enable this then an error will just be 333logged and the transform skipped. This is especially useful for optional 334transforms where version errors or other unimportant problems may trigger an 335error. 336 337### add_default_attributes 338 339Args: None 340 341When attributes are added to ops in new versions of TensorFlow, they often have 342defaults to ensure backwards compatible behavior with their original versions. 343These defaults usually get added when the graph is loaded by the runtime, but if 344your model is going to be processed outside of the main TensorFlow framework it 345can be useful to run this update process as a transform. This process finds any 346op attributes that are defined in the current TensorFlow list of ops but not 347within the saved model, and sets them to the defined default for that attribute. 348 349### backport_concatv2 350 351Args: None 352 353If you have a GraphDef file that has been produced by a newer version of the 354TensorFlow framework and includes ConcatV2, and you want to run it on an older 355version that only supports Concat, this transform will take care of converting 356those newer ops to the equivalent older form. 357 358### flatten_atrous_conv 359 360Args: None \ 361Prerequisites: [fold_constants](#fold_constants) 362 363This transform flattens atrous convolution, corresponding to a sequence of 364SpaceToBatchND-Conv2D-BatchToSpaceND operations, converting it to a regular 365Conv2D op with upsampled filters. This transforms should only be used in order 366to run graphs having atrous convolution on platforms that do not yet natively 367support SpaceToBatchND and BatchToSpaceND operations. You will need to make 368sure you run [fold_constants](#fold_constants) after this transform. If 369applicable, you should run this transform before 370[fold_batch_norms](#fold_batch_norms). 371 372### fold_batch_norms 373 374Args: None \ 375Prerequisites: [fold_constants](#fold_constants) 376 377This transform tries to optimize away the Mul that's introduced after a Conv2D 378(or a MatMul) when batch normalization has been used during training. It scans 379the graph for any channel-wise multiplies immediately after convolutions, and 380multiplies the convolution's (or matrix multiplication's) weights with the Mul 381instead so this can be omitted at inference time. You'll need to make sure you 382run [fold_constants](#fold_constants) first, since the pattern can only be 383spotted if the normal complex expression that's produced by training for the Mul 384input is collapsed down into a simple constant. 385 386### fold_constants 387 388Args: 389 390* clear_output_shapes: Clears tensor shape information saved as attributes. 391 Some older graphs contains out-of-date information and may cause import 392 errors. Defaults to true. 393 394Prerequisites: None 395 396Looks for any sub-graphs within the model that always evaluate to constant 397expressions, and replaces them with those constants. This optimization is always 398executed at run-time after the graph is loaded, so running it offline first 399won't help latency, but it can simplify the graph and so make further processing 400easier. It's often useful to call this with `fold_constants(ignore_errors=true)` 401to continue on past transient errors, since this is just an optimization phase. 402 403### fold_old_batch_norms 404 405Args: None \ 406Prerequisites: None 407 408In the early days of TensorFlow, batch normalization was implemented using 409single monolithic ops like `BatchNormWithGlobalNormalization` or 410`FusedBatchNorm`. In modern versions, adding batch normalization from Python 411will give you a series of smaller math ops instead, to achieve the same effect 412without special-purpose code. If you have a graph that uses the older-style, 413this transform will recognize and optimize those ops for inference, in the same 414way that the [fold_batch_norms](#fold_batch_norms) transform does for the new 415approach. 416 417### freeze_requantization_ranges 418 419Args: 420 421* min_max_log_file: Path to a log file containing ranges for ops. 422* min_percentile: Percentage cutoff to use to calculate an overall min. 423 Defaults to 5. 424* max_percentile: Percentage cutoff to use to calculate an overall max. 425 Defaults to 5. 426 427Quantized operations like convolution or matrix multiplies take their inputs as 4288-bit, but produce 32-bit results. To do further operations on these, they need 429to be converted back down to the lower depth. To make the most of those eight 430bits, you need to scale the thirty-two bits of original data down using a scale 431that matches the range that's actually being used. 432 433Because that range information isn't stored in the original graph, the 434[quantization process](#eight-bit-calculations) inserts RequantizationRange ops 435before each conversion from 32 to 8 bits. This op looks at the 32-bit output and 436calculates the current min and max every time it's run. 437 438This isn't incredibly time-consuming, but it is extra work that's nice to avoid 439if possible. One way of optimizing that away is replacing those 440RequantizationRange ops with a pair of Const nodes holding known min/max values, 441so the scaling down can be done without having to inspect the output every time. 442 443That's what this transform does. It's usually used in conjunction with a copy of 444the graph that's had [insert_logging](#insert_logging) run on it to instrument 445it to record the min/max values to stderr. Why is logging used rather than 446writing to a normal file? As you'll see later, to get best results you want to 447collect data from a lot of runs on real data, and for mobile apps especially 448it's a lot easier to do this by copying log files. As an example, here's how 449you'd add the logging operations for a quantized version of the Inception v3 450graph: 451 452```bash 453bazel build tensorflow/tools/graph_transforms:transform_graph 454bazel-bin/tensorflow/tools/graph_transforms/transform_graph \ 455--in_graph=/tmp/quantized_inception.pb \ 456--out_graph=/tmp/logged_quantized_inception.pb \ 457--inputs=Mul \ 458--outputs=softmax \ 459--transforms=' 460insert_logging(op=RequantizationRange, show_name=true, message="__requant_min_max:")\ 461' 462``` 463 464Now, when you run the `/tmp/logged_quantized_inception.pb` graph, it will write 465out log statements that show the value of the min and max calculated by each 466RequantizationRange op. Here's an example of running label_image and saving the 467log: 468 469```bash 470bazel build tensorflow/examples/label_image:label_image 471bazel-bin/tensorflow/examples/label_image/label_image \ 472--image=${HOME}/Downloads/grace_hopper.jpg \ 473--input_layer=Mul \ 474--output_layer=softmax \ 475--graph=/tmp/logged_quantized_inception.pb \ 476--labels=${HOME}/Downloads/imagenet_comp_graph_label_strings.txt \ 4772>/tmp/min_max_log_small.txt 478``` 479 480If you look in `/tmp/min_max_log_small.txt`, you'll see a lot of lines like 481this: 482 483``` 484I0108 21:45:42.261883 1972 logging_ops.cc:79] ;conv/Conv2D/eightbit/requant_range__print__;__requant_min_max:[-20.887871][22.274715] 485``` 486 487This is a simple way of serializing the name of the RequantizationRange op and 488its min/max values every time it's run. It's a file like this that you pass into 489the transform as the `min_max_log_file` argument. The transform will attempt to 490extract all of the min/max values associated with ops, ignoring any irrelevant 491lines in the log, and replace the RequantizationRange ops with two Const nodes 492containing the found values. 493 494This isn't the whole story though. The min/max values can vary a lot depending 495on what the particular inputs to the graph are on any given run, which means 496picking ranges based on just one run can lead to clipping of values and a loss 497of accuracy. To get better results, you need to run your network against a range 498of different inputs. In Inception's case, I often use a thousand different 499images from the training set. You can then pass the whole concatenated log from 500all of the runs into the transform, and it will pick ranges based on the 501aggregate of the values found for each RequantizationRange op. 502 503To ensure that outliers don't increase the range too much, and so decrease the 504accuracy by putting too many bits into rare extreme values, the `min_percentile` 505and `max_percentile` arguments control how the overall min and max are chosen. 506At their default values of 5, this means that the lowest 5% of the minimum 507values will be discarded, taking the minimum of the remainder, and the 508equivalent for the maximum. 509 510### fuse_convolutions 511 512Args: None \ 513Prerequisites: None 514 515For graphs that use ResizeBilinear or MirrorPad ops before convolutions (e.g. to 516scale up in the later stages of an image style transfer model), it can improve 517memory usage and latency to combine the spatial transformations with the 518convolution's im2col patch generation. This transform looks out for that 519particular pattern of ops and replaces them with a fused version that combines 520the resizing and padding with the convolution. 521 522### insert_logging 523 524Args: 525 526* op: Insert a Print after every occurrence of this op type. Can be repeated 527 to cover multiple types. If not present, all op types will be instrumented. 528* prefix: Insert a Print after every node whose name starts with this value. 529 Can be repeated to cover multiple nodes. If not present, all node names will 530 be matched. 531* show_op: If true, the op type will be prepended to all log messages. 532* show_name: If true, the node's name will be prepended to all log messages. 533* message: Arbitrary text to log before the values. 534* first_n: How many times to print before suppressing. Defaults to -1, which 535 means never stop. 536* summarize: How long numerical results can be before they're truncated. 537 Defaults to 1024. 538 539The Print operator writes strings to stderr when it's run inside a graph, and 540prints out the numerical results of the node that it's reading from. This can be 541very useful when you're debugging and want to follow particular internal values 542while a graph is running. This transform allows you to insert those ops at 543particular points in the graph, and customize the message that's displayed. It's 544also used in conjunction with the 545[freeze_requantization_ranges](#freeze_requantization_ranges) transform to 546output information that it needs. 547 548### merge_duplicate_nodes 549 550Args: None \ 551Prerequisites: None 552 553If there are Const nodes with the same types and contents, or nodes with the 554same inputs and attributes, this transform will merge them together. It can be 555useful when you want to cut down the number of nodes in a graph that has a lot 556of redundancy (e.g. this transform is always run as part of 557[quantize_nodes](#quantize_nodes) since the processing there can introduce 558duplicates of constants that are used in the quantize/dequantize process). 559 560### obfuscate_names 561 562Args: None \ 563Prerequisites: None 564 565Replaces all nodes' names with short generated ids, other than the inputs and 566outputs. This also updates all references within the graph so that the structure 567is preserved. This can be useful if you want to shrink the file size, or if you 568want to make it harder to understand the architecture of your model before 569releasing it. 570 571### quantize_nodes 572 573Args: 574 575* input_min: The lowest float value for any quantized placeholder inputs. 576* input_max: The highest float value for any quantized placeholder inputs. If 577 both input_min and input_max are set, then any float placeholders in the 578 graph will be replaced with quantized versions, and consts will be created 579 to pass the range to subsequent operations. 580* fallback_min: The lowest float value to use for requantizing activation 581 layers. 582* fallback_max: The highest float value to use for requantizing activation 583 layers. If both fallback_min and fallback_max are set, then instead of using 584 RequantizationRange ops to figure out the useful range dynamically when 585 converting the 32-bit output of ops like QuantizedConv2D and 586 QuantizedBiasAdd, hardwired consts with these values will be used instead. 587 This can help performance, if you know the range of your activation layers 588 ahead of time. 589 590Prerequisites: [quantize_weights](#quantize_weights) 591 592Replaces any calculation nodes with their eight-bit equivalents (if available), 593and adds in conversion layers to allow remaining float operations to 594interoperate. This is one of the most complex transforms, and involves multiple 595passes and a lot of rewriting. It's also still an active area of research, so 596results may vary depending on the platform and operations you're using in your 597model. You should run quantize_weights first to ensure your Const ops are in 598eight-bit form. 599 600### quantize_weights 601 602Args: 603 604* minimum_size: Tensors with fewer elements than this won't be quantized 605(defaults to 1024) 606 607Prerequisites: None 608 609Converts any large (more than minimum_size) float Const op into an eight-bit 610equivalent, followed by a float conversion op so that the result is usable by 611subsequent nodes. This is mostly useful for [shrinking file 612sizes](#shrinking-file-size), but also helps with the more advanced 613[quantize_nodes](#quantize_nodes) transform. Even though there are no 614prerequisites, it is advisable to run [fold_batch_norms](#fold_batch_norms) or 615[fold_old_batch_norms](#fold_old_batch_norms), because rounding variances down 616to zero may cause significant loss of precision. 617 618### remove_attribute 619 620Args: 621 622* attribute_name: Name of the attribute you want to remove. 623* op_name: Optional name of a single op to restrict the removal to. 624 625Prerequisites: None 626 627Deletes the given attribute from either all nodes, or just the one specified in 628`op_name`. This can be a dangerous transform since it's easy to leave your graph 629in an invalid state if you remove a required attribute. It can be useful in 630special circumstances though. 631 632### remove_device 633 634Args: None \ 635Prerequisites: None 636 637All ops can have a hardware device specified. This can be a problem when you're 638loading a graph on a different system than the model was trained on, since some 639specified devices may not be available. In order to work with graphs like these, 640you can run this transform to wipe the slate clean and delete the device 641specifier from all ops. 642 643### remove_control_dependencies 644 645Args: None \ 646Prerequisites: None 647 648Removes all control dependencies from the graph. 649 650### remove_nodes 651 652Args: 653 654* op: The name of the op you want to remove. Can be repeated to remove 655 multiple ops. 656 657Prerequisites: None 658 659This is a potentially dangerous transform that looks for single-input, 660single-output ops with the given names, removes them from the graph, and rewires 661all inputs that use to pull from them to pull from the preceding node instead. 662This is most useful for getting rid of ops like `CheckNumerics` that are useful 663during training but just complicate the graph and increase latency during 664inference. It's dangerous because it's possible that removing some ops may 665change the output of your graph, so make sure you check the overall accuracy 666after using this. 667 668### rename_attribute 669 670Args: 671 672* old_attribute_name: Current name of the attribute you want to rename. 673* new_attribute_name: Name that you want the attribute to have now. 674* op_name: If this is set, only change attributes for a given op type, 675 otherwise apply to all nodes with attribute names that match. 676 677Prerequisites: None 678 679Changes the name of the given attribute. This is often useful for upgrading 680graph files as op definitions change over versions, since the renaming is often 681enough to deal with minor changes. 682 683### rename_op 684 685Args: 686 687* old_op_name: Current name of the operation. 688* new_op_name: Name to change to. 689 690Prerequisites: None 691 692Finds all ops with the given name, and changes them to the new one. This can be 693useful for version upgrading if the changes between ops are minor apart from the 694name. 695 696### round_weights 697 698Args: 699 700* num_steps: How many unique values to use in each buffer. 701 702Prerequisites: None 703 704Rounds all float values in large Const ops (more than 15 elements) to the given 705number of steps. The unique values are chosen per buffer by linearly allocating 706between the largest and smallest values present. This is useful when you'll be 707deploying on mobile, and you want a model that will compress effectively. See 708[shrinking file size](#shrinking-file-size) for more details. Even though there 709are no prerequisites, it is advisable to run 710[fold_batch_norms](#fold_batch_norms) or 711[fold_old_batch_norms](#fold_old_batch_norms), because rounding variances down 712to zero may cause significant loss of precision. 713 714### sparsify_gather 715 716Args: None \ 717Prerequisites: None 718 719Transform 'Gather' op to a sparsified version where 'params' input of 'Gather' 720is replaced from a dense 'Const' to a 'HashTable'. 'Gather' op itself is 721replaced by a hashtable lookup. This is mostly useful for reducing sparse 722TF.learn linear model memory footprint. 723 724### set_device 725 726Args: 727 728* device: What device to assign to ops. 729* if_default: If this is true, only assign to ops with empty existing devices. 730 731Updates nodes to use the specified device. A device is a way to tell the code 732that executes the graph which piece of hardware it should run particular nodes 733on. The right assignment to use may change between training and deployment, so 734this transform (and [remove_device](#remove_device)) provide a way of updating 735the placement. If the `is_default` parameter is set, then only ops that don't 736have a device assigned already will be updated. This is mostly useful for 737preprocessing of graphs for other stages that expect all ops to have an explicit 738device assigned. 739 740### sort_by_execution_order 741 742Args: None \ 743Prerequisites: None 744 745Arranges the nodes in the GraphDef in topological order, so that the inputs of 746any given node are always earlier than the node itself. This is especially 747useful when you're targeting a minimal inference engine, since you can just 748execute the nodes in the given order knowing that the inputs will be computed 749before they're needed. 750 751### strip_unused_nodes 752 753Args: 754 755* type: Default type for any new Placeholder nodes generated, for example 756 int32, float, quint8. 757* shape: Default shape for any new Placeholder nodes generated, as 758 comma-separated dimensions. For example shape="1,299,299,3". The double 759 quotes are important, since otherwise the commas will be taken as argument 760 separators. 761* name: Identifier for the placeholder arguments. 762* type_for_name: What type to use for the previously-given name. 763* shape_for_name: What shape to use for the previously-given name. 764 765Prerequisites: None 766 767Removes all nodes not used in calculated the layers given in `--outputs`, fed by 768`--inputs`. This is often useful for removing training-only nodes like 769save-and-restore or summary ops. It's also handy for solving the [missing kernel 770errors problem](#fixing-missing-kernel-errors-on-mobile) when there are decode 771or other ops you don't need in the inference path. 772 773The biggest complication is that it sometimes has to create new Placeholder ops, 774so there are options to control their characteristics. This will happen if you 775bypass a DecodeJpeg op by specifying an input layer deeper in the network, for 776example, so you can pass in a raw image array instead of an encoded string as an 777input. The decode op will be removed, together with the Placeholder that fed it, 778but a new Placeholder is needed for the input layer you specify. The type and 779shape arguments let you control the attributes of any new Placeholders that are 780created. Plain `type` and `shape` set global defaults, but if you have different 781inputs with varying characteristics, you'll need to pass in a list of arguments 782where the preceding name specifies what layer each applies to. For example, if 783you had two inputs in1 and in2, you could call `strip_unused_nodes(name=in1, 784type_for_name=int32, shape_for_name="2,3", name=in2, type_for_name=float, 785shape_for_name="1,10,10,3")`. 786 787## Writing Your Own Transforms 788 789The Graph Transform Tool is designed to make it as easy as possible to create 790your own optimization, modification, and pre-processing transforms. At their 791heart, all of the transforms take in a valid GraphDef, make some changes, and 792output a new GraphDef. Each GraphDef is just a list of NodeDefs, each defining 793one node in the graph and its connections. You can find more information on the 794format at [this guide to TensorFlow model 795files](https://www.tensorflow.org/versions/master/extend/tool_developers/index.html), 796but for a simple example take a look at 797[tensorflow/tools/graph_transforms/rename_op.cc](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/tools/graph_transforms/rename_op.cc), 798which implements the [rename_op](#rename_op) transform: 799 800```C++ 801Status RenameOp(const GraphDef& input_graph_def, 802 const TransformFuncContext& context, 803 GraphDef* output_graph_def) { 804 if (!context.params.count("old_op_name") || 805 (context.params.at("old_op_name").size() != 1) || 806 !context.params.count("new_op_name") || 807 (context.params.at("new_op_name").size() != 1)) { 808 return errors::InvalidArgument( 809 "rename_op expects exactly one 'old_op_name' and 'new_op_name' " 810 "argument, e.g. rename_op(old_op_name=Mul, new_op_name=Multiply)"); 811 } 812 813 const string old_op_name = context.params.at("old_op_name")[0]; 814 const string new_op_name = context.params.at("new_op_name")[0]; 815 output_graph_def->Clear(); 816 for (const NodeDef& node : input_graph_def.node()) { 817 NodeDef* new_node = output_graph_def->mutable_node()->Add(); 818 new_node->CopyFrom(node); 819 if (node.op() == old_op_name) { 820 new_node->set_op(new_op_name); 821 } 822 } 823 824 return Status::OK(); 825} 826 827REGISTER_GRAPH_TRANSFORM("rename_op", RenameOp); 828``` 829 830The heart of this transform is the loop through the input_graph_def's nodes. We 831go through each op, add a new one to the output, copy the original's contents, 832and then change the op over if it matches the parameters. There's a standard set 833of parameters for every transform, so they all take in a GraphDef and context, 834and write out into a new GraphDef. The registration macro at the bottom lets the 835tool know what function to call when it finds the `rename_op` string in a 836transforms list. 837 838### Transform Functions 839 840The standard signature that all transform functions have is defined as 841`TransformFunc`, which takes in an input GraphDef, a `TransformFuncContext` 842containing environment information, writes to an output GraphDef, and returns a 843Status indicating whether the transform succeeded. 844 845The `TransformFuncContext` has a list of the inputs and outputs for the graph, 846and the [parameter arguments](#parameters) that were passed into the transform 847by the user. 848 849If you write a function that matches this signature, and [register 850it](#registration), the graph transform tool will take care of calling it. 851 852### Pattern Syntax 853 854The `rename_op` example only needs to look at a single node at a time, but one 855of the most common needs is to modify small sub-graphs within a model. To make 856this easy, the Graph Transform Tool provides the `OpTypePattern` syntax. This is 857a simple and compact way to specify patterns of nodes that you want to look for. 858The format is: 859 860``` 861OP_TYPE_PATTERN ::= "{" OP "," INPUTS "}" 862INPUTS ::= OP_TYPE_PATTERN 863``` 864 865The `OP` field can either contain a single "*", which means match any op type, 866one op type (for example "Const"), or a set of op types separated by `|` symbols 867(for example "Conv2D|MatMul|BiasAdd"). General regex patterns are not supported, 868just these special cases. 869 870You can think of these patterns as very limited regular expressions designed to 871pick out sub-trees in graphs. They are deliberately very constrained to the kind 872of things we commonly find ourselves needing to do, to make creating and 873debugging as straightforward as possible. 874 875For example, if you want all Conv2D nodes that have a constant as their second 876input, you would set up a pattern like this, using C++ initializer lists to 877populate the structure: 878 879```C++ 880OpTypePattern conv_pattern({"Conv2D", {{"*"}, {"Const"}}}); 881``` 882 883It can be easier to visualize these initializers using indentation to show the 884tree structure more clearly: 885 886```C++ 887OpTypePattern conv_pattern({ 888 "Conv2D", 889 { 890 {"*"}, 891 {"Const"} 892 } 893}); 894``` 895 896In plain English this is saying, a Conv2D op with two inputs, the first of which 897is any op type, and the second is a Const op. 898 899Here's a much more complex example, from the [quantize_nodes](#quantize_nodes) 900transform: 901 902```C++ 903{"QuantizeV2", 904 { 905 {"Dequantize"}, 906 {"Min", 907 { 908 {"Reshape", 909 { 910 {"Dequantize"}, 911 {"Const"}, 912 } 913 }, 914 {"Const"}, 915 } 916 }, 917 {"Max", 918 { 919 {"Reshape", 920 { 921 {"Dequantize"}, 922 {"Const"}, 923 } 924 }, 925 {"Const"}, 926 } 927 }, 928 } 929} 930``` 931 932This is looking for QuantizeV2 nodes, with three inputs, the first of which is a 933Dequantize, the second is a Min that ultimately pulls from a Dequantize, and the 934third is a Max which does the same. Assuming we know the Dequantize ops are 935pulling from the same eight-bit buffer, the end result of this sub-graph is a 936no-op, since it's just turning the eight-bit buffer into float, and then 937immediately converting it back to eight-bits, so if we look for this pattern and 938remove it we can optimize the graph without changing the result. 939 940### ReplaceMatchingOpTypes 941 942It's very common to want to find all occurrences of a particular sub-graph in a 943model, and replace them all with a different sub-graph that keeps the same local 944input and output connections. For example with 945[fuse_convolutions](#fuse_convolutions), we needed to find all Conv2D ops that 946read their inputs from BilinearResizes, and replace those combinations with a 947single FusedResizeAndPadConv2D op, but without affecting other ops. 948 949To make that sort of transformation easy, we created the 950`ReplaceMatchingOpTypes` helper. This takes in a graph, an `OpTypePattern` 951defining the sub-graph to look for, and a callback function to run for every 952occurrence it finds. The job of this callback function is to look at the 953`NodeMatch` that contains information about the current sub-graph, and return a 954new sub-graph in the new_nodes list that will be used to replace the old 955sub-graph. 956 957You can see how it's used in practice in the 958[fuse_convolutions](#fuse_convolutions) code: 959 960```C++ 961TF_RETURN_IF_ERROR(ReplaceMatchingOpTypes( 962 input_graph_def, // clang-format off 963 {"Conv2D", 964 { 965 {"ResizeBilinear"}, 966 {"*"} 967 } 968 }, // clang-format on 969 [](const NodeMatch& match, const std::set<string>& input_nodes, 970 const std::set<string>& output_nodes, 971 std::vector<NodeDef>* new_nodes) { 972 // Find all the nodes we expect in the subgraph. 973 const NodeDef& conv_node = match.node; 974 const NodeDef& resize_node = match.inputs[0].node; 975 const NodeDef& weights_node = match.inputs[1].node; 976 977 // We'll be reusing the old weights. 978 new_nodes->push_back(weights_node); 979 980 // Create a 'no-op' mirror padding node that has no effect. 981 NodeDef pad_dims_node; 982 pad_dims_node.set_op("Const"); 983 pad_dims_node.set_name(conv_node.name() + "_dummy_paddings"); 984 SetNodeAttr("dtype", DT_INT32, &pad_dims_node); 985 SetNodeTensorAttr<int32>("value", {4, 2}, {0, 0, 0, 0, 0, 0, 0, 0}, 986 &pad_dims_node); 987 new_nodes->push_back(pad_dims_node); 988 989 // Set up the new fused version of the convolution op. 990 NodeDef fused_conv; 991 fused_conv.set_op("FusedResizeAndPadConv2D"); 992 fused_conv.set_name(match.node.name()); 993 AddNodeInput(resize_node.input(0), &fused_conv); 994 AddNodeInput(resize_node.input(1), &fused_conv); 995 AddNodeInput(pad_dims_node.name(), &fused_conv); 996 AddNodeInput(conv_node.input(1), &fused_conv); 997 CopyNodeAttr(resize_node, "align_corners", "resize_align_corners", 998 &fused_conv); 999 SetNodeAttr("mode", "REFLECT", &fused_conv); 1000 CopyNodeAttr(conv_node, "T", "T", &fused_conv); 1001 CopyNodeAttr(conv_node, "padding", "padding", &fused_conv); 1002 CopyNodeAttr(conv_node, "strides", "strides", &fused_conv); 1003 new_nodes->push_back(fused_conv); 1004 1005 return Status::OK(); 1006 }, 1007 {}, &replaced_graph_def)); 1008``` 1009 1010Here you can see we define the pattern to look for, and in the callback function 1011use information from each of the nodes in the old sub-graph to create a new 1012fused node. We also copy over the old weights input node so that isn't lost. 1013 1014There are a few things to know about the `ReplaceMatchingOpTypes` function: 1015 1016* All of the nodes in any matching sub-graphs are removed from the new graph 1017 created by the function. If any of them are needed, it's the callback 1018 function's responsibility to add them back in. There's a `CopyOriginalMatch` 1019 convenience call that will copy over all of the original nodes if you decide 1020 you don't actually want to modify a particular sub-graph. 1021 1022* It is assumed that the same nodes will never appear in more than one matched 1023 sub-graph. This is to ensure that sub-trees are only replaced once, but it 1024 may mean that some sub-graphs aren't spotted if they overlap with earlier 1025 matches. 1026 1027* The calling framework tries to ensure that the graph remains sane, by 1028 looking at the new_nodes that are returned and making sure that no nodes 1029 which are needed as inputs by nodes outside the sub-graph are removed. These 1030 important nodes are listed in the `output_nodes` argument that's passed into 1031 each replacement function call. You can disable this checking by setting 1032 `allow_inconsistencies` to true in the options, but otherwise any 1033 replacements that break the graph constraints will be canceled. If you do 1034 allow inconsistencies, it's your transform's responsibility to fix them up 1035 before you return your final result. Functions like `RenameNodeInputs` can 1036 be useful if you are doing wholesale node renaming for example. 1037 1038### Parameters 1039 1040The arguments that are in parentheses after the transform name when the tool is 1041called are parsed and placed into the params member of the TransformFuncContext 1042that's given to each transform. For every named argument, there's a vector of 1043strings containing all the values that it was given, in the order they were 1044given. These are treated a bit like command-line parameters, and it's the 1045transform's responsibility to parse them into the data types it needs, and raise 1046errors by returning a bad Status if any of them are ill-formed. 1047 1048As an example, here's a hypothetical transform call: 1049 1050``` 1051some_transform(foo=a, foo=b, bar=2, bob="1,2,3") 1052``` 1053 1054Here's what the std::map of strings looks like in the params member: 1055 1056``` 1057{{"foo", {"a", "b"}}, {"bar", {"2"}}, {"bob", {"1,2,3"}}} 1058``` 1059 1060The double quotes around the comma-separated argument to `bob` are important 1061because otherwise they'll be treated as separate arguments, and the parsing will 1062fail. 1063 1064Here's an example of how [round_weights](#round_weights) reads its `num_steps` 1065parameter: 1066 1067```C++ 1068TF_RETURN_IF_ERROR(context.GetOneInt32Parameter("num_steps", 256, &num_steps)); 1069``` 1070 1071If the conversion fails or the parameter occurs more than once the helper 1072function will raise a meaningful error through the status result of the 1073transform. If the parameter isn't specified at all then the default will be 1074used. 1075 1076### Function Libraries 1077 1078A newer feature of TensorFlow is the ability to create libraries of functions as 1079part of graphs. These are a bit like templates, which define macro operations in 1080terms of smaller components, which can then be instantiated with different input 1081and output connections inside the graph just like regular ops. Right now the 1082graph transform tool just copies these libraries between the input and output 1083graphs, but it's likely that more complex operations will be supported on them 1084in the future. 1085 1086### Registering 1087 1088The Graph Transform Tool associates names of transforms with the code to 1089implement them using the `REGISTER_GRAPH_TRANSFORM()` macro. This takes a string 1090and a function, and automatically registers the transform with the tool. You 1091will need to watch out for a few things though: 1092 1093* Because it's using global C++ objects in each file under the hood, the 1094 linker can sometimes strip them out and lose the registration. In Bazel you 1095 need to make sure you're linking any new transforms in as libraries, and use 1096 the `alwayslink` flag in your `cc_binary` call. 1097 1098* You should be able to create your own copy of the transform_graph tool by 1099 linking against the transform_graph_main_lib library in 1100 tensorflow/tools/graph_transforms/BUILD. This contains all the `main()` 1101 logic to parse command line arguments and call transforms. 1102