• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# Graph Transform Tool
2
3## Table of Contents
4
5*   [Introduction](#introduction)
6*   [Using the Graph Transform Tool](#using-the-graph-transform-tool)
7*   [Inspecting Graphs](#inspecting-graphs)
8*   [Common Use Cases](#common-use-cases)
9    *   [Optimizing for Deployment](#optimizing-for-deployment)
10    *   [Fixing Missing Kernel Errors on
11        Mobile](#fixing-missing-kernel-errors-on-mobile)
12    *   [Shrinking File Size](#shrinking-file-size)
13    *   [Eight-bit Calculations](#eight-bit-calculations)
14*   [Transform Reference](#transform-reference)
15    *   [add_default_attributes](#add_default_attributes)
16    *   [backport_concatv2](#backport_concatv2)
17    *   [flatten_atrous_conv](#flatten_atrous_conv)
18    *   [fold_batch_norms](#fold_batch_norms)
19    *   [fold_constants](#fold_constants)
20    *   [fold_old_batch_norms](#fold_old_batch_norms)
21    *   [freeze_requantization_ranges](#freeze_requantization_ranges)
22    *   [fuse_convolutions](#fuse_convolutions)
23    *   [insert_logging](#insert_logging)
24    *   [merge_duplicate_nodes](#merge_duplicate_nodes)
25    *   [obfuscate_names](#obfuscate_names)
26    *   [quantize_nodes](#quantize_nodes)
27    *   [quantize_weights](#quantize_weights)
28    *   [remove_attribute](#remove_attribute)
29    *   [remove_device](#remove_device)
30    *   [remove_nodes](#remove_nodes)
31    *   [rename_attribute](#rename_attribute)
32    *   [rename_op](#rename_op)
33    *   [round_weights](#round_weights)
34    *   [sparsify_gather](#sparsify_gather)
35    *   [set_device](#set_device)
36    *   [sort_by_execution_order](#sort_by_execution_order)
37    *   [strip_unused_nodes](#strip_unused_nodes)
38*   [Writing Your Own Transforms](#writing-your-own-transforms)
39    *   [Transform Functions](#transform-functions)
40    *   [Pattern Syntax](#pattern-syntax)
41    *   [ReplaceMatchingOpTypes](#replacematchingoptypes)
42    *   [Parameters](#parameters)
43    *   [Function Libraries](#function-libraries)
44    *   [Registering](#registering)
45
46## Introduction
47
48When you have finished training a model and want to deploy it in production,
49you'll often want to modify it to better run in its final environment. For
50example if you're targeting a phone you might want to shrink the file size by
51quantizing the weights, or optimize away batch normalization or other
52training-only features. The Graph Transform framework offers a suite of tools
53for modifying computational graphs, and a framework to make it easy to write
54your own modifications.
55
56This guide is structured into three main parts, first giving some tutorials on
57how to perform common tasks, second a reference covering all of the different
58transformations that are included, together with the options that apply to them,
59and third a guide to creating your own transforms.
60
61## Using the Graph Transform Tool
62
63The Graph Transform tool is designed to work on models that are saved as
64GraphDef files, usually in a binary protobuf format. This is the low-level
65definition of a TensorFlow computational graph, including a list of nodes and
66the input and output connections between them. If you're using a Python API to
67train your model, this will usually be saved out in the same directory as your
68checkpoints, and usually has a '.pb' suffix.
69
70If you want to work with the values of your trained parameters, for example to
71quantize weights, you'll need to run
72[tensorflow/python/tools/freeze_graph.py](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/tools/freeze_graph.py)
73to convert the checkpoint values into embedded constants within the graph file
74itself.
75
76You call the Graph Transform tool itself like this:
77
78```bash
79bazel build tensorflow/tools/graph_transforms:transform_graph
80bazel-bin/tensorflow/tools/graph_transforms/transform_graph \
81--in_graph=tensorflow_inception_graph.pb \
82--out_graph=optimized_inception_graph.pb \
83--inputs='Mul:0' \
84--outputs='softmax:0' \
85--transforms='
86strip_unused_nodes(type=float, shape="1,299,299,3")
87remove_nodes(op=Identity, op=CheckNumerics)
88fold_old_batch_norms
89'
90```
91
92The arguments here are specifying where to read the graph from, where to write
93the transformed version to, what the input and output layers are, and what
94transforms to modify the graph with. The transforms are given as a list of
95names, and can each have arguments themselves. These transforms define the
96pipeline of modifications that are applied in order to produce the output.
97Sometimes you need some transforms to happen before others, and the ordering
98within the list lets you specify which happen first.
99Note that the optimization
100`remove_nodes(op=Identity, op=CheckNumerics)` will break the model with control
101flow operations, such as `tf.cond`, `tf.map_fn`, and `tf.while`.
102
103## Inspecting Graphs
104
105Many of the transforms that the tool supports need to know what the input and
106output layers of the model are. The best source for these is the model training
107process, where for a classifier the inputs will be the nodes that receive the
108data from the training set, and the output will be the predictions. If you're
109unsure, the
110[`summarize_graph`](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/graph_transforms/summarize_graph_main.cc)
111tool can inspect the model and provide guesses about likely input and output nodes,
112as well as other information that's useful for debugging. Here's an example of
113how to use it on the [Inception V3
114graph](https://storage.googleapis.com/download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz):
115
116```bash
117bazel build tensorflow/tools/graph_transforms:summarize_graph
118bazel-bin/tensorflow/tools/graph_transforms/summarize_graph --in_graph=tensorflow_inception_graph.pb
119```
120
121## Common Use Cases
122
123This section has small guides for some of the most frequently-used
124transformation pipelines, aimed at users who want to quickly accomplish one of
125these tasks. A lot of them will use the Inception V3 model for their examples,
126which can be downloaded from
127[https://storage.googleapis.com/download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz](https://storage.googleapis.com/download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz).
128
129### Optimizing for Deployment
130
131If you've finished training your model and want to deploy it on a server or a
132mobile device, you'll want it to run as fast as possible, and with as few
133non-essential dependencies as you can. This recipe removes all of the nodes that
134aren't called during inference, shrinks expressions that are always constant
135into single nodes, and optimizes away some multiply operations used during batch
136normalization by pre-multiplying the weights for convolutions.
137
138```bash
139bazel build tensorflow/tools/graph_transforms:transform_graph
140bazel-bin/tensorflow/tools/graph_transforms/transform_graph \
141--in_graph=tensorflow_inception_graph.pb \
142--out_graph=optimized_inception_graph.pb \
143--inputs='Mul' \
144--outputs='softmax' \
145--transforms='
146  strip_unused_nodes(type=float, shape="1,299,299,3")
147  remove_nodes(op=Identity, op=CheckNumerics)
148  fold_constants(ignore_errors=true)
149  fold_batch_norms
150  fold_old_batch_norms'
151```
152
153The batch norm folding is included twice because there are two different flavors
154of batch normalization used in TensorFlow. The older version was implemented
155with a single op like BatchNormWithGlobalNormalization or FusedBatchNorm, and
156BatchNormWithGlobalNormalization was deprecated in favor of a more recent
157approach using individual ops to implement the same computation. The two
158transforms are in there so that both styles are recognized and optimized.
159
160### Fixing Missing Kernel Errors on Mobile
161
162The mobile version of TensorFlow is focused on inference, and so by default the
163list of supported ops (defined in
164[tensorflow/core/kernels/BUILD:android_extended_ops](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/kernels/BUILD)
165for Bazel and
166[tensorflow/contrib/makefile/tf_op_files.txt](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/makefile/tf_op_files.txt)
167for make builds) doesn't include a lot that are training related. This can cause
168`No OpKernel was registered to support Op` errors when a GraphDef is loaded,
169even if the op isn't going to be executed.
170
171If you see this error and it's an op that you do actually want to run on mobile,
172then you'll need to make local modifications to the build files to include the
173right .cc file that defines it. In a lot of cases the op is just a vestigial
174remnant from the training process though, and if that's true then you can run
175the [strip_unused_nodes](#strip_unused_nodes), specifying the inputs and outputs
176of your inference usage, to remove those unnecessary nodes:
177
178```bash
179bazel build tensorflow/tools/graph_transforms:transform_graph
180bazel-bin/tensorflow/tools/graph_transforms/transform_graph \
181--in_graph=tensorflow_inception_graph.pb \
182--out_graph=optimized_inception_graph.pb \
183--inputs='Mul' \
184--outputs='softmax' \
185--transforms='
186  strip_unused_nodes(type=float, shape="1,299,299,3")
187  fold_constants(ignore_errors=true)
188  fold_batch_norms
189  fold_old_batch_norms'
190```
191
192### Shrinking File Size
193
194If you're looking to deploy your model as part of a mobile app, then keeping the
195download size as small as possible is important. For most TensorFlow models, the
196largest contributors to the file size are the weights passed in to convolutional
197and fully-connected layers, so anything that can reduce the storage size for
198those is very useful. Luckily most neural networks are resistant to noise, so
199it's possible to change the representation of those weights in a lossy way
200without losing very much accuracy overall.
201
202On both iOS and Android app packages are compressed before download, so the
203simplest way to reduce the bandwidth your users need to receive your app is to
204provide raw data that compresses more easily. By default the weights are stored
205as floating-point values, and even tiny differences between numbers result in
206very different bit patterns, and so these don't compress very well. If you round
207the weights so that nearby numbers are stored as exactly the same values, the
208resulting bit stream has a lot more repetition and so compresses down a lot more
209effectively. To try this technique on your model, run the
210[round_weights](#round_weights) transform.
211
212```bash
213bazel build tensorflow/tools/graph_transforms:transform_graph
214bazel-bin/tensorflow/tools/graph_transforms/transform_graph \
215--in_graph=tensorflow_inception_graph.pb \
216--out_graph=optimized_inception_graph.pb \
217--inputs='Mul' \
218--outputs='softmax' \
219--transforms='
220  strip_unused_nodes(type=float, shape="1,299,299,3")
221  fold_constants(ignore_errors=true)
222  fold_batch_norms
223  fold_old_batch_norms
224  round_weights(num_steps=256)'
225```
226
227You should see that the `optimized_inception_graph.pb` output file is the same
228size as the input, but if you run zip on it to compress it, it's almost 70%
229smaller than if you zip the original! The nice thing about this transform is
230that it doesn't change the structure of the graph at all, so it's running
231exactly the same operations and should have the same latency and memory usage as
232before. You can adjust the `num_steps` parameter to control how many values each
233weight buffer is rounded to, so lower numbers will increase the compression at
234the cost of accuracy.
235
236As a further step, you can store the weights into eight-bit values directly.
237Here's the recipe for that:
238
239```bash
240bazel build tensorflow/tools/graph_transforms:transform_graph
241bazel-bin/tensorflow/tools/graph_transforms/transform_graph \
242--in_graph=tensorflow_inception_graph.pb \
243--out_graph=optimized_inception_graph.pb \
244--inputs='Mul' \
245--outputs='softmax' \
246--transforms='
247  strip_unused_nodes(type=float, shape="1,299,299,3")
248  fold_constants(ignore_errors=true)
249  fold_batch_norms
250  fold_old_batch_norms
251  quantize_weights'
252```
253
254You should see that the size of the output graph is about a quarter of the
255original. The downside to this approach compared to round_weights is that extra
256decompression ops are inserted to convert the eight-bit values back into
257floating point, but optimizations in TensorFlow's runtime should ensure these
258results are cached and so you shouldn't see the graph run any more slowly.
259
260So far we've been concentrating on weights because those generally take up the
261most space. If you have a graph with a lot of small nodes in it, the names of
262those nodes can start to take up a noticeable amount of space too. To shrink
263those down, you can run the [obfuscate_names](#obfuscate_names) transform, which
264replaces all the names (except for inputs and outputs) with short, cryptic but
265unique ids:
266
267```bash
268bazel build tensorflow/tools/graph_transforms:transform_graph
269bazel-bin/tensorflow/tools/graph_transforms/transform_graph \
270--in_graph=tensorflow_inception_graph.pb \
271--out_graph=optimized_inception_graph.pb \
272--inputs='Mul:0' \
273--outputs='softmax:0' \
274--transforms='
275  obfuscate_names'
276```
277
278### Eight-bit Calculations
279
280For some platforms it's very helpful to be able to do as many calculations as
281possible in eight-bit, rather than floating-point. The support for this in
282TensorFlow is still experimental and evolving, but you can convert models into
283quantized form using the graph transform tool:
284
285```bash
286bazel build tensorflow/tools/graph_transforms:transform_graph
287bazel-bin/tensorflow/tools/graph_transforms/transform_graph \
288--in_graph=tensorflow_inception_graph.pb \
289--out_graph=optimized_inception_graph.pb \
290--inputs='Mul' \
291--outputs='softmax' \
292--transforms='
293  add_default_attributes
294  strip_unused_nodes(type=float, shape="1,299,299,3")
295  remove_nodes(op=Identity, op=CheckNumerics)
296  fold_constants(ignore_errors=true)
297  fold_batch_norms
298  fold_old_batch_norms
299  quantize_weights
300  quantize_nodes
301  strip_unused_nodes
302  sort_by_execution_order'
303```
304
305This process converts all the operations in the graph that have eight-bit
306quantized equivalents, and leaves the rest in floating point. Only a subset of
307ops are supported, and on many platforms the quantized code may actually be
308slower than the float equivalents, but this is a way of increasing performance
309substantially when all the circumstances are right.
310
311A full guide to optimizing for quantization is beyond the scope of this guide,
312but one thing that can help is using the FakeQuantWithMinMaxVars op after Conv2D
313or similar operations during training. This trains the min/max variables that
314control the range used for quantization, so that the range doesn't have to be
315calculated dynamically by RequantizationRange during inference.
316
317## Transform Reference
318
319The --transforms string is parsed as a series of transform names, each of which
320can have multiple named arguments inside parentheses. Arguments are separated by
321commas, and double-quotes (") can be used to hold argument values if they
322themselves contain commas (for example shape definitions).
323
324The --inputs and --outputs are shared across all transforms, since it's common
325to need to know what the ingoing and outgoing nodes in the graph are. You should
326make sure you set these correctly before calling the graph transform tool, and
327if you're in doubt check with the model's author, or use the [`summarize_graph`](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/tools/graph_transforms#inspecting-graphs) tool
328to examine likely inputs and outputs.
329
330All transforms can be passed the `ignore_errors` flag, with the value set to
331either true or false. By default any errors that happen within a transform will
332abort the whole process, but if you enable this then an error will just be
333logged and the transform skipped. This is especially useful for optional
334transforms where version errors or other unimportant problems may trigger an
335error.
336
337### add_default_attributes
338
339Args: None
340
341When attributes are added to ops in new versions of TensorFlow, they often have
342defaults to ensure backwards compatible behavior with their original versions.
343These defaults usually get added when the graph is loaded by the runtime, but if
344your model is going to be processed outside of the main TensorFlow framework it
345can be useful to run this update process as a transform. This process finds any
346op attributes that are defined in the current TensorFlow list of ops but not
347within the saved model, and sets them to the defined default for that attribute.
348
349### backport_concatv2
350
351Args: None
352
353If you have a GraphDef file that has been produced by a newer version of the
354TensorFlow framework and includes ConcatV2, and you want to run it on an older
355version that only supports Concat, this transform will take care of converting
356those newer ops to the equivalent older form.
357
358### flatten_atrous_conv
359
360Args: None \
361Prerequisites: [fold_constants](#fold_constants)
362
363This transform flattens atrous convolution, corresponding to a sequence of
364SpaceToBatchND-Conv2D-BatchToSpaceND operations, converting it to a regular
365Conv2D op with upsampled filters. This transforms should only be used in order
366to run graphs having atrous convolution on platforms that do not yet natively
367support SpaceToBatchND and BatchToSpaceND operations. You will need to make
368sure you run [fold_constants](#fold_constants) after this transform. If
369applicable, you should run this transform before
370[fold_batch_norms](#fold_batch_norms).
371
372### fold_batch_norms
373
374Args: None \
375Prerequisites: [fold_constants](#fold_constants)
376
377This transform tries to optimize away the Mul that's introduced after a Conv2D
378(or a MatMul) when batch normalization has been used during training. It scans
379the graph for any channel-wise multiplies immediately after convolutions, and
380multiplies the convolution's (or matrix multiplication's) weights with the Mul
381instead so this can be omitted at inference time. You'll need to make sure you
382run [fold_constants](#fold_constants) first, since the pattern can only be
383spotted if the normal complex expression that's produced by training for the Mul
384input is collapsed down into a simple constant.
385
386### fold_constants
387
388Args:
389
390*   clear_output_shapes: Clears tensor shape information saved as attributes.
391    Some older graphs contains out-of-date information and may cause import
392    errors. Defaults to true.
393
394Prerequisites: None
395
396Looks for any sub-graphs within the model that always evaluate to constant
397expressions, and replaces them with those constants. This optimization is always
398executed at run-time after the graph is loaded, so running it offline first
399won't help latency, but it can simplify the graph and so make further processing
400easier. It's often useful to call this with `fold_constants(ignore_errors=true)`
401to continue on past transient errors, since this is just an optimization phase.
402
403### fold_old_batch_norms
404
405Args: None \
406Prerequisites: None
407
408In the early days of TensorFlow, batch normalization was implemented using
409single monolithic ops like `BatchNormWithGlobalNormalization` or
410`FusedBatchNorm`. In modern versions, adding batch normalization from Python
411will give you a series of smaller math ops instead, to achieve the same effect
412without special-purpose code. If you have a graph that uses the older-style,
413this transform will recognize and optimize those ops for inference, in the same
414way that the [fold_batch_norms](#fold_batch_norms) transform does for the new
415approach.
416
417### freeze_requantization_ranges
418
419Args:
420
421*   min_max_log_file: Path to a log file containing ranges for ops.
422*   min_percentile: Percentage cutoff to use to calculate an overall min.
423    Defaults to 5.
424*   max_percentile: Percentage cutoff to use to calculate an overall max.
425    Defaults to 5.
426
427Quantized operations like convolution or matrix multiplies take their inputs as
4288-bit, but produce 32-bit results. To do further operations on these, they need
429to be converted back down to the lower depth. To make the most of those eight
430bits, you need to scale the thirty-two bits of original data down using a scale
431that matches the range that's actually being used.
432
433Because that range information isn't stored in the original graph, the
434[quantization process](#eight-bit-calculations) inserts RequantizationRange ops
435before each conversion from 32 to 8 bits. This op looks at the 32-bit output and
436calculates the current min and max every time it's run.
437
438This isn't incredibly time-consuming, but it is extra work that's nice to avoid
439if possible. One way of optimizing that away is replacing those
440RequantizationRange ops with a pair of Const nodes holding known min/max values,
441so the scaling down can be done without having to inspect the output every time.
442
443That's what this transform does. It's usually used in conjunction with a copy of
444the graph that's had [insert_logging](#insert_logging) run on it to instrument
445it to record the min/max values to stderr. Why is logging used rather than
446writing to a normal file? As you'll see later, to get best results you want to
447collect data from a lot of runs on real data, and for mobile apps especially
448it's a lot easier to do this by copying log files. As an example, here's how
449you'd add the logging operations for a quantized version of the Inception v3
450graph:
451
452```bash
453bazel build tensorflow/tools/graph_transforms:transform_graph
454bazel-bin/tensorflow/tools/graph_transforms/transform_graph \
455--in_graph=/tmp/quantized_inception.pb \
456--out_graph=/tmp/logged_quantized_inception.pb \
457--inputs=Mul \
458--outputs=softmax \
459--transforms='
460insert_logging(op=RequantizationRange, show_name=true, message="__requant_min_max:")\
461'
462```
463
464Now, when you run the `/tmp/logged_quantized_inception.pb` graph, it will write
465out log statements that show the value of the min and max calculated by each
466RequantizationRange op. Here's an example of running label_image and saving the
467log:
468
469```bash
470bazel build tensorflow/examples/label_image:label_image
471bazel-bin/tensorflow/examples/label_image/label_image \
472--image=${HOME}/Downloads/grace_hopper.jpg \
473--input_layer=Mul \
474--output_layer=softmax \
475--graph=/tmp/logged_quantized_inception.pb \
476--labels=${HOME}/Downloads/imagenet_comp_graph_label_strings.txt \
4772>/tmp/min_max_log_small.txt
478```
479
480If you look in `/tmp/min_max_log_small.txt`, you'll see a lot of lines like
481this:
482
483```
484I0108 21:45:42.261883    1972 logging_ops.cc:79] ;conv/Conv2D/eightbit/requant_range__print__;__requant_min_max:[-20.887871][22.274715]
485```
486
487This is a simple way of serializing the name of the RequantizationRange op and
488its min/max values every time it's run. It's a file like this that you pass into
489the transform as the `min_max_log_file` argument. The transform will attempt to
490extract all of the min/max values associated with ops, ignoring any irrelevant
491lines in the log, and replace the RequantizationRange ops with two Const nodes
492containing the found values.
493
494This isn't the whole story though. The min/max values can vary a lot depending
495on what the particular inputs to the graph are on any given run, which means
496picking ranges based on just one run can lead to clipping of values and a loss
497of accuracy. To get better results, you need to run your network against a range
498of different inputs. In Inception's case, I often use a thousand different
499images from the training set. You can then pass the whole concatenated log from
500all of the runs into the transform, and it will pick ranges based on the
501aggregate of the values found for each RequantizationRange op.
502
503To ensure that outliers don't increase the range too much, and so decrease the
504accuracy by putting too many bits into rare extreme values, the `min_percentile`
505and `max_percentile` arguments control how the overall min and max are chosen.
506At their default values of 5, this means that the lowest 5% of the minimum
507values will be discarded, taking the minimum of the remainder, and the
508equivalent for the maximum.
509
510### fuse_convolutions
511
512Args: None \
513Prerequisites: None
514
515For graphs that use ResizeBilinear or MirrorPad ops before convolutions (e.g. to
516scale up in the later stages of an image style transfer model), it can improve
517memory usage and latency to combine the spatial transformations with the
518convolution's im2col patch generation. This transform looks out for that
519particular pattern of ops and replaces them with a fused version that combines
520the resizing and padding with the convolution.
521
522### insert_logging
523
524Args:
525
526*   op: Insert a Print after every occurrence of this op type. Can be repeated
527    to cover multiple types. If not present, all op types will be instrumented.
528*   prefix: Insert a Print after every node whose name starts with this value.
529    Can be repeated to cover multiple nodes. If not present, all node names will
530    be matched.
531*   show_op: If true, the op type will be prepended to all log messages.
532*   show_name: If true, the node's name will be prepended to all log messages.
533*   message: Arbitrary text to log before the values.
534*   first_n: How many times to print before suppressing. Defaults to -1, which
535    means never stop.
536*   summarize: How long numerical results can be before they're truncated.
537    Defaults to 1024.
538
539The Print operator writes strings to stderr when it's run inside a graph, and
540prints out the numerical results of the node that it's reading from. This can be
541very useful when you're debugging and want to follow particular internal values
542while a graph is running. This transform allows you to insert those ops at
543particular points in the graph, and customize the message that's displayed. It's
544also used in conjunction with the
545[freeze_requantization_ranges](#freeze_requantization_ranges) transform to
546output information that it needs.
547
548### merge_duplicate_nodes
549
550Args: None \
551Prerequisites: None
552
553If there are Const nodes with the same types and contents, or nodes with the
554same inputs and attributes, this transform will merge them together. It can be
555useful when you want to cut down the number of nodes in a graph that has a lot
556of redundancy (e.g. this transform is always run as part of
557[quantize_nodes](#quantize_nodes) since the processing there can introduce
558duplicates of constants that are used in the quantize/dequantize process).
559
560### obfuscate_names
561
562Args: None \
563Prerequisites: None
564
565Replaces all nodes' names with short generated ids, other than the inputs and
566outputs. This also updates all references within the graph so that the structure
567is preserved. This can be useful if you want to shrink the file size, or if you
568want to make it harder to understand the architecture of your model before
569releasing it.
570
571### quantize_nodes
572
573Args:
574
575*   input_min: The lowest float value for any quantized placeholder inputs.
576*   input_max: The highest float value for any quantized placeholder inputs. If
577    both input_min and input_max are set, then any float placeholders in the
578    graph will be replaced with quantized versions, and consts will be created
579    to pass the range to subsequent operations.
580*   fallback_min: The lowest float value to use for requantizing activation
581    layers.
582*   fallback_max: The highest float value to use for requantizing activation
583    layers. If both fallback_min and fallback_max are set, then instead of using
584    RequantizationRange ops to figure out the useful range dynamically when
585    converting the 32-bit output of ops like QuantizedConv2D and
586    QuantizedBiasAdd, hardwired consts with these values will be used instead.
587    This can help performance, if you know the range of your activation layers
588    ahead of time.
589
590Prerequisites: [quantize_weights](#quantize_weights)
591
592Replaces any calculation nodes with their eight-bit equivalents (if available),
593and adds in conversion layers to allow remaining float operations to
594interoperate. This is one of the most complex transforms, and involves multiple
595passes and a lot of rewriting. It's also still an active area of research, so
596results may vary depending on the platform and operations you're using in your
597model. You should run quantize_weights first to ensure your Const ops are in
598eight-bit form.
599
600### quantize_weights
601
602Args:
603
604*   minimum_size: Tensors with fewer elements than this won't be quantized
605(defaults to 1024)
606
607Prerequisites: None
608
609Converts any large (more than minimum_size) float Const op into an eight-bit
610equivalent, followed by a float conversion op so that the result is usable by
611subsequent nodes. This is mostly useful for [shrinking file
612sizes](#shrinking-file-size), but also helps with the more advanced
613[quantize_nodes](#quantize_nodes) transform. Even though there are no
614prerequisites, it is advisable to run [fold_batch_norms](#fold_batch_norms) or
615[fold_old_batch_norms](#fold_old_batch_norms), because rounding variances down
616to zero may cause significant loss of precision.
617
618### remove_attribute
619
620Args:
621
622*   attribute_name: Name of the attribute you want to remove.
623*   op_name: Optional name of a single op to restrict the removal to.
624
625Prerequisites: None
626
627Deletes the given attribute from either all nodes, or just the one specified in
628`op_name`. This can be a dangerous transform since it's easy to leave your graph
629in an invalid state if you remove a required attribute. It can be useful in
630special circumstances though.
631
632### remove_device
633
634Args: None \
635Prerequisites: None
636
637All ops can have a hardware device specified. This can be a problem when you're
638loading a graph on a different system than the model was trained on, since some
639specified devices may not be available. In order to work with graphs like these,
640you can run this transform to wipe the slate clean and delete the device
641specifier from all ops.
642
643### remove_control_dependencies
644
645Args: None \
646Prerequisites: None
647
648Removes all control dependencies from the graph.
649
650### remove_nodes
651
652Args:
653
654*   op: The name of the op you want to remove. Can be repeated to remove
655    multiple ops.
656
657Prerequisites: None
658
659This is a potentially dangerous transform that looks for single-input,
660single-output ops with the given names, removes them from the graph, and rewires
661all inputs that use to pull from them to pull from the preceding node instead.
662This is most useful for getting rid of ops like `CheckNumerics` that are useful
663during training but just complicate the graph and increase latency during
664inference. It's dangerous because it's possible that removing some ops may
665change the output of your graph, so make sure you check the overall accuracy
666after using this.
667
668### rename_attribute
669
670Args:
671
672*   old_attribute_name: Current name of the attribute you want to rename.
673*   new_attribute_name: Name that you want the attribute to have now.
674*   op_name: If this is set, only change attributes for a given op type,
675    otherwise apply to all nodes with attribute names that match.
676
677Prerequisites: None
678
679Changes the name of the given attribute. This is often useful for upgrading
680graph files as op definitions change over versions, since the renaming is often
681enough to deal with minor changes.
682
683### rename_op
684
685Args:
686
687*   old_op_name: Current name of the operation.
688*   new_op_name: Name to change to.
689
690Prerequisites: None
691
692Finds all ops with the given name, and changes them to the new one. This can be
693useful for version upgrading if the changes between ops are minor apart from the
694name.
695
696### round_weights
697
698Args:
699
700*   num_steps: How many unique values to use in each buffer.
701
702Prerequisites: None
703
704Rounds all float values in large Const ops (more than 15 elements) to the given
705number of steps. The unique values are chosen per buffer by linearly allocating
706between the largest and smallest values present. This is useful when you'll be
707deploying on mobile, and you want a model that will compress effectively. See
708[shrinking file size](#shrinking-file-size) for more details. Even though there
709are no prerequisites, it is advisable to run
710[fold_batch_norms](#fold_batch_norms) or
711[fold_old_batch_norms](#fold_old_batch_norms), because rounding variances down
712to zero may cause significant loss of precision.
713
714### sparsify_gather
715
716Args: None \
717Prerequisites: None
718
719Transform 'Gather' op to a sparsified version where 'params' input of 'Gather'
720is replaced from a dense 'Const' to a 'HashTable'. 'Gather' op itself is
721replaced by a hashtable lookup. This is mostly useful for reducing sparse
722TF.learn linear model memory footprint.
723
724### set_device
725
726Args:
727
728*   device: What device to assign to ops.
729*   if_default: If this is true, only assign to ops with empty existing devices.
730
731Updates nodes to use the specified device. A device is a way to tell the code
732that executes the graph which piece of hardware it should run particular nodes
733on. The right assignment to use may change between training and deployment, so
734this transform (and [remove_device](#remove_device)) provide a way of updating
735the placement. If the `is_default` parameter is set, then only ops that don't
736have a device assigned already will be updated. This is mostly useful for
737preprocessing of graphs for other stages that expect all ops to have an explicit
738device assigned.
739
740### sort_by_execution_order
741
742Args: None \
743Prerequisites: None
744
745Arranges the nodes in the GraphDef in topological order, so that the inputs of
746any given node are always earlier than the node itself. This is especially
747useful when you're targeting a minimal inference engine, since you can just
748execute the nodes in the given order knowing that the inputs will be computed
749before they're needed.
750
751### strip_unused_nodes
752
753Args:
754
755*   type: Default type for any new Placeholder nodes generated, for example
756    int32, float, quint8.
757*   shape: Default shape for any new Placeholder nodes generated, as
758    comma-separated dimensions. For example shape="1,299,299,3". The double
759    quotes are important, since otherwise the commas will be taken as argument
760    separators.
761*   name: Identifier for the placeholder arguments.
762*   type_for_name: What type to use for the previously-given name.
763*   shape_for_name: What shape to use for the previously-given name.
764
765Prerequisites: None
766
767Removes all nodes not used in calculated the layers given in `--outputs`, fed by
768`--inputs`. This is often useful for removing training-only nodes like
769save-and-restore or summary ops. It's also handy for solving the [missing kernel
770errors problem](#fixing-missing-kernel-errors-on-mobile) when there are decode
771or other ops you don't need in the inference path.
772
773The biggest complication is that it sometimes has to create new Placeholder ops,
774so there are options to control their characteristics. This will happen if you
775bypass a DecodeJpeg op by specifying an input layer deeper in the network, for
776example, so you can pass in a raw image array instead of an encoded string as an
777input. The decode op will be removed, together with the Placeholder that fed it,
778but a new Placeholder is needed for the input layer you specify. The type and
779shape arguments let you control the attributes of any new Placeholders that are
780created. Plain `type` and `shape` set global defaults, but if you have different
781inputs with varying characteristics, you'll need to pass in a list of arguments
782where the preceding name specifies what layer each applies to. For example, if
783you had two inputs in1 and in2, you could call `strip_unused_nodes(name=in1,
784type_for_name=int32, shape_for_name="2,3", name=in2, type_for_name=float,
785shape_for_name="1,10,10,3")`.
786
787## Writing Your Own Transforms
788
789The Graph Transform Tool is designed to make it as easy as possible to create
790your own optimization, modification, and pre-processing transforms. At their
791heart, all of the transforms take in a valid GraphDef, make some changes, and
792output a new GraphDef. Each GraphDef is just a list of NodeDefs, each defining
793one node in the graph and its connections. You can find more information on the
794format at [this guide to TensorFlow model
795files](https://www.tensorflow.org/versions/master/extend/tool_developers/index.html),
796but for a simple example take a look at
797[tensorflow/tools/graph_transforms/rename_op.cc](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/tools/graph_transforms/rename_op.cc),
798which implements the [rename_op](#rename_op) transform:
799
800```C++
801Status RenameOp(const GraphDef& input_graph_def,
802                const TransformFuncContext& context,
803                GraphDef* output_graph_def) {
804  if (!context.params.count("old_op_name") ||
805      (context.params.at("old_op_name").size() != 1) ||
806      !context.params.count("new_op_name") ||
807      (context.params.at("new_op_name").size() != 1)) {
808    return errors::InvalidArgument(
809        "rename_op expects exactly one 'old_op_name' and 'new_op_name' "
810        "argument, e.g. rename_op(old_op_name=Mul, new_op_name=Multiply)");
811  }
812
813  const string old_op_name = context.params.at("old_op_name")[0];
814  const string new_op_name = context.params.at("new_op_name")[0];
815  output_graph_def->Clear();
816  for (const NodeDef& node : input_graph_def.node()) {
817    NodeDef* new_node = output_graph_def->mutable_node()->Add();
818    new_node->CopyFrom(node);
819    if (node.op() == old_op_name) {
820      new_node->set_op(new_op_name);
821    }
822  }
823
824  return Status::OK();
825}
826
827REGISTER_GRAPH_TRANSFORM("rename_op", RenameOp);
828```
829
830The heart of this transform is the loop through the input_graph_def's nodes. We
831go through each op, add a new one to the output, copy the original's contents,
832and then change the op over if it matches the parameters. There's a standard set
833of parameters for every transform, so they all take in a GraphDef and context,
834and write out into a new GraphDef. The registration macro at the bottom lets the
835tool know what function to call when it finds the `rename_op` string in a
836transforms list.
837
838### Transform Functions
839
840The standard signature that all transform functions have is defined as
841`TransformFunc`, which takes in an input GraphDef, a `TransformFuncContext`
842containing environment information, writes to an output GraphDef, and returns a
843Status indicating whether the transform succeeded.
844
845The `TransformFuncContext` has a list of the inputs and outputs for the graph,
846and the [parameter arguments](#parameters) that were passed into the transform
847by the user.
848
849If you write a function that matches this signature, and [register
850it](#registration), the graph transform tool will take care of calling it.
851
852### Pattern Syntax
853
854The `rename_op` example only needs to look at a single node at a time, but one
855of the most common needs is to modify small sub-graphs within a model. To make
856this easy, the Graph Transform Tool provides the `OpTypePattern` syntax. This is
857a simple and compact way to specify patterns of nodes that you want to look for.
858The format is:
859
860```
861OP_TYPE_PATTERN ::= "{" OP "," INPUTS "}"
862INPUTS ::= OP_TYPE_PATTERN
863```
864
865The `OP` field can either contain a single "*", which means match any op type,
866one op type (for example "Const"), or a set of op types separated by `|` symbols
867(for example "Conv2D|MatMul|BiasAdd"). General regex patterns are not supported,
868just these special cases.
869
870You can think of these patterns as very limited regular expressions designed to
871pick out sub-trees in graphs. They are deliberately very constrained to the kind
872of things we commonly find ourselves needing to do, to make creating and
873debugging as straightforward as possible.
874
875For example, if you want all Conv2D nodes that have a constant as their second
876input, you would set up a pattern like this, using C++ initializer lists to
877populate the structure:
878
879```C++
880OpTypePattern conv_pattern({"Conv2D", {{"*"}, {"Const"}}});
881```
882
883It can be easier to visualize these initializers using indentation to show the
884tree structure more clearly:
885
886```C++
887OpTypePattern conv_pattern({
888  "Conv2D",
889  {
890    {"*"},
891    {"Const"}
892  }
893});
894```
895
896In plain English this is saying, a Conv2D op with two inputs, the first of which
897is any op type, and the second is a Const op.
898
899Here's a much more complex example, from the [quantize_nodes](#quantize_nodes)
900transform:
901
902```C++
903{"QuantizeV2",
904  {
905    {"Dequantize"},
906    {"Min",
907      {
908        {"Reshape",
909          {
910            {"Dequantize"},
911            {"Const"},
912          }
913        },
914        {"Const"},
915      }
916    },
917    {"Max",
918      {
919        {"Reshape",
920          {
921            {"Dequantize"},
922            {"Const"},
923          }
924        },
925        {"Const"},
926      }
927    },
928  }
929}
930```
931
932This is looking for QuantizeV2 nodes, with three inputs, the first of which is a
933Dequantize, the second is a Min that ultimately pulls from a Dequantize, and the
934third is a Max which does the same. Assuming we know the Dequantize ops are
935pulling from the same eight-bit buffer, the end result of this sub-graph is a
936no-op, since it's just turning the eight-bit buffer into float, and then
937immediately converting it back to eight-bits, so if we look for this pattern and
938remove it we can optimize the graph without changing the result.
939
940### ReplaceMatchingOpTypes
941
942It's very common to want to find all occurrences of a particular sub-graph in a
943model, and replace them all with a different sub-graph that keeps the same local
944input and output connections. For example with
945[fuse_convolutions](#fuse_convolutions), we needed to find all Conv2D ops that
946read their inputs from BilinearResizes, and replace those combinations with a
947single FusedResizeAndPadConv2D op, but without affecting other ops.
948
949To make that sort of transformation easy, we created the
950`ReplaceMatchingOpTypes` helper. This takes in a graph, an `OpTypePattern`
951defining the sub-graph to look for, and a callback function to run for every
952occurrence it finds. The job of this callback function is to look at the
953`NodeMatch` that contains information about the current sub-graph, and return a
954new sub-graph in the new_nodes list that will be used to replace the old
955sub-graph.
956
957You can see how it's used in practice in the
958[fuse_convolutions](#fuse_convolutions) code:
959
960```C++
961TF_RETURN_IF_ERROR(ReplaceMatchingOpTypes(
962    input_graph_def,  // clang-format off
963    {"Conv2D",
964        {
965            {"ResizeBilinear"},
966            {"*"}
967        }
968    },  // clang-format on
969    [](const NodeMatch& match, const std::set<string>& input_nodes,
970       const std::set<string>& output_nodes,
971       std::vector<NodeDef>* new_nodes) {
972      // Find all the nodes we expect in the subgraph.
973      const NodeDef& conv_node = match.node;
974      const NodeDef& resize_node = match.inputs[0].node;
975      const NodeDef& weights_node = match.inputs[1].node;
976
977      // We'll be reusing the old weights.
978      new_nodes->push_back(weights_node);
979
980      // Create a 'no-op' mirror padding node that has no effect.
981      NodeDef pad_dims_node;
982      pad_dims_node.set_op("Const");
983      pad_dims_node.set_name(conv_node.name() + "_dummy_paddings");
984      SetNodeAttr("dtype", DT_INT32, &pad_dims_node);
985      SetNodeTensorAttr<int32>("value", {4, 2}, {0, 0, 0, 0, 0, 0, 0, 0},
986                               &pad_dims_node);
987      new_nodes->push_back(pad_dims_node);
988
989      // Set up the new fused version of the convolution op.
990      NodeDef fused_conv;
991      fused_conv.set_op("FusedResizeAndPadConv2D");
992      fused_conv.set_name(match.node.name());
993      AddNodeInput(resize_node.input(0), &fused_conv);
994      AddNodeInput(resize_node.input(1), &fused_conv);
995      AddNodeInput(pad_dims_node.name(), &fused_conv);
996      AddNodeInput(conv_node.input(1), &fused_conv);
997      CopyNodeAttr(resize_node, "align_corners", "resize_align_corners",
998                   &fused_conv);
999      SetNodeAttr("mode", "REFLECT", &fused_conv);
1000      CopyNodeAttr(conv_node, "T", "T", &fused_conv);
1001      CopyNodeAttr(conv_node, "padding", "padding", &fused_conv);
1002      CopyNodeAttr(conv_node, "strides", "strides", &fused_conv);
1003      new_nodes->push_back(fused_conv);
1004
1005      return Status::OK();
1006    },
1007    {}, &replaced_graph_def));
1008```
1009
1010Here you can see we define the pattern to look for, and in the callback function
1011use information from each of the nodes in the old sub-graph to create a new
1012fused node. We also copy over the old weights input node so that isn't lost.
1013
1014There are a few things to know about the `ReplaceMatchingOpTypes` function:
1015
1016*   All of the nodes in any matching sub-graphs are removed from the new graph
1017    created by the function. If any of them are needed, it's the callback
1018    function's responsibility to add them back in. There's a `CopyOriginalMatch`
1019    convenience call that will copy over all of the original nodes if you decide
1020    you don't actually want to modify a particular sub-graph.
1021
1022*   It is assumed that the same nodes will never appear in more than one matched
1023    sub-graph. This is to ensure that sub-trees are only replaced once, but it
1024    may mean that some sub-graphs aren't spotted if they overlap with earlier
1025    matches.
1026
1027*   The calling framework tries to ensure that the graph remains sane, by
1028    looking at the new_nodes that are returned and making sure that no nodes
1029    which are needed as inputs by nodes outside the sub-graph are removed. These
1030    important nodes are listed in the `output_nodes` argument that's passed into
1031    each replacement function call. You can disable this checking by setting
1032    `allow_inconsistencies` to true in the options, but otherwise any
1033    replacements that break the graph constraints will be canceled. If you do
1034    allow inconsistencies, it's your transform's responsibility to fix them up
1035    before you return your final result. Functions like `RenameNodeInputs` can
1036    be useful if you are doing wholesale node renaming for example.
1037
1038### Parameters
1039
1040The arguments that are in parentheses after the transform name when the tool is
1041called are parsed and placed into the params member of the TransformFuncContext
1042that's given to each transform. For every named argument, there's a vector of
1043strings containing all the values that it was given, in the order they were
1044given. These are treated a bit like command-line parameters, and it's the
1045transform's responsibility to parse them into the data types it needs, and raise
1046errors by returning a bad Status if any of them are ill-formed.
1047
1048As an example, here's a hypothetical transform call:
1049
1050```
1051some_transform(foo=a, foo=b, bar=2, bob="1,2,3")
1052```
1053
1054Here's what the std::map of strings looks like in the params member:
1055
1056```
1057{{"foo", {"a", "b"}}, {"bar", {"2"}}, {"bob", {"1,2,3"}}}
1058```
1059
1060The double quotes around the comma-separated argument to `bob` are important
1061because otherwise they'll be treated as separate arguments, and the parsing will
1062fail.
1063
1064Here's an example of how [round_weights](#round_weights) reads its `num_steps`
1065parameter:
1066
1067```C++
1068TF_RETURN_IF_ERROR(context.GetOneInt32Parameter("num_steps", 256, &num_steps));
1069```
1070
1071If the conversion fails or the parameter occurs more than once the helper
1072function will raise a meaningful error through the status result of the
1073transform. If the parameter isn't specified at all then the default will be
1074used.
1075
1076### Function Libraries
1077
1078A newer feature of TensorFlow is the ability to create libraries of functions as
1079part of graphs. These are a bit like templates, which define macro operations in
1080terms of smaller components, which can then be instantiated with different input
1081and output connections inside the graph just like regular ops. Right now the
1082graph transform tool just copies these libraries between the input and output
1083graphs, but it's likely that more complex operations will be supported on them
1084in the future.
1085
1086### Registering
1087
1088The Graph Transform Tool associates names of transforms with the code to
1089implement them using the `REGISTER_GRAPH_TRANSFORM()` macro. This takes a string
1090and a function, and automatically registers the transform with the tool. You
1091will need to watch out for a few things though:
1092
1093*   Because it's using global C++ objects in each file under the hood, the
1094    linker can sometimes strip them out and lose the registration. In Bazel you
1095    need to make sure you're linking any new transforms in as libraries, and use
1096    the `alwayslink` flag in your `cc_binary` call.
1097
1098*   You should be able to create your own copy of the transform_graph tool by
1099    linking against the transform_graph_main_lib library in
1100    tensorflow/tools/graph_transforms/BUILD. This contains all the `main()`
1101    logic to parse command line arguments and call transforms.
1102