• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# Benchmarks
2
3## Overview
4
5A selection of image classification models were tested across multiple platforms
6to create a point of reference for the TensorFlow community. The
7[Methodology](#methodology) section details how the tests were executed and has
8links to the scripts used.
9
10## Results for image classification models
11
12InceptionV3 ([arXiv:1512.00567](https://arxiv.org/abs/1512.00567)), ResNet-50
13([arXiv:1512.03385](https://arxiv.org/abs/1512.03385)), ResNet-152
14([arXiv:1512.03385](https://arxiv.org/abs/1512.03385)), VGG16
15([arXiv:1409.1556](https://arxiv.org/abs/1409.1556)), and
16[AlexNet](http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf)
17were tested using the [ImageNet](http://www.image-net.org/) data set. Tests were
18run on Google Compute Engine, Amazon Elastic Compute Cloud (Amazon EC2), and an
19NVIDIA® DGX-1™. Most of the tests were run with both synthetic and real data.
20Testing with synthetic data was done by using a `tf.Variable` set to the same
21shape as the data expected by each model for ImageNet. We believe it is
22important to include real data measurements when benchmarking a platform. This
23load tests both the underlying hardware and the framework at preparing data for
24actual training. We start with synthetic data to remove disk I/O as a variable
25and to set a baseline. Real data is then used to verify that the TensorFlow
26input pipeline and the underlying disk I/O are saturating the compute units.
27
28### Training with NVIDIA® DGX-1™ (NVIDIA® Tesla® P100)
29
30<div style="width:95%; margin:auto; margin-bottom:10px; margin-top:20px;">
31  <img style="width:80%" src="../images/perf_summary_p100_single_server.png">
32</div>
33
34Details and additional results are in the [Details for NVIDIA® DGX-1™ (NVIDIA®
35Tesla® P100)](#details_for_nvidia_dgx-1tm_nvidia_tesla_p100) section.
36
37### Training with NVIDIA® Tesla® K80
38
39<div style="width:95%; margin:auto; margin-bottom:10px; margin-top:20px;">
40  <img style="width:80%" src="../images/perf_summary_k80_single_server.png">
41</div>
42
43Details and additional results are in the [Details for Google Compute Engine
44(NVIDIA® Tesla® K80)](#details_for_google_compute_engine_nvidia_tesla_k80) and
45[Details for Amazon EC2 (NVIDIA® Tesla®
46K80)](#details_for_amazon_ec2_nvidia_tesla_k80) sections.
47
48### Distributed training with NVIDIA® Tesla® K80
49
50<div style="width:95%; margin:auto; margin-bottom:10px; margin-top:20px;">
51  <img style="width:80%" src="../images/perf_summary_k80_aws_distributed.png">
52</div>
53
54Details and additional results are in the [Details for Amazon EC2 Distributed
55(NVIDIA® Tesla® K80)](#details_for_amazon_ec2_distributed_nvidia_tesla_k80)
56section.
57
58### Compare synthetic with real data training
59
60**NVIDIA® Tesla® P100**
61
62<div style="width:95%; margin:auto; margin-bottom:10px; margin-top:20px;">
63  <img style="width:35%" src="../images/perf_summary_p100_data_compare_inceptionv3.png">
64  <img style="width:35%" src="../images/perf_summary_p100_data_compare_resnet50.png">
65</div>
66
67**NVIDIA® Tesla® K80**
68
69<div style="width:95%; margin:auto; margin-bottom:10px; margin-top:20px;">
70  <img style="width:35%" src="../images/perf_summary_k80_data_compare_inceptionv3.png">
71  <img style="width:35%" src="../images/perf_summary_k80_data_compare_resnet50.png">
72</div>
73
74## Details for NVIDIA® DGX-1™ (NVIDIA® Tesla® P100)
75
76### Environment
77
78*   **Instance type**: NVIDIA® DGX-1™
79*   **GPU:** 8x NVIDIA® Tesla® P100
80*   **OS:** Ubuntu 16.04 LTS with tests run via Docker
81*   **CUDA / cuDNN:** 8.0 / 5.1
82*   **TensorFlow GitHub hash:** b1e174e
83*   **Benchmark GitHub hash:** 9165a70
84*   **Build Command:** `bazel build -c opt --copt=-march="haswell" --config=cuda
85    //tensorflow/tools/pip_package:build_pip_package`
86*   **Disk:** Local SSD
87*   **DataSet:** ImageNet
88*   **Test Date:** May 2017
89
90Batch size and optimizer used for each model are listed in the table below. In
91addition to the batch sizes listed in the table, InceptionV3, ResNet-50,
92ResNet-152, and VGG16 were tested with a batch size of 32. Those results are in
93the *other results* section.
94
95Options            | InceptionV3 | ResNet-50 | ResNet-152 | AlexNet | VGG16
96------------------ | ----------- | --------- | ---------- | ------- | -----
97Batch size per GPU | 64          | 64        | 64         | 512     | 64
98Optimizer          | sgd         | sgd       | sgd        | sgd     | sgd
99
100Configuration used for each model.
101
102Model       | variable_update        | local_parameter_device
103----------- | ---------------------- | ----------------------
104InceptionV3 | parameter_server       | cpu
105ResNet50    | parameter_server       | cpu
106ResNet152   | parameter_server       | cpu
107AlexNet     | replicated (with NCCL) | n/a
108VGG16       | replicated (with NCCL) | n/a
109
110### Results
111
112<div style="width:95%; margin:auto; margin-bottom:10px; margin-top:20px;">
113  <img style="width:80%" src="../images/perf_summary_p100_single_server.png">
114</div>
115
116<div style="width:95%; margin:auto; margin-bottom:10px; margin-top:20px;">
117  <img style="width:35%" src="../images/perf_dgx1_synth_p100_single_server_scaling.png">
118  <img style="width:35%" src="../images/perf_dgx1_real_p100_single_server_scaling.png">
119</div>
120
121**Training synthetic data**
122
123GPUs | InceptionV3 | ResNet-50 | ResNet-152 | AlexNet | VGG16
124---- | ----------- | --------- | ---------- | ------- | -----
1251    | 142         | 219       | 91.8       | 2987    | 154
1262    | 284         | 422       | 181        | 5658    | 295
1274    | 569         | 852       | 356        | 10509   | 584
1288    | 1131        | 1734      | 716        | 17822   | 1081
129
130**Training real data**
131
132GPUs | InceptionV3 | ResNet-50 | ResNet-152 | AlexNet | VGG16
133---- | ----------- | --------- | ---------- | ------- | -----
1341    | 142         | 218       | 91.4       | 2890    | 154
1352    | 278         | 425       | 179        | 4448    | 284
1364    | 551         | 853       | 359        | 7105    | 534
1378    | 1079        | 1630      | 708        | N/A     | 898
138
139Training AlexNet with real data on 8 GPUs was excluded from the graph and table
140above due to it maxing out the input pipeline.
141
142### Other Results
143
144The results below are all with a batch size of 32.
145
146**Training synthetic data**
147
148GPUs | InceptionV3 | ResNet-50 | ResNet-152 | VGG16
149---- | ----------- | --------- | ---------- | -----
1501    | 128         | 195       | 82.7       | 144
1512    | 259         | 368       | 160        | 281
1524    | 520         | 768       | 317        | 549
1538    | 995         | 1485      | 632        | 820
154
155**Training real data**
156
157GPUs | InceptionV3 | ResNet-50 | ResNet-152 | VGG16
158---- | ----------- | --------- | ---------- | -----
1591    | 130         | 193       | 82.4       | 144
1602    | 257         | 369       | 159        | 253
1614    | 507         | 760       | 317        | 457
1628    | 966         | 1410      | 609        | 690
163
164## Details for Google Compute Engine (NVIDIA® Tesla® K80)
165
166### Environment
167
168*   **Instance type**: n1-standard-32-k80x8
169*   **GPU:** 8x NVIDIA® Tesla® K80
170*   **OS:** Ubuntu 16.04 LTS
171*   **CUDA / cuDNN:** 8.0 / 5.1
172*   **TensorFlow GitHub hash:** b1e174e
173*   **Benchmark GitHub hash:** 9165a70
174*   **Build Command:** `bazel build -c opt --copt=-march="haswell" --config=cuda
175    //tensorflow/tools/pip_package:build_pip_package`
176*   **Disk:** 1.7 TB Shared SSD persistent disk (800 MB/s)
177*   **DataSet:** ImageNet
178*   **Test Date:** May 2017
179
180Batch size and optimizer used for each model are listed in the table below. In
181addition to the batch sizes listed in the table, InceptionV3 and ResNet-50 were
182tested with a batch size of 32. Those results are in the *other results*
183section.
184
185Options            | InceptionV3 | ResNet-50 | ResNet-152 | AlexNet | VGG16
186------------------ | ----------- | --------- | ---------- | ------- | -----
187Batch size per GPU | 64          | 64        | 32         | 512     | 32
188Optimizer          | sgd         | sgd       | sgd        | sgd     | sgd
189
190The configuration used for each model was `variable_update` equal to
191`parameter_server` and `local_parameter_device` equal to `cpu`.
192
193### Results
194
195<div style="width:95%; margin:auto; margin-bottom:10px; margin-top:20px;">
196  <img style="width:35%" src="../images/perf_gce_synth_k80_single_server_scaling.png">
197  <img style="width:35%" src="../images/perf_gce_real_k80_single_server_scaling.png">
198</div>
199
200**Training synthetic data**
201
202GPUs | InceptionV3 | ResNet-50 | ResNet-152 | AlexNet | VGG16
203---- | ----------- | --------- | ---------- | ------- | -----
2041    | 30.5        | 51.9      | 20.0       | 656     | 35.4
2052    | 57.8        | 99.0      | 38.2       | 1209    | 64.8
2064    | 116         | 195       | 75.8       | 2328    | 120
2078    | 227         | 387       | 148        | 4640    | 234
208
209**Training real data**
210
211GPUs | InceptionV3 | ResNet-50 | ResNet-152 | AlexNet | VGG16
212---- | ----------- | --------- | ---------- | ------- | -----
2131    | 30.6        | 51.2      | 20.0       | 639     | 34.2
2142    | 58.4        | 98.8      | 38.3       | 1136    | 62.9
2154    | 115         | 194       | 75.4       | 2067    | 118
2168    | 225         | 381       | 148        | 4056    | 230
217
218### Other Results
219
220**Training synthetic data**
221
222GPUs | InceptionV3 (batch size 32) | ResNet-50 (batch size 32)
223---- | --------------------------- | -------------------------
2241    | 29.3                        | 49.5
2252    | 55.0                        | 95.4
2264    | 109                         | 183
2278    | 216                         | 362
228
229**Training real data**
230
231GPUs | InceptionV3 (batch size 32) | ResNet-50 (batch size 32)
232---- | --------------------------- | -------------------------
2331    | 29.5                        | 49.3
2342    | 55.4                        | 95.3
2354    | 110                         | 186
2368    | 216                         | 359
237
238## Details for Amazon EC2 (NVIDIA® Tesla® K80)
239
240### Environment
241
242*   **Instance type**: p2.8xlarge
243*   **GPU:** 8x NVIDIA® Tesla® K80
244*   **OS:** Ubuntu 16.04 LTS
245*   **CUDA / cuDNN:** 8.0 / 5.1
246*   **TensorFlow GitHub hash:** b1e174e
247*   **Benchmark GitHub hash:** 9165a70
248*   **Build Command:** `bazel build -c opt --copt=-march="haswell" --config=cuda
249    //tensorflow/tools/pip_package:build_pip_package`
250*   **Disk:** 1TB Amazon EFS (burst 100 MiB/sec for 12 hours, continuous 50
251    MiB/sec)
252*   **DataSet:** ImageNet
253*   **Test Date:** May 2017
254
255Batch size and optimizer used for each model are listed in the table below. In
256addition to the batch sizes listed in the table, InceptionV3 and ResNet-50 were
257tested with a batch size of 32. Those results are in the *other results*
258section.
259
260Options            | InceptionV3 | ResNet-50 | ResNet-152 | AlexNet | VGG16
261------------------ | ----------- | --------- | ---------- | ------- | -----
262Batch size per GPU | 64          | 64        | 32         | 512     | 32
263Optimizer          | sgd         | sgd       | sgd        | sgd     | sgd
264
265Configuration used for each model.
266
267Model       | variable_update           | local_parameter_device
268----------- | ------------------------- | ----------------------
269InceptionV3 | parameter_server          | cpu
270ResNet-50   | replicated (without NCCL) | gpu
271ResNet-152  | replicated (without NCCL) | gpu
272AlexNet     | parameter_server          | gpu
273VGG16       | parameter_server          | gpu
274
275### Results
276
277<div style="width:95%; margin:auto; margin-bottom:10px; margin-top:20px;">
278  <img style="width:35%" src="../images/perf_aws_synth_k80_single_server_scaling.png">
279  <img style="width:35%" src="../images/perf_aws_real_k80_single_server_scaling.png">
280</div>
281
282**Training synthetic data**
283
284GPUs | InceptionV3 | ResNet-50 | ResNet-152 | AlexNet | VGG16
285---- | ----------- | --------- | ---------- | ------- | -----
2861    | 30.8        | 51.5      | 19.7       | 684     | 36.3
2872    | 58.7        | 98.0      | 37.6       | 1244    | 69.4
2884    | 117         | 195       | 74.9       | 2479    | 141
2898    | 230         | 384       | 149        | 4853    | 260
290
291**Training real data**
292
293GPUs | InceptionV3 | ResNet-50 | ResNet-152 | AlexNet | VGG16
294---- | ----------- | --------- | ---------- | ------- | -----
2951    | 30.5        | 51.3      | 19.7       | 674     | 36.3
2962    | 59.0        | 94.9      | 38.2       | 1227    | 67.5
2974    | 118         | 188       | 75.2       | 2201    | 136
2988    | 228         | 373       | 149        | N/A     | 242
299
300Training AlexNet with real data on 8 GPUs was excluded from the graph and table
301above due to our EFS setup not providing enough throughput.
302
303### Other Results
304
305**Training synthetic data**
306
307GPUs | InceptionV3 (batch size 32) | ResNet-50 (batch size 32)
308---- | --------------------------- | -------------------------
3091    | 29.9                        | 49.0
3102    | 57.5                        | 94.1
3114    | 114                         | 184
3128    | 216                         | 355
313
314**Training real data**
315
316GPUs | InceptionV3 (batch size 32) | ResNet-50 (batch size 32)
317---- | --------------------------- | -------------------------
3181    | 30.0                        | 49.1
3192    | 57.5                        | 95.1
3204    | 113                         | 185
3218    | 212                         | 353
322
323## Details for Amazon EC2 Distributed (NVIDIA® Tesla® K80)
324
325### Environment
326
327*   **Instance type**: p2.8xlarge
328*   **GPU:** 8x NVIDIA® Tesla® K80
329*   **OS:** Ubuntu 16.04 LTS
330*   **CUDA / cuDNN:** 8.0 / 5.1
331*   **TensorFlow GitHub hash:** b1e174e
332*   **Benchmark GitHub hash:** 9165a70
333*   **Build Command:** `bazel build -c opt --copt=-march="haswell" --config=cuda
334    //tensorflow/tools/pip_package:build_pip_package`
335*   **Disk:** 1.0 TB EFS (burst 100 MB/sec for 12 hours, continuous 50 MB/sec)
336*   **DataSet:** ImageNet
337*   **Test Date:** May 2017
338
339The batch size and optimizer used for the tests are listed in the table. In
340addition to the batch sizes listed in the table, InceptionV3 and ResNet-50 were
341tested with a batch size of 32. Those results are in the *other results*
342section.
343
344Options            | InceptionV3 | ResNet-50 | ResNet-152
345------------------ | ----------- | --------- | ----------
346Batch size per GPU | 64          | 64        | 32
347Optimizer          | sgd         | sgd       | sgd
348
349Configuration used for each model.
350
351Model       | variable_update        | local_parameter_device | cross_replica_sync
352----------- | ---------------------- | ---------------------- | ------------------
353InceptionV3 | distributed_replicated | n/a                    | True
354ResNet-50   | distributed_replicated | n/a                    | True
355ResNet-152  | distributed_replicated | n/a                    | True
356
357To simplify server setup, EC2 instances (p2.8xlarge) running worker servers also
358ran parameter servers. Equal numbers of parameter servers and worker servers were
359used with the following exceptions:
360
361*   InceptionV3: 8 instances / 6 parameter servers
362*   ResNet-50: (batch size 32) 8 instances / 4 parameter servers
363*   ResNet-152: 8 instances / 4 parameter servers
364
365### Results
366
367<div style="width:95%; margin:auto; margin-bottom:10px; margin-top:20px;">
368  <img style="width:80%" src="../images/perf_summary_k80_aws_distributed.png">
369</div>
370
371<div style="width:95%; margin:auto; margin-bottom:10px; margin-top:20px;">
372  <img style="width:70%" src="../images/perf_aws_synth_k80_distributed_scaling.png">
373</div>
374
375**Training synthetic data**
376
377GPUs | InceptionV3 | ResNet-50 | ResNet-152
378---- | ----------- | --------- | ----------
3791    | 29.7        | 52.4      | 19.4
3808    | 229         | 378       | 146
38116   | 459         | 751       | 291
38232   | 902         | 1388      | 565
38364   | 1783        | 2744      | 981
384
385### Other Results
386
387<div style="width:95%; margin:auto; margin-bottom:10px; margin-top:20px;">
388  <img style="width:50%" src="../images/perf_aws_synth_k80_multi_server_batch32.png">
389</div>
390
391**Training synthetic data**
392
393GPUs | InceptionV3 (batch size 32) | ResNet-50 (batch size 32)
394---- | --------------------------- | -------------------------
3951    | 29.2                        | 48.4
3968    | 219                         | 333
39716   | 427                         | 667
39832   | 820                         | 1180
39964   | 1608                        | 2315
400
401## Methodology
402
403This
404[script](https://github.com/tensorflow/benchmarks/tree/master/scripts/tf_cnn_benchmarks)
405was run on the various platforms to generate the above results.
406@{$performance_models$High-Performance Models} details techniques in the script
407along with examples of how to execute the script.
408
409In order to create results that are as repeatable as possible, each test was run
4105 times and then the times were averaged together. GPUs are run in their default
411state on the given platform. For NVIDIA® Tesla® K80 this means leaving on [GPU
412Boost](https://devblogs.nvidia.com/parallelforall/increase-performance-gpu-boost-k80-autoboost/).
413For each test, 10 warmup steps are done and then the next 100 steps are
414averaged.
415