README.md
1# TensorFlow Bazel Clang
2
3This is a specialized toolchain that uses an old Debian with a new Clang that
4can cross compile to any x86_64 microarchitecture. It's intended to build Linux
5binaries that only require the following ABIs:
6
7- GLIBC_2.18
8- CXXABI_1.3.7 (GCC 4.8.3)
9- GCC_4.2.0
10
11Which are available on at least the following Linux platforms:
12
13- Ubuntu 14+
14- CentOS 7+
15- Debian 8+
16- SuSE 13.2+
17- Mint 17.3+
18- Manjaro 0.8.11
19
20# System Install
21
22On Debian 8 (Jessie) Clang 6.0 can be installed as follows:
23
24```sh
25cat >>/etc/apt/sources.list <<'EOF'
26deb http://apt.llvm.org/jessie/ llvm-toolchain-jessie main
27deb-src http://apt.llvm.org/jessie/ llvm-toolchain-jessie main
28EOF
29wget -O - https://apt.llvm.org/llvm-snapshot.gpg.key | apt-key add -
30apt-key fingerprint |& grep '6084 F3CF 814B 57C1 CF12 EFD5 15CF 4D18 AF4F 7421'
31apt-get update
32apt-get install clang lld
33```
34
35# Bazel Configuration
36
37This toolchain can compile TensorFlow in 2m30s on a 96-core Skylake GCE VM if
38the following `.bazelrc` settings are added:
39
40```
41startup --host_jvm_args=-Xmx30G
42startup --host_jvm_args=-Xms30G
43startup --host_jvm_args=-XX:MaxNewSize=3g
44startup --host_jvm_args=-XX:-UseAdaptiveSizePolicy
45startup --host_jvm_args=-XX:+UseConcMarkSweepGC
46startup --host_jvm_args=-XX:TargetSurvivorRatio=70
47startup --host_jvm_args=-XX:SurvivorRatio=6
48startup --host_jvm_args=-XX:+UseCMSInitiatingOccupancyOnly
49startup --host_jvm_args=-XX:CMSFullGCsBeforeCompaction=1
50startup --host_jvm_args=-XX:CMSInitiatingOccupancyFraction=75
51
52build --jobs=100
53build --local_resources=200000,100,100
54build --crosstool_top=@local_config_clang6//clang6
55build --noexperimental_check_output_files
56build --nostamp
57build --config=opt
58build --noexperimental_check_output_files
59build --copt=-march=native
60build --host_copt=-march=native
61```
62
63# x86_64 Microarchitectures
64
65## Intel CPU Line
66
67- 2003 P6 M SSE SSE2
68- 2004 prescott SSE3 SSSE3 (-march=prescott)
69- 2006 core X64 SSE4.1 (only on 45nm variety) (-march=core2)
70- 2008 nehalem SSE4.2 VT-x VT-d (-march=nehalem)
71- 2010 westmere CLMUL AES (-march=westmere)
72- 2012 sandybridge AVX TXT (-march=sandybridge)
73- 2012 ivybridge F16C MOVBE (-march=ivybridge)
74- 2013 haswell AVX2 TSX BMI2 FMA (-march=haswell)
75- 2014 broadwell RDSEED ADCX PREFETCHW (-march=broadwell - works on trusty
76 gcc4.9)
77- 2015 skylake SGX ADX MPX
78 AVX-512[xeon-only](-march=skylake / -march=skylake-avx512 - needs gcc7)
79- 2018 cannonlake AVX-512 SHA (-march=cannonlake - needs clang5)
80
81## Intel Low Power CPU Line
82
83- 2013 silvermont SSE4.1 SSE4.2 VT-x (-march=silvermont)
84- 2016 goldmont SHA (-march=goldmont - needs clang5)
85
86## AMD CPU Line
87
88- 2003 k8 SSE SSE2 (-march=k8)
89- 2005 k8 (Venus) SSE3 (-march=k8-sse3)
90- 2008 barcelona SSE4a?! (-march=barcelona)
91- 2011 bulldozer SSE4.1 SSE4.2 CLMUL AVX AES FMA4?! (-march=bdver1)
92- 2011 piledriver FMA (-march=bdver2)
93- 2015 excavator AVX2 BMI2 MOVBE (-march=bdver4)
94
95## Google Compute Engine Supported CPUs
96
97- 2012 sandybridge 2.6gHz -march=sandybridge
98- 2012 ivybridge 2.5gHz -march=ivybridge
99- 2013 haswell 2.3gHz -march=haswell
100- 2014 broadwell 2.2gHz -march=broadwell
101- 2015 skylake 2.0gHz -march=skylake-avx512
102
103See: <https://cloud.google.com/compute/docs/cpu-platforms>
104