1# How Protobuf supports multiple C++ build systems 2 3This document explains how the Protobuf project supports multiple C++ build 4systems. 5 6## Background 7 8Protobuf primarily uses [Bazel](https://bazel.build) to build the Protobuf C++ 9runtime and Protobuf compiler[^historical_sot]. However, there are several 10different build systems in common use for C++, each one of which requires 11essentially a complete copy of the same build definitions. 12 13[^historical_sot]: 14 On a historical note, prior to its [release as Open Source 15 Software](https://opensource.googleblog.com/2008/07/protocol-buffers-googles-data.html), 16 the Protobuf project was developed using Google's internal build system, which 17 was the predecessor to Bazel (the vast majority of Google's contributions 18 continue to be developed this way). The Open Source Protobuf project, however, 19 historically used Autoconf to build the C++ implementation. 20 Over time, other build systems (including Bazel) have been added, thanks in 21 large part to substantial contributions from the Open Source community. Since 22 the Protobuf project deals with multiple languages (all of which ultimately 23 rely upon C++, for the Protobuf compiler), Bazel is a natural choice for a 24 project-wide build system -- in fact, Bazel (and its predecessor, Blaze) 25 was designed in large part to support exactly this type of rich, 26 multi-language build. 27 28Currently, C++ Protobuf can be built with Bazel and CMake. Each of these build 29systems has different semantics and structure, but share in common the list of 30files needed to build the runtime and compiler. 31 32## Design 33 34### Extracting information from Bazel 35 36Bazel's Starlark API provides [aspects](https://bazel.build/rules/aspects) to 37traverse the build graph, inspect build rules, define additional actions, and 38expose information through 39[providers](https://bazel.build/rules/rules#providers). For example, the 40`cc_proto_library` rule uses an aspect to traverse the dependency graph of 41`proto_library` rules, and dynamically attaches actions to generate C++ code 42using the Protobuf compiler and compile using the C++ compiler. 43 44In order to support multiple build systems, the overall build structure is 45defined once for each system, and expose frequently-changing metadata 46from Bazel in a way that can be included from the build definition. Primarily, 47this means exposing the list of source files in a way that can be included 48in other build definitions. 49 50Two aspects are used to extract this information from the Bazel build 51definitions: 52 53* `cc_file_list_aspect` extracts `srcs`, `hdrs`, and `textual_hdrs` from build 54 rules like `cc_library`. The sources are exposed through a provider named 55 `CcFileList`. 56* `proto_file_list_aspect` extracts the `srcs` from a `proto_library`, and 57 also generates the expected filenames that would be generated by the 58 Protobuf compiler. This information is exposed through a provider named 59 `ProtoFileList`. 60 61On their own, these aspects have limited utility. However, they can be 62instantiated by custom rules, so that an ordinary `BUILD.bazel` target can 63produce outputs based on the information gleaned from these aspects. 64 65### (Aside) Distribution libraries 66 67Bazel's native `cc_library` rule is typically used on a "fine-grained" level, so 68that, for example, lightweight unit tests can be written with narrow scope. 69Although Bazel does build library artifacts (such as `.so` and `.a` files on 70Linux), they correspond to `cc_library` rules. 71 72Since the entire "Protobuf library" includes many constituent `cc_library` 73rules, a special rule, `cc_dist_library`, combines several fine-grained 74libraries into a single, monolithic library. 75 76For the Protobuf project, these "distribution libraries" are intended to match 77the granularity of the CMake-based builds. Since the Bazel-built 78distribution library covers the rules with the source files needed by other 79builds, the `cc_dist_library` rule invokes the `cc_file_list_aspect` on its 80input libraries. The result is that a `cc_dist_library` rule not only produces 81composite library artifacts, but also collect and provide the list of sources 82that were inputs. 83 84For example: 85 86``` 87$ cat cc_dist_library_example/BUILD.bazel 88load("@rules_cc//cc:defs.bzl", "cc_library") 89load("//pkg:cc_dist_library.bzl", "cc_dist_library") 90 91cc_library( 92 name = "a", 93 srcs = ["a.cc"], 94) 95 96cc_library( 97 name = "b", 98 srcs = ["b.cc"], 99 deps = [":c"], 100) 101 102# N.B.: not part of the cc_dist_library, even though it is in the deps of 'b': 103cc_library( 104 name = "c", 105 srcs = ["c.cc"], 106) 107 108cc_dist_library( 109 name = "lib", 110 deps = [ 111 ":a", 112 ":b", 113 ], 114 visibility = ["//visibility:public"], 115) 116 117# Note: the output below has been formatted for clarity: 118$ bazel cquery //cc_dist_library_example:lib \ 119 --output=starlark \ 120 --starlark:expr='providers(target)["//pkg:cc_dist_library.bzl%CcFileList"]' 121struct( 122 hdrs = depset([]), 123 internal_hdrs = depset([]), 124 srcs = depset([ 125 <source file cc_dist_library_example/a.cc>, 126 <source file cc_dist_library_example/b.cc>, 127 ]), 128 textual_hdrs = depset([]), 129) 130``` 131 132The upshot is that the "coarse-grained" library can be defined by the Bazel 133build, and then export the list of source files that are needed to reproduce the 134library in a different build system. 135 136One major difference from most Bazel rule types is that the file list aspects do 137not propagate. In other words, they only expose the immediate dependency's 138sources, not transitive sources. This is for two reasons: 139 1401. Immediate dependencies are conceptually simple, while transitivity requires 141 substantially more thought. For example, if transitive dependencies were 142 considered, then some way would be needed to exclude dependencies that 143 should not be part of the final library (for example, a distribution library 144 for `//:protobuf` could be defined not to include all of 145 `//:protobuf_lite`). While dependency elision is an interesting design 146 problem, the protobuf library is small enough that directly listing 147 dependencies should not be problematic. 1482. Dealing only with immediate dependencies gives finer-grained control over 149 what goes into the composite library. For example, a Starlark `select()` 150 could conditionally add fine-grained libraries to some builds, but not 151 others. 152 153Another subtlety for tests is due to Bazel internals. Internally, a slightly 154different configuration is used when evaluating `cc_test` rules as compared to 155`cc_dist_library`. If `cc_test` targets are included in a `cc_dist_library` 156rule, and both are evaluated by Bazel, this can result in a build-time error: 157the config used for the test contains additional options that tell Bazel how to 158execute the test that the `cc_file_list_aspect` build config does not. Bazel 159detects this as two conflicting actions generating the same outputs. (For 160`cc_test` rules, the simplest workaround is to provide sources through a 161`filegroup` or similar.) 162 163### File list generation 164 165Lists of input files are generated by Bazel in a format that can be imported to 166other build systems. Currently only CMake-style files can be generated. 167 168The lists of files are derived from Bazel build targets. The sources can be: 169* `cc_dist_library` rules (as described above) 170* `proto_library` rules 171* individual files 172* `filegroup` rules 173* `pkg_files` or `pkg_filegroup` rules from 174 https://github.com/bazelbuild/rules_pkg 175 176For example: 177 178``` 179$ cat gen_file_lists_example/BUILD.bazel 180load("@protobuf//bazel:proto_library.bzl", "proto_library") 181load("//pkg:build_systems.bzl", "gen_cmake_file_lists") 182 183filegroup( 184 name = "doc_files", 185 srcs = [ 186 "README.md", 187 "englilsh_paper.md", 188 ], 189) 190 191proto_library( 192 name = "message", 193 srcs = ["message.proto"], 194) 195 196gen_cmake_file_lists( 197 name = "source_lists", 198 out = "source_lists.cmake", 199 src_libs = { 200 ":doc_files": "docs", 201 ":message": "buff", 202 "//cc_dist_library_example:c": "distlib", 203 }, 204) 205 206$ bazel build gen_file_lists_example:source_lists 207$ cat bazel-bin/gen_file_lists_example/source_lists.cmake 208# Auto-generated by //gen_file_lists_example:source_lists 209# 210# This file contains lists of sources based on Bazel rules. It should 211# be included from a hand-written CMake file that defines targets. 212# 213# Changes to this file will be overwritten based on Bazel definitions. 214 215if(${CMAKE_VERSION} VERSION_GREATER 3.10 OR ${CMAKE_VERSION} VERSION_EQUAL 3.10) 216 include_guard() 217endif() 218 219# //gen_file_lists_example:doc_files 220set(docs_files 221 gen_file_lists_example/README.md 222 gen_file_lists_example/englilsh_paper.md 223) 224 225# //gen_file_lists_example:message 226set(buff_proto_srcs 227 gen_file_lists_example/message.proto 228) 229 230# //gen_file_lists_example:message 231set(buff_srcs 232 gen_file_lists_example/message.proto.pb.cc 233) 234 235# //gen_file_lists_example:message 236set(buff_hdrs 237 gen_file_lists_example/message.proto.pb.h 238) 239 240# //gen_file_lists_example:message 241set(buff_files 242 gen_file_lists_example/message-descriptor-set.proto.bin 243) 244 245# //cc_dist_library_example:c 246set(distlib_srcs 247 cc_dist_library_example/a.cc 248 cc_dist_library_example/b.cc 249) 250 251# //cc_dist_library_example:c 252set(distlib_hdrs 253 254) 255``` 256 257A hand-written CMake build rule could then use the generated file to define 258libraries, such as: 259 260``` 261include(source_lists.cmake) 262add_library(distlib ${distlib_srcs} ${buff_srcs}) 263``` 264 265### Protobuf usage 266 267The main C++ runtimes (lite and full) and the Protobuf compiler use their 268corresponding `cc_dist_library` rules to generate file lists. For 269`proto_library` targets, the file list generation can extract the source files 270directly. For other targets, notably `cc_test` targets, the file list generators 271use `filegroup` rules. 272 273In general, adding new targets to a non-Bazel build system in Protobuf (or 274adding a new build system altogether) requires some one-time setup: 275 2761. The overall structure of the new build system has to be defined. It should 277 import lists of files and refer to them by variable, instead of listing 278 files directly. 2792. (Only if the build system is new) A new rule type has to be added to 280 `//pkg:build_systems.bzl`. Most of the implementation is shared, but a 281 "fragment generator" is need to declare a file list variable, and the rule 282 type itself has to be defined and call the shared implementation. 283 284When files are added or deleted, or when the Protobuf Bazel structure is 285changed, these changes may need to be reflected in the file list logic. These 286are some example scenarios: 287 288* Files are added to (or removed from) the `srcs` of an existing `cc_library`: 289 no changes needed. If the `cc_library` is already part of a 290 `cc_dist_library`, then regenerating the source lists will reflect the 291 change. 292* A `cc_library` is added: the new target may need to be added to the Protobuf 293 `cc_dist_library` targets, as appropriate. 294* A `cc_library` is deleted: if a `cc_dist_library` depends upon the deleted 295 target, then a build-time error will result. The library needs to be removed 296 from the `cc_dist_library`. 297* A `cc_test` is added or deleted: test sources are handled by `filegroup` 298 rules defined in the same package as the `cc_test` rule. The `filegroup`s 299 are usually given a name like `"test_srcs"`, and often use `glob()` to find 300 sources. This means that adding or removing a test may not require any extra 301 work, but this can be verified within the same package as the test rule. 302* Test-only proto files are added: the `proto_library` might need to be added 303 to the file list map in `//pkg:BUILD.bazel`, and then the file added to 304 various build systems. However, most test-only protos are already exposed 305 through libraries like `//src/google/protobuf:test_protos`. 306 307If there are changes, then the regenerated file lists need to be copied back 308into the repo. That way, the corresponding build systems can be used with a git 309checkout, without needing to run Bazel first. 310 311### (Aside) Distribution archives 312 313A very similar set of rules is defined in `//pkg` to build source distribution 314archives for releases. In addition to the full sources, Protobuf releases also 315include source archives sliced by language, so that, for example, a Ruby-based 316project can get just the sources needed to build the Ruby runtime. (The 317per-language slices also include sources needed to build the protobuf compiler, 318so they all effectively include the C++ runtime.) 319 320These archives are defined using rules from the 321[rules_pkg](https://github.com/bazelbuild/rules_pkg) project. Although they are 322similar to `cc_dist_library` and the file list generation rules, the goals are 323different: the build system file lists described above only apply to C++, and 324are organized according to what should or should not be included in different 325parts of the build (e.g., no tests are included in the main library). On the 326other hand, the distribution archives deal with languages other than C++, and 327contain all the files that need to be distributed as part of a release (even for 328C++, this is more than just the C++ sources). 329 330While it might be possible to use information from the `CcFileList` and 331`ProtoFileList` providers to define the distribution files, additional files 332(such as the various `BUILD.bazel` files) are also needed in the distribution 333archive. The lists of distribution files can usually be generated by `glob()`, 334anyhow, so sharing logic with the file list aspects may not be beneficial. 335 336Currently, all of the file lists are checked in. However, it would be possible 337to build the file lists on-the-fly and include them in the distribution 338archives, rather than checking them in. 339